-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated Train- kernel-parameters example #430
Conversation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…iaGaussianProcesses/KernelFunctions.jl into st/examples--train-kernel-parameters
commit f9bbd84 Author: st-- <[email protected]> Date: Fri Jan 28 09:11:50 2022 +0100 make nystrom work with AbstractVector (#427) * make nystrom work with AbstractVector * add test * Update test/approximations/nystrom.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * patch bump * Update test/approximations/nystrom.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: David Widmann <[email protected]> * Apply suggestions from code review * deprecate * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: David Widmann <[email protected]> * Apply suggestions from code review Co-authored-by: Théo Galy-Fajou <[email protected]> * Update src/approximations/nystrom.jl Co-authored-by: Théo Galy-Fajou <[email protected]> * Update src/approximations/nystrom.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: David Widmann <[email protected]> Co-authored-by: Théo Galy-Fajou <[email protected]> commit d1c68a9 Author: st-- <[email protected]> Date: Thu Jan 13 22:33:43 2022 +0100 fix Distances compat (#423) * CompatHelper: bump compat for Distances to 0.10 for package test, (keep existing compat) * try out Theo's fix * fix test compat * use ForwardDiff for chain rule test of SqMahalanobis * test on 1.4 instead of 1.3 - see if the chainrules test passes there * revert version branch * revert to 1.3 * test_broken for older Julia versions Co-authored-by: CompatHelper Julia <[email protected]> commit 93d33c2 Author: st-- <[email protected]> Date: Wed Jan 12 14:11:14 2022 +0100 fix figure & cleanup (#422) * fix figure & cleanup * bump LIBSVM compat & Manifest * improve writing, replaces #321 commit 40cb59e Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed Jan 12 09:39:01 2022 +0200 CompatHelper: bump compat for Kronecker to 0.5 for package docs, (keep existing compat) (#367) * CompatHelper: bump compat for Kronecker to 0.5 for package docs, (keep existing compat) * ] up Co-authored-by: CompatHelper Julia <[email protected]> Co-authored-by: st-- <[email protected]> commit 7204529 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue Jan 11 18:37:23 2022 +0200 CompatHelper: bump compat for Kronecker to 0.5 for package test, (keep existing compat) (#366) Co-authored-by: CompatHelper Julia <[email protected]> Co-authored-by: st-- <[email protected]> commit 924925d Author: st-- <[email protected]> Date: Tue Jan 11 16:26:02 2022 +0100 switch SVM example to half-moon dataset (#421) commit 992b665 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Fri Dec 24 12:18:56 2021 +0200 CompatHelper: bump compat for SpecialFunctions to 2 for package test, (keep existing compat) (#412) Co-authored-by: CompatHelper Julia <[email protected]> Co-authored-by: Théo Galy-Fajou <[email protected]> Co-authored-by: st-- <[email protected]> commit 04fa7f7 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu Dec 23 13:33:59 2021 +0200 CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#411) Co-authored-by: CompatHelper Julia <[email protected]> Co-authored-by: Théo Galy-Fajou <[email protected]> Co-authored-by: st-- <[email protected]> commit c0fc3e1 Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Date: Thu Dec 23 01:10:40 2021 +0200 CompatHelper: add new compat entry for Compat at version 3 for package test, (keep existing compat) (#418) Co-authored-by: CompatHelper Julia <[email protected]> commit 05fe340 Author: st-- <[email protected]> Date: Tue Dec 21 00:49:37 2021 +0200 use only() instead of first() (#403) * use only() instead of first() for 1-"vectors" that were for the benefit of Flux * fix one test that should not have worked as it was * add missing scalar Sinus constructor commit 2d17212 Author: st-- <[email protected]> Date: Sat Dec 18 23:43:30 2021 +0200 Zygote AD failure workarounds & test cleanup (#414) Zygote AD failures: * revert #409 (test_utils workaround for broken Zygote - now working again) * disable broken Zygote AD test for ChainTransform Improved tests: * finer-grained testsets * add missing test cases to test_AD * replace test_FiniteDiff with test_AD(..., :FiniteDiff, ...) * remove code duplication commit 3c49949 Author: Théo Galy-Fajou <[email protected]> Date: Wed Nov 24 18:32:19 2021 +0100 Fix typo in valid_inputs error (#408) * Fix typo in valid_inputs error * Update src/utils.jl Co-authored-by: David Widmann <[email protected]> Co-authored-by: David Widmann <[email protected]> commit 9955044 Author: st-- <[email protected]> Date: Wed Nov 24 18:55:18 2021 +0200 Fix for Zygote 0.6.30 breaking our tests (#409) * restrict Zygote to <0.6.30 * revert Zygote test restriction and add finer-grained testset * Update test/utils.jl Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * revert testset * mark test_broken * Use `@test_throws` instead of `@test_broken` Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: David Widmann <[email protected]> commit 33d64d1 Author: Théo Galy-Fajou <[email protected]> Date: Thu Nov 4 14:23:57 2021 +0100 Add benchmarking CI (#399) * Add benchmark file * delete old benchmarks * Add github action * Add Project * Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Missing end of line Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> commit 360ce10 Author: David Widmann <[email protected]> Date: Tue Nov 2 11:09:58 2021 +0100 Update docstring of `GibbsKernel` (#395)
Codecov Report
@@ Coverage Diff @@
## master #430 +/- ##
=======================================
Coverage 93.13% 93.13%
=======================================
Files 52 52
Lines 1252 1252
=======================================
Hits 1166 1166
Misses 86 86 Continue to review full report at Codecov.
|
I didn't build the documentation locally, is there a way to preview from here? |
There's a preview for PRs that are not from forks, the link to the deployed docs will show up among the Github actions (see e.g. #429). However, currently there's no link since the documentation is not built successfully. |
The problem is that BenchmarkTools is not installed: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/runs/4983109277?check_suite_focus=true#step:5:31 Is it actually necessary to perform benchmarks? |
Ah, that makes alot of sense. Not strictly necessary, but there are interesting things to see. |
|
||
function loss(θ) | ||
ŷ = f(x_train, x_train, y_train, θ) | ||
return sum(abs2, y_train - ŷ) + exp(θ[4]) * norm(ŷ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
a) why not
return sum(abs2, y_train - ŷ) + exp(θ[4]) * norm(ŷ) | |
return norm(y_train - ŷ) + exp(θ[4]) * norm(ŷ) |
b) can you explain where the second term comes from? Just thinking about "parameter regularisation" might make one think it ought to be norm(\theta)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is from your original example, I am honestly not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which in turn was part of a previously existing example 😅 @theogf @willtebbutt how would you address this (comment to add / equation to fix / ...)?
|
||
# ### Training | ||
# Setting an initial value and initializing the optimizer: | ||
θ = log.([1.1, 0.1, 0.01, 0.001]) # Initial vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
θ = log.([1.1, 0.1, 0.01, 0.001]) # Initial vector | |
θ = log.(ones(4)) # Initial vector |
as you had before?
(alternatively, use the values from here above as well?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB- what do you think of commenting explicitly on this being a mutable vector that will be changed in-place by the optimiser ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also comes from the original example, can experiment with setting it to ones
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably a reasonable setting to end up in the right local optimum - maybe simpler to just change the very first ones(4)
to also be [1.1, 0.1, 0.01, 0.001] instead?:)
|
||
# Computational cost for one step | ||
|
||
@benchmark let θt = θ[:], optt = Optimise.ADAGrad(0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you introduce a new optt
? (I'm assuming it's due to the internal optimizer state that you don't want to affect for the actual optimisation loop, but this might not be clear to the casual reader!)
As it's within a let
block, you could just call it opt
and it wouldn't get outside the scope of the let
block.
It might then also be cleaner if you move the opt = ...
definition from line 80 to the section below (line 94 following) where it's actually used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few things, but there seemed to be some leaking. This was my way of fixing that leakage.
I can take another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely need to copy the \theta so you don't change the initial values. But
@benchmark let
θ = log.(ones(4))
opt = Optimise.ADAGrad(0.5)
...
end
should work
gif(anim, "train-kernel-param.gif"; show_msg=false, fps=15); | ||
nothing; #hide | ||
|
||
# ![](train-kernel-param.gif) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the benefit of this vs. having the output from gif() inline ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally it didn't display correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh! alright. Well, as long as it works in the deployed docs it doesn't matter too much one way or another:)
|
||
# ## Using ParameterHandling.jl | ||
# Alternatively, we can use the [ParameterHandling.jl](https://github.com/invenia/ParameterHandling.jl) package | ||
# to handle the requirement that all kernel parameters should be positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be nice to also comment on it allowing arbitrary nested namedtuples for easily accessing parameters without worrying about where in the flat param vector they are, their constraints, etc...
) | ||
|
||
flat_θ, unflatten = ParameterHandling.value_flatten(raw_initial_θ) | ||
nothing #hide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need these? I think Literate.jl only splits up code cells at non-code markdown (comments), so all of this would run as a single cell and its return values wouldn't get printed anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right, and this is some artifact from me moving things around.
# ## Flux.destructure | ||
# If don't want to write an explicit function to construct the kernel, we can alternatively use the `Flux.destructure` function. | ||
# Again, we need to ensure that the parameters are positive. Note that the `exp` function is now part of the loss function, instead of part of the kernel construction. | ||
# We could also use ParameterHandling.jl here, similar to the example above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not clear to me, how would you combine the two? I had thought it was either-or..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression was that one could do a kernelc \circ flatten
(or unflatten) construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you think it's something one would actually want to do in practice, it'd be great to expand on it a bit more, but otherwise I would prefer removing the reference so as to not confuse people (like myself reading it now 😅 )
|
||
p, kernelc = Flux.destructure(kernel); | ||
|
||
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel. | |
# This returns the "trainable" parameters of the kernel and a function to reconstruct the kernel. |
or did you mean for trainable
to be typeset as code ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as typeset in code, for Flux.trainable
(which I understand from another PR might be going away, but still).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flux.trainable isn't exported (and barely documented - the only reference I could find was in https://github.com/FluxML/Flux.jl/blob/4a3483efd8e13437b3d86371723c977fb61c2793/docs/src/models/advanced.md, but that doesn't seem relevant here), so as it is it seems to me to be rather confusing. Maybe
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel. | |
# This returns the trainable `params` of the kernel and a function to reconstruct the kernel. |
?
or just
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel. | |
# This returns the trainable parameters of the kernel and a function to reconstruct the kernel from these parameters. |
|
||
kernel = (θ[1] * SqExponentialKernel() + θ[2] * Matern32Kernel()) ∘ ScaleTransform(θ[3]) | ||
|
||
p, kernelc = Flux.destructure(kernel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the p
? what is it needed for? what does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed anymore. Initially I wanted to show that kernelc(p) == kernel
, but due to a mutability issue, this does not evaluate as true
, despite them being ==
in almost all respects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is a useful addition to make clear that you can train kernel parameters (and how to do so); I've left minor comments as github suggestions, there's a few other comments that would be good if you could address before it's ready to merge. Let me know if anything doesn't make sense!
Co-authored-by: st-- <[email protected]>
@Crown421 let me know once you've addressed my remaining comments (either through changes or by replying what you doesn't make sense to you / you disagree with:)), and I'll finish off the review! |
@st-- I think I addressed all comments now. Apologies for the delay, I also managed to catch covid since we spoke. |
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a very helpful addition, thanks for contributing!
there's just one remaining comment on the loss comment:) let me know what you want to do with that, then we can merge it
Co-authored-by: st-- <[email protected]>
Summary
Picking up #317, and adding more to it. The commits might a bit of a mess, due to some unfortunate experiments with the github desktop app.
Proposed changes
What alternatives have you considered?