Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Train- kernel-parameters example #430

Merged
merged 50 commits into from
Mar 21, 2022
Merged

Updated Train- kernel-parameters example #430

merged 50 commits into from
Mar 21, 2022

Conversation

Crown421
Copy link
Member

Summary
Picking up #317, and adding more to it. The commits might a bit of a mess, due to some unfortunate experiments with the github desktop app.

Proposed changes

What alternatives have you considered?

st-- and others added 19 commits June 30, 2021 17:13
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…iaGaussianProcesses/KernelFunctions.jl into st/examples--train-kernel-parameters
commit f9bbd84
Author: st-- <[email protected]>
Date:   Fri Jan 28 09:11:50 2022 +0100

    make nystrom work with AbstractVector (#427)

    * make nystrom work with AbstractVector

    * add test

    * Update test/approximations/nystrom.jl

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    * patch bump

    * Update test/approximations/nystrom.jl

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    * Apply suggestions from code review

    Co-authored-by: David Widmann <[email protected]>

    * Apply suggestions from code review

    * deprecate

    * Apply suggestions from code review

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    * Apply suggestions from code review

    Co-authored-by: David Widmann <[email protected]>

    * Apply suggestions from code review

    Co-authored-by: Théo Galy-Fajou <[email protected]>

    * Update src/approximations/nystrom.jl

    Co-authored-by: Théo Galy-Fajou <[email protected]>

    * Update src/approximations/nystrom.jl

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: David Widmann <[email protected]>
    Co-authored-by: Théo Galy-Fajou <[email protected]>

commit d1c68a9
Author: st-- <[email protected]>
Date:   Thu Jan 13 22:33:43 2022 +0100

    fix Distances compat (#423)

    * CompatHelper: bump compat for Distances to 0.10 for package test, (keep existing compat)

    * try out Theo's fix

    * fix test compat

    * use ForwardDiff for chain rule test of SqMahalanobis

    * test on 1.4 instead of 1.3 - see if the chainrules test passes there

    * revert version branch

    * revert to 1.3

    * test_broken for older Julia versions

    Co-authored-by: CompatHelper Julia <[email protected]>

commit 93d33c2
Author: st-- <[email protected]>
Date:   Wed Jan 12 14:11:14 2022 +0100

    fix figure & cleanup (#422)

    * fix figure & cleanup

    * bump LIBSVM compat & Manifest

    * improve writing, replaces #321

commit 40cb59e
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Wed Jan 12 09:39:01 2022 +0200

    CompatHelper: bump compat for Kronecker to 0.5 for package docs, (keep existing compat) (#367)

    * CompatHelper: bump compat for Kronecker to 0.5 for package docs, (keep existing compat)

    * ] up

    Co-authored-by: CompatHelper Julia <[email protected]>
    Co-authored-by: st-- <[email protected]>

commit 7204529
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Tue Jan 11 18:37:23 2022 +0200

    CompatHelper: bump compat for Kronecker to 0.5 for package test, (keep existing compat) (#366)

    Co-authored-by: CompatHelper Julia <[email protected]>
    Co-authored-by: st-- <[email protected]>

commit 924925d
Author: st-- <[email protected]>
Date:   Tue Jan 11 16:26:02 2022 +0100

    switch SVM example to half-moon dataset (#421)

commit 992b665
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Fri Dec 24 12:18:56 2021 +0200

    CompatHelper: bump compat for SpecialFunctions to 2 for package test, (keep existing compat) (#412)

    Co-authored-by: CompatHelper Julia <[email protected]>
    Co-authored-by: Théo Galy-Fajou <[email protected]>
    Co-authored-by: st-- <[email protected]>

commit 04fa7f7
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Thu Dec 23 13:33:59 2021 +0200

    CompatHelper: bump compat for SpecialFunctions to 2, (keep existing compat) (#411)

    Co-authored-by: CompatHelper Julia <[email protected]>
    Co-authored-by: Théo Galy-Fajou <[email protected]>
    Co-authored-by: st-- <[email protected]>

commit c0fc3e1
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Thu Dec 23 01:10:40 2021 +0200

    CompatHelper: add new compat entry for Compat at version 3 for package test, (keep existing compat) (#418)

    Co-authored-by: CompatHelper Julia <[email protected]>

commit 05fe340
Author: st-- <[email protected]>
Date:   Tue Dec 21 00:49:37 2021 +0200

    use only() instead of first() (#403)

    * use only() instead of first() for 1-"vectors" that were for the benefit of Flux

    * fix one test that should not have worked as it was

    * add missing scalar Sinus constructor

commit 2d17212
Author: st-- <[email protected]>
Date:   Sat Dec 18 23:43:30 2021 +0200

    Zygote AD failure workarounds & test cleanup (#414)

    Zygote AD failures:

    * revert #409 (test_utils workaround for broken Zygote - now working again)

    * disable broken Zygote AD test for ChainTransform

    Improved tests:

    * finer-grained testsets

    * add missing test cases to test_AD

    * replace test_FiniteDiff with test_AD(..., :FiniteDiff, ...)

    * remove code duplication

commit 3c49949
Author: Théo Galy-Fajou <[email protected]>
Date:   Wed Nov 24 18:32:19 2021 +0100

    Fix typo in valid_inputs error (#408)

    * Fix typo in valid_inputs error

    * Update src/utils.jl

    Co-authored-by: David Widmann <[email protected]>

    Co-authored-by: David Widmann <[email protected]>

commit 9955044
Author: st-- <[email protected]>
Date:   Wed Nov 24 18:55:18 2021 +0200

    Fix for Zygote 0.6.30 breaking our tests (#409)

    * restrict Zygote to <0.6.30

    * revert Zygote test restriction and add finer-grained testset

    * Update test/utils.jl

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    * revert testset

    * mark test_broken

    * Use `@test_throws` instead of `@test_broken`

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
    Co-authored-by: David Widmann <[email protected]>

commit 33d64d1
Author: Théo Galy-Fajou <[email protected]>
Date:   Thu Nov 4 14:23:57 2021 +0100

    Add benchmarking CI (#399)

    * Add benchmark file

    * delete old benchmarks

    * Add github action

    * Add Project

    * Apply suggestions from code review

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

    * Missing end of line

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit 360ce10
Author: David Widmann <[email protected]>
Date:   Tue Nov 2 11:09:58 2021 +0100

    Update docstring of `GibbsKernel` (#395)
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 28, 2022

Codecov Report

Merging #430 (04a5d98) into master (aa9ac9f) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #430   +/-   ##
=======================================
  Coverage   93.13%   93.13%           
=======================================
  Files          52       52           
  Lines        1252     1252           
=======================================
  Hits         1166     1166           
  Misses         86       86           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa9ac9f...04a5d98. Read the comment docs.

@Crown421
Copy link
Member Author

I didn't build the documentation locally, is there a way to preview from here?

@devmotion
Copy link
Member

There's a preview for PRs that are not from forks, the link to the deployed docs will show up among the Github actions (see e.g. #429). However, currently there's no link since the documentation is not built successfully.

@devmotion
Copy link
Member

The problem is that BenchmarkTools is not installed: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/runs/4983109277?check_suite_focus=true#step:5:31 Is it actually necessary to perform benchmarks?

@Crown421
Copy link
Member Author

The problem is that BenchmarkTools is not installed: https://github.com/JuliaGaussianProcesses/KernelFunctions.jl/runs/4983109277?check_suite_focus=true#step:5:31 Is it actually necessary to perform benchmarks?

Ah, that makes alot of sense. Not strictly necessary, but there are interesting things to see.


function loss(θ)
ŷ = f(x_train, x_train, y_train, θ)
return sum(abs2, y_train - ŷ) + exp(θ[4]) * norm(ŷ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:
a) why not

Suggested change
return sum(abs2, y_train - ŷ) + exp(θ[4]) * norm(ŷ)
return norm(y_train - ŷ) + exp(θ[4]) * norm(ŷ)

b) can you explain where the second term comes from? Just thinking about "parameter regularisation" might make one think it ought to be norm(\theta).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is from your original example, I am honestly not sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which in turn was part of a previously existing example 😅 @theogf @willtebbutt how would you address this (comment to add / equation to fix / ...)?


# ### Training
# Setting an initial value and initializing the optimizer:
θ = log.([1.1, 0.1, 0.01, 0.001]) # Initial vector
Copy link
Member

@st-- st-- Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

Suggested change
θ = log.([1.1, 0.1, 0.01, 0.001]) # Initial vector
θ = log.(ones(4)) # Initial vector

as you had before?
(alternatively, use the values from here above as well?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB- what do you think of commenting explicitly on this being a mutable vector that will be changed in-place by the optimiser ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also comes from the original example, can experiment with setting it to ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably a reasonable setting to end up in the right local optimum - maybe simpler to just change the very first ones(4) to also be [1.1, 0.1, 0.01, 0.001] instead?:)


# Computational cost for one step

@benchmark let θt = θ[:], optt = Optimise.ADAGrad(0.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you introduce a new optt? (I'm assuming it's due to the internal optimizer state that you don't want to affect for the actual optimisation loop, but this might not be clear to the casual reader!)

As it's within a let block, you could just call it opt and it wouldn't get outside the scope of the let block.
It might then also be cleaner if you move the opt = ... definition from line 80 to the section below (line 94 following) where it's actually used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few things, but there seemed to be some leaking. This was my way of fixing that leakage.
I can take another look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely need to copy the \theta so you don't change the initial values. But

@benchmark let
    θ = log.(ones(4))
    opt = Optimise.ADAGrad(0.5)
    ...
end

should work

Comment on lines +107 to +110
gif(anim, "train-kernel-param.gif"; show_msg=false, fps=15);
nothing; #hide

# ![](train-kernel-param.gif)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of this vs. having the output from gif() inline ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally it didn't display correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh! alright. Well, as long as it works in the deployed docs it doesn't matter too much one way or another:)


# ## Using ParameterHandling.jl
# Alternatively, we can use the [ParameterHandling.jl](https://github.com/invenia/ParameterHandling.jl) package
# to handle the requirement that all kernel parameters should be positive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to also comment on it allowing arbitrary nested namedtuples for easily accessing parameters without worrying about where in the flat param vector they are, their constraints, etc...

)

flat_θ, unflatten = ParameterHandling.value_flatten(raw_initial_θ)
nothing #hide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need these? I think Literate.jl only splits up code cells at non-code markdown (comments), so all of this would run as a single cell and its return values wouldn't get printed anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right, and this is some artifact from me moving things around.

# ## Flux.destructure
# If don't want to write an explicit function to construct the kernel, we can alternatively use the `Flux.destructure` function.
# Again, we need to ensure that the parameters are positive. Note that the `exp` function is now part of the loss function, instead of part of the kernel construction.
# We could also use ParameterHandling.jl here, similar to the example above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear to me, how would you combine the two? I had thought it was either-or..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression was that one could do a kernelc \circ flatten (or unflatten) construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you think it's something one would actually want to do in practice, it'd be great to expand on it a bit more, but otherwise I would prefer removing the reference so as to not confuse people (like myself reading it now 😅 )


p, kernelc = Flux.destructure(kernel);

# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel.
# This returns the "trainable" parameters of the kernel and a function to reconstruct the kernel.

or did you mean for trainable to be typeset as code ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as typeset in code, for Flux.trainable (which I understand from another PR might be going away, but still).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flux.trainable isn't exported (and barely documented - the only reference I could find was in https://github.com/FluxML/Flux.jl/blob/4a3483efd8e13437b3d86371723c977fb61c2793/docs/src/models/advanced.md, but that doesn't seem relevant here), so as it is it seems to me to be rather confusing. Maybe

Suggested change
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel.
# This returns the trainable `params` of the kernel and a function to reconstruct the kernel.

?
or just

Suggested change
# This returns the `trainable` parameters of the kernel and a function to reconstruct the kernel.
# This returns the trainable parameters of the kernel and a function to reconstruct the kernel from these parameters.


kernel = (θ[1] * SqExponentialKernel() + θ[2] * Matern32Kernel()) ∘ ScaleTransform(θ[3])

p, kernelc = Flux.destructure(kernel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the p? what is it needed for? what does it do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed anymore. Initially I wanted to show that kernelc(p) == kernel, but due to a mutability issue, this does not evaluate as true, despite them being == in almost all respects.

Copy link
Member

@st-- st-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is a useful addition to make clear that you can train kernel parameters (and how to do so); I've left minor comments as github suggestions, there's a few other comments that would be good if you could address before it's ready to merge. Let me know if anything doesn't make sense!

@st--
Copy link
Member

st-- commented Feb 18, 2022

@Crown421 let me know once you've addressed my remaining comments (either through changes or by replying what you doesn't make sense to you / you disagree with:)), and I'll finish off the review!

@Crown421
Copy link
Member Author

@st-- I think I addressed all comments now. Apologies for the delay, I also managed to catch covid since we spoke.

examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
examples/train-kernel-parameters/script.jl Outdated Show resolved Hide resolved
Crown421 and others added 2 commits March 20, 2022 00:25
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Member

@st-- st-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very helpful addition, thanks for contributing!

there's just one remaining comment on the loss comment:) let me know what you want to do with that, then we can merge it

@st-- st-- merged commit 6033b56 into master Mar 21, 2022
@st-- st-- deleted the train-kernel-ex2 branch March 21, 2022 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants