Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust adaption for NUTS #324

Closed
yebai opened this issue Jul 26, 2017 · 16 comments
Closed

Robust adaption for NUTS #324

yebai opened this issue Jul 26, 2017 · 16 comments
Milestone

Comments

@yebai
Copy link
Member

yebai commented Jul 26, 2017

This is a umbrella issue for adaption issues of the NUTS algorithm.

Before these three

@xukai92
Copy link
Member

xukai92 commented Aug 11, 2017

Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb
Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl

@xukai92
Copy link
Member

xukai92 commented Sep 24, 2017

https://github.com/stan-dev/stan/blob/develop/src/stan/mcmc/var_adaptation.hpp shows that the way Stan uses Phase II window is: only set the computing-on-the-fly pre-cond matrix when setting up a new interval.

@yebai
Copy link
Member Author

yebai commented Feb 26, 2018

Notebook for simple illustration of adaptation issue: https://github.com/xukai92/TuringDemo/blob/master/look-into-adapt.ipynb
Helper file for generating LDA data: https://github.com/xukai92/TuringDemo/blob/master/video/lda-gen-data.jl

@xukai92 Let's try to isolate issues of adaption of step size and covariance and solve them one at a time. I suggest we first try to disable adaption of step-size (e.g. use a small enough step-size) and make sure covariance adaption is robust through using the Welford trick (see #289 (comment)); Then, we come back to see why step-size is adaption is fragile.

@xukai92
Copy link
Member

xukai92 commented Feb 26, 2018

I see. that makes sense.

@xukai92
Copy link
Member

xukai92 commented Mar 4, 2018

Note: there is a related issue in DiffEqBayes.jl (SciML/DiffEqBayes.jl#30).

  • use that as a test case when solving stability issue.

@yebai
Copy link
Member Author

yebai commented Apr 2, 2018

@xukai92 Any update on this issue?

yebai added a commit that referenced this issue Apr 3, 2018
yebai pushed a commit that referenced this issue Apr 4, 2018
* Fix dep log in lad

* Dont send opt res

* Fix VarInfo.show bug

* Fix auto tune

* Change * to .* in leapfrog

* temp fix type

* Disable @suppress_err temporarily

* Fix a dep

* Workable ReverseDiff v0.1 done

* Add RevDiff to REQUIRE

* Fix bug in R-AD

* Fix some bugs

* Fix bugs

* Update test

* ReversedDiff.jl mutable bug fixed

* Any to Real

* update benchmark

* Resolve mem alloc for simplex dist

* Fix bug and improve mem alloc

* Improve implementaion of transformations

* Don't include compile time in benchk

* Resolve slowness caused by use of vi.logp

* Update benchmark files

* Add line to load pickle

* Bugfix with reject

* Using ReverseDiff.jl and unsafe model as default

* Fix bug in test file

* Rename vi.rs to vi.rvs

* Add Naive Bayes model in Turing

* Add NB to travis

* DA works

* Tune init

* Better init

* NB MNIST Stan added

* Improve ad assignment

* Improve ad assignment

* Add Stan SV model

* Improve transform typing

* Finish HMM model

* High dim gauss done

* Benchmakr v2.0 done

* Modulize var estimator and fix transform.jl

* Run with ForwardDiff

* Enable Stan for LDA bench

* Fix a bug in adapt

* Improve some code

* Fix bug in NUTS MH step (#324)

* Add interface for optionally enabling adaption.

* Do not adapt step size when numerical error is caught.

* Fix initial epsilon_bar.

* Fix missing t_valid.

* Drop incorrectly adapted step size when necessary  (#324)

* Edit warning message.

* Small tweaks.

* reset_da ==> restart_da

* address suggested naming

* Samler type for WarmUpManager.paras and notation tweaks.

* Bugfix and adapt_step_size == > adapt_step_size!
@xukai92
Copy link
Member

xukai92 commented Jul 3, 2018

#324 (comment) works on my local.

@xukai92
Copy link
Member

xukai92 commented Jul 3, 2018

@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?

@xukai92
Copy link
Member

xukai92 commented Jul 4, 2018

The downstream package DiffEqBayes.jl had a test relying on Turing.jl which suffered from adaptation issues before also passed with the current master (related issue: SciML/DiffEqBayes.jl#30, related PR: SciML/DiffEqBayes.jl#48)

@ChrisRackauckas
Copy link
Collaborator

Could you describe what changed to fix it? We thought the issue was related to parameters going negative when they were supposed to be non-negative, but didn't have a nice way to do domain transforms (are these going to be added in Turing?)

@yebai
Copy link
Member Author

yebai commented Jul 4, 2018

@yebai I tried the notebook with master branch again on my local and a remote linux machine. It seems that the adaptation is working now as long as the initialization is fine, i.e. the sampling only keeps throwing numerical error if the initialization is bad. Do you mind try it again on your local to see if you agree on this?

@xukai92 thanks, I will do another test and come back with my findings.

Ps. can you clarify/give an example what do you mean by bad initializations?

@xukai92
Copy link
Member

xukai92 commented Jul 4, 2018

@yebai What I observed is that: 1) if there are numerical errors during the sampling, our adaptation can fix it (i.e. the numerical error disappears after few iterations, which was not the case before); 2) if there is a numerical error in the beginning, the sampling keeps throwing numerical errors.

My current understanding is: basically theta is initialized in a place where after invlink the model throws numerical error when evaluating log-joint or gradient. If this happens, we cannot "rewind" theta to a place which is still numerically OK because the initial state is numerically flawed. I haven't intensively looked into it, but I guess we might need some mechanism to resample the initial state if this is really what leads to the problem.

@xukai92
Copy link
Member

xukai92 commented Jul 4, 2018

@ChrisRackauckas Turing.jl always has domain transforms. I didn't really change the functionality of Turing.jl in that PR but refactoring the core code in a way to ensure no unexpected side-effects happening, which I now believe was the reason why in-sampling numerical error was not correctly handled (either in rejection or adaptation). As I post in the comment above, there is still an issue on initialization, which is especially critical when the domain is very constrained. I think this is still a problem in the model in SciML/DiffEqBayes.jl#30 as I see there is a latter Travis job fails on DiffEqBayes.

@yebai yebai modified the milestones: Release 0.3.2, 0.5 Sep 13, 2018
@xukai92
Copy link
Member

xukai92 commented Sep 13, 2018

DynamicHMC.jl has a good adaptation design: https://github.com/tpapp/DynamicHMC.jl/blob/master/src/sampler.jl

@yebai
Copy link
Member Author

yebai commented Sep 13, 2018

DymanicHMC is a very well designed and tested NUTS implementation together with adaption for preconditioning matrix. We can try to plug DynamicHMC into Turing and compare its results against our NUTS sampler. We can also try to refactor our NUTS sampler following DynamicHMC's design. Ideally, the sampler code should be testable and benchmarkable without dependency on other parts of Turing.

This also echoes the discussion in #456.

cc @willtebbutt @wesselb @mohamed82008

yebai pushed a commit that referenced this issue Sep 18, 2018
* Fix dep log in lad

* Dont send opt res

* Fix VarInfo.show bug

* Fix auto tune

* Change * to .* in leapfrog

* temp fix type

* Disable @suppress_err temporarily

* Fix a dep

* Workable ReverseDiff v0.1 done

* Add RevDiff to REQUIRE

* Fix bug in R-AD

* Fix some bugs

* Fix bugs

* Update test

* ReversedDiff.jl mutable bug fixed

* Any to Real

* update benchmark

* Resolve mem alloc for simplex dist

* Fix bug and improve mem alloc

* Improve implementaion of transformations

* Don't include compile time in benchk

* Resolve slowness caused by use of vi.logp

* Update benchmark files

* Add line to load pickle

* Bugfix with reject

* Using ReverseDiff.jl and unsafe model as default

* Fix bug in test file

* Rename vi.rs to vi.rvs

* Add Naive Bayes model in Turing

* Add NB to travis

* DA works

* Tune init

* Better init

* NB MNIST Stan added

* Improve ad assignment

* Improve ad assignment

* Add Stan SV model

* Improve transform typing

* Finish HMM model

* High dim gauss done

* Benchmakr v2.0 done

* Modulize var estimator and fix transform.jl

* Run with ForwardDiff

* Enable Stan for LDA bench

* Fix a bug in adapt

* Improve some code

* Fix bug in NUTS MH step (#324)

* Add interface for optionally enabling adaption.

* Do not adapt step size when numerical error is caught.

* Fix initial epsilon_bar.

* Fix missing t_valid.

* Drop incorrectly adapted step size when necessary  (#324)

* Edit warning message.

* Small tweaks.

* reset_da ==> restart_da

* address suggested naming

* Samler type for WarmUpManager.paras and notation tweaks.

* Bugfix and adapt_step_size == > adapt_step_size!
@xukai92
Copy link
Member

xukai92 commented Nov 22, 2018

NUTS bug fixed in d0dafa9

@xukai92 xukai92 closed this as completed Nov 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants