Feature/issue 2799 robust no u turn #2800

betanalpha · 2019-08-13T02:42:42Z

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary

Resolves #2799.

Intended Effect

Adds additional no-u-turn checks across subtrees to avoid missing u-turns for approximately iid normal models.

How to Verify

See https://discourse.mc-stan.org/t/nuts-misses-u-turns-runs-in-circles-until-max-treedepth/9727/36?u=betanalpha.

Side Effects

Decrease in antithetic behavior for component means in correlated models.

Documentation

Inline.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

Michael Betancourt

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

bbbales2

The code looks good. Do we have any hypothesis test-tests in place for this like we do for the RNGs? That's how the last sampler problem got found right?

bbbales2 · 2019-08-17T15:37:04Z

src/test/unit/mcmc/hmc/nuts/base_nuts_test.cpp

-  EXPECT_EQ(4 * init_momentum, sampler.rho_values.at(5));
-  EXPECT_EQ(8 * init_momentum, sampler.rho_values.at(6));
+  EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(5));
+  EXPECT_EQ(4 * init_momentum, sampler.rho_values.at(6));


There should be 14 more of these checks since there are now 3x the number of rho_values

There's only one aggregate rho exposed in the build_tree function. The others needed for the new tests are constructed from this aggregate rho on the fly internally and hence not accessible for unit testing. That internal construction could be repeated externally using the boundary momenta but then there's nothing to compare it to.

I don't understand this. These values are being returned so why shouldn't they be tested?

This seems to work:

EXPECT_EQ(7 * 3, sampler.rho_values.size()); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(0)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(1)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(2)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(3)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(4)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(5)); EXPECT_EQ(4 * init_momentum, sampler.rho_values.at(6)); EXPECT_EQ(3 * init_momentum, sampler.rho_values.at(7)); EXPECT_EQ(3 * init_momentum, sampler.rho_values.at(8)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(9)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(10)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(11)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(12)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(13)); EXPECT_EQ(2 * init_momentum, sampler.rho_values.at(14)); EXPECT_EQ(4 * init_momentum, sampler.rho_values.at(15)); EXPECT_EQ(3 * init_momentum, sampler.rho_values.at(16)); EXPECT_EQ(3 * init_momentum, sampler.rho_values.at(17)); EXPECT_EQ(8 * init_momentum, sampler.rho_values.at(18)); EXPECT_EQ(5 * init_momentum, sampler.rho_values.at(19)); EXPECT_EQ(5 * init_momentum, sampler.rho_values.at(20));

If it makes sense, just add em' in.

Ah, sorry, I finally get what you were saying. I was thinking that there aren't any new rho values to check because the tree depth is the same but forgot how this code was instrumented to add a new rho values for every check, not every tree depth. I added the new checks split up into the values corresponding to the main check and the extra checks. Waiting to push in case any changes need to be made to the algorithm itself.

betanalpha · 2019-08-18T21:21:55Z

There's currently no validation of the samplers as explicit integration tests because there's not enough functionality to run everything from pure C++ (the main obstruction is tearing out the var_context code and replacing it with a clear data access layer and then writing C++ data access layer callbacks so that everything can be run in memory; I have designs that were long ago discussed and agreed upon but no one has had the time to implement it). In any case I did check this code against some IID models of varying dimension and a correlated Gaussian in which the last bug manifested (using 100000 iterations to be sufficient sensitive to small effects) and everything looked good.

bob-carpenter · 2019-08-18T21:40:47Z

Why not use R dump or JSON format for var contexts? They're not in memory, but I/O performance shouldn't be an issue for testing.

…

On Aug 18, 2019, at 5:21 PM, Michael Betancourt ***@***.***> wrote: There's currently no validation of the samplers as explicit integration tests because there's not enough functionality to run everything from pure C++ (the main obstruction is tearing out the var_context code and replacing it with a clear data access layer and then writing C++ data access layer callbacks so that everything can be run in memory; I have designs that were long ago discussed and agreed upon but no one has had the time to implement it). In any case I did check this code against some IID models of varying dimension and a correlated Gaussian in which the last bug manifested (using 100000 iterations to be sufficient sensitive to small effects) and everything looked good. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

bbbales2 · 2019-08-24T07:17:17Z

@betanalpha oh okay, got it. Can you e-mail me the models for the test? I guess I should just manually double check that it's all working okay since it's not automated.

@seantalts yo just at-ing you so you're aware of the state of tests. We should probably do something about this eventually.

betanalpha · 2019-08-24T16:17:28Z

They’re all in benchmark repo, https://github.com/stan-dev/stat_comp_benchmarks/tree/master/benchmarks. The baselines for each expectation value are in https://github.com/stan-dev/stat_comp_benchmarks/tree/master/empirical_results/nuts. You can run them all or just check the basics like low_dim_corr_gauss and maybe a 50 or 100 dimensional IID gaussian, parameters { real x[100]; } model { x ~ normal(0, 1); }

…

On Aug 24, 2019, at 3:17 AM, Ben Bales ***@***.***> wrote: @betanalpha oh okay, got it. Can you e-mail me the models for the test? I guess I should just manually double check that it's all working okay since it's not automated. @seantalts yo just at-ing you so you're aware of the state of tests. We should probably do something about this eventually. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bbbales2 · 2019-08-25T08:06:35Z

Got it thanks. I'll run these checks Tuesday or Wednesday (still travelin'). And the extra return values from build_tree?

junpenglao · 2019-08-25T08:41:05Z

Thanks a lot for the discussion and the validation!!
Just a quick drive by comment: you could skip doing the additional U turn check at depth == 1 within each tree (as it is already been checked), and skip the U turn check altogether at depth == 0 at the tree doubling level (i.e., the outer loop, as it is already been checked during the tree building).

betanalpha · 2019-08-25T15:15:13Z

The test you were looking at checks just the rho aggregation. The edge momenta and sharp momenta are checked in other tests, including the diag_e and softabs tests.

bbbales2

I tried out adding tests for the rho_values. See if you think they make sense.

bbbales2 · 2019-08-28T19:32:06Z

I'm having trouble with this. I ran tests with the model here: https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan

I ran a bunch of these models with different numbers of post warmup samples with the new and old code. I made plots for each parameter (and generated quantity) of mean +- 2 * the MCSE.

The left column are the results with the new code and the right is with the old code. Different rows are different parameters. I plotted the true parameter value in red. The x axis is the number of post warmup draws. The things are all jittered -- this plot shows results run for 64000, 256000, 1024000, 4096000, and 16384000 iters. For large numbers of samples, the delta_var1 and delta_var2 are not zero.

I don't know if I'm testing this correctly or if I'm doing the check wrong, but can you try to reproduce this?

For reproducing, my test code looked something like this (I used the dense adaptation for the new and old codes):

for i in 1000 4000 8000 16000 64000
do
    for j in 8 9 10 11 12 13 14 15
    do
        ./corr sample num_samples=$i algorithm=hmc metric=dense_e >& /dev/null
        bin/stansummary --sig_figs=5 output.csv | grep "^z\|^delta_var" | tr -s " " "," | sed -e "s/^/$i,$j,/"
    done
done

Run this script for each code and dump the output in analysis.csv. Make the plots with something like:

library(tidyverse)
library(ggplot2)

results_new = read_csv("/home/bbales2/cmdstan-uturn/analysis.csv", col_names = FALSE) %>%
  mutate(which = "new")
results_orig = read_csv("/home/bbales2/cmdstan-current/analysis.csv", col_names = FALSE) %>%
  mutate(which = "original")

bind_rows(results_new,
          results_orig) %>%
  mutate(iters = X1,
         chain = X2,
         name = X3,
         mean = X4,
         ub = mean + 2 * X5,
         lb = mean - 2 * X5) %>%
  filter(iters > 50000) %>%
  ggplot(aes(iters, mean)) +
  geom_hline(data = tibble(name = c("delta_var1", "delta_var2", "z[1]", "z[2]"),
                           y = c(0.0, 0.0, 0.0, 3.0)), aes(yintercept = y), color = "red") +
  geom_point(aes(group = chain), position = position_dodge(width=0.5)) +
  geom_errorbar(aes(group = chain, ymin = lb, ymax = ub), position=position_dodge(width=0.5)) +
  scale_x_log10() +
  facet_grid(name ~ which, scales = "free_y") +
  theme(text = element_text(size=20))

betanalpha · 2019-08-28T20:12:39Z

That’s definitely not good. Do you get the same result when you run with the diagonal metric? Let me see if I can reproduce locally.

…

On Aug 28, 2019, at 3:32 PM, Ben Bales ***@***.***> wrote: I'm having trouble with this. I ran tests with the model here: https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan <https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/low_dim_corr_gauss/low_dim_corr_gauss.stan> I ran a bunch of these models with different numbers of post warmup samples with the new and old code. I made plots for each parameter (and generated quantity) of mean +- 2 * the MCSE. The left column are the results with the new code and the right is with the old code. Different rows are different parameters. I plotted the true parameter value in red. The x axis is the number of post warmup draws. The things are all jittered -- this plot shows results run for 64000, 256000, 1024000, 4096000, and 16384000 iters. For large numbers of samples, the delta_var1 and delta_var2 are not zero. I don't know if I'm testing this correctly or if I'm doing the check wrong, but can you try to reproduce this? <https://user-images.githubusercontent.com/4742424/63885819-29d42080-c9a7-11e9-8d6a-0450a12aa9a4.png> For reproducing, my test code looked something like this (I used the dense adaptation for the new and old codes): for i in 1000 4000 8000 16000 64000 do for j in 8 9 10 11 12 13 14 15 do ./corr sample num_samples=$i algorithm=hmc metric=dense_e >& /dev/null bin/stansummary --sig_figs=5 output.csv | grep "^z\|^delta_var" | tr -s " " "," | sed -e "s/^/$i,$j,/" done done Run this script for each code and dump the output in analysis.csv. Make the plots with something like: library(tidyverse) library(ggplot2) results_new = read_csv("/home/bbales2/cmdstan-uturn/analysis.csv", col_names = FALSE) %>% mutate(which = "new") results_orig = read_csv("/home/bbales2/cmdstan-current/analysis.csv", col_names = FALSE) %>% mutate(which = "original") bind_rows(results_new, results_orig) %>% mutate(iters = X1, chain = X2, name = X3, mean = X4, ub = mean + 2 * X5, lb = mean - 2 * X5) %>% filter(iters > 50000) %>% ggplot(aes(iters, mean)) + geom_hline(data = tibble(name = c("delta_var1", "delta_var2", "z[1]", "z[2]"), y = c(0.0, 0.0, 0.0, 3.0)), aes(yintercept = y), color = "red") + geom_point(aes(group = chain), position = position_dodge(width=0.5)) + geom_errorbar(aes(group = chain, ymin = lb, ymax = ub), position=position_dodge(width=0.5)) + scale_x_log10() + facet_grid(name ~ which, scales = "free_y") + theme(text = element_text(size=20)) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2800?email_source=notifications&email_token=AALU3FWXBAMTVOJM4A5PMQTQG3HDRA5CNFSM4ILGSCZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MHDAI#issuecomment-525889921>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FUOPIUEM3GF2XS54KDQG3HDRANCNFSM4ILGSCZQ>.

betanalpha · 2019-08-28T21:08:59Z

What is the j loop for in the code? Just for repeated runs?

Just to confirm -- the iteration lengths shown in your test code are different from those used to make the figure, right?

I'm still not seeing any bias for higher numbers of iterations using the default settings (sample num_samples=4096000 random seed=838389).

1 chains: each with iter=(4096000); warmup=(0); thin=(1); 4096000 iterations saved.

                    Mean     MCSE   StdDev     5%       50%       95%    N_Eff  N_Eff/s    R_hat
z[1]            -9.2e-05  7.4e-04  1.0e+00   -1.6   1.5e-04   1.6e+00  1.8e+06  1.1e+04  1.0e+00
z[2]             3.0e+00  1.5e-03  2.0e+00  -0.29   3.0e+00   6.3e+00  1.8e+06  1.1e+04  1.0e+00
delta_var1      -4.2e-04  9.7e-04  1.4e+00  -1.00  -5.4e-01   2.8e+00  2.1e+06  1.3e+04  1.0e+00
delta_var2       2.5e-03  4.0e-03  5.7e+00   -4.0  -2.2e+00   1.1e+01  2.0e+06  1.2e+04  1.0e+00
delta_corr       2.4e-03  8.2e-04  1.1e+00   -1.1  -3.3e-01   2.2e+00  1.9e+06  1.1e+04  1.0e+00

betanalpha · 2019-08-28T21:20:39Z

Same for a dense metric,

1 chains: each with iter=(4096000); warmup=(0); thin=(1); 4096000 iterations saved.

                    Mean     MCSE   StdDev     5%       50%       95%    N_Eff  N_Eff/s    R_hat
z[1]            -3.9e-04  4.8e-04  1.0e+00   -1.6   4.5e-04   1.6e+00  4.3e+06  2.9e+04  1.0e+00
z[2]             3.0e+00  1.0e-03  2.0e+00  -0.29   3.0e+00   6.3e+00  3.8e+06  2.6e+04  1.0e+00
delta_var1      -3.7e-04  9.9e-04  1.4e+00  -1.00  -5.5e-01   2.8e+00  2.1e+06  1.4e+04  1.0e+00
delta_var2      -5.4e-03  4.0e-03  5.6e+00   -4.0  -2.2e+00   1.1e+01  2.0e+06  1.4e+04  1.0e+00
delta_corr       2.8e-05  7.6e-04  1.1e+00   -1.1  -3.4e-01   2.2e+00  2.2e+06  1.5e+04  1.0e+00

bbbales2 · 2019-08-28T22:15:07Z

Yeah j is just repeated runs.

And yeah I go up to 16384000 in the figures. Try some repeated runs if you don't mind. It looks like it's messing up about half the time at 4096000 iters. I was a little concerned with the dense as well, but the plain Stan works with the dense metric.

I've checked git diff and I don't have extra changes. I will repeat this calculation overnight, but I just pulled down a clean repo and the md5sum of the model built with this clean repo and the model I used for these calculations in the figure is the same.

bbbales2 · 2019-08-28T22:16:58Z

This is the script I'll run:

for i in 1000 4000 16000 64000 256000 1024000 4096000 16384000
do
    for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
    do
	for metric in diag_e dense_e
	do
	    ./corr sample num_samples=$i algorithm=hmc metric=$metric random seed=$j >& /dev/null
	    bin/stansummary --sig_figs=5 output.csv | grep "^z\|^delta" | tr -s " " "," | sed -e "s/^/$i,$j,$metric,/"
	done
    done
done

(edited to use random seed)

betanalpha · 2019-08-28T23:14:14Z

I recommend fixing a seed and then using id to vary the seed for each run.

betanalpha · 2019-08-29T01:12:36Z

This is weeeeeeeeeird. For some seeds I do see the effect that you're seeing.

1 chains: each with iter=(16384000); warmup=(0); thin=(1); 16384000 iterations saved.

                    Mean     MCSE   StdDev     5%       50%       95%    N_Eff  N_Eff/s    R_hat
z[1]             4.9e-04  3.4e-04  1.0e+00   -1.6   4.4e-04   1.6e+00  8.4e+06  1.2e+04  1.0e+00
z[2]             3.0e+00  7.3e-04  2.0e+00  -0.28   3.0e+00   6.3e+00  7.4e+06  1.0e+04  1.0e+00
delta_var1      -6.2e-03  4.7e-04  1.4e+00  -1.00  -5.5e-01   2.8e+00  8.9e+06  1.3e+04  1.0e+00
delta_var2      -2.4e-02  2.0e-03  5.6e+00   -4.0  -2.2e+00   1.1e+01  8.2e+06  1.2e+04  1.0e+00
delta_corr      -2.3e-03  4.0e-04  1.1e+00   -1.1  -3.4e-01   2.2e+00  7.8e+06  1.1e+04  1.0e+00

What's odd, however, it doesn't look there's some bias that manifests only once the standard error shrinks enough. Rather it looks like the standard error shrinks and then at some point a bias larger than the standard error suddenly appears. Then there's the fact that it's an inconsistent bias, and maybe correlated with slightly higher step sizes?

I'm going to go through the code carefully to see if I can see anything. On the other hand I'm not sure any algorithm has been tested to one million effective samples so if the current version didn't behave as expected then I'd suspect a numerical issue with the estimators themselves.

bob-carpenter · 2019-08-29T01:17:27Z

You mean things like the mean and quantile estimators? Adding a million numbers will lose a lot of precision if they're all the same sign. For example, imagine 1e6 numbers between 0 and 1. You wind up adding numbers on the scale of 5e5 (the expectation) and 1 (the individual numbers being added), which loses more than 5 decimal places of precision (out of a total of 16 or so in double-precision floating point).

…

On Aug 28, 2019, at 9:12 PM, Michael Betancourt ***@***.***> wrote: This is weeeeeeeeeird. For some seeds I do see the effect that you're seeing. 1 chains: each with iter=(16384000); warmup=(0); thin=(1); 16384000 iterations saved. Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat z[1] 4.9e-04 3.4e-04 1.0e+00 -1.6 4.4e-04 1.6e+00 8.4e+06 1.2e+04 1.0e+00 z[2] 3.0e+00 7.3e-04 2.0e+00 -0.28 3.0e+00 6.3e+00 7.4e+06 1.0e+04 1.0e+00 delta_var1 -6.2e-03 4.7e-04 1.4e+00 -1.00 -5.5e-01 2.8e+00 8.9e+06 1.3e+04 1.0e+00 delta_var2 -2.4e-02 2.0e-03 5.6e+00 -4.0 -2.2e+00 1.1e+01 8.2e+06 1.2e+04 1.0e+00 delta_corr -2.3e-03 4.0e-04 1.1e+00 -1.1 -3.4e-01 2.2e+00 7.8e+06 1.1e+04 1.0e+00 What's odd, however, it doesn't look there's some bias that manifests only once the standard error shrinks enough. Rather it looks like the standard error shrinks and then at some point a bias larger than the standard error suddenly appears. Then there's the fact that it's an inconsistent bias, and maybe correlated with slightly higher step sizes? I'm going to go through the code carefully to see if I can see anything. On the other hand I'm not sure any algorithm has been tested to one million effective samples so if the current version didn't behave as expected then I'd suspect a numerical issue with the estimators themselves. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

betanalpha · 2019-08-29T01:37:42Z

Of course, but the issue is that there is no corresponding bias in the baseline version of the sampler so it can’t be something shared between the two versions, such as the estimator code.

…

On Aug 28, 2019, at 9:17 PM, Bob Carpenter ***@***.***> wrote: You mean things like the mean and quantile estimators? Adding a million numbers will lose a lot of precision if they're all the same sign. For example, imagine 1e6 numbers between 0 and 1. You wind up adding numbers on the scale of 5e5 (the expectation) and 1 (the individual numbers being added), which loses more than 5 decimal places of precision (out of a total of 16 or so in double-precision floating point). > On Aug 28, 2019, at 9:12 PM, Michael Betancourt ***@***.***> wrote: > > This is weeeeeeeeeird. For some seeds I do see the effect that you're seeing. > > 1 chains: each with iter=(16384000); warmup=(0); thin=(1); 16384000 iterations saved. > > Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat > z[1] 4.9e-04 3.4e-04 1.0e+00 -1.6 4.4e-04 1.6e+00 8.4e+06 1.2e+04 1.0e+00 > z[2] 3.0e+00 7.3e-04 2.0e+00 -0.28 3.0e+00 6.3e+00 7.4e+06 1.0e+04 1.0e+00 > delta_var1 -6.2e-03 4.7e-04 1.4e+00 -1.00 -5.5e-01 2.8e+00 8.9e+06 1.3e+04 1.0e+00 > delta_var2 -2.4e-02 2.0e-03 5.6e+00 -4.0 -2.2e+00 1.1e+01 8.2e+06 1.2e+04 1.0e+00 > delta_corr -2.3e-03 4.0e-04 1.1e+00 -1.1 -3.4e-01 2.2e+00 7.8e+06 1.1e+04 1.0e+00 > > What's odd, however, it doesn't look there's some bias that manifests only once the standard error shrinks enough. Rather it looks like the standard error shrinks and then at some point a bias larger than the standard error suddenly appears. Then there's the fact that it's an inconsistent bias, and maybe correlated with slightly higher step sizes? > > I'm going to go through the code carefully to see if I can see anything. On the other hand I'm not sure any algorithm has been tested to one million effective samples so if the current version didn't behave as expected then I'd suspect a numerical issue with the estimators themselves. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub, or mute the thread. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2800?email_source=notifications&email_token=AALU3FVEVHTTX3CHJ2SJICLQG4PSXA5CNFSM4ILGSCZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5M5OAY#issuecomment-525981443>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FW4QKPRK3NBJYZ76EDQG4PSXANCNFSM4ILGSCZQ>.

betanalpha · 2019-08-29T02:48:48Z

I can't find anything that looks suspect in the code -- anyone want to take a second look? I've been poking around at edge cases and places where redundant calculations in the added checks might be vulnerable to insufficient precision introducing errors but I can't find anything inconsistent.

bbbales2 · 2019-08-29T12:38:00Z

Here's the results of running overnight.

anyone want to take a second look?

I'll make a post in the discourse thread. I will eventually, but this looks like it could be a difficult problem to figure out.

betanalpha · 2019-08-29T17:50:37Z

I think I found it — there’s a bug in how the edge momenta and sharp momenta are updated before a new expansion/proposal. As in some of them aren’t updated as they should be, causing the new checks to be slightly off (and getting increasingly off as the tree depth increases). I’m testing the seeds that were previously manifesting the bias now.

…

On Aug 29, 2019, at 8:38 AM, Ben Bales ***@***.***> wrote: Here's the results of running overnight. <https://user-images.githubusercontent.com/4742424/63940195-01940280-ca37-11e9-9792-26b2b621aa99.png> anyone want to take a second look? I'll make a post in the discourse thread. I will eventually, but this looks like it could be a difficult problem to figure out. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2800?email_source=notifications&email_token=AALU3FWP53OULXW7W6EP63LQG67KXA5CNFSM4ILGSCZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5OKNOQ#issuecomment-526165690>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AALU3FVN5USQE4ZQVT6NP5TQG67KXANCNFSM4ILGSCZQ>.

betanalpha · 2019-08-29T18:11:57Z

Yup, that did the trick.

1 chains: each with iter=(16384000); warmup=(0); thin=(1); 16384000 iterations saved.

                    Mean     MCSE   StdDev     5%       50%       95%    N_Eff  N_Eff/s    R_hat
z[1]             4.9e-04  3.4e-04  1.0e+00   -1.6   4.4e-04   1.6e+00  8.4e+06  1.2e+04  1.0e+00
z[2]             3.0e+00  7.3e-04  2.0e+00  -0.28   3.0e+00   6.3e+00  7.4e+06  1.0e+04  1.0e+00
delta_var1      -6.2e-03  4.7e-04  1.4e+00  -1.00  -5.5e-01   2.8e+00  8.9e+06  1.3e+04  1.0e+00
delta_var2      -2.4e-02  2.0e-03  5.6e+00   -4.0  -2.2e+00   1.1e+01  8.2e+06  1.2e+04  1.0e+00
delta_corr      -2.3e-03  4.0e-04  1.1e+00   -1.1  -3.4e-01   2.2e+00  7.8e+06  1.1e+04  1.0e+00

goes to

1 chains: each with iter=(16384000); warmup=(0); thin=(1); 16384000 iterations saved.

                    Mean     MCSE   StdDev     5%       50%       95%    N_Eff  N_Eff/s    R_hat
z[1]             9.0e-05  3.5e-04  1.0e+00   -1.6   1.8e-04   1.6e+00  8.0e+06  1.2e+04  1.0e+00
z[2]             3.0e+00  7.3e-04  2.0e+00  -0.29   3.0e+00   6.3e+00  7.5e+06  1.2e+04  1.0e+00
delta_var1       2.6e-04  4.8e-04  1.4e+00  -1.00  -5.5e-01   2.8e+00  8.6e+06  1.3e+04  1.0e+00
delta_var2       2.4e-03  2.0e-03  5.7e+00   -4.0  -2.2e+00   1.1e+01  8.2e+06  1.3e+04  1.0e+00
delta_corr       2.3e-04  4.0e-04  1.1e+00   -1.1  -3.4e-01   2.2e+00  7.6e+06  1.2e+04  1.0e+00

I'll push once the tests are updated to the new behavior.

bbbales2 · 2019-08-29T18:41:50Z

Great thanks!

betanalpha · 2019-08-29T19:11:49Z

Changes pushed -- I updated base_nuts_test with a new instrumented mock sampler that tests the edge momenta that weren't being updated correctly before. I'm going to rerun the performance tests to verify that hasn't been affected, but given that this only shows up at <Dr. Evil voice>one million iterations</Dr. Evil voice> I don't think there will be much of an affect.

I don't think I've ever seen a sampler tested to this level of precision! Pretty awesome.

betanalpha · 2019-08-29T20:41:01Z

Performance tests look good -- small differences but the general story is pretty much the same. I'll post to Discourse.

bob-carpenter · 2019-08-29T23:25:23Z

On Aug 29, 2019, at 3:11 PM, Michael Betancourt ***@***.***> wrote: Changes pushed -- I updated base_nuts_test with a new instrumented mock sampler that tests the edge momenta that weren't being updated correctly before. I'm going to rerun the performance tests to verify that hasn't been affected, but given that this only shows up at <Dr. Evil voice>one million iterations</Dr. Evil voice> I don't think there will be much of an affect. I don't think I've ever seen a sampler tested to this level of precision! Pretty awesome.

Indeed. Ben's going to get a reputation for seriously heroic testing (after this and my autodiff test PR).

bbbales2 · 2019-08-30T18:37:08Z

Don't jinx us. This looks good. @bob-carpenter can you merge? I'm not authorized.

* [WIP] Robust U-turn check Following the recent discussion on the Stan side: stan-dev/stan#2800 For experiment, do not merge. * typo fix * bug fix * Additional U turn check only when depth > 1 (to avoid redundant work). * further logic to reduce redundant U Turn check. * bug fix fix error in recording the end point of the reversed subtree * [WIP] Robust U-turn check Following the recent discussion on the Stan side: stan-dev/stan#2800 For experiment, do not merge. * typo fix * bug fix * Additional U turn check only when depth > 1 (to avoid redundant work). * further logic to reduce redundant U Turn check. * bug fix fix error in recording the end point of the reversed subtree * Add release note.

Adds keyword argument to dynamic integration transitions and samplers to enable extra subtree termination criterion checks as described in stan-dev/stan#2800. Extra subtree checks are set to be enabled by default for the `DynamicMultinomialHMC` sampler.

Summary: Pull Request resolved: #864 An issue with the U-turn condition was discovered and discussed in [this post in Stan forum](https://discourse.mc-stan.org/t/nuts-misses-u-turns-runs-in-circles-until-max-treedepth/9727) TL;DR: we can make the U-turn condition more robust by introducing two additional checks across subtrees. This can help us avoid missing U-turns for approximately iid normal models. {F619223264} Since the tree combining code are almost identical in `_build_tree` and `propose`, I also take the chance to refactor them into a common function called `_combine_tree`. If you look closely you will notice that most part of `_combine_tree` are moved from existing code as-is. The only addition is the two additional call to `_is_u_turning` Related PR that implements this change: - Stan: stan-dev/stan#2800 - PyMC3: pymc-devs/pymc#3605 - Turing.jl: TuringLang/AdvancedHMC.jl#207 - DynamicHMC.jl: tpapp/DynamicHMC.jl#145 Reviewed By: neerajprad Differential Revision: D28735950 fbshipit-source-id: ada4ebcad26a87ef5e697f422b5c5b17007afe42

betanalpha added 4 commits August 11, 2019 21:45

Expanded termination criterion

aa0c811

Add additional no-u-turn checks, clearer naming conventions

f2e6464

Fix typos

17ad3e2

Update tests

555efc0

betanalpha requested a review from bbbales2 August 13, 2019 02:42

Update performance test

cdf6541

bbbales2 requested changes Aug 17, 2019

View reviewed changes

junpenglao mentioned this pull request Aug 24, 2019

Implement robust U-turn check pymc-devs/pymc#3605

Merged

bbbales2 reviewed Aug 28, 2019

View reviewed changes

Update rho aggregation test

104973d

yebai mentioned this pull request Aug 28, 2019

Potential improvement for U-Turn detection TuringLang/AdvancedHMC.jl#94

Closed

betanalpha added 2 commits August 29, 2019 14:14

Update base nuts test

6041a87

Fix bug in updating edge momenta before each expansion

8a65f5b

betanalpha added 3 commits August 29, 2019 14:59

Lint

841c6a1

Add test sensitive to edge momenta in transition

fc7df12

Update performance test

b2deb9e

Hand tuning performance test to pass on Jenkins machine

7bab596

bbbales2 approved these changes Aug 30, 2019

View reviewed changes

bob-carpenter merged commit 6a98b12 into develop Aug 30, 2019

serban-nicusor-toptal added this to the 2.20.0++ milestone Oct 18, 2019

mcol deleted the feature/issue-2799-robust_no_u_turn branch February 22, 2020 12:27

horizon-blue mentioned this pull request Jun 4, 2021

Robust U-turn condition facebookresearch/beanmachine#864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/issue 2799 robust no u turn #2800

Feature/issue 2799 robust no u turn #2800

betanalpha commented Aug 13, 2019

bbbales2 left a comment

bbbales2 Aug 17, 2019

betanalpha Aug 18, 2019

bbbales2 Aug 24, 2019

bbbales2 Aug 28, 2019 •

edited

Loading

betanalpha Aug 28, 2019

betanalpha commented Aug 18, 2019

bob-carpenter commented Aug 18, 2019 via email

bbbales2 commented Aug 24, 2019

betanalpha commented Aug 24, 2019 via email

bbbales2 commented Aug 25, 2019

junpenglao commented Aug 25, 2019

betanalpha commented Aug 25, 2019

bbbales2 left a comment

bbbales2 commented Aug 28, 2019

betanalpha commented Aug 28, 2019 via email

betanalpha commented Aug 28, 2019

betanalpha commented Aug 28, 2019

bbbales2 commented Aug 28, 2019

bbbales2 commented Aug 28, 2019 •

edited

Loading

betanalpha commented Aug 28, 2019

betanalpha commented Aug 29, 2019

bob-carpenter commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019

bbbales2 commented Aug 29, 2019

betanalpha commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019

bbbales2 commented Aug 29, 2019

betanalpha commented Aug 29, 2019

betanalpha commented Aug 29, 2019

bob-carpenter commented Aug 29, 2019 via email

bbbales2 commented Aug 30, 2019

Feature/issue 2799 robust no u turn #2800

Feature/issue 2799 robust no u turn #2800

Conversation

betanalpha commented Aug 13, 2019

Submission Checklist

Summary

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

bbbales2 left a comment

Choose a reason for hiding this comment

bbbales2 Aug 17, 2019

Choose a reason for hiding this comment

betanalpha Aug 18, 2019

Choose a reason for hiding this comment

bbbales2 Aug 24, 2019

Choose a reason for hiding this comment

bbbales2 Aug 28, 2019 • edited Loading

Choose a reason for hiding this comment

betanalpha Aug 28, 2019

Choose a reason for hiding this comment

betanalpha commented Aug 18, 2019

bob-carpenter commented Aug 18, 2019 via email

bbbales2 commented Aug 24, 2019

betanalpha commented Aug 24, 2019 via email

bbbales2 commented Aug 25, 2019

junpenglao commented Aug 25, 2019

betanalpha commented Aug 25, 2019

bbbales2 left a comment

Choose a reason for hiding this comment

bbbales2 commented Aug 28, 2019

betanalpha commented Aug 28, 2019 via email

betanalpha commented Aug 28, 2019

betanalpha commented Aug 28, 2019

bbbales2 commented Aug 28, 2019

bbbales2 commented Aug 28, 2019 • edited Loading

betanalpha commented Aug 28, 2019

betanalpha commented Aug 29, 2019

bob-carpenter commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019

bbbales2 commented Aug 29, 2019

betanalpha commented Aug 29, 2019 via email

betanalpha commented Aug 29, 2019

bbbales2 commented Aug 29, 2019

betanalpha commented Aug 29, 2019

betanalpha commented Aug 29, 2019

bob-carpenter commented Aug 29, 2019 via email

bbbales2 commented Aug 30, 2019

bbbales2 Aug 28, 2019 •

edited

Loading

bbbales2 commented Aug 28, 2019 •

edited

Loading