Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stan_sample generates R files in the tmpdir? #113

Closed
quantitative-ecologist opened this issue Feb 21, 2022 · 6 comments
Closed

stan_sample generates R files in the tmpdir? #113

quantitative-ecologist opened this issue Feb 21, 2022 · 6 comments

Comments

@quantitative-ecologist
Copy link

Good day,

First, I would like to thank you very much for the development of Stan for Julia. I am trying to understand the workflow of Stan.jl but something bugs me.

I am running this model from StatisticalRethinking.jl :

using Pkg, StatisticalRethinking, StanSample

testdat = Dict("N" => 100, "y" => rand(Normal(0, 1), 100))

stan8_4s = "
data{
  int N;
  vector[N] y;
}
parameters{
  real sigma;
  real alpha;
  real a1;
  real a2;
}
model{
  real mu;
  mu = a1 + a2;
  y ~ normal( mu , sigma );
}
"

path1 = joinpath(pwd(), "test")

m8_4s = SampleModel("m8.4s",
                    stan8_4s,
                    tmpdir = path1)

rc8_4s = stan_sample(m8_4s,
                     data = testdat,
                     seed = 123)

The models runs fine. However, what is wierd is that stan_sample generates four .R files as you can see below:
image

Why would this happen? I see that the four .R files re-generate the data that I already specified in my .jl script. Is this intended behavior from the package? I couldn't find any documentation pointing to this.

I am running the script on Julia REPL in VS Code under a Windows 10 computer OS with Julia version 1.6.1. The versions for StatisticalRethinking are 3.4.3 and 3.0.11 for StanSample.

Thank you very much for your help and I hope this is the proper way to report any issues/questions with the package.

Maxime

@goedman
Copy link
Collaborator

goedman commented Feb 21, 2022

Hi Maxime ( @maximerischard ),

That is a pretty old version of StanSample.jl. Are you sure this is StanSample v3.0.11? This looks more like v5 or v6?

By default the number of chains is indeed 4. Dependent on which version of StanSample you are running I can show you how to change that.

Which version of cmdstan are you using? You can see that in the first few lines of any of the .csv files in your path1.

Stan's executable (cmdstan) expects input data in the RDump format (or JSON format in StanSample.jl v6.3.0) hence I need to rework the Dict() you use for input data into the RDump format.

Let me know if these instructions help.

Best, Rob

@quantitative-ecologist
Copy link
Author

Hi Rob,

Thank you for your quick answer. So I was indeed using an older version of StanSample so I updated all my Julia packages and am now using StanSample v6.3.0. I switched from the example above to the one on the Stan.jl page : https://stanjulia.github.io/Stan.jl/latest/WALKTHROUGH2/

The files created by stan_sample() are now JSON files. However, the model fails to run and the .csv files are not created. It gives me this error message :

# Run the model
rc6_1s = stan_sample(m6_1s; data, seed=-1, delta=0.85);

num_chains=1 is either mistyped or misplaced.
Failed to parse command arguments, cannot run model.
num_chains=1 is either mistyped or misplaced.
Failed to parse command arguments, cannot run model.num_chains=1 is either mistyped or misplaced.num_chains=1 is either mistyped or 
misplaced.

F
ailed to parse command arguments, cannot run model.F
ailed to parse command arguments, cannot run model.
ERROR: LoadError: failed processes:
  Process(`'C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s.exe' sample num_chains=1 num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 adapt engaged=1 gamma=0.05 delta=0.85 kappa=0.75 t0=10 init_buffer=75 term_buffer=75 window=25 algorithm=hmc engine=nuts max_depth=10 metric=diag_e stepsize=1.0 stepsize_jitter=0.0 random seed=-1 init=2 id=1 data 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_data_1.json' output 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_chain_1.csv' refresh=100`, ProcessExited(1)) [1]
  Process(`'C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s.exe' sample num_chains=1 num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 adapt engaged=1 gamma=0.05 delta=0.85 kappa=0.75 t0=10 init_buffer=75 term_buffer=75 window=25 algorithm=hmc engine=nuts max_depth=10 metric=diag_e stepsize=1.0 stepsize_jitter=0.0 random seed=-1 init=2 id=2 data 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_data_2.json' output 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_chain_2.csv' refresh=100`, ProcessExited(1)) [1]
  Process(`'C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s.exe' sample num_chains=1 num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 adapt engaged=1 gamma=0.05 delta=0.85 kappa=0.75 t0=10 init_buffer=75 term_buffer=75 window=25 algorithm=hmc engine=nuts max_depth=10 metric=diag_e stepsize=1.0 stepsize_jitter=0.0 random seed=-1 init=2 id=3 data 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_data_3.json' output 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_chain_3.csv' refresh=100`, ProcessExited(1)) [1]
  Process(`'C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s.exe' sample num_chains=1 num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 adapt engaged=1 gamma=0.05 delta=0.85 kappa=0.75 t0=10 init_buffer=75 term_buffer=75 window=25 algorithm=hmc engine=nuts max_depth=10 metric=diag_e stepsize=1.0 stepsize_jitter=0.0 random seed=-1 init=2 id=4 data 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_data_4.json' output 'file=C:\Users\maxim\OneDrive\Bureau\tests_julia\test\m6.1s_chain_4.csv' refresh=100`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error(procs::Base.ProcessChain)
   @ Base .\process.jl:538
 [2] run(::Base.CmdRedirect; wait::Bool)
   @ Base .\process.jl:440
 [3] run
   @ .\process.jl:438 [inlined]
 [4] stan_run(m::SampleModel; kwargs::Base.Iterators.Pairs{Symbol, Any, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:data, :seed, :delta), Tuple{NamedTuple{(:H, :LL, :LR, :N), Tuple{Vector{Float64}, Vector{Float64}, Vector{Float64}, Int64}}, Int64, Float64}}})     
   @ StanSample C:\Users\maxim\.julia\packages\StanSample\4ayOP\src\stanrun\stan_run.jl:129
 [5] top-level scope
   @ c:\Users\maxim\OneDrive\Bureau\tests_julia\code\test_model.jl:62
in expression starting at c:\Users\maxim\OneDrive\Bureau\tests_julia\code\test_model.jl:62

My Julia startup file points to this version of cmdstan : ENV["JULIA_CMDSTAN_HOME"]="C:\Users\maxim\cmdstan-2.27.0"

Thank you very much for your help again :)

Maxime

@goedman
Copy link
Collaborator

goedman commented Feb 22, 2022 via email

@goedman
Copy link
Collaborator

goedman commented Feb 22, 2022

Hi Maxine,

A new version of StanSample.jl (v6.3.1) just got merged which I believe will, by default, be compatible with older versions of the cmdstan as long as the keyword argument use_cpp_chains=false (in the stan_sample() call). That is the default and multiple chains will execute on Julia level ( a process for each chain).

So in the walkthrough examples:

rc6_1s = stan_sample(m6_1s; data, seed=-1, delta=0.85);
if success(rc6_1s)
   df = read_samples(m6_1s, :dataframe)
end

will return a single DataFrame with all 4000 draws. Don't think alpha will be returned (defined but not used in your Stan Language model).

Best, Rob

@quantitative-ecologist
Copy link
Author

Hi Rob,

I updated to the newest version and the model runs like a charm. Thank you for your help. I look forward to using Stan.jl for my PhD project. I might get back to you soon when I start running complex multivariate hierarchical models :)

Best,

Maxime

@goedman
Copy link
Collaborator

goedman commented Feb 24, 2022

Your welcome Maxime (@quantitative-ecologist ),

Glad it all works again, I should have come up with this approach much earlier.

There are a few minor updates in the Stan Language that have been introduces in v2.28 and I think will be rejected in v2.32 (should be released in about a year). They are minor, e.g.:
"vector[N] height;" in stead of "real height[N];"

Just a heads up.

Best,
Rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants