Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r_session$new() afects the random seed generator in a non reproducible way on Windows #390

Closed
dfalbel opened this issue Jan 28, 2025 · 7 comments

Comments

@dfalbel
Copy link

dfalbel commented Jan 28, 2025

For example, on Windows, if you run a few times:

set.seed(1)
digest::sha1(.Random.seed)
sess <- callr::r_session$new()
digest::sha1(.Random.seed)

You'll see that it that the RNG state is different after every execution.

@gaborcsardi gaborcsardi transferred this issue from r-lib/callr Jan 28, 2025
@gaborcsardi
Copy link
Member

Indeed, to generate a unique pipe name:

processx/src/win/stdio.c

Lines 93 to 98 in 118704a

static void processx__unique_pipe_name(char* ptr, char* name, size_t size) {
int r;
GetRNGstate();
r = (int)(unif_rand() * 65000);
snprintf(name, size, "\\\\?\\pipe\\px\\%p-%lu", ptr + r, GetCurrentProcessId());
PutRNGstate();

R CMD check forces me to use R's RNGs, so that is not OK, either?

@dfalbel
Copy link
Author

dfalbel commented Jan 28, 2025

I believe it's OK to use R's RNG, but starting a session should modify the RNG state in a predictable manner. Such that, the RNG state is consistent accross multiple runs. With the current behavior, if you do set.seed() and later use r_session$new(), then future outputs will not depend on the starting seed.

@gaborcsardi
Copy link
Member

So how is the code in the original post different than this?

set.seed(1)
digest::sha1(.Random.seed)
rnorm(5)
digest::sha1(.Random.seed)

I am sorry, my argument is not with you, really. But I am also not entirely sure what to do here.

@dfalbel
Copy link
Author

dfalbel commented Jan 28, 2025

Here are two runs from the above and another two runs using the r_session:

> set.seed(1)
+ digest::sha1(.Random.seed)
+ rnorm(1)
+ digest::sha1(.Random.seed)
[1] "26d295cbeca7a799eb5b849b285aacf83794392c"
[1] -0.6264538
[1] "20e54edc8504855ccc29c3f9abb77ac5651300e8"
> set.seed(1)
+ digest::sha1(.Random.seed)
+ rnorm(1)
+ digest::sha1(.Random.seed)
[1] "26d295cbeca7a799eb5b849b285aacf83794392c"
[1] -0.6264538
[1] "20e54edc8504855ccc29c3f9abb77ac5651300e8"
> set.seed(1)
+ digest::sha1(.Random.seed)
+ sess <- callr::r_session$new()
+ digest::sha1(.Random.seed)
[1] "26d295cbeca7a799eb5b849b285aacf83794392c"
[1] "2a53052f8dfedfe31831a157ee373c93c695eaeb"
> set.seed(1)
+ digest::sha1(.Random.seed)
+ sess <- callr::r_session$new()
+ digest::sha1(.Random.seed)
[1] "26d295cbeca7a799eb5b849b285aacf83794392c"
[1] "00b61021b84384fb2292842dc1347bc75a9529ce"

Note that r_session is modifying the state differently between runs.

@gaborcsardi
Copy link
Member

Well, yes, because it keeps generating random file names until one does not exist, and you are creating the second session while the first is still around, so that pipe file already exists. If you re-run it in a different session you get the same:

> RS -q -e "set.seed(1); rs <- callr::r_session[['new']](); digest::digest(.Random.seed)"
> set.seed(1); rs <- callr::r_session[['new']](); digest::digest(.Random.seed)
[1] "34ef6514589f20e41c2f1da6bdce71b1"

> RS -q -e "set.seed(1); rs <- callr::r_session[['new']](); digest::digest(.Random.seed)"
> set.seed(1); rs <- callr::r_session[['new']](); digest::digest(.Random.seed)
[1] "34ef6514589f20e41c2f1da6bdce71b1"

But yeah, I guess the file name is not something that is supposed to be reproducible, so I should not use the R RNG. Sadly, it is kind of hard to run the equivalent of basename(tempfile()) from the Windows C API. I'll come up with something.

@gaborcsardi
Copy link
Member

I can do a release soon if this is important for you.

@dfalbel
Copy link
Author

dfalbel commented Jan 28, 2025

Thanks! No hurry for a release, that's not urgent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants