Seeded RNGs only work reproducibly within a cell #192

penelopeysm · 2024-10-10T23:47:40Z

Description

Calling Random.seed! at the top of a notebook doesn't ensure reproducibility of random number generation in other cells; it only works within the same cell that seed! is called.

I'm not sure if this is intentional, but it's slightly counterintuitive behaviour so if it's intentional it might be good to document it somewhere.

A related issue (and perhaps an example of how this behaviour is confusing!) is that the Pumas tutorial on reproducibility says that the second batch of samples in Random.seed!(123); rand(3); rand(3) should always result in "the same behavior [...] every time we run the code". This is true if done in a REPL, but the Quarto source has the second rand(3) in a different cell and consequently the second rand(3) will not always return the same numbers.

MWE

The following notebook works to demonstrate the behaviour. The first call to randn() should always give the same result no matter how many times the notebook is rendered. The second call however will not.

---
engine: julia
---

```{julia}
versioninfo()
```

```{julia}
using Random
Random.seed!(42)
randn()
```

```{julia}
randn()
```

My `versioninfo` from the notebook above

Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_LOAD_PATH = @:@stdlib

The text was updated successfully, but these errors were encountered:

devmotion · 2024-10-11T07:33:52Z

I think this might be an upstream issue but I'm not too familiar with Malt so maybe it's intended and/or could be avoided:

julia> using Malt

julia> w = Malt.Worker();

julia> Malt.remote_eval_fetch(w, quote
           using Random
           Random.seed!(42)
           randn()
       end)
0.7883556016042917

julia> Malt.remote_eval_fetch(w, quote
           randn()
       end)
0.8653602653617717

julia> Malt.remote_eval_fetch(w, quote
           using Random
           Random.seed!(42)
           randn()
       end)
0.7883556016042917

julia> Malt.remote_eval_fetch(w, quote
           randn()
       end)
-0.6715764280647957

MichaelHatherly · 2024-10-11T07:35:13Z

Will investigate, thanks for the thorough issue description @penelopeysm.

jkrumbiegel · 2024-10-11T07:41:15Z

Maybe this has something to do with task-local RNG state? Could be that Malt evals in different tasks per fetch and that this has an effect on the RNG. If they were doing some random number generation themselves that changed the RNG state, then it should still lead to the same numbers if it was the same n operations between two remote_eval_fetches so that can't be it.

devmotion · 2024-10-11T07:56:17Z

Hmm, the RNG object id seems to be stable, so it seems to be the same RNG?

julia> Malt.remote_eval_fetch(w, quote
           objectid(Random.default_rng())
       end)
0xffffffff0968f0b4

julia> Malt.remote_eval_fetch(w, quote
           objectid(Random.default_rng())
       end)
0xffffffff0968f0b4

julia> Malt.remote_eval_fetch(w, quote
           objectid(Random.default_rng())
       end)
0xffffffff0968f0b4

julia> Malt.remote_eval_fetch(w, quote
           objectid(Random.default_rng())
       end)
0xffffffff0968f0b4

jkrumbiegel · 2024-10-11T08:07:35Z

I think that's because it's TaskLocalRNG and that's a singleton type:

julia> TaskLocalRNG() === TaskLocalRNG()
true

The RNG state for that is stored in the task itself.

MichaelHatherly · 2024-10-11T08:09:32Z

cc @fonsp in case you happen to have any insights.

jkrumbiegel · 2024-10-11T08:13:54Z

This shows that we get different task-local RNG state in a new remote_eval_fetch:

julia> Malt.remote_eval_fetch(w, quote
           using Random
           Random.seed!(123)

           rngfields = filter(fieldnames(Task)) do field
               startswith(string(field), "rng")
           end
           state = NamedTuple(map(rngfields) do field
               field => getfield(Base.current_task(), field)
           end)
       end)
(rngState0 = 0xfefa8d41b8f5dca5, rngState1 = 0xf80cc98e147960c1, rngState2 = 0x20e2ccc17662fc1d, rngState3 = 0xea7a7dcb2e787c01, rngState4 = 0xf4e85a418b9c4f80)

julia> Malt.remote_eval_fetch(w, quote
           rngfields = filter(fieldnames(Task)) do field
               startswith(string(field), "rng")
           end
           state = NamedTuple(map(rngfields) do field
               field => getfield(Base.current_task(), field)
           end)
       end)
(rngState0 = 0x4e291591396795b9, rngState1 = 0xc9a680b08b4c9006, rngState2 = 0x3bb5088d1c600f58, rngState3 = 0xe2bb063e7e800a67, rngState4 = 0xe264ff418d928577)

Probably because each invocation is a different task?

julia> Malt.remote_eval_fetch(w, quote
           objectid(Base.current_task())
       end)
0xb68e01fb526e2337

julia> Malt.remote_eval_fetch(w, quote
           objectid(Base.current_task())
       end)
0x49246d797797f2ad

jkrumbiegel · 2024-10-11T08:41:25Z

If this is hard to fix in Malt, I guess we could store the rngState fields after a task runs a cell, and then first thing in the next cell, mutate that one's rngState fields so they take the same values. Not sure if Malt can be configured to use only a single task.

devmotion · 2024-10-11T09:22:08Z

Maybe one could take some inspiration from how Test resets the task-local RNG: https://github.com/JuliaLang/julia/blob/1438b1578941d0f1cc8f8c958cf3bd2927fd482c/stdlib/Test/src/Test.jl#L1694-L1720

jkrumbiegel · 2024-10-11T09:28:41Z

Interesting, it seems this looked quite different on 1.10 where Random.get_tls_seed() did not exist

https://github.com/JuliaLang/julia/blob/6f3fdf7b36250fb95f512a2b927ad2518c07d2b5/stdlib/Test/src/Test.jl#L1567-L1578

fonsp · 2024-10-11T13:46:34Z

@Pangoraw knows best about RNGs and threading in Pluto :)

Pangoraw · 2024-10-11T14:48:55Z

remote_eval_fetch indeed spawns a new task at each call. One solution would be for QuartoNotebookWorker to manage its own persistent task for evaluation and then communicate with this task using channels.

devmotion · 2024-10-11T17:23:43Z

The more I thought about this issue, the more I got the feeling that this is an upstream issue in Malt. The behaviour seems completely surprising, in particular given that everything else is consistent between subsequent Malt.remote_eval_fetch calls and there were not tasks involved in the to-be-evaluated expressions sent to the worker (additionally, in my example above the worker used a single thread). Additionally, it seemed the behaviour was not documented, so I assumed possibly it was not even intentional.

But then I tried the same example with Distributed:

julia> using Distributed

julia> addprocs(1)
1-element Vector{Int64}:
 2

julia> remotecall_fetch(eval, 2, quote
           using Random
           Random.seed!(42)
           randn()
       end)
0.7883556016042917

julia> remotecall_fetch(eval, 2, quote
           randn()
       end)
0.08641660475950964

julia> remotecall_fetch(eval, 2, quote
           using Random
           Random.seed!(42)
           randn()
       end)
0.7883556016042917

julia> remotecall_fetch(eval, 2, quote
           randn()
       end)
1.559594774368473

julia> remotecall_fetch(eval, 2, quote
           objectid(Base.current_task())
       end)
0xebc117a52ab2a7be

julia> remotecall_fetch(eval, 2, quote
           objectid(Base.current_task())
       end)
0xe14f2bb195cecba7

So it seems Malt is consistent with Distributed. I couldn't find any issue related to it in the Distributed repo but in the Julia repo someone had opened an issue regarding this behaviour: JuliaLang/julia#51225 It also links to a Pluto issue: fonsp/Pluto.jl#2290

So my take away is that the behaviour of Malt is consistent with Distributed but that this behaviour - both in Distributed and in Malt - is confusing and surprising for users. Based on the discussion in the Julia issue, I assume Distributed won't be changed - but since there are issues from both Pluto and Quarto users, maybe it could be addressed in Malt? Maybe Malt could expose an API for evaluations within a persistent task?

devmotion · 2024-10-11T18:18:12Z

This is an approach for Distributed (maybe it can be done more elegantly or efficiently?):

julia> using Distributed

julia> addprocs(1)
1-element Vector{Int64}:
 2

julia> const in = RemoteChannel(() -> Channel{Expr}(0));

julia> const out = RemoteChannel(() -> Channel{Any}(0));

julia> @everywhere function evalloop(out, in)
           while true
               expr = take!(in)
               put!(out, eval(expr))
           end
       end

julia> remote_do(evalloop, 2, out, in)

julia> put!(in, quote
           using Random
           Random.seed!(42)
           randn()
       end);

julia> take!(out)
0.7883556016042917

julia> put!(in, quote
           randn()
       end);

julia> take!(out)
-0.8798585959543993

julia> put!(in, quote
           using Random
           Random.seed!(42)
           randn()
       end);

julia> take!(out)
0.7883556016042917

julia> put!(in, quote
           randn()
       end);

julia> take!(out)
-0.8798585959543993

What's the best way to do the same with Malt?

I tried the following without success:

julia> using Malt

julia> w = Malt.Worker();

julia> cin = Malt.worker_channel(w, :(cin = Channel{Expr}(0)));

julia> cout = Malt.worker_channel(w, :(cout = Channel{Any}(0)));

julia> Malt.remote_eval(w, quote
           while true
               expr = take!(cin)
               put!(cout, eval(expr))
           end
       end);

julia> put!(cin, quote
           using Random
           Random.seed!(42)
           randn()
       end);
ERROR: Remote exception from Malt.Worker on port 9591 with PID 96591:

MethodError: Cannot `convert` an object of type Float64 to an object of type Expr
The function `convert` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  Expr(::Any...)
   @ Core boot.jl:271
  convert(::Type{T}, !Matched::T) where T
   @ Base Base.jl:126

Stacktrace:
 [1] put!(c::Channel{Expr}, v::Float64)
   @ Base ./channels.jl:357
 [2] top-level scope
   @ REPL[7]:4
 [3] eval
   @ ./boot.jl:430 [inlined]
 [4] (::var"#1#2"{Sockets.TCPSocket, UInt64, Bool, @Kwargs{}, Tuple{Module, Expr}, typeof(Core.eval)})()
   @ Main ~/.julia/packages/Malt/Z3YQq/src/worker.jl:120
Stacktrace:
 [1] unwrap_worker_result(worker::Malt.Worker, result::Malt.WorkerResult)
   @ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:50
 [2] _wait_for_response(worker::Malt.Worker, msg_id::UInt64)
   @ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:325
 [3] _send_receive
   @ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:336 [inlined]
 [4] #remote_call_wait#43
   @ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:422 [inlined]
 [5] remote_call_wait
   @ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:421 [inlined]
 [6] remote_eval_wait
   @ ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:491 [inlined]
 [7] put!(rc::Malt.RemoteChannel{Any}, v::Expr)
   @ Malt ~/.julia/packages/Malt/Z3YQq/src/Malt.jl:532
 [8] top-level scope
   @ REPL[7]:1

OK, the Malt example doesn't error anymore when I remove the type parameter of the Channel (seems like a bug?). But then the random numbers in subsequent evaluations are still not reproducible:

julia> using Malt

julia> w = Malt.Worker();

julia> cin = Malt.worker_channel(w, :(cin = Channel(0)));

julia> cout = Malt.worker_channel(w, :(cout = Channel(0)));

julia> put!(cin, quote
           using Random
           Random.seed!(42)
           randn()
       end);

julia> take!(cout)
0.7883556016042917

julia> put!(cin, quote
           randn()
       end)

julia> take!(cout)
0.10690497033454253

julia> put!(cin, quote
           using Random
           Random.seed!(42)
           randn()
       end);

julia> take!(cout)
0.7883556016042917

julia> put!(cin, quote
           randn()
       end)

julia> take!(cout)
-0.047221952638015874

julia> put!(cin, quote
           using Random
           Random.seed!(42)
           randn()
       end);

julia> take!(cout)
0.7883556016042917

julia> put!(cin, quote
           randn()
       end)

julia> take!(cout)
0.37395225126097864

Should this discrepancy with Distributed be considered a bug? Or do I use Malt incorrectly?

jkrumbiegel · 2024-10-14T09:42:25Z

I thought it would look something like this:

using Malt

function initialize_special_channels(worker)
    @assert Malt.remote_eval_fetch(worker, quote
        var"__special_channel_out" = Channel()
        var"__special_channel_in" = Channel{Expr}() do chan
            for expr in chan
                result = Core.eval(Main, expr)
                put!(var"__special_channel_out", result)
            end
        end
        true
    end)
end

function remote_eval_fetch_channeled(worker, expr::Expr)
    code = quote
        Threads.@spawn put!(var"__special_channel_in", $(QuoteNode(expr)))
        take!(var"__special_channel_out")
    end
    return Malt.remote_eval_fetch(worker, code)
end

w = Malt.Worker()
initialize_special_channels(w)

I do get consistent random numbers with this technique:

devmotion · 2024-10-14T11:00:13Z

Should we go with that? The reproducibility issue came up in our triage call today, and we thought we should inform users in the documentation about this problem if it takes more time to fix it. But if we can fix it right away, this might be the better option?

MichaelHatherly · 2024-10-14T11:15:06Z

Should we go with that?

Sounds fine to me 👍

jkrumbiegel · 2024-10-14T11:31:11Z

I'll attempt a PR

fonsp · 2024-10-14T12:06:59Z

Nice! Could you also make a PR to Malt to add this consistency as a test?

jkrumbiegel · 2024-10-14T12:18:58Z

Nice! Could you also make a PR to Malt to add this consistency as a test?

@fonsp Wait I thought I'd implement this logic in QuartoNotebookRunner, did you think it was a good fit for Malt itself?

fonsp · 2024-10-14T13:50:00Z

No my suggestion is to add a test to Malt for this behaviour, to guarantee that this will continue working in future Malt versions, since you depend on it.

jkrumbiegel · 2024-10-14T14:09:05Z

To my understanding I'm not depending on anything new in Malt with this workaround, I'm depending on Julia's channel and task behavior. If you change your implementation some day to use a stable task for all evals then the workaround might not be necessary anymore but it shouldn't break.

penelopeysm mentioned this issue Oct 10, 2024

Random assertions can fail randomly (despite seeding) TuringLang/docs#533

Closed

MichaelHatherly self-assigned this Oct 11, 2024

fonsp mentioned this issue Oct 11, 2024

Use Threads.@spawn instead of @async JuliaPluto/Malt.jl#86

Open

MichaelHatherly assigned jkrumbiegel and unassigned MichaelHatherly Oct 14, 2024

jkrumbiegel mentioned this issue Oct 14, 2024

Deterministic seeded random streams across runs #194

Merged

MichaelHatherly closed this as completed in #194 Oct 15, 2024

penelopeysm mentioned this issue Oct 18, 2024

Bump QuartoNotebookRunner.jl to 0.11.5 quarto-dev/quarto-cli#11100

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeded RNGs only work reproducibly within a cell #192

Seeded RNGs only work reproducibly within a cell #192

penelopeysm commented Oct 10, 2024 •

edited

Loading

devmotion commented Oct 11, 2024 •

edited

Loading

MichaelHatherly commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024 •

edited

Loading

devmotion commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024

MichaelHatherly commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024 •

edited

Loading

jkrumbiegel commented Oct 11, 2024

devmotion commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024

fonsp commented Oct 11, 2024

Pangoraw commented Oct 11, 2024

devmotion commented Oct 11, 2024

devmotion commented Oct 11, 2024 •

edited

Loading

jkrumbiegel commented Oct 14, 2024

devmotion commented Oct 14, 2024

MichaelHatherly commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

fonsp commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

fonsp commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

Seeded RNGs only work reproducibly within a cell #192

Seeded RNGs only work reproducibly within a cell #192

Comments

penelopeysm commented Oct 10, 2024 • edited Loading

Description

MWE

My versioninfo from the notebook above

devmotion commented Oct 11, 2024 • edited Loading

MichaelHatherly commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024 • edited Loading

devmotion commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024

MichaelHatherly commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024 • edited Loading

jkrumbiegel commented Oct 11, 2024

devmotion commented Oct 11, 2024

jkrumbiegel commented Oct 11, 2024

fonsp commented Oct 11, 2024

Pangoraw commented Oct 11, 2024

devmotion commented Oct 11, 2024

devmotion commented Oct 11, 2024 • edited Loading

jkrumbiegel commented Oct 14, 2024

devmotion commented Oct 14, 2024

MichaelHatherly commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

fonsp commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

fonsp commented Oct 14, 2024

jkrumbiegel commented Oct 14, 2024

penelopeysm commented Oct 10, 2024 •

edited

Loading

My `versioninfo` from the notebook above

devmotion commented Oct 11, 2024 •

edited

Loading

jkrumbiegel commented Oct 11, 2024 •

edited

Loading

jkrumbiegel commented Oct 11, 2024 •

edited

Loading

devmotion commented Oct 11, 2024 •

edited

Loading