pm.sample(random_seed) sets the random_seed of functions from other packages #5714

benslack19 · 2022-04-15T21:15:24Z

Description of your problem

The pymc random_seed set within pm.sample is affecting random function output of other. This is true in all pymc versions I've tested (3.11.0, 3.11.5, 4.0.0b6). I would say that this is unexpected since I would think that the model context would confine parameters within the pymc code block.

import pymc as pm
import scipy.stats as stats

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print(test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False
        )

There's no error but you can see that the stats.norm.rvs output repeats itself after the first loop.

# I'm omitting the standard pymc comments for readability

--- loop0 --- 
[1.24009922 0.66126999 1.03865944 1.54606376 1.59789667]

 --- loop1 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop2 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop3 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop4 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

Removing the random_seed or setting random_seed=None does not show this behavior.

Versions and main components

PyMC/PyMC3 Version: 4.0.0b6
Aesara/Theano Version:
Scipy version: 1.8.0
Python Version: 3.9.12
Operating system: macOS 11.6.2 (20G314)
How did you install PyMC/PyMC3: pip

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2022-04-15T21:23:56Z

Unfortunately (most of) our samplers are dependent on global seeding. It will require a lot of refactoring to overcome that.

I am closing this as a duplicate of #5093

benslack19 · 2022-04-16T08:33:19Z

Thank you @ricardoV94. I'm wondering if a simple "reset" of the seed could occur at the end of a model context block within the function, by setting np.random.seed(seed=None).

I was able to do this in a modification of what I had above. The result is random_seed within pm.sample provides expected, reproducible output but does not affect the stats.norm.rvs output.

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print("stats.norm.rvs values: ", test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False
        )
        print("check pymc sampling with random_seed=0")
        print(trace_test['dummy'][0:5], end='\n')

        print("reset random seed")
        np.random.seed(seed=None)

Output. Pymc output omitted for readibility.

--- loop0 --- 
stats.norm.rvs values:  [ 0.019048   -0.16396249 -1.36197385  1.33321176  1.46444872]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop1 --- 
stats.norm.rvs values:  [ 0.85087594  0.77433995  1.17812175  1.17992299 -0.34280457]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop2 --- 
stats.norm.rvs values:  [ 1.16744827 -1.30708103  2.54481578  1.74631898 -0.66179566]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop3 --- 
stats.norm.rvs values:  [0.03289028 1.4035661  1.56154409 2.43393023 2.47116958]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop4 --- 
stats.norm.rvs values:  [0.11886464 0.61403243 0.16277635 1.78810325 1.53460163]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

ricardoV94 · 2022-04-16T08:39:40Z

That would still be problematic, we would be erasing users global seeds if they had set them.

AFAICT, a proper solution will require moving away completely from any use of global seeding. We can't make safe assumptions about how users are using global seeding outside of our library.

benslack19 · 2022-04-17T15:37:12Z

OK thank you @ricardoV94. Glad I checked with you before working on my first pymc pull request!

ricardoV94 · 2022-04-17T16:45:51Z

If you would like to work on this issue it would have a big impact! Even just exploring what needs to be done / possible solutions would be invaluable in itself.

benslack19 · 2022-04-18T02:51:41Z

Thanks. Not sure how much time I can really devote to this, but one thing I thought could be a workaround would be to set the random state back to what it was before. This can, in principle, work, but it also has drawbacks depending on the use case (e.g. my example of running in a loop and calling stats.norm.rvs).

RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
st8297 = np.random.get_state()         # <------- get the pre-set random state

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print("stats.norm.rvs values: ", test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False      # <------- pymc random state
        )
        print("check pymc sampling with random_seed=0")
        print(trace_test['dummy'][0:5], end='\n')
        
        print("reset random seed to what it was before")
        np.random.set_state(st8297)                                    # <------- go back to the initial random state

Output


 --- loop0 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop1 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop2 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]

check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop3 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop4 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]

check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]

In my case, this wouldn't work for me, because I want stats.norm.rvs to be a different random sample on each draw but it might be okay for someone else. For me, I'm just going to stop setting the random_seed parameter within the pymc model code since it can affect things globally.

ricardoV94 closed this as completed Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pm.sample(random_seed) sets the random_seed of functions from other packages #5714

pm.sample(random_seed) sets the random_seed of functions from other packages #5714

benslack19 commented Apr 15, 2022

ricardoV94 commented Apr 15, 2022 •

edited

Loading

benslack19 commented Apr 16, 2022

ricardoV94 commented Apr 16, 2022 •

edited

Loading

benslack19 commented Apr 17, 2022

ricardoV94 commented Apr 17, 2022

benslack19 commented Apr 18, 2022

pm.sample(random_seed) sets the random_seed of functions from other packages #5714

pm.sample(random_seed) sets the random_seed of functions from other packages #5714

Comments

benslack19 commented Apr 15, 2022

Description of your problem

Versions and main components

ricardoV94 commented Apr 15, 2022 • edited Loading

benslack19 commented Apr 16, 2022

ricardoV94 commented Apr 16, 2022 • edited Loading

benslack19 commented Apr 17, 2022

ricardoV94 commented Apr 17, 2022

benslack19 commented Apr 18, 2022

ricardoV94 commented Apr 15, 2022 •

edited

Loading

ricardoV94 commented Apr 16, 2022 •

edited

Loading