Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pm.sample(random_seed) sets the random_seed of functions from other packages #5714

Closed
benslack19 opened this issue Apr 15, 2022 · 6 comments
Closed

Comments

@benslack19
Copy link

Description of your problem

The pymc random_seed set within pm.sample is affecting random function output of other. This is true in all pymc versions I've tested (3.11.0, 3.11.5, 4.0.0b6). I would say that this is unexpected since I would think that the model context would confine parameters within the pymc code block.

import pymc as pm
import scipy.stats as stats

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print(test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False
        )

There's no error but you can see that the stats.norm.rvs output repeats itself after the first loop.

# I'm omitting the standard pymc comments for readability

--- loop0 --- 
[1.24009922 0.66126999 1.03865944 1.54606376 1.59789667]

 --- loop1 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop2 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop3 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

 --- loop4 --- 
[1.97873798 3.2408932  2.86755799 0.02272212 1.95008842]

Removing the random_seed or setting random_seed=None does not show this behavior.

Versions and main components

  • PyMC/PyMC3 Version: 4.0.0b6
  • Aesara/Theano Version:
  • Scipy version: 1.8.0
  • Python Version: 3.9.12
  • Operating system: macOS 11.6.2 (20G314)
  • How did you install PyMC/PyMC3: pip
@ricardoV94
Copy link
Member

ricardoV94 commented Apr 15, 2022

Unfortunately (most of) our samplers are dependent on global seeding. It will require a lot of refactoring to overcome that.

I am closing this as a duplicate of #5093

@benslack19
Copy link
Author

Thank you @ricardoV94. I'm wondering if a simple "reset" of the seed could occur at the end of a model context block within the function, by setting np.random.seed(seed=None).

I was able to do this in a modification of what I had above. The result is random_seed within pm.sample provides expected, reproducible output but does not affect the stats.norm.rvs output.

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print("stats.norm.rvs values: ", test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False
        )
        print("check pymc sampling with random_seed=0")
        print(trace_test['dummy'][0:5], end='\n')

        print("reset random seed")
        np.random.seed(seed=None)

Output. Pymc output omitted for readibility.

--- loop0 --- 
stats.norm.rvs values:  [ 0.019048   -0.16396249 -1.36197385  1.33321176  1.46444872]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop1 --- 
stats.norm.rvs values:  [ 0.85087594  0.77433995  1.17812175  1.17992299 -0.34280457]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop2 --- 
stats.norm.rvs values:  [ 1.16744827 -1.30708103  2.54481578  1.74631898 -0.66179566]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop3 --- 
stats.norm.rvs values:  [0.03289028 1.4035661  1.56154409 2.43393023 2.47116958]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

 --- loop4 --- 
stats.norm.rvs values:  [0.11886464 0.61403243 0.16277635 1.78810325 1.53460163]

check pymc sampling with random_seed=0
[-0.46795611 -0.23220823  0.22529251  0.76628907 -0.71146223]
reset random seed

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 16, 2022

That would still be problematic, we would be erasing users global seeds if they had set them.

AFAICT, a proper solution will require moving away completely from any use of global seeding. We can't make safe assumptions about how users are using global seeding outside of our library.

@benslack19
Copy link
Author

OK thank you @ricardoV94. Glad I checked with you before working on my first pymc pull request!

@ricardoV94
Copy link
Member

If you would like to work on this issue it would have a big impact! Even just exploring what needs to be done / possible solutions would be invaluable in itself.

@benslack19
Copy link
Author

Thanks. Not sure how much time I can really devote to this, but one thing I thought could be a workaround would be to set the random state back to what it was before. This can, in principle, work, but it also has drawbacks depending on the use case (e.g. my example of running in a loop and calling stats.norm.rvs).

RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
st8297 = np.random.get_state()         # <------- get the pre-set random state

for i in range(5):
    print(f" --- loop{i} --- ", end='\n')
    test_vals = stats.norm.rvs(loc=1, scale=1, size=5)
    print("stats.norm.rvs values: ", test_vals, end='\n\n')

    with pm.Model() as m:
        dummy = pm.Normal("dummy", 0, 0.5)
        trace_test = pm.sample(
            draws=100, random_seed=0, return_inferencedata=False, progressbar=False      # <------- pymc random state
        )
        print("check pymc sampling with random_seed=0")
        print(trace_test['dummy'][0:5], end='\n')
        
        print("reset random seed to what it was before")
        np.random.set_state(st8297)                                    # <------- go back to the initial random state

Output


 --- loop0 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop1 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop2 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]

check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop3 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]


check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]
reset random seed to what it was before

 --- loop4 --- 
stats.norm.rvs values:  [ 0.70856787 -0.27033081  1.91979879  3.29447325  0.661021  ]

check pymc sampling with random_seed=0
[-0.11492257 -0.00398722  0.26818099  0.75019017  0.160017  ]

In my case, this wouldn't work for me, because I want stats.norm.rvs to be a different random sample on each draw but it might be okay for someone else. For me, I'm just going to stop setting the random_seed parameter within the pymc model code since it can affect things globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants