Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first selfplay worker uses the same seed for all parallel environments #27

Open
rPortelas opened this issue May 25, 2022 · 2 comments

Comments

@rPortelas
Copy link

I might have found an unexpected behavior in how parallel training environments are being seeded.

I am referring to this line:

envs = [self.config.new_game(self.config.seed + self.rank * i) for i in range(env_nums)]

Because the rank of the first selfplay worker is 0, parallel environments are being initialized with the same seed, which might reduce training data diversity.

We could go for a simple fix like replacing self.rank by (self.rank + 1), however this is still problematic if considering multiple workers, as there will be seed overlap between them anyway.

A good option might be to sample a seed for each parallel environment using numpy (which is seeded before launching data workers). For instance:

envs = [self.config.new_game(np.random.randint(10**9)) for i in range(env_nums)]

@jamesliu
Copy link

Ditto, but using randint may cause irreproducible.

@rPortelas
Copy link
Author

rPortelas commented May 25, 2022

Hmm right right. Thanks for the input.

Then we could use a dedicated random state created from the original seed:

rnd_state = np.random.RandomState(self.config.seed + self.rank)
envs = [self.config.new_game(rnd_state.randint(10**9)) for _ in range(env_nums)]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants