Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: have access to terminal_observation in the infos. #233

Merged
merged 8 commits into from
Nov 28, 2023

Conversation

KaleabTessera
Copy link
Contributor

Closes #232.

This allows access to terminal_observation, which previously wasn't possible when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv.

@KaleabTessera
Copy link
Contributor Author

@elliottower Do you mind having a look at this? This is quite important since if you don't have the correct terminal obs, you can't do bootstrapping correctly.

@elliottower
Copy link
Contributor

Hey @KaleabTessera sorry about this, will take a look now. Just approved the ci workflows

@elliottower
Copy link
Contributor

Going to get confirmation from another Farama dev who has more experience with this type of thing than me cause I’m not 100% confident in my ability to check correctness for this

@elliottower
Copy link
Contributor

I see a few errors in the CI but I'm not sure they are anything to do with the changes you've made, testing it locally myself just to confirm and get a better idea (may be something with pettingzoo instead not sure)

@elliottower
Copy link
Contributor

Fixed the bug on PettingZoo's end, queued up the CI to run (will take a little while as the PZ CI is running as well), I think it should all pass now

@elliottower
Copy link
Contributor

Oh right I need to do a pettingzoo release for this to pass, this will have to wait for a few days as we are waiting on the AgileRL tutorial bugs to be fixed

@elliottower
Copy link
Contributor

PZ release is out in case you didn't see, so this should be unblocked now. Approved the workflows just now

@KaleabTessera
Copy link
Contributor Author

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

@elliottower
Copy link
Contributor

That sounds reasonable to me, going to get input from another dev to see if it makes sense to them

@ffelten
Copy link
Contributor

ffelten commented Nov 27, 2023

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

Hi,

I don't think it makes sense in this case since the underlying env (the parallel PZ env) is only a single entity.

@elliottower
Copy link
Contributor

Jet (@jjshoots) also said he thought this shouldn't be implemented, here's a screenshot of what he said.

Discord_6EwUEsGCck

@@ -52,6 +52,13 @@ def step_wait(self):
return self.step(self._saved_actions)

def reset(self, seed=None, options=None):
if seed is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused, won't this pass the same seed to all of the environments, therefore, this does the opposite of what you want.
Even so, this should be part of a second PR, not this one if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this will create the same seed. Each time np.random.randint is called, np internally updates its internal state, meaning that a new seed is created in the next call - docs explaining this.

This would only create the same seed for all envs if np.random.seed is used in each process to set them all to have the same seed.

Nonetheless, I agree this should be removed from this PR.

@KaleabTessera
Copy link
Contributor Author

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

  • You pass a seed into the environment when calling the first env.reset(.
  • Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

@ffelten
Copy link
Contributor

ffelten commented Nov 27, 2023

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

  • You pass a seed into the environment when calling the first env.reset(.
  • Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

@jjshoots
Copy link
Member

jjshoots commented Nov 27, 2023

I agree to what @ffelten is saying. That was what I was getting at in the first place but Florian put better words to it.

@KaleabTessera
Copy link
Contributor Author

KaleabTessera commented Nov 27, 2023

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.
I still think there is a possible issue for seeding in this scenario :

  • You pass a seed into the environment when calling the first env.reset(.
  • Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

This makes sense, thanks @ffelten @jjshoots ! I double-checked the base env I was using and this was the case 👍 So likely this is not an issue if base env handles seeding reasonably.

@elliottower
Copy link
Contributor

Thanks for the input guys. Looks like there's still a pytest failure which I'm not 100% sure why is the case

@KaleabTessera
Copy link
Contributor Author

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them.
Screenshot from 2023-11-27 17-25-03

My package versions, p3.10 + :

cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6

I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

@elliottower
Copy link
Contributor

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them. Screenshot from 2023-11-27 17-25-03

My package versions, p3.10 + :

cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6

I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

Thanks for looking into this, I'll try re-running the tests to see if it works. Weird that it passed on one python version but not the other as well.

@elliottower
Copy link
Contributor

Looks like it passed when I re-ran it, going to re-run on python 3.9 to see but yeah it's unrelated to this PR so not a big deal

@elliottower elliottower merged commit 7a16fe8 into Farama-Foundation:master Nov 28, 2023
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Final Observation is not returned when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv
6 participants