fix: have access to `terminal_observation` in the infos. #233

KaleabTessera · 2023-10-30T17:56:48Z

Closes #232.

This allows access to terminal_observation, which previously wasn't possible when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv.

KaleabTessera · 2023-11-02T13:24:18Z

@elliottower Do you mind having a look at this? This is quite important since if you don't have the correct terminal obs, you can't do bootstrapping correctly.

elliottower · 2023-11-03T15:08:05Z

Hey @KaleabTessera sorry about this, will take a look now. Just approved the ci workflows

supersuit/vector/markov_vector_wrapper.py

elliottower · 2023-11-03T15:12:06Z

Going to get confirmation from another Farama dev who has more experience with this type of thing than me cause I’m not 100% confident in my ability to check correctness for this

elliottower · 2023-11-07T17:30:21Z

I see a few errors in the CI but I'm not sure they are anything to do with the changes you've made, testing it locally myself just to confirm and get a better idea (may be something with pettingzoo instead not sure)

elliottower · 2023-11-07T18:11:33Z

Fixed the bug on PettingZoo's end, queued up the CI to run (will take a little while as the PZ CI is running as well), I think it should all pass now

elliottower · 2023-11-07T22:11:52Z

Oh right I need to do a pettingzoo release for this to pass, this will have to wait for a few days as we are waiting on the AgileRL tutorial bugs to be fixed

elliottower · 2023-11-22T20:35:21Z

PZ release is out in case you didn't see, so this should be unblocked now. Approved the workflows just now

KaleabTessera · 2023-11-27T11:21:12Z

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

elliottower · 2023-11-27T15:42:38Z

That sounds reasonable to me, going to get input from another dev to see if it makes sense to them

ffelten · 2023-11-27T15:49:36Z

Thanks @elliottower ! I added this code to ensure that a seed used even when reset is called here. Not sure if it is necessary, what do you think?

A similar thing is done in stable baselines' vec env

Hi,

I don't think it makes sense in this case since the underlying env (the parallel PZ env) is only a single entity.

elliottower · 2023-11-27T15:51:38Z

Jet (@jjshoots) also said he thought this shouldn't be implemented, here's a screenshot of what he said.

pseudo-rnd-thoughts · 2023-11-27T16:17:02Z

supersuit/vector/markov_vector_wrapper.py

@@ -52,6 +52,13 @@ def step_wait(self):
        return self.step(self._saved_actions)

    def reset(self, seed=None, options=None):
+        if seed is None:


I'm a bit confused, won't this pass the same seed to all of the environments, therefore, this does the opposite of what you want.
Even so, this should be part of a second PR, not this one if possible

I don't believe this will create the same seed. Each time np.random.randint is called, np internally updates its internal state, meaning that a new seed is created in the next call - docs explaining this.

This would only create the same seed for all envs if np.random.seed is used in each process to set them all to have the same seed.

Nonetheless, I agree this should be removed from this PR.

KaleabTessera · 2023-11-27T16:36:20Z

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

You pass a seed into the environment when calling the first env.reset(.
Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

ffelten · 2023-11-27T16:44:18Z

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.

I still think there is a possible issue for seeding in this scenario :

You pass a seed into the environment when calling the first env.reset(.

Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

jjshoots · 2023-11-27T16:53:34Z

I agree to what @ffelten is saying. That was what I was getting at in the first place but Florian put better words to it.

KaleabTessera · 2023-11-27T16:54:43Z

So I removed the manual seeding. I agree it should be in another PR and maybe it is not even useful.
I still think there is a possible issue for seeding in this scenario :

You pass a seed into the environment when calling the first env.reset(.

Then the env loops itself, and manually calls env.reset here. Although you have passed an initial seed for controlling the determinism, the env doesn't use a seed when calling reset from the step, meaning the underlying behaviour is not deterministic and relies on how base env handles none seeds.

I think the vec env should create a new seed deterministically - similar to how Jax handles random numbers. I think that is why stable baselines vec env ensure that a new seed is created or used.

Normally the environment should handle this: the first reset() is made with a seed. Then an internal attribute keeps track of this seed so even if there are subsequent calls to reset() with seed being None, they are still seeded by the first call.

This makes sense, thanks @ffelten @jjshoots ! I double-checked the base env I was using and this was the case 👍 So likely this is not an issue if base env handles seeding reasonably.

elliottower · 2023-11-27T17:12:57Z

Thanks for the input guys. Looks like there's still a pytest failure which I'm not 100% sure why is the case

KaleabTessera · 2023-11-27T17:39:25Z

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them.

My package versions, p3.10 + :

cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6

I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

elliottower · 2023-11-28T16:26:43Z

Looks like the error is thrown by pettingzoo's api_test function here. knights_archers_zombies_v10 is returning an out of range obs (obs >1 from the logs), when running the py3.8 tests (py 3.11 pass).

The tests passed locally when I ran them.

My package versions, p3.10 + :
cffi==1.16.0
cfgv==3.4.0
cloudpickle==3.0.0
distlib==0.3.7
exceptiongroup==1.1.3
Farama-Notifications==0.0.4
filelock==3.13.0
gymnasium==0.29.1
identify==2.5.31
iniconfig==2.0.0
nodeenv==1.8.0
numpy==1.26.1
packaging==23.2
pettingzoo==1.24.2
platformdirs==3.11.0
pluggy==1.3.0
pre-commit==3.5.0
pycparser==2.21
pygame==2.5.2
pymunk==6.6.0
pytest==7.4.3
PyYAML==6.0.1
tinyscaler==1.2.7
tomli==2.0.1
typing_extensions==4.8.0
virtualenv==20.24.6
I also downgraded my numpy to 1.24.4 and tried py3.8, the same version in the tests, and the tests still passed. I assume there is some stochastic behaviour in knights_archers_zombies_v10 env or sticky actions wrapper that is not getting seeded deterministically.

I don't think it is related to this PR, since the test_sticky_actions test doesn't use the vec env in the test.

Thanks for looking into this, I'll try re-running the tests to see if it works. Weird that it passed on one python version but not the other as well.

elliottower · 2023-11-28T16:50:25Z

Looks like it passed when I re-ran it, going to re-run on python 3.9 to see but yeah it's unrelated to this PR so not a big deal

KaleabTessera added 2 commits October 30, 2023 17:55

fix: have access to terminal_observation in the infos.

65086cf

fix: fix empty reset_infs.

ba3299d

This was referenced Nov 2, 2023

Update ppo_pettingzoo_ma_atari.py vwxyzjn/cleanrl#408

Open

[Bug Report] Possible bug with bootstrapping when environment is truncated in CleanRL mutli-agent Atari example Farama-Foundation/PettingZoo#1126

Open

elliottower reviewed Nov 3, 2023

View reviewed changes

supersuit/vector/markov_vector_wrapper.py Outdated Show resolved Hide resolved

elliottower approved these changes Nov 3, 2023

View reviewed changes

KaleabTessera added 3 commits November 6, 2023 18:35

feat: Ensure infos list is of size n agents.

7096918

feat: test terminal_obs are returned when env reset.

9788c66

fix: add black death wrapper to test.

344ee29

elliottower approved these changes Nov 7, 2023

View reviewed changes

elliottower mentioned this pull request Nov 7, 2023

Fix minor bug in TerminateIllegal wrapper indexing empty info dict Farama-Foundation/PettingZoo#1129

Merged

7 tasks

KaleabTessera added 2 commits November 19, 2023 17:10

fix: ensure parallel vec env don't reset using same seed.

20af522

fix: remove debugging code.

bc36bdf

pseudo-rnd-thoughts reviewed Nov 27, 2023

View reviewed changes

fix: remove manual seeding if seed is none.

e49c5e1

elliottower merged commit 7a16fe8 into Farama-Foundation:master Nov 28, 2023
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: have access to `terminal_observation` in the infos. #233

fix: have access to `terminal_observation` in the infos. #233

KaleabTessera commented Oct 30, 2023

KaleabTessera commented Nov 2, 2023

elliottower commented Nov 3, 2023

elliottower commented Nov 3, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 22, 2023

KaleabTessera commented Nov 27, 2023

elliottower commented Nov 27, 2023

ffelten commented Nov 27, 2023

elliottower commented Nov 27, 2023

pseudo-rnd-thoughts Nov 27, 2023

KaleabTessera Nov 27, 2023

KaleabTessera commented Nov 27, 2023

ffelten commented Nov 27, 2023

jjshoots commented Nov 27, 2023 •

edited

Loading

KaleabTessera commented Nov 27, 2023 •

edited

Loading

elliottower commented Nov 27, 2023

KaleabTessera commented Nov 27, 2023

elliottower commented Nov 28, 2023

elliottower commented Nov 28, 2023

fix: have access to terminal_observation in the infos. #233

fix: have access to terminal_observation in the infos. #233

Conversation

KaleabTessera commented Oct 30, 2023

KaleabTessera commented Nov 2, 2023

elliottower commented Nov 3, 2023

elliottower commented Nov 3, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 7, 2023

elliottower commented Nov 22, 2023

KaleabTessera commented Nov 27, 2023

elliottower commented Nov 27, 2023

ffelten commented Nov 27, 2023

elliottower commented Nov 27, 2023

pseudo-rnd-thoughts Nov 27, 2023

Choose a reason for hiding this comment

KaleabTessera Nov 27, 2023

Choose a reason for hiding this comment

KaleabTessera commented Nov 27, 2023

ffelten commented Nov 27, 2023

jjshoots commented Nov 27, 2023 • edited Loading

KaleabTessera commented Nov 27, 2023 • edited Loading

elliottower commented Nov 27, 2023

KaleabTessera commented Nov 27, 2023

elliottower commented Nov 28, 2023

elliottower commented Nov 28, 2023

fix: have access to `terminal_observation` in the infos. #233

fix: have access to `terminal_observation` in the infos. #233

jjshoots commented Nov 27, 2023 •

edited

Loading

KaleabTessera commented Nov 27, 2023 •

edited

Loading