-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skipping observation in multi agent env #6757
Comments
I think you should be able to model this using the multi-agent API without any changes to rllib. In your MultiAgentEnv class
Does that work? |
Thank you for the fast reply. 1) Emitting empty obs for villagers during night timeIn this case the observation dictionary stays constant in the number of elements (agent ids). Edit 1: Using default preprocessorUsing the default preprocessor yields the following error: 2) Not emitting observation for villagers during night timeIn this case the observation dict id dynamic, e.g. the number of agent ids changes during steps. Let me know if I misunderstood you in some way. |
I manage to fixed the At the moment I am getting a shape error:
Where 6 is the number of players. Changing the number of player to 8 yields the same error : |
I mean omitting the key for the player entirely. For example: during day: {"player1": obs1a, "werewolf1": obs1b}. During night: just {"werewolf1": obs2b}.
Yeah, you can't emit rewards if there are no obs. The reward must be delayed to the next step (whenever an obs shows up). Edit: Ah, I see this is resolved. |
Not sure what's going on with the gradient error (probably some incorrect shape emitted as an observation). Is it possible to post a script to run? |
Sorry for the late reply,
Rather then :
|
Moreover the second solution seems to work for the issue so we could consider the issue closed |
Describe your feature request
I am working on an implementation of the warewolf game using the rllib wrapper for gym multi agent envs. In this game there are wolves and villagers.
The game is divided into night and day phase.
During day every agent can perform an action while during night only wolves can.
Precisely, night observation should not be visible to villager agents.
I have an observation which specify the current phase and would like to filter out night observation for the latter case.
Is there a way to implement it easily?
What have I tried
I tried modifying the _process_observations function adding a line after line 403. Using a custom Preprocessor I am able to return None if the current observation should be discarded (given an agent id). Then if the processed observation is none just skip the step with:
I don't know if this implementation if conceptually correct or if there is another way to do it.
Please let me know.
Edit 1
Applying the previous method yields:
{ValueError}The environment terminated for all agents, but we still don't have a last observation for agent villager_2 (policy vill_p). Please ensure that you include the last observations of all live agents when setting '__all__' done to True. Alternatively, set no_done_at_end=True to allow this.
In here.
The text was updated successfully, but these errors were encountered: