Skipping observation in multi agent env #6757

nicofirst1 · 2020-01-09T14:44:21Z

Describe your feature request

I am working on an implementation of the warewolf game using the rllib wrapper for gym multi agent envs. In this game there are wolves and villagers.

The game is divided into night and day phase.
During day every agent can perform an action while during night only wolves can.
Precisely, night observation should not be visible to villager agents.
I have an observation which specify the current phase and would like to filter out night observation for the latter case.
Is there a way to implement it easily?

What have I tried

I tried modifying the _process_observations function adding a line after line 403. Using a custom Preprocessor I am able to return None if the current observation should be discarded (given an agent id). Then if the processed observation is none just skip the step with:

 if prep_obs is None:
                continue

I don't know if this implementation if conceptually correct or if there is another way to do it.
Please let me know.

Edit 1

Applying the previous method yields:
{ValueError}The environment terminated for all agents, but we still don't have a last observation for agent villager_2 (policy vill_p). Please ensure that you include the last observations of all live agents when setting '__all__' done to True. Alternatively, set no_done_at_end=True to allow this.
In here.

The text was updated successfully, but these errors were encountered:

ericl · 2020-01-09T18:33:11Z

I think you should be able to model this using the multi-agent API without any changes to rllib.

In your MultiAgentEnv class

during day phase: obs dict with all agent ids as keys is emitted. All agents return actions.
during night phase: obs dict with only wolf agent ids is emitted. Only wolves return actions.
termination: include an empty obs for all agent ids when setting all done

Does that work?

nicofirst1 · 2020-01-10T11:25:26Z

Thank you for the fast reply.
I didn't quite get what you are suggesting but it should be one of the following.

1) Emitting empty obs for villagers during night time

In this case the observation dictionary stays constant in the number of elements (agent ids).
However I get the following error
ValueError: Cannot feed value of shape (3, 0) for Tensor 'vill_p/observation:0', which has shape '(?, 32)'
Since I am trying to feed an empty list instead of the required size (32). I should add that I am using a custom Preprocessor class which is the one setting the initial size, but I think the predefined one will lead to the same error.

Edit 1: Using default preprocessor

Using the default preprocessor yields the following error:
ValueError: ('Observation outside expected value range', Dict(day:Discrete(1000), phase:Discrete(4), status_map:MultiBinary(5), targets:Box(5, 5)), {})
Which is kind of obvious since empty dict is different from the full one.

2) Not emitting observation for villagers during night time

In this case the observation dict id dynamic, e.g. the number of agent ids changes during steps.
For this one I get:
ValueError: Key set for obs and rewards must be the same: dict_keys(['werewolf_0', 'werewolf_1']) vs dict_keys(['werewolf_0', 'werewolf_1', 'villager_2', 'villager_3', 'villager_4'])
coming from the base env

Let me know if I misunderstood you in some way.

nicofirst1 · 2020-01-10T14:24:25Z

I manage to fixed the
ValueError: Key set for obs and rewards must be the same: dict_keys(['werewolf_0', 'werewolf_1']) vs dict_keys(['werewolf_0', 'werewolf_1', 'villager_2', 'villager_3', 'villager_4'])
In solution 2 by not returning rewards for villagers during night phase.

At the moment I am getting a shape error:

File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1367, in _do_call
    return fn(*args)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1352, in _run_fn
    target_list, run_metadata)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1445, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [120] vs. [6]
	 [[{{node vill_p_1/tower_1/gradients_1/vill_p_1/tower_1/add_7_grad/BroadcastGradientArgs}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/giulia/Desktop/rl-werewolf/src/tests/simple_policy.py", line 73, in <module>
    trainer.train()
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 447, in train
    raise e
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 433, in train
    result = Trainable.train(self)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_optimizer.py", line 204, in step
    self.per_device_batch_size)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_impl.py", line 260, in optimize
    return sess.run(fetches, feed_dict=feed_dict)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 960, in run
    run_metadata_ptr)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1183, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1361, in _do_run
    run_metadata)
  File "/usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1386, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [120] vs. [6]
	 [[node vill_p_1/tower_1/gradients_1/vill_p_1/tower_1/add_7_grad/BroadcastGradientArgs (defined at /usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/agents/ppo/ppo_policy.py:211) ]]

Where 6 is the number of players.

Changing the number of player to 8 yields the same error :
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [120] vs. [8] [[node vill_p_1/tower_1/gradients_1/vill_p_1/tower_1/add_9_grad/BroadcastGradientArgs (defined at usr/local/anaconda3/envs/ww/lib/python3.6/site-packages/ray/rllib/agents/ppo/ppo_policy.py:211) ]]

ericl · 2020-01-10T17:38:01Z

In this case the observation dictionary stays constant in the number of elements (agent ids).
However I get the following error

I mean omitting the key for the player entirely. For example: during day: {"player1": obs1a, "werewolf1": obs1b}. During night: just {"werewolf1": obs2b}.

Key set for obs and rewards must be the same.

Yeah, you can't emit rewards if there are no obs. The reward must be delayed to the next step (whenever an obs shows up).

Edit: Ah, I see this is resolved.

ericl · 2020-01-10T17:57:32Z

Not sure what's going on with the gradient error (probably some incorrect shape emitted as an observation). Is it possible to post a script to run?

nicofirst1 · 2020-01-11T10:23:25Z

Sorry for the late reply,
I manage to solve the problem by running

   analysis = tune.run(
        "PG",
        local_dir=Params.RAY_DIR,
        config=configs,
        trial_name_creator=trial_name_creator,

    )

Rather then :

trainer = PGTrainer(configs, PolicyWw)
for i in tqdm(range(20)):
    trainer.train()

nicofirst1 · 2020-01-11T10:23:52Z

Moreover the second solution seems to work for the issue so we could consider the issue closed

nicofirst1 added the enhancement Request for new feature and/or capability label Jan 9, 2020

ericl added question Just a question :) and removed enhancement Request for new feature and/or capability labels Jan 9, 2020

ericl closed this as completed Jan 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skipping observation in multi agent env #6757

Skipping observation in multi agent env #6757

nicofirst1 commented Jan 9, 2020 •

edited

Loading

ericl commented Jan 9, 2020 •

edited

Loading

nicofirst1 commented Jan 10, 2020 •

edited

Loading

nicofirst1 commented Jan 10, 2020 •

edited

Loading

ericl commented Jan 10, 2020 •

edited

Loading

ericl commented Jan 10, 2020

nicofirst1 commented Jan 11, 2020

nicofirst1 commented Jan 11, 2020

Skipping observation in multi agent env #6757

Skipping observation in multi agent env #6757

Comments

nicofirst1 commented Jan 9, 2020 • edited Loading

Describe your feature request

What have I tried

Edit 1

ericl commented Jan 9, 2020 • edited Loading

nicofirst1 commented Jan 10, 2020 • edited Loading

1) Emitting empty obs for villagers during night time

Edit 1: Using default preprocessor

2) Not emitting observation for villagers during night time

nicofirst1 commented Jan 10, 2020 • edited Loading

ericl commented Jan 10, 2020 • edited Loading

ericl commented Jan 10, 2020

nicofirst1 commented Jan 11, 2020

nicofirst1 commented Jan 11, 2020

nicofirst1 commented Jan 9, 2020 •

edited

Loading

ericl commented Jan 9, 2020 •

edited

Loading

nicofirst1 commented Jan 10, 2020 •

edited

Loading

nicofirst1 commented Jan 10, 2020 •

edited

Loading

ericl commented Jan 10, 2020 •

edited

Loading