MADDPG with horizon #5913

nicofirst1 · 2019-10-14T11:06:36Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, but same error is on macOSX 10.14.06
Ray installed from (source or binary): source
Ray version: 0.8.0 dev5
Python version: python 3.6.9
Exact command to reproduce:
I'm trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.

I've set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:

sample_batch_size=100
train_batch_size= 400
learning_starts =2

Describe the problem

When the policy tries to sample observation from the batch I get the following error

Traceback (most recent call last):
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trial_runner.py", line 438, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): �[36mray_MADDPG:train()�[39m (pid=30410, ip=100.81.9.4)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 421, in train
    raise e
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 142, in step
    self._optimize()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 162, in _optimize
    samples = self._replay()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 205, in _replay
    dones) = replay_buffer.sample_with_idxes(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 81, in sample_with_idxes
    return self._encode_sample(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 60, in _encode_sample
    data = self._storage[i]
IndexError: list index out of range

I've tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError.
Any idea on how to fix this?

The text was updated successfully, but these errors were encountered:

ericl · 2019-10-14T21:20:24Z

cc @wsjeon

wsjeon · 2019-10-15T00:18:20Z

Hi. I guess you've used the environment different from OpenAI MPE. Have you tested the code works on MPE and got the same issue? Additionally, do you have the same issue when there are a small number of agents?

nicofirst1 · 2019-10-15T09:28:43Z

Thanks for the fast reply, I haven't tested the OpenAI environemnt and I do not think I have the time to do that.
Concerning the number of agents, the same problem arises having just 3 agents (1 dpg and 2 maddpg). If i remove the dpg agent another error rises, which is due to the lack of agents using the second policy.

nicofirst1 · 2019-10-15T09:29:37Z

However I just noticed that having an horizon as low as 15 does not prevent the error, so my previous considerations on it were wrong.

wsjeon · 2019-10-15T13:56:33Z

@nicofirst1 I've been checking the code with MPE and found there's no error with your dependencies. I couldn't find an error when I vary the size of replay buffer and types of the algorithm (ddpg, maddpg for good and adversary).

One thing I guess is that the error is due to the environment class (rllib.MultiAgentEnv) you defined.

Would you please share some information on the environment?
Additionally, could you share the full configuration (hyperparameters)?

nicofirst1 · 2019-10-15T14:06:43Z

@wsjeon the whole project can be found in here. The configuration is stored in the Params class ( under Agent params), while the environment is this one.

wsjeon · 2019-10-15T14:46:45Z

I just found that you've set leanring_starts=2. Could you please check if the error occurs after setting it larger than train_batch_size?

nicofirst1 · 2019-10-15T15:07:25Z

I set leanring_starts=train_batch_size and leanring_starts=train_batch_size*2. Still getting the same error

nicofirst1 · 2019-10-15T15:09:30Z

Currently my parameters look like this:

    sample_batch_size= 10
    train_batch_size= 20
    # number of episode after which the training start, and repeats itself
    learning_starts = train_batch_size

    # number of iterations for training
    training_iteration = 100
    episode_num = 15

wsjeon · 2019-10-15T15:22:19Z

What is the main code you're running in your project?

nicofirst1 · 2019-10-15T15:23:30Z

The train script

nicofirst1 · 2019-10-15T15:28:12Z

I got the _storage length from each policy and they are different, while still keeping the same idxes:

 {'RL_coop_0': 20,
 'RL_coop_4': 30,
 'RL_coop_5': 20,
 'RL_selfish_0': 20,
 'RL_selfish_2': 30,
 'RL_coop_2': 19,
 'RL_selfish_1': 18,
 'RL_coop_1': 10, 
'RL_coop_3': 10}

I'm referring to the replay function

wsjeon · 2019-10-15T15:30:16Z

It seems like the line you invoke MADDPG agent is this line, but I 'm wondering where training_alg is used.

nicofirst1 · 2019-10-15T15:32:00Z

It is used here and here

wsjeon · 2019-10-15T15:37:05Z

I think you need to check the sample batch size after these lines so that agent-environment interaction gives the experiences with the same length.

wsjeon · 2019-10-15T15:46:12Z

By the way, I'd like to ask about episode lengths and termination flags. Is the number of agents varying in a single episode?

nicofirst1 · 2019-10-15T15:47:57Z

The number of agents it's not varying, but after these lines the batch contains just some agents, not all of them. This is why I get different sample sizes.

nicofirst1 · 2019-10-15T15:50:52Z

Seems like some agents just disappear during episodes

wsjeon · 2019-10-15T15:58:10Z

Hmm, that is quite weird, and definitely that's the source of the error. Can you please check your environment functions as you desired, e.g., the number of observations and actions it took during step?

nicofirst1 · 2019-10-15T16:44:03Z

I think you're right and the env just removes some agents during the episode, I will fix it and update you as soon as possible

stale · 2020-11-14T21:03:39Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2020-11-28T21:36:28Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

nicofirst1 mentioned this issue Oct 15, 2019

Add synchronize_sampling to config #5920

Closed

2 tasks

nicofirst1 mentioned this issue Oct 15, 2019

Keeping agents in osm map during episode flow-project/flow#755

Closed

nicofirst1 mentioned this issue Oct 23, 2019

Example of how to reroute all vehicles [DO NOT MERGE] flow-project/flow#759

Open

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 14, 2020

stale bot closed this as completed Nov 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MADDPG with horizon #5913

MADDPG with horizon #5913

nicofirst1 commented Oct 14, 2019

ericl commented Oct 14, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019 •

edited

Loading

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019 •

edited

Loading

wsjeon commented Oct 15, 2019

wsjeon commented Oct 15, 2019 •

edited

Loading

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

stale bot commented Nov 14, 2020

stale bot commented Nov 28, 2020

MADDPG with horizon #5913

MADDPG with horizon #5913

Comments

nicofirst1 commented Oct 14, 2019

System information

Describe the problem

ericl commented Oct 14, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019 • edited Loading

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019 • edited Loading

wsjeon commented Oct 15, 2019

wsjeon commented Oct 15, 2019 • edited Loading

nicofirst1 commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

wsjeon commented Oct 15, 2019

nicofirst1 commented Oct 15, 2019

stale bot commented Nov 14, 2020

stale bot commented Nov 28, 2020

wsjeon commented Oct 15, 2019 •

edited

Loading

nicofirst1 commented Oct 15, 2019 •

edited

Loading

wsjeon commented Oct 15, 2019 •

edited

Loading