Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MADDPG with horizon #5913

Closed
nicofirst1 opened this issue Oct 14, 2019 · 22 comments
Closed

MADDPG with horizon #5913

nicofirst1 opened this issue Oct 14, 2019 · 22 comments
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@nicofirst1
Copy link

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, but same error is on macOSX 10.14.06
  • Ray installed from (source or binary): source
  • Ray version: 0.8.0 dev5
  • Python version: python 3.6.9
  • Exact command to reproduce:
    I'm trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.

I've set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:

  • sample_batch_size=100
  • train_batch_size= 400
  • learning_starts =2

Describe the problem

When the policy tries to sample observation from the batch I get the following error

Traceback (most recent call last):
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trial_runner.py", line 438, in _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/ray_trial_executor.py", line 351, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/worker.py", line 2121, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(IndexError): �[36mray_MADDPG:train()�[39m (pid=30410, ip=100.81.9.4)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 421, in train
    raise e
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer.py", line 407, in train
    result = Trainable.train(self)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/tune/trainable.py", line 176, in train
    result = self._train()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/agents/trainer_template.py", line 129, in _train
    fetches = self.optimizer.step()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 142, in step
    self._optimize()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 162, in _optimize
    samples = self._replay()
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/sync_replay_optimizer.py", line 205, in _replay
    dones) = replay_buffer.sample_with_idxes(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 81, in sample_with_idxes
    return self._encode_sample(idxes)
  File "/home/dizzi/.conda/envs/dmas/lib/python3.6/site-packages/ray-0.8.0.dev5-py3.6-linux-x86_64.egg/ray/rllib/optimizers/replay_buffer.py", line 60, in _encode_sample
    data = self._storage[i]
IndexError: list index out of range

I've tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError.
Any idea on how to fix this?

@ericl
Copy link
Contributor

ericl commented Oct 14, 2019

cc @wsjeon

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

Hi. I guess you've used the environment different from OpenAI MPE. Have you tested the code works on MPE and got the same issue? Additionally, do you have the same issue when there are a small number of agents?

@nicofirst1
Copy link
Author

Thanks for the fast reply, I haven't tested the OpenAI environemnt and I do not think I have the time to do that.
Concerning the number of agents, the same problem arises having just 3 agents (1 dpg and 2 maddpg). If i remove the dpg agent another error rises, which is due to the lack of agents using the second policy.

@nicofirst1
Copy link
Author

However I just noticed that having an horizon as low as 15 does not prevent the error, so my previous considerations on it were wrong.

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

@nicofirst1 I've been checking the code with MPE and found there's no error with your dependencies. I couldn't find an error when I vary the size of replay buffer and types of the algorithm (ddpg, maddpg for good and adversary).

One thing I guess is that the error is due to the environment class (rllib.MultiAgentEnv) you defined.

  • Would you please share some information on the environment?
  • Additionally, could you share the full configuration (hyperparameters)?

@nicofirst1
Copy link
Author

@wsjeon the whole project can be found in here. The configuration is stored in the Params class ( under Agent params), while the environment is this one.

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

I just found that you've set leanring_starts=2. Could you please check if the error occurs after setting it larger than train_batch_size?

@nicofirst1
Copy link
Author

I set leanring_starts=train_batch_size and leanring_starts=train_batch_size*2. Still getting the same error

@nicofirst1
Copy link
Author

Currently my parameters look like this:

    sample_batch_size= 10
    train_batch_size= 20
    # number of episode after which the training start, and repeats itself
    learning_starts = train_batch_size

    # number of iterations for training
    training_iteration = 100
    episode_num = 15

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

What is the main code you're running in your project?

@nicofirst1
Copy link
Author

The train script

@nicofirst1
Copy link
Author

I got the _storage length from each policy and they are different, while still keeping the same idxes:

 {'RL_coop_0': 20,
 'RL_coop_4': 30,
 'RL_coop_5': 20,
 'RL_selfish_0': 20,
 'RL_selfish_2': 30,
 'RL_coop_2': 19,
 'RL_selfish_1': 18,
 'RL_coop_1': 10, 
'RL_coop_3': 10}

I'm referring to the replay function

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

It seems like the line you invoke MADDPG agent is this line, but I 'm wondering where training_alg is used.

@nicofirst1
Copy link
Author

nicofirst1 commented Oct 15, 2019

It is used here and here

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

I think you need to check the sample batch size after these lines so that agent-environment interaction gives the experiences with the same length.

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

By the way, I'd like to ask about episode lengths and termination flags. Is the number of agents varying in a single episode?

@nicofirst1
Copy link
Author

The number of agents it's not varying, but after these lines the batch contains just some agents, not all of them. This is why I get different sample sizes.

@nicofirst1
Copy link
Author

Seems like some agents just disappear during episodes

@wsjeon
Copy link
Contributor

wsjeon commented Oct 15, 2019

Hmm, that is quite weird, and definitely that's the source of the error. Can you please check your environment functions as you desired, e.g., the number of observations and actions it took during step?

@nicofirst1
Copy link
Author

I think you're right and the env just removes some agents during the episode, I will fix it and update you as soon as possible

@stale
Copy link

stale bot commented Nov 14, 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 14, 2020
@stale
Copy link

stale bot commented Nov 28, 2020

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Nov 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants