-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MADDPG with horizon #5913
Comments
cc @wsjeon |
Hi. I guess you've used the environment different from OpenAI MPE. Have you tested the code works on MPE and got the same issue? Additionally, do you have the same issue when there are a small number of agents? |
Thanks for the fast reply, I haven't tested the OpenAI environemnt and I do not think I have the time to do that. |
However I just noticed that having an horizon as low as 15 does not prevent the error, so my previous considerations on it were wrong. |
@nicofirst1 I've been checking the code with MPE and found there's no error with your dependencies. I couldn't find an error when I vary the size of replay buffer and types of the algorithm (ddpg, maddpg for good and adversary). One thing I guess is that the error is due to the environment class (rllib.MultiAgentEnv) you defined.
|
@wsjeon the whole project can be found in here. The configuration is stored in the Params class ( under Agent params), while the environment is this one. |
I just found that you've set |
I set leanring_starts=train_batch_size and leanring_starts=train_batch_size*2. Still getting the same error |
Currently my parameters look like this:
|
What is the main code you're running in your project? |
The train script |
I got the _storage length from each policy and they are different, while still keeping the same idxes:
I'm referring to the replay function |
It seems like the line you invoke MADDPG agent is this line, but I 'm wondering where |
I think you need to check the sample batch size after these lines so that agent-environment interaction gives the experiences with the same length. |
By the way, I'd like to ask about episode lengths and termination flags. Is the number of agents varying in a single episode? |
The number of agents it's not varying, but after these lines the batch contains just some agents, not all of them. This is why I get different sample sizes. |
Seems like some agents just disappear during episodes |
Hmm, that is quite weird, and definitely that's the source of the error. Can you please check your environment functions as you desired, e.g., the number of observations and actions it took during |
I think you're right and the env just removes some agents during the episode, I will fix it and update you as soon as possible |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
System information
I'm trying to use the MADDPG algorithm to train 180 agents, divided into 60 agents with a dpg policy and 120 with a maddpg one.
I've set the horizon at 1500, but I would like to use 4000 later on, while the batches are the following:
Describe the problem
When the policy tries to sample observation from the batch I get the following error
I've tried to change the above mentioned parameter but the only one that seems to make a difference is the horizon, which (if set to <=15) does not trig the IndexError.
Any idea on how to fix this?
The text was updated successfully, but these errors were encountered: