Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Eval] How to choose the best checkpoint in the paper? #6

Open
aopolin-lv opened this issue Aug 21, 2024 · 10 comments
Open

[Eval] How to choose the best checkpoint in the paper? #6

aopolin-lv opened this issue Aug 21, 2024 · 10 comments
Assignees

Comments

@aopolin-lv
Copy link

Hello, after completing the training of the model, I don't know how to choose the right ckpt. So, I would appreciate it if I could answer any questions.

  1. When evaluating and testing, do you execute the eval.py script on the ckpt saved every 10k steps to select the ckpt with the highest score, after the training process completed? Specifically, given the number of training steps is 40k, the ckpt with 10k, 20k, 30k, and 40k will be evaluated one by one, and the ckpt with the highest score will be selected for the final test on the testset.

  2. How can I improve the speed of testing? Specifically, when I run the eval.py script, it takes 1 hour to complete 25 episodes of a single task. The hardware I'm using includes an Intel-8352V CPU with 72 cores and an A800-80G GPU with performance similar to the A100-80G. May I ask what your typical efficiency is when running eval.py?

@aopolin-lv
Copy link
Author

By the way, the valiadation sets has not been accessed from the ftp server. Could you please upload the relevant datasets?

@markusgrotz
Copy link
Owner

  1. Training: I use SLURM for launching my jobs for training/evaluation. Everything is pretty automated. I will provide some details soon. The code / documentation is still work in progress and I hope I can find more time to work on that soon.
  2. Evaluation: That is too slow. Do you have more insights on the setup? Is it a headless setup?
  3. Dataset: I haven actively looking for an alternative hosting option and transfered the data today to https://dataset.cs.washington.edu/fox/bimanual/

Let me know if that helps

@aopolin-lv
Copy link
Author

aopolin-lv commented Aug 24, 2024

Thank you for your reply. The data download is now very convenient! However, I still have some doubts about the time consumption for training/evaluating.

  1. Training: I used the bimanual_peract configuration with a batch_size of 4, which occupied about 46GB of GPU memory. Training for 40k iterations took approximately 15-16 hours.
  2. Evaluating: I used 25 episodes, with each task taking about 1 hour.
    Everything was done under the headless setting.

The paper mentions that using the bimanual setting would result in a total training time of about 54 hours. However, my single-task training takes 15 hours, and the total training time for all tasks is 15 * 13 = 195 hours, which far exceeds the time reported in the paper. Is there anything I should improve? And the evaluating period also takes too much time, how can I do to reduce the cost?

@markusgrotz
Copy link
Owner

That's great that you're able to train the network! Just to clarify, the paper doesn't mention the total training cost; instead, Table 4 reports the average training time. To estimate your total training time, you'd multiply this average by the number of tasks you’re running. Given that your setup may differ in hardware or other configurations, it's also expected that the actual time might vary.

Regarding the evaluation, I assume this is due to the headless mode, but I need more information about this. Is this some kind of HPC system? Happy to chat since to speed things up

@aopolin-lv
Copy link
Author

I apologize for mistakenly considering the average task training time in the paper as the total training time. So far, I have only completed the training for the coordinated_lift_ball task and have not yet conducted a full test to verify the effectiveness of the training. Additionally, I am not very familiar with HPC. I am using a regular GPU computing server without any special modifications. By the way, could you please provide the specific configurations for training and validation? This would help us troubleshoot in case any issues arise.

@aopolin-lv
Copy link
Author

Hi, could you release the model checkpoints (including ACT/RVT-LF/Peract-LF/Peract^2) for reimplementing the result cited in the paper?

@markusgrotz
Copy link
Owner

Hi aopolin-lv,

I have the first results for multi-task training! I will update the webpage with the results soon. But I would like to first finish the documentation. I can also share my checkpoints then

Let me know if you have any further questions.

Kind regards,
Markus

@aopolin-lv
Copy link
Author

aopolin-lv commented Oct 23, 2024

Hello, I have a question. How do I get the initialization positions and rotation matrices of the two robotic arms in the scene? This way, I can use the initial position and rotation matrix to get the end effect position.

@markusgrotz
Copy link
Owner

Hi aopolin-lv,

The end-effector pose is in the data-set. Unfortunately, the pose of the robot arms is not stored. But this one is static so I can get this for you. I am writing a small tool to display the trajectory of the end-effectors and a point cloud. I hope I will be done with the documentation soon.

Kind regards, Markus

PS: I am still working on the multi-task training and I hope I have some results soon.

@markusgrotz markusgrotz self-assigned this Oct 24, 2024
@markusgrotz
Copy link
Owner

Hi,

Robot poses: What would be your reference coordinate system? Do you need something like this?

image

Scene visualization:

I have added a tool to visualize a scene including the trajectory of the end-effectors. You can find it here

https://github.com/markusgrotz/RLBench/blob/main/tools/visualize_dataset.py

Does this help you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants