-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RT export questions #30
Comments
The pred_cameras.R and pred_cameras.T are in ndc. If you use the similar codes as here for visualization they should be exactly the same. You can use this to construct new cameras using the saved R and t. Focal length can be omitted for visualization. PoseDiffusion/pose_diffusion/demo.py Lines 134 to 136 in 57d6444
|
Yeah, I know. What I want to do is extract the pose information RT for other back-end use, such as instant-ngp, but this RT is obviously the format of ndc, so I localized RT, after the second visualization and the original output is different, I was a little confused. Is there a way to get RT in non-NDC format? |
Please refer to this issue #9 where I provided an example on how to convert ndc RT to colmap RT. |
Thank you for your reply. I have solved the problem of pose extraction. At present, I am trying to train the data of co3d by myself and want to simply reproduce the results. Due to the large amount of data, I tried the training of one class (109GB), but the result was very poor, and I felt that there might be some overfitting. Can you briefly reveal some training strategies, such as the composition of the data? Maybe I'll try to make the result as good as possible considering the time cost? Which other indicator is more important, I mainly look at the auc indicator. |
Well in my experience, if you want to train on one category (such as teddybear), one GPU is enough. You may need to change the lr correspondingly, and other hyperparameters should be almost the same. From my observation, the model trained for on category would perform better on the corresponding test set than the multi-category model (note means trained on teddybear and tested on teddybear, although the testing scenes are never shown). The suggestion would be (1) try from teddybear because I tried it before (2) you can even start from forcing the model to be overfitted on one scene (3) in most cases, lr is the most important hyperparameter. Racc@15 and Tacc@15 are the indicators I care about most. |
Thanks for your reply! The default stored weight file is model.safetensors. How can i convert these into pth files that i can load directly with torch?Is loss reduced to 0.03, racc_15 and tacc_15 to 0.8 and 0.7 respectively a good result(only 1 class)? |
I am a bit confused by "The default stored weight file is model.safetensors". In our code, the trained checkpoint should be saved by below: PoseDiffusion/pose_diffusion/train.py Line 146 in 57d6444
You should be able to find the pth files in the corresponding path, and reload it by accelerator or torch itself. Please refer to the documents of accelerator for details. racc_15 around 80% is not bad, at least it means something has been learned correctly. |
@sungh66 by the way, if you mean racc_15 for the training data, it should be usually more than 90% or even 95% for one category training. |
The default default_train.yaml does not have exp_dir keyword, I added it myself. The trained files are(ckpt_000055$ls) |
I am not sure why it is saved as model.safetensors (probably some version difference), while the operations may be checked here: https://huggingface.co/docs/safetensors/index. Or I think you can directly use the built-in function accelerator.load_state (https://huggingface.co/docs/accelerate/v0.26.0/en/usage_guides/checkpoint#checkpointing):
|
Thanks for your reply! Demo.py didn't seem to be able to directly load the weights for my multi-gpu training, so I tried modifying demo.py according to the Accelerator documentation and successfully loaded the weights, but I didn't have gt-cameras.npz. The GT information here doesnt affect the RT result generation,right? Info is as follows: |
Hey great to hear that you have resolved it. Yes the GT information here doesnt affect the RT result generation, please just skip the corresponding codes. While, the GGS log seems not correct here. The shown sampson error is around 10, but ideally it should be reduced to some numbers close to 1 or 2. It seems you have changed the GGS setting, from using GGS since t=10 to t=1, or any other settings. Please note that it is very likely that a sampson error of 10 will not lead to a good camera pose. |
Thank you for reminding me. The problem is that there are still some errors in multi-gpu prediction. I changed it to single card loading multi-card weight and made some key modifications. Now the sampson results are completely normal. Could you please tell me how much epoch it took for the complete co3d(5.5TB) training? Is the len_train 16384? I would like to make a simple estimate of GPU computing resources , time and purchase equipment before repeating it. It would be great if you could tell me! |
Hey I did not remember exactly how many epochs it was trained, but it took around 2-3 days on 8 A100 GPUs. |
To avoid safetensors, you can set:
|
I saved the RT locally through pred_cameras.R and pred_cameras.T respectively and used visdom to visualize it. Then there is a big difference between the camera direction and the camera direction visualized with pred_ameras. Why is this? Is this saved RT in ndc format? What should I do if I want a c2w RT?
The text was updated successfully, but these errors were encountered: