-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About GPU memory usage #10
Comments
Hi @Fan-Yixuan Thanks for your interest in our work. I have tried training TransFusion on 8 3090GPUs and it could fit into the memory, not sure what happens in your environment. But you could try to use spconv 1.2 to reduce the memory. The spconv is used in TransFusion/mmdet3d/ops/spconv/__init__.py Lines 14 to 20 in 5337046
by something like
|
Thanks a lot for your help, I'm using the latest spconv 2.1.21 and now I can train 200 queries with one sample per 3090 using ~22GB memory. While 2 samples per GPU is still not achievable. I will keep exploring to better solve this problem! |
@XuyangBai Hi dear author, I would like to ask if TransFusion's prediction heads do not contain branches for attribute prediction (moving, stopped, parked vehicle, etc.). I'm not familiar with this task (nuScenes), why does it work like this instead of reducing AAE by adding such branches. |
I basically follow the mmdet3d and achieve the attribute prediction using some post-processing rules, check the code here:
|
Yes I noticed, but it seems strange to directly use the default attribute, is there any official statement as to why this is done? |
Ah sorry I just use it as the de-facto, never carefully think about this issue |
Ok, since mmd3d implements it like this, it should have its own reason 2333333 |
@XuyangBai Hi dear author, I finished training transfusion_nusc_voxel_L and got val set performance of 64.63mAP/69.99NDS. The previous problem about GPU memory has been solved, which was because the images were not resized to 448*800 due to some version issue. |
TransFusion/mmdet3d/models/dense_heads/transfusion_head.py Lines 944 to 948 in 5187414
BTW, I just realize that this might be the problem: here I assume the BS=1 is for evaluation time so I skip the |
I'm using the latest version of the code, and I'm using 2 samples per GPU, and I have another question, RandomFlip3D's parent class, RandomFlip, doesn't support flipping a list of images, will it matter? |
Yes, It might be the reason. If the |
My concern is that maybe |
@Fan-Yixuan I find |
It is really weird that the bbox loss turns to increase at some point, the curve before 10k looks normal. I am not sure the reason but maybe you can first verify the projection of object queries onto the image through some visualization? If the lidar and image are not aligned well, the image feature attached to the object queries will be wrong. BTW, you mentioned the mATE, mASE, mAOE are all increasing, so how about mAP? |
The first three epochs after adding camera, mAP: 62.49, 58.86, 59.66. I feel that the loss turns to increase is probably because the learning rate becomes larger (I use 4*3090 with 2 samples per GPU so I forward propagation twice and then update the parameters to make batch size equals 16, thus learning rate reaches a maximum at around 40k iters) Do you think this is normal if the LiDAR branch is not frozen? |
The learning rate should not be the reason. I have also tried to use batch_size 8*1. Yes, I freeze the LiDAR branch during training of TransFusion as it is already well trained in the first stage. If you would like to jointly optimize the lidar branch and the fusion component, maybe they should be operimized in different learning rates. |
Yes, the order of images does not affect a lot but freezing the backbone did. |
Ok, thank you for your patience and your excellent work, I close this issue. |
@Fan-Yixuan Hello! Could you tell me the max learning rate in your training step of the first stage and second stage separately? |
Hi, my experiment follows the code given by the author TransFusion/configs/transfusion_nusc_voxel_L.py Lines 244 to 250 in 5337046
TransFusion/configs/transfusion_nusc_voxel_LC.py Lines 246 to 252 in 5337046
|
OK! Thanks! |
Hello! @XuyangBai May I ask about this apply_3d_transformation is only used in projecting 3D to 2D query, but is not used in adding the BEV lidar feature and BEV image feature for image guided query initialization. Will this be a mismatch between lidar and image modalities due to the Radomflip3d and GlobalRotScaleTrans? |
Hi @nmll That's a very good question that I didn't realize previously. Intuitively, the point clouds should also be transformed using the inversion of data augmentation when projecting image features onto the BEV plane (or equivalently, I should perform a similar rotation and flip to images, which is somewhat complicated). However, the network still works under the current settings. My guess is that the network is able to 1) leverage the contextual relationship (between image features and LiDAR features) to associate the two sets of features and thus perform the projection, and 2) ignore the geometry relationship brought by the position encodings of image features and LiDAR features. Furthermore, I have run another experiment that removes the RandomFlip and GlobalRotScaleTrans during training to see whether forcing the two modalities to be consistent will further improve the results. In this case, the network could also leverage the geometry relationship to build the association. The observation is that: the training loss is decreasing more rapidly compared with the previous setting. The blue curve in the following figure is the one without RandomFlip & GlobalRotScaleTrans while the gray curve is the original one. However, the final mAP and NDS is similar. So I assume that removing these two augmentations will increase the convergence speed but the final performance might be already saturated (although the heatmap_loss could be further reduced, the object queries selected by the heatmap are already with good locations, so the improvement is not remarkable in terms of final mAP and NDS) I will remove the Best, |
Hello @Fan-Yixuan Can you tell what you did to solve the version issue? I am facing the same problem now. |
Hi @heming7, you need to make sure that TransFusion/mmdet3d/datasets/pipelines/loading.py Lines 187 to 190 in 8977b2b
|
Hello Yixuan Thank you for the suggestion. I checked the code and I think the author has pushed a commit that fixes this. But I manage to run it by reducing the value samples_per_gpu. Anyway, thank you so much for the help! |
Hi, @Fan-Yixuan, could you please share your torch/cuda/mmdet3d/spconv environment you've used to reproduce the nusc val performance (64.63mAP and 69.99NDS)? It seems that you used 8*3090 with batch size 2 and lr 1e-4? |
Hi my env: #10 (comment), my spconv: 2.1.21 my total batchsize: 16, lr: 1e-4 |
|
@yinjunbo |
Thank you very much! |
@yinjunbo Sorry I didn't save the training logs before modifying the coordinate system, but if the coordinate is not aligned, it should work very poorly. |
I totally agree with you. Since my repreoced performance is just slightly lower (~2 points) than yours, this could not be caused by coordinate system. I'll continue to find the problems. tks! |
@Fan-Yixuan Hello, I am trying to reproduce transfusion in mmdet3d-1.1.0. But I got the wrong result in training lidar-camera fusion stage. Could you please share your training log for this stage, thanks! (email: [email protected]) |
Thanks for your great work! I am trying to reimplement your work with the new version (v1.0.0) of mmd3d, my environment:
I have dealt with the coordinate system refactoring problem and also the
img_fields
issue, but I can only train with up to 50 query proposals while with one sample per 24GB RTX3090 GPU, using the default config (nuscenes, Lidar and camera, R50FPN, second lidar backbone, 200queries) will encounter CUDA OOM.Noting your practice #6 (comment), I hereby seek help. I also didn't notice your use of spconv, hope you can provide more details.
Thanks a lot.
The text was updated successfully, but these errors were encountered: