Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple view overlaying problem. #12

Open
BHC1205 opened this issue Dec 8, 2022 · 3 comments
Open

Multiple view overlaying problem. #12

BHC1205 opened this issue Dec 8, 2022 · 3 comments

Comments

@BHC1205
Copy link

BHC1205 commented Dec 8, 2022

Thank you for your great work.

  1. I was wondering if the overlap area here is not specially treated, but simply covered with the information of the latter view. Is it because sum or avg doesn't work well, or simply because it wasn't implemented?
    https://github.com/fudan-zvg/DeepInteraction/blob/main/projects/mmdet3d_plugin/models/utils/decoder_utils.py#L758

2.Is it because of its superior performance that DynamicConv is used to replace the cross-Attention module in the decoder?

@Alexander0Yang
Copy link
Collaborator

Alexander0Yang commented Dec 8, 2022

  1. I was wondering if the overlap area here is not specially treated, but simply covered with the information of the latter view. Is it because sum or avg doesn't work well, or simply because it wasn't implemented?
    https://github.com/fudan-zvg/DeepInteraction/blob/main/projects/mmdet3d_plugin/models/utils/decoder_utils.py#L758

Yes, queries in overlap area only keep the information from the latter view in the current implementation, because it is simple and the number of queries in overlap area is negligible. Using information from multiple view will be helpful by intuition, but we haven't tried.

  1. Is it because of its superior performance that DynamicConv is used to replace the cross-Attention module in the decoder?

Yes, replacing vanilla transformer decoder layer with our predictive interaction layer for both modalities can bring better performance as shown in Table 3(a) of our paper.

@BHC1205
Copy link
Author

BHC1205 commented Dec 9, 2022

Thank you for your detailed response.

@BHC1205 BHC1205 closed this as completed Dec 9, 2022
@BHC1205 BHC1205 reopened this Dec 31, 2022
@BHC1205
Copy link
Author

BHC1205 commented Dec 31, 2022

I'm sorry to reopen this issue, but there are still some doubts: I tried using DynamicConv compared to the vanilla transformer, and it is indeed more effective, but I still don't understand why it is effective. Is it because the task being processed is RoI Feature, and I guess vanilla transformer may have difficulty describing its position information? If it is convenient, can you explain the reason why DynamicConv can bring better performance? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants