Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about location tokens #4

Open
Deephome opened this issue Jul 10, 2023 · 4 comments
Open

Questions about location tokens #4

Deephome opened this issue Jul 10, 2023 · 4 comments

Comments

@Deephome
Copy link

Hi, your work is great! But I am confused about the location tokens you used in Decoder, could you provide more details it?

@kahnchana
Copy link

Same here, I am trying to figure out what they are.

Are they a fixed grid or some learnable parameter?

@kahnchana
Copy link

I think it appears to be the same grid like structure used in deformable DETR. Basically it's a uniform grid across image coordinates, and each grid centre is used as an anchor, over which the model regresses the deviation of correct bbox.

@SxJyJay
Copy link

SxJyJay commented Jan 19, 2024

Basically it's a uniform grid across image coordinates, and each grid centre is used as an anchor, over which the model regresses the deviation of correct bbox.

Hi, I have a similar problem with you. If VisionLLM uses the Deformable DETR-like decoder, and object queries act as positional anchors, the Hungarian matching is required to assign GT boxes to object queries. However, the authors don't mention that in the paper. What do you think of the possible training details of these object queries?

@chagmgang
Copy link

chagmgang commented Feb 6, 2024

Basically it's a uniform grid across image coordinates, and each grid centre is used as an anchor, over which the model regresses the deviation of correct bbox.

Hi, I have a similar problem with you. If VisionLLM uses the Deformable DETR-like decoder, and object queries act as positional anchors, the Hungarian matching is required to assign GT boxes to object queries. However, the authors don't mention that in the paper. What do you think of the possible training details of these object queries?

Also, Same here.

#7 (comment)

maybe below link will be helpful

https://openreview.net/forum?id=Vx1JadlOIt&noteId=616Bhd6O5S

maybe below image will be helpful too

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants