Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position embedding in the DETR model #19833

Closed
2 of 4 tasks
SamuelCahyawijaya opened this issue Oct 24, 2022 · 11 comments
Closed
2 of 4 tasks

Position embedding in the DETR model #19833

SamuelCahyawijaya opened this issue Oct 24, 2022 · 11 comments
Assignees

Comments

@SamuelCahyawijaya
Copy link

System Info

According to the argument definition of the DetrDecoderLayer.forward() specified here:

position_embeddings (`torch.FloatTensor`, *optional*):
position embeddings that are added to the queries and keys
in the cross-attention layer.
query_position_embeddings (`torch.FloatTensor`, *optional*):
position embeddings that are added to the queries and keys
in the self-attention layer.

The positional_embeddings argument for the cross-attention should be assigned by the position_embeddings variable instead of query_position_embeddings .

hidden_states, cross_attn_weights = self.encoder_attn(
hidden_states=hidden_states,
position_embeddings=query_position_embeddings,
key_value_states=encoder_hidden_states,
attention_mask=encoder_attention_mask,
key_value_position_embeddings=position_embeddings,
output_attentions=output_attentions,
)

Is this an error in the argument definition or the code part?

Thank you!

Who can help?

@NielsRogge

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

It is from the transformers code.

Arguments definition:

position_embeddings (`torch.FloatTensor`, *optional*):
position embeddings that are added to the queries and keys
in the cross-attention layer.
query_position_embeddings (`torch.FloatTensor`, *optional*):
position embeddings that are added to the queries and keys
in the self-attention layer.

Cross-attention code:

hidden_states, cross_attn_weights = self.encoder_attn(
hidden_states=hidden_states,
position_embeddings=query_position_embeddings,
key_value_states=encoder_hidden_states,
attention_mask=encoder_attention_mask,
key_value_position_embeddings=position_embeddings,
output_attentions=output_attentions,
)

Expected behavior

Either:

  1. The positional_embeddings argument for the cross-attention should be assigned by the position_embeddings variable instead of query_position_embeddings , or
  2. Update the documentation of the argument to the correct one.
@NielsRogge NielsRogge self-assigned this Oct 31, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@sgugger
Copy link
Collaborator

sgugger commented Dec 5, 2022

Hey @NielsRogge , could you explain how to solve this issue? You just put the Goo first issue label on it but it's not clear what a contributor would have to do to fix it.

@daspartho
Copy link
Contributor

Hi @NielsRogge, I would like to take on this.
As Sylvain suggested, could you offer some context on how to go with this?
Thanks :)

@NielsRogge
Copy link
Contributor

Yeah I marked this as good first issue as someone could take a deeper dive into DETR's position embeddings.

Reading the paper for that could definitely be helpful. But the implementation is correct, it's probably internal variables/docstrings that need to be updated. From the paper:

Since the decoder is also permutation-invariant, the N input embeddings must be different to produce different results. These input embeddings are learnt positional encodings that we refer to as object queries, and similarly to the encoder, we add them to the input of each attention layer.

So the position_embeddings argument of the cross-attention layer are exactly these input embeddings, often also called "content embeddings" or "object queries".

Then a bit later on in the paper they state:

There are two kinds of positional encodings in our model: spatial positional encodings and output positional encodings (object queries).

So the key_value_position_embeddings arguments of the cross-attention layer refer to these spatial position encodings. These are added to the keys and values in the cross-attention operation.

So we could for clarity update the "position_embeddings" argument to "object_queries", and the "key_value_position_embeddings" argument to "spatial_position_embeddings"

@adit299
Copy link
Contributor

adit299 commented Jan 22, 2023

Hello @daspartho @NielsRogge , wanted to inquire as to whether any progress was made on this? I'd like to take a look.

@Lorenzobattistela
Copy link
Contributor

Hello @NielsRogge , I am currently working on this issue. I've read the article and I do understand what has to be changed. My question is if we only have to change the DetrDecoderLayer class (in the respective forward function mentioned above or al position_embeddings args have to change too.

I did some local tests too, and noted that changing only in the function forward i mentioned to object_queries and spatial_position_embeddings, many tests broke because of wrong arguments passed since names changed. In order to change these arguments, we need to change them in tests?

I looked up some tests, but I do think the problem is in the code itself, since classes related to that one would be passing arguments wrongly.

This is my first contribution to an open source project this size, and I'm really happy to do it. Thanks in advance.

@hackpk
Copy link
Contributor

hackpk commented Jul 13, 2023

Hey @NielsRogge is this issue still open? If yes can I take this?

@Lorenzobattistela
Copy link
Contributor

Hey @hackpk I'm finishing touches in my PR to fix this Issue, so Idk...

@hackpk
Copy link
Contributor

hackpk commented Jul 13, 2023

That's great.I'll look for another issue then. Thanks.

@Lorenzobattistela
Copy link
Contributor

No problem, good luck :D

@Lorenzobattistela
Copy link
Contributor

@NielsRogge @amyeroberts I think this can be closed due to #24652

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants