Position embedding in the DETR model #19833

SamuelCahyawijaya · 2022-10-24T05:11:40Z

System Info

According to the argument definition of the DetrDecoderLayer.forward() specified here:

transformers/src/transformers/models/detr/modeling_detr.py

Lines 723 to 728 in bd469c4

    
                       position_embeddings (`torch.FloatTensor`, *optional*): 
        
                           position embeddings that are added to the queries and keys 
        
                       in the cross-attention layer. 
        
                       query_position_embeddings (`torch.FloatTensor`, *optional*): 
        
                           position embeddings that are added to the queries and keys 
        
                       in the self-attention layer.

The positional_embeddings argument for the cross-attention should be assigned by the position_embeddings variable instead of query_position_embeddings .

transformers/src/transformers/models/detr/modeling_detr.py

Lines 757 to 764 in bd469c4

    
           hidden_states, cross_attn_weights = self.encoder_attn( 
        
               hidden_states=hidden_states, 
        
               position_embeddings=query_position_embeddings, 
        
               key_value_states=encoder_hidden_states, 
        
               attention_mask=encoder_attention_mask, 
        
               key_value_position_embeddings=position_embeddings, 
        
               output_attentions=output_attentions, 
        
           )

Is this an error in the argument definition or the code part?

Thank you!

Who can help?

@NielsRogge

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

It is from the transformers code.

Arguments definition:

transformers/src/transformers/models/detr/modeling_detr.py

Lines 723 to 728 in bd469c4

    
                       position_embeddings (`torch.FloatTensor`, *optional*): 
        
                           position embeddings that are added to the queries and keys 
        
                       in the cross-attention layer. 
        
                       query_position_embeddings (`torch.FloatTensor`, *optional*): 
        
                           position embeddings that are added to the queries and keys 
        
                       in the self-attention layer.

Cross-attention code:

transformers/src/transformers/models/detr/modeling_detr.py

Lines 757 to 764 in bd469c4

    
           hidden_states, cross_attn_weights = self.encoder_attn( 
        
               hidden_states=hidden_states, 
        
               position_embeddings=query_position_embeddings, 
        
               key_value_states=encoder_hidden_states, 
        
               attention_mask=encoder_attention_mask, 
        
               key_value_position_embeddings=position_embeddings, 
        
               output_attentions=output_attentions, 
        
           )

Expected behavior

Either:

The positional_embeddings argument for the cross-attention should be assigned by the position_embeddings variable instead of query_position_embeddings , or
Update the documentation of the argument to the correct one.

The text was updated successfully, but these errors were encountered:

github-actions · 2022-11-25T15:02:02Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sgugger · 2022-12-05T16:34:40Z

Hey @NielsRogge , could you explain how to solve this issue? You just put the Goo first issue label on it but it's not clear what a contributor would have to do to fix it.

daspartho · 2022-12-08T04:47:44Z

Hi @NielsRogge, I would like to take on this.
As Sylvain suggested, could you offer some context on how to go with this?
Thanks :)

NielsRogge · 2022-12-08T08:04:25Z

Yeah I marked this as good first issue as someone could take a deeper dive into DETR's position embeddings.

Reading the paper for that could definitely be helpful. But the implementation is correct, it's probably internal variables/docstrings that need to be updated. From the paper:

Since the decoder is also permutation-invariant, the N input embeddings must be different to produce different results. These input embeddings are learnt positional encodings that we refer to as object queries, and similarly to the encoder, we add them to the input of each attention layer.

So the position_embeddings argument of the cross-attention layer are exactly these input embeddings, often also called "content embeddings" or "object queries".

Then a bit later on in the paper they state:

There are two kinds of positional encodings in our model: spatial positional encodings and output positional encodings (object queries).

So the key_value_position_embeddings arguments of the cross-attention layer refer to these spatial position encodings. These are added to the keys and values in the cross-attention operation.

So we could for clarity update the "position_embeddings" argument to "object_queries", and the "key_value_position_embeddings" argument to "spatial_position_embeddings"

adit299 · 2023-01-22T17:57:51Z

Hello @daspartho @NielsRogge , wanted to inquire as to whether any progress was made on this? I'd like to take a look.

Lorenzobattistela · 2023-04-30T13:40:30Z

Hello @NielsRogge , I am currently working on this issue. I've read the article and I do understand what has to be changed. My question is if we only have to change the DetrDecoderLayer class (in the respective forward function mentioned above or al position_embeddings args have to change too.

I did some local tests too, and noted that changing only in the function forward i mentioned to object_queries and spatial_position_embeddings, many tests broke because of wrong arguments passed since names changed. In order to change these arguments, we need to change them in tests?

I looked up some tests, but I do think the problem is in the code itself, since classes related to that one would be passing arguments wrongly.

This is my first contribution to an open source project this size, and I'm really happy to do it. Thanks in advance.

hackpk · 2023-07-13T17:42:24Z

Hey @NielsRogge is this issue still open? If yes can I take this?

Lorenzobattistela · 2023-07-13T17:43:23Z

Hey @hackpk I'm finishing touches in my PR to fix this Issue, so Idk...

hackpk · 2023-07-13T17:45:37Z

That's great.I'll look for another issue then. Thanks.

Lorenzobattistela · 2023-07-13T17:52:42Z

No problem, good luck :D

Lorenzobattistela · 2023-09-02T20:51:22Z

@NielsRogge @amyeroberts I think this can be closed due to #24652

NielsRogge self-assigned this Oct 31, 2022

github-actions bot closed this as completed Dec 4, 2022

NielsRogge reopened this Dec 5, 2022

NielsRogge added the Good First Issue label Dec 5, 2022

Lorenzobattistela mentioned this issue May 1, 2023

DETR: changing position_embedding and key_value_position_embedding args #23091

Closed

5 tasks

Lorenzobattistela mentioned this issue Jul 4, 2023

fixing name position_embeddings to object_queries #24652

Merged

5 tasks

NielsRogge closed this as completed Sep 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Position embedding in the DETR model #19833

Position embedding in the DETR model #19833

SamuelCahyawijaya commented Oct 24, 2022

github-actions bot commented Nov 25, 2022

sgugger commented Dec 5, 2022

daspartho commented Dec 8, 2022

NielsRogge commented Dec 8, 2022

adit299 commented Jan 22, 2023

Lorenzobattistela commented Apr 30, 2023

hackpk commented Jul 13, 2023

Lorenzobattistela commented Jul 13, 2023

hackpk commented Jul 13, 2023

Lorenzobattistela commented Jul 13, 2023

Lorenzobattistela commented Sep 2, 2023

Position embedding in the DETR model #19833

Position embedding in the DETR model #19833

Comments

SamuelCahyawijaya commented Oct 24, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

github-actions bot commented Nov 25, 2022

sgugger commented Dec 5, 2022

daspartho commented Dec 8, 2022

NielsRogge commented Dec 8, 2022

adit299 commented Jan 22, 2023

Lorenzobattistela commented Apr 30, 2023

hackpk commented Jul 13, 2023

Lorenzobattistela commented Jul 13, 2023

hackpk commented Jul 13, 2023

Lorenzobattistela commented Jul 13, 2023

Lorenzobattistela commented Sep 2, 2023