Is sequence information leaking into the structure tokens? #129
Unanswered
OliviaViessmann
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I would like to discuss and understand how the ESM-3 structure tokens contain sequence information.
From the Appendix I understand that an inverse folding loss was added to the overall auto-reconstruction loss of the structure.
This inverse folding loss predicts a sequence from the tokens and the loss to the ground truth sequence is thus encoding sequence information into the learned structure tokens.
How is this not harmful for down-stream sequence generation tasks? Aka, is this not a form of leakage?
Are the sequence-structure pairs used for decoder training excluded from down-stream partial masking training?
From the appendix:
"Finally, an inverse folding token prediction loss (i.e., a crossentropy loss between predicted sequence and ground truth sequence) is an auxiliary loss used to encourage the learned representations to contain information pertinent to sequence related tasks. [...]
Inverse Folding Loss: Pass final layer representations of the decoder through a regression head to produce logits z. Using ground truth residues as labels y, compute cross-entropy for the classification task of predicting residues from final layer representations."
Beta Was this translation helpful? Give feedback.
All reactions