You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you again for this work, and especially for sharing the data with community! I have a few questions regarding the files in the "PDBs.zip" file downloaded from here:
For the protein pair with an exact PDB structure (which I assume are the ones with "exact" in the column "PDB" on the "Final Prediction" page?), are the contacts in the .contacts file the residue pairs that have < 6A heavy atom distance in the .pdb file? I read the following description in section M6.2 of the Supplementary Information:
"For every predicted PPI, we exploited the ColabFold pipeline to generate 5 AF2 models and 5 AFmm models (see M5.5). We used these 3D models to identify the inter-protein contacts (interaction probability > 0.5 and inter-residue distance < 6Å). Residues participating in such contacts were considered as interface residues. We integrated the inter-protein contacts in 10 models (5 from AF2 and 5 from AFmm) to identify consistently predicted contacts present in ≥ 50% of models. The model containing the largest number of such consistently predicted contacts was selected as the representative structure model for each predicted PPI.
We compared the structural features of interfaces for predicted PPIs and interacting PDB chain pairs that are orthologous to human proteins (see M6.1). Interface residues in predicted PPIs were identified as above, whereas the interface residues in PDB chain pairs were identified only by inter-residue distances (< 6Å)."
However, when I tried to extract the contacts from the provided PDB files myself for a few examples with exact structure, there seem to be less contacts compared to the provided contact file. As an example, for the pair Q6UXV0_Q99988, when using a 6A distance cutoff, I found 73 contacts, while there are 194 contacts in the contact file. However, if I relax the distance cutoff to 8A, there are 214 contacts, and all 194 contacts from the contact file are included. I have not done this comparison comprehensively though, so want to reach out and confirm: what exactly is the procedure for extracting the contacts in the contact file, for those with an exact PDB structure and those with predicted structures, if these are different?
I also noticed that some protein pairs seem to come with multiple associated PDB and contact files, e.g. O95239_S2__Q2VIQ3_S1.pdb, O95239_S1__Q2VIQ3_S2.pdb, O95239_S1__Q2VIQ3_S1.pdb and O95239_S2__Q2VIQ3_S2.pdb. What do the numbers e.g. S1 or S2 correspond to?
Thank you!
Best,
Bear
The text was updated successfully, but these errors were encountered:
Hi!
Thank you again for this work, and especially for sharing the data with community! I have a few questions regarding the files in the "PDBs.zip" file downloaded from here:
.contacts
file the residue pairs that have < 6A heavy atom distance in the.pdb
file? I read the following description in section M6.2 of the Supplementary Information:"For every predicted PPI, we exploited the ColabFold pipeline to generate 5 AF2 models and 5 AFmm models (see M5.5). We used these 3D models to identify the inter-protein contacts (interaction probability > 0.5 and inter-residue distance < 6Å). Residues participating in such contacts were considered as interface residues. We integrated the inter-protein contacts in 10 models (5 from AF2 and 5 from AFmm) to identify consistently predicted contacts present in ≥ 50% of models. The model containing the largest number of such consistently predicted contacts was selected as the representative structure model for each predicted PPI.
We compared the structural features of interfaces for predicted PPIs and interacting PDB chain pairs that are orthologous to human proteins (see M6.1). Interface residues in predicted PPIs were identified as above, whereas the interface residues in PDB chain pairs were identified only by inter-residue distances (< 6Å)."
However, when I tried to extract the contacts from the provided PDB files myself for a few examples with exact structure, there seem to be less contacts compared to the provided contact file. As an example, for the pair
Q6UXV0_Q99988
, when using a 6A distance cutoff, I found 73 contacts, while there are 194 contacts in the contact file. However, if I relax the distance cutoff to 8A, there are 214 contacts, and all 194 contacts from the contact file are included. I have not done this comparison comprehensively though, so want to reach out and confirm: what exactly is the procedure for extracting the contacts in the contact file, for those with an exact PDB structure and those with predicted structures, if these are different?O95239_S2__Q2VIQ3_S1.pdb
,O95239_S1__Q2VIQ3_S2.pdb
,O95239_S1__Q2VIQ3_S1.pdb
andO95239_S2__Q2VIQ3_S2.pdb
. What do the numbers e.g.S1
orS2
correspond to?Thank you!
Best,
Bear
The text was updated successfully, but these errors were encountered: