Question about contact files and PDB structures #5

junhaobearxiong · 2024-12-05T23:18:38Z

Hi!

Thank you again for this work, and especially for sharing the data with community! I have a few questions regarding the files in the "PDBs.zip" file downloaded from here:

For the protein pair with an exact PDB structure (which I assume are the ones with "exact" in the column "PDB" on the "Final Prediction" page?), are the contacts in the .contacts file the residue pairs that have < 6A heavy atom distance in the .pdb file? I read the following description in section M6.2 of the Supplementary Information:

"For every predicted PPI, we exploited the ColabFold pipeline to generate 5 AF2 models and 5 AFmm models (see M5.5). We used these 3D models to identify the inter-protein contacts (interaction probability > 0.5 and inter-residue distance < 6Å). Residues participating in such contacts were considered as interface residues. We integrated the inter-protein contacts in 10 models (5 from AF2 and 5 from AFmm) to identify consistently predicted contacts present in ≥ 50% of models. The model containing the largest number of such consistently predicted contacts was selected as the representative structure model for each predicted PPI.

We compared the structural features of interfaces for predicted PPIs and interacting PDB chain pairs that are orthologous to human proteins (see M6.1). Interface residues in predicted PPIs were identified as above, whereas the interface residues in PDB chain pairs were identified only by inter-residue distances (< 6Å)."

However, when I tried to extract the contacts from the provided PDB files myself for a few examples with exact structure, there seem to be less contacts compared to the provided contact file. As an example, for the pair Q6UXV0_Q99988, when using a 6A distance cutoff, I found 73 contacts, while there are 194 contacts in the contact file. However, if I relax the distance cutoff to 8A, there are 214 contacts, and all 194 contacts from the contact file are included. I have not done this comparison comprehensively though, so want to reach out and confirm: what exactly is the procedure for extracting the contacts in the contact file, for those with an exact PDB structure and those with predicted structures, if these are different?

I also noticed that some protein pairs seem to come with multiple associated PDB and contact files, e.g. O95239_S2__Q2VIQ3_S1.pdb, O95239_S1__Q2VIQ3_S2.pdb, O95239_S1__Q2VIQ3_S1.pdb and O95239_S2__Q2VIQ3_S2.pdb. What do the numbers e.g. S1 or S2 correspond to?

Thank you!

Best,
Bear

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about contact files and PDB structures #5

Question about contact files and PDB structures #5

junhaobearxiong commented Dec 5, 2024

Question about contact files and PDB structures #5

Question about contact files and PDB structures #5

Comments

junhaobearxiong commented Dec 5, 2024