Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protein.pdb files are not valid PDB files #20

Closed
jchodera opened this issue Jan 30, 2022 · 7 comments · Fixed by #52
Closed

protein.pdb files are not valid PDB files #20

jchodera opened this issue Jan 30, 2022 · 7 comments · Fixed by #52
Assignees
Labels
bug Something isn't working
Milestone

Comments

@jchodera
Copy link
Member

The thrombin protein.pdb file appears to have several defects that make it noncompliant with the PDB format specification.

  • It lacks any header information, such as SEQRES sequence information
  • It contains multiple chain breaks that are capped with ACE and NME, but residues are numbered sequentially
  • There are no TER records denoting the chain breaks
  • There are no CONECT records that would be required by ACE and NME since these are nonstandard residues.

I'm not quite sure where the current file comes from---was it generated by Spruce?

Is there another alternative PDB file that is more compliant with the PDB format that others have been using?

@jchodera jchodera changed the title Thrombin protein.pdb appears to be a non-compliant PDB file protein.pdb files are not valid PDB files Jan 31, 2022
@jchodera
Copy link
Member Author

I've changed the issue title to reflect the fact that many of these protein.pdb files are not actually valid PDB files.

We will likely have to remediate all of these files in order for them to be useful.

Here's another example: the CDK2 protein.pdb contains a non-standard TPO residue, which, according to the PDB standard, requires CONECT records be specified:

ATOM   2597  N   TYR A 161     -14.655  30.937  31.508  1.00  0.00           N
ATOM   2598  H   TYR A 161     -13.806  30.774  32.030  1.00  0.00           H
ATOM   2599  CA  TYR A 161     -14.629  30.568  30.094  1.00  0.00           C
ATOM   2600  HA  TYR A 161     -15.559  30.851  29.599  1.00  0.00           H
ATOM   2601  CB  TYR A 161     -13.446  31.294  29.412  1.00  0.00           C
ATOM   2602  HB1 TYR A 161     -12.516  30.934  29.849  1.00  0.00           H
ATOM   2603  HB2 TYR A 161     -13.401  31.051  28.352  1.00  0.00           H
ATOM   2604  CG  TYR A 161     -13.470  32.808  29.524  1.00  0.00           C
ATOM   2605  CD1 TYR A 161     -14.659  33.530  29.280  1.00  0.00           C
ATOM   2606  HD1 TYR A 161     -15.553  33.021  28.951  1.00  0.00           H
ATOM   2607  CE1 TYR A 161     -14.701  34.919  29.499  1.00  0.00           C
ATOM   2608  HE1 TYR A 161     -15.620  35.461  29.336  1.00  0.00           H
ATOM   2609  CZ  TYR A 161     -13.553  35.588  29.964  1.00  0.00           C
ATOM   2610  OH  TYR A 161     -13.611  36.916  30.260  1.00  0.00           O
ATOM   2611  HH  TYR A 161     -12.794  37.239  30.609  1.00  0.00           H
ATOM   2612  CE2 TYR A 161     -12.349  34.884  30.147  1.00  0.00           C
ATOM   2613  HE2 TYR A 161     -11.459  35.395  30.483  1.00  0.00           H
ATOM   2614  CD2 TYR A 161     -12.307  33.496  29.927  1.00  0.00           C
ATOM   2615  HD2 TYR A 161     -11.386  32.958  30.101  1.00  0.00           H
ATOM   2616  C   TYR A 161     -14.528  29.040  29.979  1.00  0.00           C
ATOM   2617  O   TYR A 161     -14.614  28.329  30.983  1.00  0.00           O
ATOM   2618  N   TPO A 162     -14.366  28.534  28.747  1.00  0.00           N
ATOM   2619  H   TPO A 162     -14.353  29.177  27.962  1.00  0.00           H
ATOM   2620  CA  TPO A 162     -14.197  27.125  28.393  1.00  0.00           C
ATOM   2621  HA  TPO A 162     -15.096  26.583  28.689  1.00  0.00           H
ATOM   2622  CB  TPO A 162     -14.005  27.029  26.861  1.00  0.00           C
ATOM   2623  HB  TPO A 162     -13.084  27.543  26.577  1.00  0.00           H
ATOM   2624  CG2 TPO A 162     -13.932  25.585  26.343  1.00  0.00           C
ATOM   2625 1HG2 TPO A 162     -13.795  25.574  25.262  1.00  0.00           H
ATOM   2626 2HG2 TPO A 162     -13.099  25.037  26.778  1.00  0.00           H
ATOM   2627 3HG2 TPO A 162     -14.853  25.046  26.560  1.00  0.00           H
ATOM   2628  OG  TPO A 162     -15.125  27.615  26.214  1.00  0.00           O
ATOM   2629  P   TPO A 162     -14.964  28.920  25.281  1.00  0.00           P
ATOM   2630  O1P TPO A 162     -16.333  29.241  24.839  1.00  0.00           O
ATOM   2631  O2P TPO A 162     -14.390  29.922  26.198  1.00  0.00           O
ATOM   2632  O3P TPO A 162     -14.059  28.490  24.197  1.00  0.00           O
ATOM   2633  C   TPO A 162     -12.971  26.558  29.124  1.00  0.00           C
ATOM   2634  O   TPO A 162     -11.873  27.097  28.999  1.00  0.00           O
ATOM   2635  N   HIE A 163     -13.164  25.483  29.895  1.00  0.00           N
ATOM   2636  H   HIE A 163     -14.097  25.100  29.967  1.00  0.00           H
ATOM   2637  CA  HIE A 163     -12.107  24.820  30.654  1.00  0.00           C
ATOM   2638  HA  HIE A 163     -11.469  25.589  31.095  1.00  0.00           H
ATOM   2639  CB  HIE A 163     -12.749  24.035  31.806  1.00  0.00           C
ATOM   2640  HB1 HIE A 163     -13.331  24.712  32.433  1.00  0.00           H
ATOM   2641  HB2 HIE A 163     -13.448  23.292  31.425  1.00  0.00           H
ATOM   2642  CG  HIE A 163     -11.741  23.353  32.681  1.00  0.00           C
ATOM   2643  ND1 HIE A 163     -11.520  21.972  32.634  1.00  0.00           N
ATOM   2644  CE1 HIE A 163     -10.500  21.771  33.452  1.00  0.00           C
ATOM   2645  HE1 HIE A 163     -10.064  20.801  33.639  1.00  0.00           H
ATOM   2646  NE2 HIE A 163     -10.065  22.896  34.012  1.00  0.00           N
ATOM   2647  HE2 HIE A 163      -9.261  22.978  34.626  1.00  0.00           H
ATOM   2648  CD2 HIE A 163     -10.834  23.935  33.537  1.00  0.00           C
ATOM   2649  HD2 HIE A 163     -10.660  24.966  33.810  1.00  0.00           H
ATOM   2650  C   HIE A 163     -11.201  23.941  29.766  1.00  0.00           C
ATOM   2651  O   HIE A 163     -10.034  23.719  30.092  1.00  0.00           O

No CONECT records appear in the PDB file.

@dfhahn : Can you point me to the scripts you used to generate these files? I can see if I can find a different route that uses much the same geometry/models but produced valid PDB files that can be processed by programs that expect the PDB files to comply with the PDB format specification.

@dfhahn
Copy link
Collaborator

dfhahn commented Feb 17, 2022

@jchodera I do not have scripts which generated these files. They come from public sources to ensure compatibility with former calculations. E.g. for thrombin, it is Vytas Gapsys work. I think they were generated with Gromacs pdb2gmx.

@dfhahn
Copy link
Collaborator

dfhahn commented Feb 17, 2022

I agree these files should comply with the PDB format specifications. It would be great if we changed the format without touching the coordinates.

@vgapsys
Copy link

vgapsys commented Feb 18, 2022

Hey @jchodera @dfhahn,

pdbs from the repository were generated by pdb2gmx and are compatible with the gromacs-based topologies that are also in the same github repository. This is the reason why the connectivities are not present and residue numbering as well as some nonstandard residue namings are there: this information is in the topology files.

@jchodera
Copy link
Member Author

@vgapsys: Do we have the original topology and coordinate files from which these were generated? I wonder if there is a way to generate new PDB files from the source information that is compliant with the PDB standard so that other packages could use these files as well.

@dotsdl dotsdl added this to the Release 0.3.0 milestone Mar 23, 2022
@dotsdl dotsdl assigned jchodera and unassigned ldamore Apr 19, 2022
@jchodera jchodera removed their assignment Apr 20, 2022
@dotsdl dotsdl mentioned this issue Apr 26, 2022
@dotsdl
Copy link
Member

dotsdl commented May 3, 2022

This issue is blocking several others in the 0.3.0 milestone; is it possible to resolve this issue within this week, or at least before EOW next week? We are tentatively aiming for 0.3.0 release by 2022.05.31, and there will be follow-up work required following this issue.

@bobym
Copy link
Collaborator

bobym commented May 3, 2022

This issue is blocking several others in the 0.3.0 milestone; is it possible to resolve this issue within this week, or at least before EOW next week? We are tentatively aiming for 0.3.0 release by 2022.05.31, and there will be follow-up work required following this issue.

Shooting for end of this week for fixed PDBs and re-docked ligands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
6 participants