Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBs written without coordinates with OpenEye, sometimes #1967

Closed
lilyminium opened this issue Nov 15, 2024 · 4 comments · Fixed by #1971
Closed

PDBs written without coordinates with OpenEye, sometimes #1967

lilyminium opened this issue Nov 15, 2024 · 4 comments · Fixed by #1971
Labels

Comments

@lilyminium
Copy link
Collaborator

lilyminium commented Nov 15, 2024

Describe the bug

Sometimes PDB files are written without coordinates with OpenEye. Both times I observed this was from creating a molecule from RDKit.

To Reproduce

(uses MDAnalysis to read and parse the PDB)

from openff.toolkit import Molecule
import MDAnalysis as mda

mol = Molecule.from_smiles("CC")
mol.generate_conformers()
mol.to_file("test.pdb", "PDB")

u = mda.Universe("test.pdb")
u.add_TopologyAttr("elements", u.atoms.types)
rdmol = u.atoms.convert_to("RDKIT")
mol2 = Molecule.from_rdkit(rdmol)
assert len(mol2.conformers)
mol2.to_file("test2.pdb", "PDB")

Output

test.pdb contains coordinates:

> cat test.pdb
ATOM      1  C   UNL     1       0.815  -0.538   0.493  1.00 20.00 
ATOM      2  C   UNL     1       2.155  -0.052  -0.012  1.00 20.00 
ATOM      3  H   UNL     1       0.000   0.001   0.001  1.00 20.00 
ATOM      4  H   UNL     1       0.693  -1.607   0.291  1.00 20.00 
ATOM      5  H   UNL     1       0.730  -0.380   1.572  1.00 20.00 
ATOM      6  H   UNL     1       2.970  -0.591   0.480  1.00 20.00 
ATOM      7  H   UNL     1       2.277   1.017   0.191  1.00 20.00 
ATOM      8  H   UNL     1       2.240  -0.210  -1.091  1.00 20.00 
TER       9      UNL     1 
CONECT    1    2    3    4    5
CONECT    2    6    7    8
MASTER        0    0    0    0    0    0    0    0    8    1    2    0
END

test2.pdb does not:

ATOM      1 C    UNL 
ATOM      2 C    UNL 
ATOM      3 H    UNL 
ATOM      4 H    UNL 
ATOM      5 H    UNL 
ATOM      6 H    UNL 
ATOM      7 H    UNL 
ATOM      8 H    UNL 
TER       9      UNL 
CONECT    1    2    3    4    5
CONECT    2    6    7    8
MASTER        0    0    0    0    0    0    0    0    8    1    2    0
END

If I write it using the RDKitToolkitWrapper though:

ATOM      1  C   UNL     1       0.815  -0.538   0.493  1.00  0.00           C  
ATOM      2  C   UNL     1       2.155  -0.052  -0.012  1.00  0.00           C  
ATOM      3  H   UNL     1       0.000   0.001   0.001  1.00  0.00           H  
ATOM      4  H   UNL     1       0.693  -1.607   0.291  1.00  0.00           H  
ATOM      5  H   UNL     1       0.730  -0.380   1.572  1.00  0.00           H  
ATOM      6  H   UNL     1       2.970  -0.591   0.480  1.00  0.00           H  
ATOM      7  H   UNL     1       2.277   1.017   0.191  1.00  0.00           H  
ATOM      8  H   UNL     1       2.240  -0.210  -1.091  1.00  0.00           H  
CONECT    1    2    3    4    5
CONECT    2    6    7    8
END

Computing environment (please complete the following information):

Additional context

@j-wags
Copy link
Member

j-wags commented Dec 3, 2024

I've reproduced the issue and it looks like there's something with the residue/metadata info that affects how the PDB is written out. If I clear the hierarchy metadata before writing the PDB the coordinates are written as usual:

from openff.toolkit import Molecule
import MDAnalysis as mda

mol = Molecule.from_smiles("CC")
mol.generate_conformers()
mol.to_file("test.pdb", "PDB")

u = mda.Universe("test.pdb")
u.add_TopologyAttr("elements", u.atoms.types)
rdmol = u.atoms.convert_to("RDKIT")
mol2 = Molecule.from_rdkit(rdmol)
assert len(mol2.conformers)
for atom in mol2.atoms:
    keys = [*atom.metadata.keys()]
    for key in keys:
        del atom.metadata[key]
mol2.to_file("test2.pdb", "PDB")
!cat test2.pdb

returns

ATOM      1 C    UNL     1       0.247  -0.604   0.382  1.00 20.00 
ATOM      2 C    UNL     1      -0.247   0.604  -0.382  1.00 20.00 
ATOM      3 H    UNL     1       1.338  -0.594   0.455  1.00 20.00 
ATOM      4 H    UNL     1      -0.056  -1.527  -0.122  1.00 20.00 
ATOM      5 H    UNL     1      -0.166  -0.613   1.395  1.00 20.00 
ATOM      6 H    UNL     1      -1.338   0.593  -0.455  1.00 20.00 
ATOM      7 H    UNL     1       0.167   0.614  -1.395  1.00 20.00 
ATOM      8 H    UNL     1       0.056   1.527   0.122  1.00 20.00 
TER       9      UNL     1 
CONECT    1    2    3    4    5
CONECT    2    6    7    8
MASTER        0    0    0    0    0    0    0    0    8    1    2    0
END

I'll keep looking into this.

@j-wags
Copy link
Member

j-wags commented Dec 3, 2024

I can reproduce without MDA - Basically if either insertion_code or chain_id is set on an atom, OETKW doesn't write its coords.

from openff.toolkit import Molecule, OpenEyeToolkitWrapper, RDKitToolkitWrapper

mol_no_mda = Molecule.from_smiles("CC")
mol_no_mda.generate_conformers()
for atom in mol_no_mda.atoms[:3]:
    atom.metadata.update({
        #'residue_name': 'UNL', 
        #'residue_number': 1, 
        'insertion_code': '', 
        #'chain_id': ''
                         })
OpenEyeToolkitWrapper().to_file(mol_no_mda, "test_oe.pdb", file_format='pdb')
!cat test_oe.pdb

yields

ATOM      1  C   UNL     1
ATOM      2  C   UNL     1
ATOM      3  H   UNL     1
ATOM      4  H   UNL     1      -0.056  -1.527  -0.122  1.00 20.00 
ATOM      5  H   UNL     1      -0.166  -0.613   1.395  1.00 20.00 
ATOM      6  H   UNL     1      -1.338   0.593  -0.455  1.00 20.00 
ATOM      7  H   UNL     1       0.167   0.614  -1.395  1.00 20.00 
ATOM      8  H   UNL     1       0.056   1.527   0.122  1.00 20.00 
TER       9      UNL     1 
CONECT    1    2    3    4    5
CONECT    2    6    7    8
MASTER        0    0    0    0    0    0    0    0    8    1    2    0
END

@j-wags
Copy link
Member

j-wags commented Dec 3, 2024

Can be fixed by replacing insertion_code or chain_id = "" with " ":

from openff.toolkit import Molecule, OpenEyeToolkitWrapper, RDKitToolkitWrapper

mol_no_mda = Molecule.from_smiles("CC")
mol_no_mda.generate_conformers()
for atom in mol_no_mda.atoms[:3]:
    atom.metadata.update({
        #'residue_name': '', 
        #'residue_number': 1, 
        'insertion_code': ' ', 
        #'chain_id': ' '
                         })
OpenEyeToolkitWrapper().to_file(mol_no_mda, "test_oe.pdb", file_format='pdb')
!cat test_oe.pdb

returns

ATOM      1  C   UNL     1       0.247  -0.604   0.382  1.00 20.00 
ATOM      2  C   UNL     1      -0.247   0.604  -0.382  1.00 20.00 
ATOM      3  H   UNL     1       1.338  -0.594   0.455  1.00 20.00 
ATOM      4  H   UNL     1      -0.056  -1.527  -0.122  1.00 20.00 
ATOM      5  H   UNL     1      -0.166  -0.613   1.395  1.00 20.00 
ATOM      6  H   UNL     1      -1.338   0.593  -0.455  1.00 20.00 
ATOM      7  H   UNL     1       0.167   0.614  -1.395  1.00 20.00 
ATOM      8  H   UNL     1       0.056   1.527   0.122  1.00 20.00 
TER       9      UNL     1 
CONECT    1    2    3    4    5
CONECT    2    6    7    8
MASTER        0    0    0    0    0    0    0    0    8    1    2    0
END

@lilyminium
Copy link
Collaborator Author

Thanks for looking into this Jeff, what an odd bug.

j-wags added a commit that referenced this issue Dec 3, 2024
j-wags added a commit that referenced this issue Dec 4, 2024
…pace in to_openeye (#1971)

* Fix issue #1967 and add test

* update releasehistory

* remove unused stringio import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants