Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve Molecule core properties through RDMol/OEMol conversions #135

Open
j-wags opened this issue Nov 6, 2018 · 9 comments
Open

Preserve Molecule core properties through RDMol/OEMol conversions #135

j-wags opened this issue Nov 6, 2018 · 9 comments

Comments

@j-wags
Copy link
Member

j-wags commented Nov 6, 2018

The goals in this issue are

  1. Make sure that core Molecule properties survive conversion to/from OEMol/RDMol.

  2. Determine whether the core properties survive filo I/O in supported file formats.

  3. Determine whether secondary properties survive conversion to/from OEMol/RDMol.

  4. Determine whether secondary properties survive filo I/O in supported file formats.

Core properties of Molecule include:

  • name
  • atoms
    • name
    • element
    • formal charge
    • is_aromatic
    • stereochemistry
  • bonds
    • atom1
    • atom2
    • bond_order
    • fractional_bond_order
    • is_aromatic
    • stereochemistry
  • partial charges
  • conformers

Note that core properties do not include VirtualSites

Secondary properties are anything serializeable that users add to the molecule._properties dict.

@j-wags
Copy link
Member Author

j-wags commented Nov 6, 2018

Also, test specifically that charges are read from MOL2's

@j-wags j-wags self-assigned this Nov 6, 2018
@j-wags
Copy link
Member Author

j-wags commented Nov 7, 2018

I'm thinking that the strict test of "successful round trips" is to pass:

off_mol1 = (any valid openforcefield.topology.molecule.Molecule object with just core properties)
toolkit_mol = toolkit_wrapper.to_openeye(off_mol) # (or any toolkit molecule object)
off_mol2 = toolkit_wrapper.from_object(toolkit_mol)
assert off_mol1 == off_mol2

In this round-trip framework, we need to think of how each property of an OFFMol can map in and out of a toolkit molecule representation. For example, I hit a problem with OEMols, where oemol.SetTitle only works on strings, while OFFMols can have their name be a string or None. There's no safe mapping for None in and out of an OEMol. We could always stash these core properties into arbitrary property dictionaries (both OEMols and RDMols have these), but it would be weird to convert a OFFMol to an OEmol, and not have it's name go into the OEMol's Title field.

Many fields are natively supported, so fields like "atom element" and "integer bond orders" are no problem.

Generally, not knowing what interfaces we'll need to comply with in the future, I think this means we want to limit the OFFMol's property space when possible. For example, I'm removing None as a valid OFFMol name, and strictly requiring strings ("" is the default).

Goal 1 has a few implications.

  • OEMols require a string for a molecule title. Therefore we can no longer support Molecule._name values of None, as that won't be losslessly converted to and from OE.
  • OEMols with no coordinates give all atoms coordinates 0,0,0. RDMols with no coordinates have no entries in their "conformers" list. I think, in this case, we'll let OFFMols have _conformers=None, but this will map to all (0,0,0)s in OE conversion, and an empty list in RDKit conversion.

I'll update this post with more interface notes as they are implemented.

@j-wags
Copy link
Member Author

j-wags commented Nov 7, 2018

Molecule Core Property Mapping Spec

(in progress, this should go somewhere official eventually)

OFFMol OEMol RDMol
No Name molecule.name = "" oemol.SetTitle("") rdmol.SetProp('name','')
No partial charges molecule._partial_charges = None each oeAtom.GetPartialCharge() returns 0.0 each rdatom.HasProp('partial_charge' returns False
No conformers molecule._conformers = None oemol.GetConformers() yields a dictionary mapping all atom indices to (0,0,0)'s rdmol.GetConformers() returns ()

@j-wags
Copy link
Member Author

j-wags commented Nov 7, 2018

Atom Core Property Mapping Spec

Note: We only care about round trips from OFFMol --> ToolkitMol --> OFFMol, therefore if OFFMol doesn't a permit a blank value for a field, then we don't have to worry about the toolkit representation of the blank value. (?)

OFFAtom OEAtom RDAtom
No name Atom.name = "" oeatom.SetName("") rdatom.SetProp("name","")
No element Not permitted N/A N/A
No formal_charge Not permitted N/A N/A
No is_aromatic Not permitted N/A N/A
No stereochemistry None oechem.OEPerceiveCIPStereo(oemol, oeatom) returns oechem.OECIPAtomStereo_NotStereo, and oeatom.SetStereo() is not called on atom during construction (might open a low-priority issue to inspect closer) rdatom.GetChiralTag() returns rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED OR returns something other than CHI_TETRAHEDRAL_CCW or CHI_TETRAHEDRAL_CW (Are there other possible return values? Need to inspect)

@j-wags j-wags changed the title Preserve Molecule.properties through RDMol/OEMol conversions Preserve Molecule core properties through RDMol/OEMol conversions Nov 7, 2018
@j-wags
Copy link
Member Author

j-wags commented Nov 7, 2018

Bond core property mapping spec

OFFBond OEBond RDBond
No atom1 Not permitted N/A N/A
No atom2 Not permitted N/A N/A
No bond_order Not permitted N/A N/A
No fractional_bond_order None oebond.HasData('fractional_bond_order') returns False rdbond.HasProp("fractional_bond_order") returns 0 or False
No is_aromatic Not permitted N/A N/A
No stereochemistry None oebond.HasStereoSpecified() returns False OR oechem.OECIPBondStereo_NotStereo (This will require further inspection) rdb.GetStereo() returns none of [Chem.BondStereo.STEREOCIS, Chem.BondStereo.STEREOZ, Chem.BondStereo.STEREOTRANS, Chem.BondStereo.STEREOE] OR running Chem.AssignStereochemistry(rdmol, cleanIt=True, force=True) then rdmol.GetProp('_CIPCode') returns... ? (Incomplete)

@j-wags
Copy link
Member Author

j-wags commented Nov 7, 2018

Meeting notes, also posted to Slack:

2018_11_07 Jeff Caitlin Chat

@j-wags
Copy link
Member Author

j-wags commented Nov 14, 2018

First pass implemented in 0d29212 7651ecd and 6a03e26

Will wait for review to close issue.

@j-wags
Copy link
Member Author

j-wags commented Apr 19, 2019

Note - the name of the atom field that we send partial charges to will be changed as a result of work in #250. Update the table in this issue when changes are merged.

@mattwthompson
Copy link
Member

Is there anything left to do here? Is roundtripping through formats and object models we don't control an absolutely essential behavior?

If there's nothing to act on here we should close this and consider moving the notes to somewhere like developer docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants