Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BioSimSpace and OpenForcefieldTK interoperability #117

Closed
jmichel80 opened this issue Sep 6, 2019 · 5 comments
Closed

BioSimSpace and OpenForcefieldTK interoperability #117

jmichel80 opened this issue Sep 6, 2019 · 5 comments

Comments

@jmichel80
Copy link
Contributor

Hi,

I am starting a thread to discuss efficient solutions to reliably exchange information back and forth between the OpenFFtoolkit and BioSimSpace libraries.

The purpose would be to facilitate development of code that makes use of functionality available in both toolkits.

We ran an exploratory coding session in August 2019 at the ACS San Diego meeting, the results of which can be found here

What came out of this was that it could be desirable to add functionality in BioSimSpace to enable conversion of a BioSimSpace system into an OpenMM system, and to add functionality in OpenFFto enable conversion of an OpenMM system into a BioSimSpace system. This may well be easier and extensible than a solution that relies on writing/reading with high reliability to fileformats supported by both toolkits. Particularly as we may want to move towards supporting functionality that is not covered by all legacy file formats (e.g. virtual sites, new functional forms for potential energy functions).

Before any significant work is undertaken it would be good to collect opinions about the best way forward.

One immediate question I have is whether the conversion should operate at the level of a ‘system’, a ‘molecule’ or a collection of molecules. Both toolkits seem to have slightly different definition of what this means so I would be good if @lohedges , @j-wags, @ppxasjsm and @chryswoods could comment on this.

@lohedges
Copy link
Member

lohedges commented Sep 6, 2019

Hi @jmichel80, thanks for putting this together. With regards to your question:

One immediate question I have is whether the conversion should operate at the level of a ‘system’, a ‘molecule’ or a collection of molecules.

In BioSimSpace you can directly upconvert any Sire wrapped object. This means that you can go from a Molecule or Molecules container to a System directly using, e.g., molecule.toSystem(). In this sense, it doesn't really matter what we use since we can always convert as necessary. For example, we could create a collection of BioSimSpace molecules from OpenMM ones, then add them together to create a system. However, it might make sense to convert at the smallest scale, i.e. molecule, since this would provide the greatest flexibility, i.e. you might only want to operate on a single molecule, so it seems silly to create a one molecule system then extract the molecule.

In BioSimSpace/Sire, the only additional property (at least the one we use) of a System is the box information in space. However, this could probably be inferred, e.g. computing the axis aligned bounding box, or by passing information through in other ways.

@j-wags
Copy link

j-wags commented Sep 25, 2019

pinging @davidlmobley and @jchodera to keep them in the loop

Sorry for the delay -- OFF is somewhat shorthanded right now and I had to help with some science work this month.

I've been thinking about this for a while, and it is probably best to talk about which information we hold in which objects.

OFF object overview

In terms of major classes, the OpenFF Toolkit uses OFF ForceFields, OFF Molecules, OFF Topologys, and OpenMM Systems.

OFF Molecule and OFF Topology classes are defined entirely in our toolkit.

OFF Systems don't exist yet (but will eventually!), so for now we just use OpenMM's System class to hold the parameters we assign.

OFF ForceFields are specialized enough towards our internal use cases that it doesn't make much sense to talk about converting them.

OFF Molecules

An OFF Molecule object is a description of a molecule that contains enough information to parameterize it.

OFF Molecules must include at least {element, formal charge, is_aromatic, and stereochemistry (R/S/None)} for all atoms and {order, is_aromatic, and stereochemistry (E/Z/None)} for bonds. They can include more information, but the above is the minimum needed for parameterization. Our current Molecule "spec" can be found here. These are the fields we check in round-trip tests in our other existing toolkit integrations (RDKit and OpenEye).

So, one big potential difference between BSS and OFF Molecules is that OFF Molecules do not have parameters (bond lengths, dihedral terms, vdW radii, etc) assigned. OFF Molecules are a description of a molecule that can be parameterized. Based on my reading of BSS docs, it seems like BSS Molecules may have parameters attached earlier on. In OFFTK, the parameters don't get defined until we make an OpenMM System.

It would be good to get more clarification on whether BSS Molecules require parameters to be attached. If so, round-tripping may still be possible, we'd just need to figure out a scheme to stash the parameters in data dictionaries when making the OFF Molecule, and then unpack them on the way back to a BSS Molecule.

Also, OFF Molecules are a bit unusual in that they really can't exist without bond orders and formal charges (since the primary reason for them to exist is to be parameterized, we need to know bond orders and such to assign parameters). A BSS Molecule <--> OFF Molecule converter will probably need to include logic on this front, which might take some tinkering.

OFF Topologies

An OFF Topology is basically just a bunch of OFF Molecules, with an additional layer that allows the Topologys atom indexing to be different than the original Molecule's, and also optional box vectors. This seems to be analogous to BSS's System.

Figuring out which converters to make

Using the functionality in the OFFTK, it's a one-way road from OFF Molecule --> OFF Topology --> OpenMM System. So, the most functionality is available if the user starts with a Molecule.

Given my current understanding, I'd advocate making the following converters:

  1. BSS Molecule --> OFF Molecule (and maybe the other direction too)
  2. OpenMM System --> BSS Molecule

My logic is that most of the functionality in the OFFTK is centered around assigning parameters. So, someone making a workflow in BSS and looking to integrate OFFTK is probably looking to take a BSS Molecule, parameterize it using OFFTK, and bring it back to a BSS Molecule. Until we make a dedicated OFF System object, this conversion will need to use an OpenMM System to hold the parameters that OFFTK assigns.

I could use your input on whether OFFMol --> BSSMol conversion would be a popular use case, knowing that OFFMols don't have parameters assigned.

Alternatively, we could wait until the OFF System class is built, since that may be better exchange point between OFFTK and BSS. But with current manpower that's at least several months away, so we may not want to delay making useful converters now waiting for it.

@jmichel80
Copy link
Contributor Author

Hi @j-wags
Sorry for the very slow reply. Thanks for the clear and detailed post. We are now shorter on manpower than I anticipated earlier this year and had to push back plans for OFFTK and BSS integration to get the time to complete a few other tasks. I'm hoping to revisit this in a few months, at which stage it may make sense to exchange data directly between an OFF System class depending on how OFFTK has evolved.

For your question on whether parameters have to be attached to molecules in BSS that's not a strict requirement because BSS Molecules wraps around Sire Molecule objects which can be very flexible in terms of what they contain (could lack coordinates etc...). This just stops you from doing certain things like computing a potential energy etc...Maybe @lohedges can chime in on possible restrictions I may have overlooked.

@lohedges
Copy link
Member

Hi all. Apologies for the slow reply too. I think @j-wags post appeared when I was on holiday and I had forgotten to check back in afterwards.

As @jmichel80 says, BSS molecules are very flexible since they are simply wrappers around Sire objects. These contain properties, such as charge, coordinates, bonds, etc, which are typically translated from records in coordinate or topology files, but in practice can come from anywhere. There is also no restriction on what a property can be, e.g. they could be other Sire objects, or the contents of a file.

Based on my reading of BSS docs, it seems like BSS Molecules may have parameters attached earlier on.

Not necessarily. In BioSimSpace, a Molecule will typically first be generated by reading an input file. If this is just a PDB or Mol2 file then the molecules in the system will not have parameters. You could then use the BioSimSpace.Parameters package to parameterise the molecule(s).

It would be good to get more clarification on whether BSS Molecules require parameters to be attached.

No, it's not a requirement. As mentioned above, you can load a unparameterised molecule from file then use BioSimSpace to parameterise it, or read in pre-parameterised molecules from, e.g. AMBER prm/rst files.

@lohedges
Copy link
Member

lohedges commented Jun 1, 2021

Closing since discussion has moved to this thread.

@lohedges lohedges closed this as completed Jun 1, 2021
annamherz pushed a commit that referenced this issue Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants