Enable GLYCAM06j-1 forcefield conversion #156

zhang-ivy · 2021-02-27T05:12:36Z

Convert leaprc.GLYCAM_06j-1 to an opemm ffxml.

~~TODO:~~
- [ ] The protein atoms in GLYCAM_06j-1 are named N, CA, etc. Need to decide whether we should use ff14SB to match the existing protein atom names or enforce use of protein.ff14SB (which will require changing the protein atom names in GLYCAM_06j-1 to protein-N, protein-CA etc).
~~- [x] Allow override_level parameter in yaml input to convert_amber.py~~
~~- [ ] Write test for validating energies (openmm vs tleap system) -- I've started this, but its not working. See below.~~

Currently, the primary showstopper for the test is that ffxml/GLYCAM_06j-1.xml does not contain residues that are actually part of glycans. The only residues that are present are the modified protein residues to which glycans can bond (e.g. NLN, which is the modified version of ASN).
For example, here are some of the residues that should be present: [‘UYB’, ‘4YB’, ‘VMB’, ‘2MA’, ‘0YB’, ‘0FA’, ‘0LB’]

This problem shows up as an error when I run python convert_amber.py -i glycan.yaml at
https://github.com/zhang-ivy/openmmforcefields/blob/convert-glycans/amber/convert_amber.py#L626

ValueError: No template found for residue 197 (UYB).  The set of atoms is similar to C5, but it is missing 3 atoms.

Summary of the conversion pipeline:

In openmmforcefields/amber/glycam/, convert the glycam prep file (GLYCAM_06j-1.prep) into a .lib file (GLYCAM_06j-1.lib)

Make a tleap.in file to do this:

# Read lines from prep file
with open("GLYCAM_06j-1.prep", "r") as f:
    lines_in = f.readlines()

# Make a list of residues
residues = []
# d_enantiomers = []
for line in lines_in:
    if "INT 0" in line:
        residue = line[:3]
        residues.append(residue)

# Remove CA2, as this isn't present as an atom type in ff14SB.xml or GLYCAM06j-1.xml or protein.ff14SB.xml or tip3p_standard.xml
residues.remove('CA2')

# Prep lines to write to tleap in file
lines = ["loadamberprep GLYCAM_06j-1.prep\n"]
for i in range(200, len(residues), 200): # Write 200 residues per line 
    lines.append("saveOff {" + " ".join(residues[i-200:i]) + "} GLYCAM_06j-1.lib\n")
lines.append("quit\n")

# Write lines
with open("make_glycam_lib.in", "w") as f:
    f.writelines(lines)

Then run tleap -s -f make_glycam_lib.in > make_glycam_lib.out
Output: GLYCAM_06j-1.lib

In openmmforcefields/amber/glycam/, create a modified glycam leaprc file (that replaces loading the .prep with loading the .lib)
- Output: leaprc.GLYCAM_06j-1
In openmmforcefields/amber/glycam/, create a glycan.yaml and make sure to load the modified leaprc.
From openmmforcefields/amber, run python convert_amber.py -i glycam/glycan.yaml --verbose
- Note you will see warnings like this that are safe to ignore. ParmEd thinks that there are a lot of identical residues because it doesn’t take into stereochemistry or anomeric forms

/home/zhangi/miniconda3/envs/glycan-ff-2/lib/python3.8/site-packages/ParmEd-3.4.0+21.gffe7aaf-py3.8-linux-x86_64.egg/parmed/openmm/parameters.py:666: UserWarning: Residue <ResidueTemplate YzB: 18 atoms; 18 bonds; head=C1; tail=O2> will be considered by OpenMM to be identical to <ResidueTemplate YZB: 18 atoms; 18 bonds; head=C1; tail=O2>.
warnings.warn(f'Residue {residue} will be considered by OpenMM to be identical to {residue_collision}.')

Use the converted glycam ffxml to parametrize a system:

from simtk import openmm
from simtk.openmm import app

# Load the ffxml
ffxml = ['../ffxml/protein.ff14SB.xml', "../ffxml/GLYCAM_06j-1_modified.xml"]

ff = app.ForceField(*ffxml)

# Load the test pdb
pdb = app.PDBFile("../test.pdb")

# Create system
system_omm = ff.createSystem(pdb.topology)

Summary of changes to existing codebases:

Parmed

Allow some 1-4 interactions to be unscaled: Allow some 1-4 interactions to be unscaled ParmEd/ParmEd#1149
Base unscaled pairs on all atoms in a torsion: Base unscaled pairs on all atoms in a torsion ParmEd/ParmEd#1182
Perceive external bonds: Perceive external bonds (necessary for glycam ffxml conversion) ParmEd/ParmEd#1184
Fix external bond perception: Add fixes to external bond perception for glycans ParmEd/ParmEd#1206

Openmmforcefields

Enable GLYCAM06j-1 forcefield conversion: Enable GLYCAM06j-1 forcefield conversion #156
Fix GLYCAM tests: Fix tests to test GLYCAM and use new way of importing for openmm #177
Fix GLYCAM conversion pipeline: Fix glycam ffxml and conversion script #180

OpenMM

Allow custom template matchers for residues: Allow custom template matchers for residues openmm#3161
Support initialization scripts in force fields: Support initialization scripts in force fields openmm#3167
Add GLYCAM ffxml: Added GLYCAM openmm#3303

zhang-ivy · 2021-02-27T05:17:57Z

@jchodera @mikemhenry @rafwiewiora : could you help me identify why there are no glycan residues in ffxml/GLYCAM_06j-1.xml?
@peastman and @jchodera : could you help me decide whether we should enforce protein atom names in ffxml/GLYCAM_06j-1.xml to contain the protein- prefix (to be used with protein.ff14SB)? Or should we leave ffxml/GLYCAM_06j-1.xml as is and use ff14SB as the protein ff?

mikemhenry · 2021-03-01T12:31:18Z

Not sure if it helps or not but this site seems to have the parameter files:
http://glycam.org/docs/forcefield/all-parameters/
And the missing residues are in this file:
http://glycam.org/docs/forcefield/wp-content/uploads/sites/6/2014/03/GLYCAM_06j-1.prep

jchodera · 2021-03-01T15:14:40Z

We managed to get to the root of the issue, and I will try a full conversion today!

… test

zhang-ivy · 2021-03-05T22:28:28Z

I was able to address the "primary showstopper" (i.e. getting glycan residues into the ffxml, described in detail above) with the help of John's comment. What I did:

Converted the glycam .prep file into a .lib file (using tleap -s -f amber/glycam/make_glycam_lib_trunc.in > make_glycam_lib_trunc.out): glycam/GLYCAM_06j-1.lib
Created a modified glycam leaprc file (that replaces loading the .prep with loading the .lib): glycam/leaprc.GLYCAM_06j-1_trunc
Updated glycam/glycan.yaml to load the modified leaprc.
From amber/, ran: python convert_amber.py -i glycam/glycan.yaml --verbose

This generated ffxml/GLYCAM_06j-1_trunc.xml. Note that this ffxml is truncated because it only includes the following glycan residues (which are present in RBD:ACE2): ['UYB', '4YB', 'VMB', '2MA', '0YB', '0fA', '0LB']. I initially tried to include all glycan residues in the ffxml, but came across many problems related to OpenMM perceiving different templates as the same, so I figured we could deal with those issues later.

After generating the ffxml, I was getting errors like:
ValueError: No template found for residue 197 (UYB). The set of atoms matches UYB, but the bonds are different.
This was because the ExternalBonds in the ff templates were not correct. In GLYCAM_06j-1_trunc.xml, for the UYB residue, I was seeing:

     <ExternalBond atomName="C1"/>
     <ExternalBond atomName="O4"/>

But there should be a third line for atomName="O6". Since Parmed was not writing out all of the atoms with external bonds correctly, I added a temporary fix to parmed/openmm/parameters.py. This fix involved adding a function called _get_atoms_with_external_bonds() that gets atoms with external bonds.

Here is the function:

def _get_atoms_with_external_bonds(self, residue_name):
        # Determine the atoms with external bonds
        d = {'C': 4, 'N': 3, 'O': 2, 'H': 1} # Key: atom element name, Value: number of bonds it should have
        external_bonds = []
        for atom in self.residues[residue_name].atoms:
            bonds = 0

            # Add an extra bond if the atom is the O in a carbonyl
            if atom.name in ['OD1', 'O', 'O2N', 'OXT']:
                bonds += 1

            for bond in self.residues[residue_name].bonds:
                if bond.atom1 == atom or bond.atom2 == atom:
                    bonds += 1

                # Add extra bonds if the atom is the C in a carbonyl
                if (bond.atom1.name == 'OD1' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'OD1'):
                    bonds += 1
                elif (bond.atom1.name == 'O' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'O'):
                    bonds += 1
                elif (bond.atom1.name == 'O2N' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'O2N'):
                    bonds += 1
                elif (bond.atom1.name == 'OXT' and bond.atom2 == atom) or (bond.atom1 == atom and bond.atom2.name == 'OXT'):
                    bonds += 1
            if atom.element_name == 'Og': # If the atom is in a glycan, the element names are not set properly
                element_name = atom.name[0]
            else:
                element_name = atom.element_name
            if d[element_name] != bonds:
                external_bonds.append(atom)

        return external_bonds

And I added the following lines after this line:

external_bonds = self._get_atoms_with_external_bonds(name)
            for atom in external_bonds:
                if atom != residue.head and atom != residue.tail:
                    etree.SubElement(xml_residue, 'ExternalBond', atomName=atom.name)

After implementing this fix, I re-ran python convert_amber.py -i glycam/glycan.yaml --verbose and got an error in the test I wrote in convert_amber.py. The error occurs here and the message is:

parmed.exceptions.ParameterError: Cannot find necessary parameters

My temporary fix: At this line, I replace parm_omm with parm_amber.

When I re-run with this fix, I can compare the omm vs amber energies:

amber energies [('HarmonicBondForce', 786.2137738648838), ('HarmonicAngleForce', 1573.0058453514364), ('PeriodicTorsionForce', 9090.943943094859), ('PeriodicTorsionForce', 13.594734381835927), ('NonbondedForce', -15185.572665049249), ('CMMotionRemover', 0.0)]

omm energies [('HarmonicBondForce', 785.9383122713264), ('HarmonicAngleForce', 1567.5857045366686), ('PeriodicTorsionForce', 9034.076934658806), ('NonbondedForce', -4528.287677760537), ('CMMotionRemover', 0.0)]

Notice there are two PeriodicTorsionForces in the amber energies..? And the NonbondedForce energies differ by a lot..

amber/convert_amber.py

jchodera · 2021-03-05T23:46:41Z

amber/convert_amber.py

@@ -1247,6 +1258,68 @@ def validate_merged_lipids(ffxml_name, leaprc_name):
            os.unlink(f[1])
    if verbose: print('Lipids energy validation for %s done!' % ffxml_name)

+def validate_glyco_protein(ffxml_name, leaprc_name,


Use this test idea?

ParmEd/ParmEd#1149 (comment)

jchodera · 2021-03-30T16:36:30Z

When I re-run with this fix, I can compare the omm vs amber energies:

@zhang-ivy Can you paste the XML file you generated as a gist?

zhang-ivy · 2021-03-30T17:00:52Z

@jchodera : Here are the system xmls ( amber.xml and omm.xml): https://gist.github.com/zhang-ivy/0d7d192e0f8296d609b8d4d94e4098ce

jchodera · 2021-03-30T22:25:31Z

@jchodera : Here are the system xmls ( amber.xml and omm.xml): https://gist.github.com/zhang-ivy/0d7d192e0f8296d609b8d4d94e4098ce

Oh, sorry, I was hoping to see the ffxml!

zhang-ivy · 2021-03-30T23:55:00Z

@jchodera : The ffxml I generated is included in this PR: https://github.com/openmm/openmmforcefields/blob/b73b6b0e420a737bb315da79b1ba4a66397caf33/amber/ffxml/GLYCAM_06j-1_trunc.xml
Note that it only includes the glycan residues necessary for my RBD:ACE2 system, as I was running to problems when I tried converting all glycan residues

zhang-ivy · 2021-06-14T21:00:54Z

@peastman helped me identify that the reason why there is an energy discrepancy between the amber vs openmm-generated test systems is because the prefixes are not correct in GLYCAM_06j-1_trunc.xml.

I think I'll want to modify this function such that it iterates through all atom types in GLYCAM_06j.dat and if the atom type is present in ff14SB, add the protein- prefix, otherwise add the glycan- prefix.

@peastman and @jchodera : Please let me know what other considerations I need to take into account in order to write the appropriate changes necessary to address this prefix issue.

peastman · 2021-06-14T22:03:40Z

Let me try to go over all the issues we need to consider. Hopefully I'm not missing any!

Prefixes were a compromise to deal with differences in how OpenMM and Amber think about atom types. OpenMM expects them to be globally unique. If you load two files together, they can't each define a type with the same name. But Amber sometimes has multiple files listing the same type. The solution was to add prefixes like protein-, DNA-, and RNA-. That way if you need to include both protein and DNA in a single system, you don't get conflicts.

Each atom type definition also defines a class, which is the same name but without the prefix:

openmmforcefields/amber/ffxml/protein.ff14SB.xml

Line 8 in ca9319f

When defining parameters, they can be defined either by type="protein-C" or class="C". It appears that in all cases they do the former:

openmmforcefields/amber/ffxml/protein.ff14SB.xml

Line 2823 in ca9319f

    
           <Bond type1="protein-C" type2="protein-C" length="0.1525" k="259407.99999999994"/>

and

openmmforcefields/amber/ffxml/DNA.bsc1.xml

Line 1113 in ca9319f

These are basically the same definition, but one using protein types and the other using DNA types. This allows you to load either just the protein force field or just the DNA force field, and it has all parameters it needs. Or you can load both of them together, and you don't have to worry about duplicate or conflicting definitions. In principle we could separate all the common definitions into a different file that would specify them by class rather than type. That would be a substantial change though.

All of this works fine for protein, DNA, and RNA. Those are separate molecules so they can each be parametrized independently. But glycans are bonded to other molecules. A single bond/angle/torsion might include both protein and glycan atom types. We need to figure out how to handle that.

Atom types appear in one other place. ParmEd/ParmEd#1149 added a list of atom types that should have 1-4 interactions scaled differently. That list appears in an embedded script generated by ParmEd. We could make the conversion script modify them the same way it modifies types in other places, but it might be cleaner to modify ParmEd so it can generate types with correct prefixes right from the start.

peastman · 2021-06-22T00:04:20Z

While working on this, I ran into what I think is a bug in the glycam force field. GLYCAM_06j.dat contains these lines.

NH-Cg-Cg-Sm   1    0.45          0.0            -3.         SCEE=1.0 SCNB=1.0 Thiol-linkages
              1    0.25          0.0             2.         SCEE=1.0 SCNB=1.0

However, the atom type NH is not defined. This is the only place it is mentioned anywhere in the file. It also is not a standard type used in the protein, DNA, or RNA force fields.

peastman · 2021-06-22T00:57:38Z

Based on our meeting today, I created a script to post-process the XML file. It makes the following assumptions about the input file:

No prefixes have been added.
All parameters are specified by type, not class.
The bad torsion noted above has been removed. (Otherwise it throws an exception.)

It creates a modified XML file with the following changes.

All atom type definitions that are not GLYCAM-specific are removed.
All parameter definitions that do not involve at least one GLYCAM-specific type are removed.
For standard (not GLYCAM-specific) atom types, it adds the protein- prefix.

Here's a first draft of it.

import xml.etree.ElementTree as etree
import re
import textwrap

# Define atom types

protein_types = set(["C", "CA", "CB", "CC", "CN", "CR", "CT", "CV", "CW", "C*", "CX", "H", "HC", "H1", "HA", "H4", "H5",
                     "HO", "HS", "HP", "N", "NA", "NB", "N2", "N3", "O", "O2", "OH", "S", "SH", "CO", "2C", "3C", "C8"])
solvent_types = set(["HW", "OW", "Li+", "Na+", "K+", "Rb+", "Cs+", "F-", "Cl-", "Br-", "I-"])
glycam_types = set()
replacements = {}
for type in protein_types:
    replacements[type] = 'protein-'+type

# Process <AtomTypes>

tree = etree.parse('glycam.xml')
root = tree.getroot()
types = root.find('AtomTypes')
for type in types.findall('Type'):
    name = type.get('name')
    if name in protein_types or name in solvent_types:
        types.remove(type)
    else:
        glycam_types.add(name)
        replacements[name] = 'glycam-'+name
        type.set('name', replacements[name])

# Process <Residues>

residues = root.find('Residues')
for residue in residues.findall('Residue'):
    for atom in residue.findall('Atom'):
        atom.set('type', replacements[atom.get('type')])

# Process <HarmonicBondForce>

force = root.find('HarmonicBondForce')
for bond in force.findall('Bond'):
    types = [bond.get('type1'), bond.get('type2')]
    if any(t in glycam_types for t in types):
        bond.set('type1', replacements[types[0]])
        bond.set('type2', replacements[types[1]])
    else:
        force.remove(bond)

# Process <HarmonicAngleForce>

force = root.find('HarmonicAngleForce')
for angle in force.findall('Angle'):
    types = [angle.get('type1'), angle.get('type2'), angle.get('type3')]
    if any(t in glycam_types for t in types):
        angle.set('type1', replacements[types[0]])
        angle.set('type2', replacements[types[1]])
        angle.set('type3', replacements[types[2]])
    else:
        force.remove(angle)

# Process <PeriodicTorsionForce>

force = root.find('PeriodicTorsionForce')
for tag in ['Proper', 'Improper']:
    for torsion in force.findall(tag):
        types = [torsion.get('type1'), torsion.get('type2'), torsion.get('type3'), torsion.get('type4')]
        if any(t in glycam_types for t in types):
            torsion.set('type1', replacements[types[0]])
            torsion.set('type2', replacements[types[1]])
            torsion.set('type3', replacements[types[2]])
            torsion.set('type4', replacements[types[3]])
        else:
            force.remove(torsion)

# Process <NonbondedForce>

force = root.find('NonbondedForce')
for atom in force.findall('Atom'):
    type = atom.get('type')
    if type in glycam_types:
        atom.set('type', replacements[type])
    else:
        force.remove(atom)

# Update the unscaled types in the script

script = root.find('Script')
text = script.text
types = ', '.join('"%s"' % replacements[s] for s in sorted(glycam_types))
types = '\n    '.join(textwrap.wrap(types))
pattern = re.compile('unscaled_types = \[(.*\n)*?.*?\]')
script.text = pattern.sub('unscaled_types = ['+types+']', text)

tree.write('modified.xml')

zhang-ivy · 2021-06-24T13:19:32Z

Thanks @peastman ! Excited to try that out.
As discussed in our meeting on Monday, my task was to adapt the tleap.in file used to convert the GLYCAM_06j-1.prep to a GLYCAM_06j-1.lib to contain only one enantiomer of each residue. I've done that, but when I run python convert_amber.py -i glycam/glycan.yaml --verbose, I get an error.

Here is what I did:

Wrote a python script that creates a tleap.in file that takes in GLYCAM_06j-1.prep and generates a .lib file. Grabs all residue names from GLYCAM_06j-1.prep and keeps only one enantiomer for each residue. Output: make_glycam_lib_one_enantiomer.in
Script:

# Read lines from prep file
with open("leaprc_GLYCAM_06j-1_2014-03-14/GLYCAM_06j-1.prep", "r") as f:
    lines_in = f.readlines()

# Grab residues, keeping only one enantiomer of each
residues = []
for line in lines_in:
    residue = line[:3]
    if "INT 0" in line and residue.upper() not in residues: # Keep only one enantiomer
        residues.append(residue)

# Make list of lines to write to the tleap.in file
lines = ["loadamberprep GLYCAM_06j-1.prep\n"]
for i in range(200, len(residues), 200):
    lines.append("saveOff {" + " ".join(residues[i-200:i]) + "} GLYCAM_06j-1.lib\n")
lines.append("quit\n")

# Write lines
with open("make_glycam_lib_one_enantiomer.in", "w") as f:
    f.writelines(lines)

Converted the glycam .prep file into a .lib file (using tleap -s -f make_glycam_lib_one_enantiomer.in > make_glycam_lib_one_enantiomer.out). Output: GLYCAM_06j-1.lib
Created a modified glycam leaprc file (that replaces loading the .prep with loading the .lib): leaprc.GLYCAM_06j-1_trunc
Updated glycan.yaml to load the modified leaprc.
From amber/, ran: python convert_amber.py -i glycam/glycan.yaml --verbose

Got error:

Converting ['./glycam/leaprc.GLYCAM_06j-1_trunc'] to ffxml...
Preprocessing the leaprc for GLYCAM_06j-1_trunc...
Converting to ffxml ffxml/GLYCAM_06j-1_trunc.xml...
leaprc addAtomTypes {
        { "C"   "C" "sp2" }
        { "Cg"  "C" "sp3" }
        { "Cy"  "C" "sp3" }
        { "Ck"  "C" "sp2" }
        { "CT"  "C" "sp3" }
        { "Cj"  "C" "sp2" }
        { "Cp"  "C" "sp3" }
        { "H"   "H" "sp3" }
        { "H1"  "H" "sp3" }
        { "H2"  "H" "sp3" }
        { "Ha"  "H" "sp3" }
        { "Hp"  "H" "sp3" }
        { "Hc"  "H" "sp3" }
        { "Ho"  "H" "sp3" }
        { "HW"  "H" "sp3" }
	{ "Ng"  "N" "sp2" }
        { "NT"  "N" "sp3" }
        { "N3"  "N" "sp3" }
        { "Oh"  "O" "sp3" }
        { "Os"  "O" "sp3" }
        { "O"   "O" "sp2" }
        { "O2"  "O" "sp2" }
        { "OW"  "O" "sp3" }
        { "Oy"  "O" "sp3" }
        { "S"   "S" "sp3" }
        { "Sm"  "S" "sp3" }
        { "P"   "P" "sp3" }
}
glycam_06 = loadamberparams GLYCAM_06j.dat
loadOff glycam/GLYCAM_06j-1.lib
loadOff GLYCAM_amino_06j_12SB.lib
loadOff GLYCAM_aminoct_06j_12SB.lib
loadOff GLYCAM_aminont_06j_12SB.lib
addPdbResMap {
        { 0 "OLS" "NOLS" } { 1 "OLS" "COLS" }
        { 0 "OLT" "NOLT" } { 1 "OLT" "COLT" }
        { 0 "OLP" "NOLP" } { 1 "OLP" "COLP" }
        { 0 "HYP" "NHYP" } { 1 "HYP" "CHYP" }
        { 0 "NLN" "NNLN" } { 1 "NLN" "CNLN" }
}
HOH = TP3  WAT = TP3
ionsff = loadamberparams frcmod.ionsjc_tip3p
/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py:361: ParameterWarning: Some residue templates are using unavailable AtomTypes
  warnings.warn('Some residue templates are using unavailable AtomTypes',
Traceback (most recent call last):
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 1420, in <module>
    main()
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 113, in main
    convert_yaml(args.input, ffxml_dir='ffxml/')
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 434, in convert_yaml
    ffxml_name = convert_leaprc(files, ffxml_dir=ffxml_dir, ignore=ignore,
  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 222, in convert_leaprc
    params.write(ffxml_name, provenance=provenance, write_unused=write_unused, improper_dihedrals_ordering='amber')
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 377, in write
    self._write_omm_residues(root, skip_residues, skip_duplicates,
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 689, in _write_omm_residues
    external_bonds = self._get_atoms_with_external_bonds(name)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/ParmEd-3.4.0+19.g310f15e.dirty-py3.9-linux-x86_64.egg/parmed/openmm/parameters.py", line 645, in _get_atoms_with_external_bonds
    if d[element_name] != bonds:
KeyError: 'S'

~~Is this potentially related to the atom prefix issue that Peter pointed out?~~
This is because the hacky code I wrote to perceive the number of external bonds does not cover S. Fixing this now

zhang-ivy · 2021-06-24T14:39:59Z

After fixing the external bonds error above, I realized that the approach we outlined in our meeting (keep only one enantiomer for each glycan residue) will not work.

When I run python convert_amber.py -i glycam/glycan.yaml --verbose after fixing the above error, I am not able to parametrize my test system (glycosylated RBD) in tleap because the glycan residues needed (NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) are not all present in the GLYCAM_06j-1.lib file.

However, if I do not filter out any residues when converting the .prep to a .lib (except for CA2, which must be filtered out because it has atom type C0 which is not in ff14SB.xml, GLYCAM06j-1.xml, proteiin.ff14SB.xml, or tip3p_standard.xml), I have no issues parametrizing my system in tleap, but get the following errror when parametrizing in OpenMM:

  File "/lila/home/zhangi/choderalab/openmmforcefields/amber/convert_amber.py", line 637, in assert_energies2
    system_omm = ff.createSystem(parm_amber.topology)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 1177, in createSystem
    templateForResidue = self._matchAllResiduesToTemplates(data, topology, residueTemplates, ignoreExternalBonds)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 1359, in _matchAllResiduesToTemplates
    [template, matches] = self._getResidueTemplateMatches(res, data.bondedToAtom, ignoreExternalBonds=ignoreExternalBonds, ignoreExtraParticles=ignoreExtraParticles)
  File "/home/zhangi/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py", line 977, in _getResidueTemplateMatches
    raise Exception('Multiple non-identical matching templates found for residue %d (%s): %s.' % (res.index+1, res.name, ', '.join(match[0].name for match in allMatches)))
Exception: Multiple non-identical matching templates found for residue 197 (UYB): UVA, UVB, UWA, UWB, UYA, UYB.

It seems that the UYB template actually matches many other residues that do not just differ by upper/lower case as I had initially thought, so my code for keeping only one enantiomer above is not going to work.

For now, I'll proceed with testing Peter's fix on the ffxml generated for the small subset of glycan residues ((NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) in my RBD system. But in order to make a ffxml including all glycan residues, we'll need to think of a more clever approach for how to circumvent the template matching issue.

@peastman and @jchodera : Let me know if you have any thoughts on how to resolve this. If not, maybe we can discuss with the amber dev folks.

zhang-ivy · 2021-06-24T15:26:37Z

I've just modified the truncated ffxml (with only a small subset of glycan residues) using Peter's code and tested the ffxml on a test system (glycosylated RBD):

from simtk.openmm import app

# Load the ffxml
ffxml = [ffxml/protein.ff14SB.xml', "ffxml/GLYCAM_06j-1_trunc_modified.xml"]
if isinstance(ffxml, str):
    ff = app.ForceField(ffxml)
else:
    ff = app.ForceField(*ffxml)

# Load the test pdb
pdb = app.PDBFile("../test.pdb")

# Create system
system_omm = ff.createSystem(pdb.topology)

The last line yields this error:

----------
ValueErrorTraceback (most recent call last)
<ipython-input-11-f9fd22c0d06a> in <module>
----> 1 system_omm = ff.createSystem(pdb.topology)

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in createSystem(self, topology, nonbondedMethod, nonbondedCutoff, constraints, rigidWater, removeCMMotion, hydrogenMass, residueTemplates, ignoreExternalBonds, switchDistance, flexibleConstraints, drudeMass, **args)
   1326 
   1327         for force in self._forces:
-> 1328             force.createForce(sys, data, nonbondedMethod, nonbondedCutoff, args)
   1329         if removeCMMotion:
   1330             sys.addForce(mm.CMMotionRemover())

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in createForce(self, sys, data, nonbondedMethod, nonbondedCutoff, args)
   2433         force = mm.NonbondedForce()
   2434         for atom in data.atoms:
-> 2435             values = self.params.getAtomParameters(atom, data)
   2436             force.addParticle(values[0], values[1], values[2])
   2437         force.setNonbondedMethod(methodMap[nonbondedMethod])

~/miniconda3/envs/glycan-ff/lib/python3.9/site-packages/simtk/openmm/app/forcefield.py in getAtomParameters(self, atom, data)
    926                 return result
    927             else:
--> 928                 raise ValueError('%s: No parameters defined for atom type %s' % (self.forceName, t))
    929 
    930         def getExtraParameters(self, atom, data):

ValueError: NonbondedForce: No parameters defined for atom type glycam-Cg

Here are the files necessary to reproduce the error (test.pdb and GLYCAM_06j-1_trunc_modified.xml) :
debug_peters_fix.zip

jchodera · 2021-06-24T15:32:28Z

When I run python convert_amber.py -i glycam/glycan.yaml --verbose after fixing the above error, I am not able to parametrize my test system (glycosylated RBD) in tleap because the glycan residues needed (NLN UYB 4YB VMB 2MA 0YB 0fA 0LB) are not all present in the GLYCAM_06j-1.lib file.

It's important to remember that tleap matches by residue name, but OpenMM matches by template. It sounds like we'll need to generate a complete translation of the .prep file to .lib to feed into tleap and a pruned version of the .lib file to feed into ParmEd to generate the OpenMM ffxml. (Alternatively, we could modify ForceField to permit multiple matches, or add a mode to ParmEd to prune duplicate templates.)

jchodera · 2021-06-24T15:33:20Z

except for CA2, which must be filtered out because it has atom type C0 which is not in ff14SB.xml, GLYCAM06j-1.xml, proteiin.ff14SB.xml, or tip3p_standard.xml

Weird! This sounds like another bug. Let's make sure to add this and the other potential bug in the .prep file to the list of issues to raise with the glycam folks.

jchodera · 2021-06-24T15:34:13Z

It seems that the UYB template actually matches many other residues that do not just differ by upper/lower case as I had initially thought, so my code for keeping only one enantiomer above is not going to work.

OK, this suggests we will really need to add a mode to ParmEd to filter out identical residue templates on export to ffxml, or perhaps add this to @peastman's post-processing script as a short-term workaround.

zhang-ivy · 2021-06-24T16:05:09Z

Weird! This sounds like another bug.

Actually, it seems like C0 is present as a protein atom type in peter's script. Just not sure why CA2 is present in GLYCAM_06j-1 and not in ff14SB, as CA2 seems to correspond to a Ca2+ ion, meaning it should have no parameters that are specific to the glycam forcefield. Will make a note to bring this up in the chat with the amber folks.

and the other potential bug in the .prep file to the list of issues to raise with the glycam folks.

Which other potential bug are you referring to here?

zhang-ivy · 2021-06-24T16:26:41Z

OK, this suggests we will really need to add a mode to ParmEd to filter out identical residue templates on export to ffxml, or perhaps add this to @peastman's post-processing script as a short-term workaround.

Yes, I'm just not sure how we want to do this.
Here is the glycan naming system: http://glycam.org/docs/forcefield/glycam-naming-2/
Not sure I fully understand it, but it seems that there is a set of core monosaccharides (Table 1), where lowercase indicates L- and uppercase indicates D-. The third letter indicates the anomeric configuration (α or β).
But it seems like the naming system also allows encoding of ring form (pyranosyl or furanosyl) and occupied linkage positions (2-, 2,3-, 2,4,6-, etc.) -- not sure how these are encoded. For these two properties, given the same monosaccharide, I'm pretty sure we want there to be different parameters for different ring forms and linkage positions? In which case we would need ParmEd's residue template matcher to be able to distinguish between residues with these diff properties so that it can properly match the right residue template to a given residue?

peastman · 2021-06-24T16:34:11Z

Just not sure why CA2 is present in GLYCAM_06j-1 and not in ff14SB, as CA2 seems to correspond to a Ca2+ ion

Ions get defined along with water, since their parameters can be different for different water models. Add it to solvent_types in my script. I only included the ions that were present in your converted XML file, but it sounds like the full force field includes more of them.

jchodera · 2021-07-05T17:44:29Z

Woohoo!

peastman · 2021-07-05T18:21:31Z

Sounds good!

…iles/glycan -> files/glycam name

zhang-ivy · 2021-08-09T20:09:25Z

I've finished cleaning up this PR!

I've updated the first comment in this PR with a summary of the conversion pipeline and changes required by other repos. Note that I'm still waiting on ParmEd/ParmEd#1184 to be merged.

I've also spent the afternoon modifying the validate_glyco_protein test in convert_amber.py to use the glycoprotein files that John suggested here. In order to get the tests working, I modified the files John initially sent and included them in this PR
(in amber/files/glycam/Glycoprotein_shortened*). Here are the changes I made:

The .parm7 and .rst7 files were generated by a .leapin file that loads oldff/leaprc.ff10 and oldff/leaprc.GLYCAM_06j_10. In order to generate .parm7 and .rst7 files using updated forcefields that will be used in practice, I updated the .leapin file to load leaprc.ff14SB and leaprc.GLYCAM_06j-1 instead.
The .parm7 and .rst7 files include 7 molecules, but I removed the first 2 molecules (by editing the .leapin file) because they involve zwitterionic molecules, which I wasn't sure how to parametrize in OpenMM).

The test now outputs:

Validating the conversion...
Glycosylated protein energy validation for ffxml/GLYCAM_06j-1.xml
HarmonicBondForce 166.05257416728875 166.0525741672887
HarmonicAngleForce 443.64924500308035 443.6478171109116
PeriodicTorsionForce 548.348291428228 548.3482921829175
NonbondedForce 169459044.11620137 169459044.18473148
CMMotionRemover 0.0 0.0

I should note that I've set the energy tolerance to 1e-1 so that it passes.

@jchodera @peastman : Could you please review and merge when you get a chance?

peastman · 2021-08-11T16:56:27Z

Excellent, congratulations on getting this done!

zhang-ivy · 2021-08-11T18:58:55Z

@peastman Thanks so much for your help!!

jchodera · 2021-08-28T00:38:07Z

@peastman : We should cut a new release with the glycam functionality once you've had a chance to review and merge this.

peastman · 2021-08-31T20:48:38Z

It looks good to me. If all the tests pass, it should be good to merge. (I notice that CI is currently failing due to lack of an OpenEye license key, but that's not related to this.)

jchodera · 2021-08-31T23:34:34Z

(I notice that CI is currently failing due to lack of an OpenEye license key, but that's not related to this.)

Will fix!

jchodera · 2021-09-01T02:31:47Z

@peastman : I've updated the OpenEye toolkit license and have triggered all the tests to run again.

zhang-ivy · 2021-09-01T13:05:05Z

@jchodera : It looks like there's still a problem with the open eye license

jchodera · 2021-09-01T21:21:19Z

Investigating.

jchodera · 2021-09-01T21:24:19Z

The issue here is that secrets are not disclosed on PRs opened from forks.

I'll merge this and we can debug from here.

jchodera · 2021-09-01T21:26:24Z

@zhang-ivy I've added you to this repo so you can use branches instead of pushing from forks. That should get around this issue next time.

add override to yaml and initial test for glycoprotein

b3416c3

jchodera mentioned this pull request Feb 28, 2021

Allow some 1-4 interactions to be unscaled ParmEd/ParmEd#1149

Merged

zhang-ivy added 2 commits March 4, 2021 20:47

add ffxml

44109f2

add files used for ff conversion and changes needed to run validation…

4b0a7b3

… test

jchodera reviewed Mar 5, 2021

View reviewed changes

amber/convert_amber.py Show resolved Hide resolved

jchodera reviewed Mar 5, 2021

View reviewed changes

amber/convert_amber.py Outdated Show resolved Hide resolved

jchodera reviewed Mar 5, 2021

View reviewed changes

zhang-ivy added 2 commits March 30, 2021 11:59

simplify assert energies

be75ca7

remove accidentally change

b73b6b0

peastman mentioned this pull request Jul 5, 2021

Base unscaled pairs on all atoms in a torsion ParmEd/ParmEd#1182

Merged

zhang-ivy added 4 commits July 13, 2021 13:42

clean up convert_amber.py and update helper files

788b6e4

remove _modified from ffxml name

ba00644

add global flag controlling external bond code in parmed and change f…

352fceb

…iles/glycan -> files/glycam name

switch test case to be glyco proteins john sent

de8aab5

zhang-ivy mentioned this pull request Aug 9, 2021

Perceive external bonds (necessary for glycam ffxml conversion) ParmEd/ParmEd#1184

Merged

jchodera added this to the 0.10.0 milestone Aug 28, 2021

Merge branch 'master' into convert-glycans

0f0d014

jchodera merged commit 65fb59e into openmm:master Sep 1, 2021

This was referenced Sep 4, 2021

Update imports and tests for OpenMM 7.6 #173

Merged

GLYCAM PR now causing failures #176

Closed

Add support for GLYCAM06j-1 glycan force fields #150

Closed

zhang-ivy mentioned this pull request Sep 20, 2021

Optimized computing long range correction for CustomNonbondedForce openmm/openmm#3236

Merged

peastman mentioned this pull request Oct 22, 2021

Planning for 7.7/8.0 openmm/openmm#3191

Closed

zhang-ivy mentioned this pull request Apr 11, 2022

Add fixes to external bond perception for glycans ParmEd/ParmEd#1206

Open

zhang-ivy mentioned this pull request Jul 1, 2022

Add all Amber ion params openmm/openmm#3663

Open

gohdavid mentioned this pull request Jul 16, 2023

Amber Lipid21 two 1-4 scaling error #291

Open

Enable GLYCAM06j-1 forcefield conversion #156

Enable GLYCAM06j-1 forcefield conversion #156

Conversation

zhang-ivy commented Feb 27, 2021 • edited Loading

Summary of the conversion pipeline:

Summary of changes to existing codebases:

zhang-ivy commented Feb 27, 2021

mikemhenry commented Mar 1, 2021

jchodera commented Mar 1, 2021

zhang-ivy commented Mar 5, 2021

jchodera Mar 5, 2021

Choose a reason for hiding this comment

jchodera commented Mar 30, 2021

zhang-ivy commented Mar 30, 2021

jchodera commented Mar 30, 2021

zhang-ivy commented Mar 30, 2021 • edited Loading

zhang-ivy commented Jun 14, 2021

peastman commented Jun 14, 2021

peastman commented Jun 22, 2021

peastman commented Jun 22, 2021

zhang-ivy commented Jun 24, 2021 • edited Loading

zhang-ivy commented Jun 24, 2021 • edited Loading

zhang-ivy commented Jun 24, 2021 • edited Loading

jchodera commented Jun 24, 2021

jchodera commented Jun 24, 2021

jchodera commented Jun 24, 2021

zhang-ivy commented Jun 24, 2021 • edited Loading

zhang-ivy commented Jun 24, 2021 • edited Loading

peastman commented Jun 24, 2021

jchodera commented Jul 5, 2021

peastman commented Jul 5, 2021

zhang-ivy commented Aug 9, 2021 • edited Loading

peastman commented Aug 11, 2021

zhang-ivy commented Aug 11, 2021

jchodera commented Aug 28, 2021

peastman commented Aug 31, 2021

jchodera commented Aug 31, 2021

jchodera commented Sep 1, 2021

zhang-ivy commented Sep 1, 2021

jchodera commented Sep 1, 2021

jchodera commented Sep 1, 2021

jchodera commented Sep 1, 2021

zhang-ivy commented Feb 27, 2021 •

edited

Loading

zhang-ivy commented Mar 30, 2021 •

edited

Loading

zhang-ivy commented Jun 24, 2021 •

edited

Loading

zhang-ivy commented Jun 24, 2021 •

edited

Loading

zhang-ivy commented Jun 24, 2021 •

edited

Loading

zhang-ivy commented Jun 24, 2021 •

edited

Loading

zhang-ivy commented Jun 24, 2021 •

edited

Loading

zhang-ivy commented Aug 9, 2021 •

edited

Loading