Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which procedures require OpenEye? #1111

Closed
ljmartin opened this issue Aug 7, 2023 · 11 comments
Closed

Which procedures require OpenEye? #1111

ljmartin opened this issue Aug 7, 2023 · 11 comments

Comments

@ljmartin
Copy link

ljmartin commented Aug 7, 2023

Hi all - thanks for making this package available.
I'm interested in running RBFE calculations, but do not have OpenEye access. I noticed that one of the tests does not require OpenEye:

def benchmark_hif2a(verbose: bool = False, num_batches: int = 100, steps_per_batch: int = 1000):

It looks like OE would ordinarily be used at the partial charge generation step, but after that point the 'systems' are OpenMM System objects. That suggests to me that I could use openforcefields to generate System objects, with AM1BCC partial charges, for instance, and carry on from there.
Is that feasible more generally for RBFE calculations?
Thanks

@proteneer
Copy link
Owner

Currently, the small molecule charge parameters do not go through OpenMM System objects, only the protein/water does (using 99sb-ildn/tip3p.xmls). The simplest way is probably to just re-implement an AM1BCCHandler in the ff/handlers/nonbonded.py file that shell drops into AmberTools.

@proteneer
Copy link
Owner

(PR welcome)

@ljmartin
Copy link
Author

ljmartin commented Aug 8, 2023

OK - thank you! My understanding is this would be achieved by recapitulating this function:

def oe_assign_charges(mol, charge_model=AM1BCCELF10):

using something like this:

https://github.com/openforcefield/openff-toolkit/blob/dff0d46a06be062edb47f7f6af5f6885ab16fc60/openff/toolkit/utils/ambertools_wrapper.py#L215

I'll have a go.

@proteneer
Copy link
Owner

Yep exactly, just remember to pre-multiply the charges by sqrt of EPS0

@ljmartin
Copy link
Author

alrighty, is this kind of thing suitable? See the function at the end of this post. I'm not much of a SWE so this might be missing error checking or style elements that you want for this repo, but it works so far.

from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
from timemachine import constants

mol = Chem.MolFromSmiles('c1ccccc1')
ac_assign_charges(mol)

out: array([-1.53232151, -1.53232151, -1.53232151, -1.53232151, -1.53232151, -1.53232151, 1.53232151, 1.53232151, 1.53232151, 1.53232151, 1.53232151, 1.53232151])

I'm still yet to figure out what to change upstream in order to integrate this. But if you think this is suitable then I'll look into it.
Cheers

def ac_assign_charges(rdkit_mol):
    import copy
    from shutil import which
    import tempfile
    import subprocess
    import os
    
    mol = copy.deepcopy(rdkit_mol)

    #add a conformer if there are none
    nconf = mol.GetNumConformers()
    any3D = any(mol.GetConformer(c).Is3D() for c in range(nconf))
    none3D = ~any3D
    
    if (nconf==0) or (none3D):
        #ensure it has explicit Hs already:
        if mol.GetNumAtoms() != Chem.AddHs(mol).GetNumAtoms():
            mol = Chem.AddHs(mol)
        
        #mol was 2D, it has explicit Hs, so embed in 3D:
        AllChem.EmbedMolecule(mol)

    #now run antechamber
    ANTECHAMBER_PATH = which("antechamber")
    if ANTECHAMBER_PATH is None:
        raise RuntimeError("antechamber not installed")

    with tempfile.TemporaryDirectory() as tmpdir:
        net_charge = sum([a.GetFormalCharge() for a in mol.GetAtoms()])
        #if the mol has some 2d conformers for some reason, and then 
        #some 3d conformers of interest, we want to save only a 3d conformer...
        #taking the first 3D conformer for convenience
        for c in range(mol.GetNumConformers()):
            if mol.GetConformer(c).Is3D():
                with Chem.SDWriter(f"{tmpdir}/molecule.sdf") as writer:
                    writer.write(mol, confId=c)

        subprocess.check_output(
                [
                    "antechamber",
                    "-i",
                    "molecule.sdf",
                    "-fi",
                    "sdf",
                    "-o",
                    "charged.mol2",
                    "-fo",
                    "mol2",
                    "-pf",
                    "yes",
                    "-dr",
                    "n",
                    "-c",
                    "bcc",
                    "-nc",
                    str(net_charge),
                ],
                cwd=tmpdir,
            )
        
        # Write out just charges
        subprocess.check_output(
            [
                "antechamber",
                "-dr",
                "n",
                "-i",
                "charged.mol2",
                "-fi",
                "mol2",
                "-o",
                "charges2.mol2",
                "-fo",
                "mol2",
                "-c",
                "wc",
                "-cf",
                "charges.txt",
                "-pf",
                "yes",
            ],
            cwd=tmpdir,
        )

        # Check to ensure charges were actually produced
        if not os.path.exists(f"{tmpdir}/charges.txt"):
            # TODO: copy files into local directory to aid debugging?
            raise RuntimeError(
                "Antechamber/sqm partial charge calculation failed")
            
            
        # Read the charges
        with open(f"{tmpdir}/charges.txt", "r") as infile:
            contents = infile.read()
        text_charges = contents.split()
        partial_charges = np.zeros([mol.GetNumAtoms()], np.float64)
        for index, token in enumerate(text_charges):
            partial_charges[index] = float(token)
        # TODO: Ensure that the atoms in charged.mol2 are in the same order as in molecule.sdf
    
    
    #back to standard timemachine stuff:
    # Verify that the charges sum up to an integer
    net_charge = np.sum(partial_charges)
    net_charge_is_integral = np.isclose(net_charge, np.round(net_charge), atol=1e-5)
    assert net_charge_is_integral, f"Charge is not an integer: {net_charge}"

    # https://github.com/proteneer/timemachine#forcefield-gotchas
    # "The charges have been multiplied by sqrt(ONE_4PI_EPS0) as an optimization."
    inlined_constant = np.sqrt(constants.ONE_4PI_EPS0)

    return inlined_constant * partial_charges

@ljmartin
Copy link
Author

now that I think of it: would it be preferable to allow users to supply their own charges? That way antechamber/AM1BCC, XTB, gasteiger, or whatever could all be supported in one hit.

@proteneer
Copy link
Owner

That could be an option too - ie. reading pre-computed off a property in the mol block. It would be agnostic to the charging method as you mentioned. IIRC the OpenFF proposed a spec for the name of the property, might be good to standardize on it.

@proteneer
Copy link
Owner

@ljmartin
Copy link
Author

ljmartin commented Aug 22, 2023

Now that I figured out where charges are generated (by the handful of ***Handler objects such as

class AM1BCCHandler(SerializableMixIn):
, which are hardcoded into the forcefields that the user chooses) I'm struggling to come up with a general solution. I'd like the user to be able to pick any forcefield - let's say, either smirnoff_1_1_0_sc.py or smirnoff_2_0_0_ccc.py, and just swap out the charge generation step. But that would require re-writing every ***Handler separately, or refactoring to pass a 'ChargeHandler' into the forcefield object, for example.

This does work for me as a general solution, however I grant that it's very hacky:

from rdkit.Chem import rdMolDescriptors

charges_a = rdMolDescriptors.CalcEEMcharges(mol_a)
charges_b = rdMolDescriptors.CalcEEMcharges(mol_b)

for c, atom in enumerate(mol_a.GetAtoms()):
    atom.SetDoubleProp('PartialCharge', charges_a[c])
for c, atom in enumerate(mol_b.GetAtoms()):
    atom.SetDoubleProp('PartialCharge', charges_b[c])


class UserChargesHandler(object):
    def __init__(self, q_handle):
        self.params = q_handle.params
        self.smirks = q_handle.smirks

    def parameterize(self, mol):
        partial_charges = np.array([at.GetDoubleProp('PartialCharge') for at in mol.GetAtoms()])
        optimization_constant = np.sqrt(constants.ONE_4PI_EPS0)
        return optimization_constant * partial_charges

    def partial_parameterize(self, params, mol):
        return self.parameterize(mol)

import dataclasses

forcefield = Forcefield.load_from_file("smirnoff_2_0_0_ccc.py")

#intervene here before the `parameterize` method gets called on the charge handlers:
forcefield = dataclasses.replace(forcefield, q_handle=UserChargesHandler(forcefield.q_handle))
forcefield = dataclasses.replace(forcefield, q_handle_solv=UserChargesHandler(forcefield.q_handle_solv))
forcefield = dataclasses.replace(forcefield, q_handle_intra=UserChargesHandler(forcefield.q_handle_intra))

#this will call `parameterize`
st = SingleTopology(mol_a, mol_b, core, forcefield)

I think this solves my immediate issue but I'd still like to make a more general solution as a PR - if you like the ChargeHandler refactor lmk. If I can't figure that out then I'll close this issue up.

@proteneer
Copy link
Owner

@proteneer
Copy link
Owner

(closing this - feel free to re-open if still an issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants