-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement AmberTools WBO #508
Conversation
Ok, so for our "detailed" test molecule (protonated and deprotonated versions of [1]), AmberTools narrowly failed some tests that we had calibrated against OETK. Basically, we tested to ensure that every aromatic bond fell in the range of 1.15 to 1.60 in both the neutral molecule and anion. AmberTools returns values in the range 1.06 to 1.61, which caused these tests to fail. Here's a full summary:
So basically AmberTools returns slightly more extreme values, but everything is around the same magnitude. Because these differences are small, so I'm going to relax the aromatic tolerances for these tests to accept between 1.05 and 1.65. Huge thanks to @sukanyasasmal and @ChayaSt and for making this thoughtful test. It was very quick to plug in AmberTools and get results! [2] Printout of aromatic bond orders in these tests (molecule is loaded from SDF so indexing is the same):
|
It's likely the case that OpenEye |
Reviewer RouletteA review has been requested! To spread load more evenly across eligible reviewers, Danger has randomly picked a candidate for this review and assigned them to this PR. If you need to run the roulette again (for example, if the assigned reviewer is unavailable), simply un-request a review from me, then request again. After about 45 seconds, I will update the message with a new random reviewer.
Review guidelinesSome notes from @j-wags that we can refine as we do this more Timeline and responsibilitiesThe PR reviewer should perform a review within 48 hours of being assigned. The review may take up to three hours. If few or insignificant changes are needed, the reviewer should accept the PR. If substantial fixes should be made (see categories below), the reviewer should request changes, indicating which comments are high-priority (blocking), as opposed to questions or comments. The PR author will then correct any blocking issues and re-request review. The re-review should focus just on the changes that were requested and any new code that was added in response. The PR author is the only person who should modify the code in the branch, and it is customary to let them press the "merge" button once the PR is approved. Either person can "resolve" non-blocking comment threads, but only the reviewer should "resolve" comment threads that prompt a re-review. The PR authorThe person requesting the review should ensure that the purpose of the review is clearly explained, by linking relevant GitHub Issues in the PR body text (ex "Closes #12"), using clear variable names, commenting non-obvious code, and identifying areas of the diff that are unusual (ex. "the molecule_to_string function was cut and pasted to a different file and didn't change, so don't review it"). If the PR diff is larger than 300 lines, they should identify the area to prioritize for the review, to let the reviewer add as much value as possible if they are time-constrained. The PR assigneeThe newly-assigned reviewer should acknowledge that they received the request, and confirm that they can perform the review within 48 hours. Generally, a good review strategy is to:
Types of commentsI've found that my comments fall into a few rough categories. This is a list of them in descending order of value: [If any of these are present, you should request changes]
[These may be blocking at the reviewer's discretion]
[Simple fixes that generally aren't blocking]
A few other tips for reviewers:
What am I?I am the Open Force Field Dangerbot. You can find my code and installation instructions This reviewer was selected out of a list of Open Force Field volunteers: ["j-wags", "jaimergp", "simonboothroyd", "trevorgokey", "vtlim", "dfhahn", "jthorton", "chayast", "dgasmith", "maxentile", "jchodera"] |
@@ -33,18 +47,18 @@ Behavior changed | |||
"""""""""""""""" | |||
- `PR #469 <https://github.com/openforcefield/openforcefield/pull/469>`_: | |||
When running :py:meth:`Topology.to_openmm <openforcefield.topology.Topology.to_openmm>`, unique atom names | |||
are generated (overriding any existing atom names) if the provided atom names are not unique. This | |||
are generated if the provided atom names are not unique (overriding any existing atom names). This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to releasehistory
below here are small grammar/formatting/syntax fixes and are not related to this PR.
# Prepare sqm.in file as if we were going to run charge calc | ||
# TODO: Add error handling if antechamber chokes | ||
subprocess.check_output([ | ||
"antechamber", "-i", "molecule.sdf", "-fi", "sdf", "-o", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used subprocess.check_output
before so I am not sure if you can get the same effect using subprocess.run
but this is recommended after python 3.5 https://docs.python.org/3/library/subprocess.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgasmith recently recommended using check_output
, but that was in the context of replacing several os.system
calls.
The big difference between run
and check_output
[1] according to the docs is that check_output
operates like run
, except the keyword argument check=True
is always set. This is good for our use case, because if some external dependency throws an error on the command line, we should raise an exception for the user [2]. I'm happy to standardize on either solution, but right now it will save us a few characters to use check_output
everywhere.
[1] https://docs.python.org/3/library/subprocess.html#subprocess.check_output
[2] Except I've heard tales that tleap is shady about this -- apparently it can raise a non-zero exit value, but finish successfully
openforcefield/utils/toolkits.py
Outdated
raise (IOError("Antechamber not found, cannot run " | ||
"AmberToolsToolkitWrapper.assign_fractional_bond_orders()")) | ||
|
||
if len(molecule._conformers) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this point but this could be a pain for users, is it not better to assume we need to generate a conformer if one is not supplied and just do this for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, and actually this parallels an API change that we'll make with compute_partial_charges
. I'll make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addresses in 8bd7fec
"molecule.generate_conformers() before calling molecule.assign_fractional_bond_orders" | ||
) | ||
if len(molecule._conformers) > 1: | ||
logger.warning(f"Warning: In AmberToolsToolkitWrapper.assign_fractional_bond_orders: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how slow would it be to generate the bond order for all of the supplied conformers and average over them in some way? Could this even be done in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conformer averaging would be a nice thing to do. But, like ELF10, it's not clear that there's a single common-sense way to do it. So, I'd like to leave conformer averaging algorithms as something for the community to experiment with. Once someone wants to try it, they can invent a new keyword for bond_order_model
, and implement support for it in a ToolkitWrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely seems like a thing to do separately/unresolved science.
# Note that sqm calculate WBOs for ALL PAIRS of atoms, not just those that have | ||
# bonds defined in the original molecule. So here we iterate over the bonds in | ||
# the original molecule and only nab the WBOs for those. | ||
for bond in molecule.bonds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible this could go wrong and we pull out a bond order for the wrong pair of atoms due to file reordering somewhere in the workflow? Should we compare atom symbols as well to make sure the bond is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible this could go wrong and we pull out a bond order for the wrong pair of atoms due to file reordering somewhere in the workflow
Yes.
Right now, AmberToolsToolkitWrapper.assign_fractional_bond_orders
and compute_partial_charges
assume that the atom indexing is not modified by antechamber. I don't know if this is guaranteed everywhere, but nobody has reported a bug so far. The toolkit makes several assumptions like this, and beyond our "throw a lot of molecules through it, occasionally reversing the atom ordering, and see if it's alright" tests, I doubt even the AMBER manual says anything about this.
So, this is something that occasionally keeps me up at night, but other than sifting through a bunch of fortran, I'm not sure that we can really prove this assumption.
Long story short, "added in 43377fe "
Overall looks really good, with great test coverage! One thing to check is subprocess I think you could get the same result using run with |
Thanks for the quick review, @jthorton. I've addressed your comments and will merge when tests pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great excited to see the bond order interpolation now!
This parallels plans for the `assign_partial_charges` API
compute_wiberg_bond_orders
in AmberToolsToolkitWrapper #507 by implementingAmberToolsToolkitWrapper.assign_fractional_bond_orders
Molecule.compute_wiberg_bond_orders
toMolecule.assign_fractional_bond_orders
(This goes along with another planned change, fromMolecule.compute_partial_charge
toMolecule.assign_partial_charges
)ToolkitWrapper.compute_wiberg_bond_orders
toToolkitWrapper.assign_fractional_bond_orders
use_conformers
kwarg toassign_fractional_bond_orders
, which defaults to None. If None, the function generates its own conformer using an available ToolkitWrapper. Otherwise, if the user provides an iterable of conformers, the FBO calculation will use those.assign_fractional_bond_orders
fromcharge_model
tobond_order_model
. The previous use ofcharge_model
as a keyword name is a copy-paste mistake, since we copied so much of that function fromMolecule.compute_partial_charges
.assign_fractional_bond_orders
fromam1
andpm3
toam1-wiberg
andpm3-wiberg
. This is because we may introduce bond order population analyses other than Wiberg in the future.