Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AmberTools WBO #508

Merged
merged 17 commits into from
Feb 13, 2020
Merged

Implement AmberTools WBO #508

merged 17 commits into from
Feb 13, 2020

Conversation

j-wags
Copy link
Member

@j-wags j-wags commented Feb 7, 2020

  • Closes Implement and test compute_wiberg_bond_orders in AmberToolsToolkitWrapper  #507 by implementing AmberToolsToolkitWrapper.assign_fractional_bond_orders
  • Change Molecule.compute_wiberg_bond_orders to Molecule.assign_fractional_bond_orders (This goes along with another planned change, from Molecule.compute_partial_charge to Molecule.assign_partial_charges)
  • Change ToolkitWrapper.compute_wiberg_bond_orders to ToolkitWrapper.assign_fractional_bond_orders
  • Add use_conformers kwarg to assign_fractional_bond_orders, which defaults to None. If None, the function generates its own conformer using an available ToolkitWrapper. Otherwise, if the user provides an iterable of conformers, the FBO calculation will use those.
  • Change keyword name for assign_fractional_bond_orders from charge_model to bond_order_model. The previous use of charge_model as a keyword name is a copy-paste mistake, since we copied so much of that function from Molecule.compute_partial_charges.
  • Change accepted keyword values for assign_fractional_bond_orders from am1 and pm3 to am1-wiberg and pm3-wiberg. This is because we may introduce bond order population analyses other than Wiberg in the future.
  • Changes the expected Wiberg bond orders for the existing test molecule. Due to small differences between the values returned by OpenEye and AmberTools I've relaxed the allowed aromatic bond order range.
  • Add tests
  • Update docstrings/documentation, if applicable
  • Update changelog

@codecov-io
Copy link

codecov-io commented Feb 7, 2020

Codecov Report

Merging #508 into master will increase coverage by 0.66%.
The diff coverage is n/a.

@j-wags
Copy link
Member Author

j-wags commented Feb 9, 2020

Ok, so for our "detailed" test molecule (protonated and deprotonated versions of [1]), AmberTools narrowly failed some tests that we had calibrated against OETK. Basically, we tested to ensure that every aromatic bond fell in the range of 1.15 to 1.60 in both the neutral molecule and anion. AmberTools returns values in the range 1.06 to 1.61, which caused these tests to fail. Here's a full summary:

Charge state AmberTools WBO range OE WBO range
neutral 1.34 - 1.44 1.35 - 1.44
anion 1.06 - 1.61 1.12 - 1.55

So basically AmberTools returns slightly more extreme values, but everything is around the same magnitude. Because these differences are small, so I'm going to relax the aromatic tolerances for these tests to accept between 1.05 and 1.65.

Huge thanks to @sukanyasasmal and @ChayaSt and for making this thoughtful test. It was very quick to plug in AmberTools and get results!

[1]
Screen Shot 2020-02-08 at 10 20 06 PM

[2] Printout of aromatic bond orders in these tests (molecule is loaded from SDF so indexing is the same):

OpenEye neutral aromatic order: 1.4249782910821187
OpenEye neutral aromatic order: 1.4247535696661562
OpenEye neutral aromatic order: 1.3993880829721475
OpenEye neutral aromatic order: 1.386557474305936
OpenEye neutral aromatic order: 1.3864636016050376
OpenEye neutral aromatic order: 1.3721272356508605
OpenEye neutral aromatic order: 1.3798238226278605
OpenEye neutral aromatic order: 1.4308855238777132
OpenEye neutral aromatic order: 1.3485824537022644
OpenEye neutral aromatic order: 1.3484389250490467
OpenEye neutral aromatic order: 1.4361705759121075
OpenEye anion aromatic order: 1.4214715268113562
OpenEye anion aromatic order: 1.4082261746981803
OpenEye anion aromatic order: 1.4083384738085785
OpenEye anion aromatic order: 1.421413561882213
OpenEye anion aromatic order: 1.3418572306137813
OpenEye anion aromatic order: 1.3418629033811214
OpenEye anion aromatic order: 1.282103697809885
OpenEye anion aromatic order: 1.2819662464523884
OpenEye anion aromatic order: 1.5524099209448596
OpenEye anion aromatic order: 1.1228566259220674
OpenEye anion aromatic order: 1.1228857329592095
OpenEye anion aromatic order: 1.5523016417471136
AmberTools neutral aromatic order: 1.42847694
AmberTools neutral aromatic order: 1.42838216
AmberTools neutral aromatic order: 1.38847364
AmberTools neutral aromatic order: 1.38967052
AmberTools neutral aromatic order: 1.38955476
AmberTools neutral aromatic order: 1.36395535
AmberTools neutral aromatic order: 1.38584981
AmberTools neutral aromatic order: 1.42623811
AmberTools neutral aromatic order: 1.3548221
AmberTools neutral aromatic order: 1.3413636
AmberTools neutral aromatic order: 1.44752967
AmberTools anion aromatic order: 1.42697576
AmberTools anion aromatic order: 1.40317476
AmberTools anion aromatic order: 1.40317794
AmberTools anion aromatic order: 1.42697246
AmberTools anion aromatic order: 1.31119999
AmberTools anion aromatic order: 1.31119746
AmberTools anion aromatic order: 1.23414108
AmberTools anion aromatic order: 1.23414195
AmberTools anion aromatic order: 1.61292386
AmberTools anion aromatic order: 1.06733526
AmberTools anion aromatic order: 1.06733486
AmberTools anion aromatic order: 1.61292474

@j-wags j-wags changed the title [WIP] Implement AmberTools WBO Implement AmberTools WBO Feb 9, 2020
@jchodera
Copy link
Member

jchodera commented Feb 9, 2020

It's likely the case that OpenEye quacpac uses some "quick and dirty" tolerances and convergence criteria to give a semiquantitative result faster, so this is a perfectly reasonable approach!

@openff-dangerbot
Copy link

openff-dangerbot commented Feb 11, 2020

Reviewer Roulette

A review has been requested!

To spread load more evenly across eligible reviewers, Danger has randomly picked a candidate for this review and assigned them to this PR.

If you need to run the roulette again (for example, if the assigned reviewer is unavailable), simply un-request a review from me, then request again. After about 45 seconds, I will update the message with a new random reviewer.

Reviewer
@jthorton

Review guidelines

Some notes from @j-wags that we can refine as we do this more

Timeline and responsibilities

The PR reviewer should perform a review within 48 hours of being assigned.

The review may take up to three hours.

If few or insignificant changes are needed, the reviewer should accept the PR.

If substantial fixes should be made (see categories below), the reviewer should request changes, indicating which comments are high-priority (blocking), as opposed to questions or comments.

The PR author will then correct any blocking issues and re-request review. The re-review should focus just on the changes that were requested and any new code that was added in response.

The PR author is the only person who should modify the code in the branch, and it is customary to let them press the "merge" button once the PR is approved. Either person can "resolve" non-blocking comment threads, but only the reviewer should "resolve" comment threads that prompt a re-review.

The PR author

The person requesting the review should ensure that the purpose of the review is clearly explained, by linking relevant GitHub Issues in the PR body text (ex "Closes #12"), using clear variable names, commenting non-obvious code, and identifying areas of the diff that are unusual (ex. "the molecule_to_string function was cut and pasted to a different file and didn't change, so don't review it"). If the PR diff is larger than 300 lines, they should identify the area to prioritize for the review, to let the reviewer add as much value as possible if they are time-constrained.

The PR assignee

The newly-assigned reviewer should acknowledge that they received the request, and confirm that they can perform the review within 48 hours. Generally, a good review strategy is to:

  • Ensure that you understand the overall purpose of the codebase (consider looking at the codebase's tests or examples to understand how the functions are used)
  • Read through the body text of the PR (click through to any relevant tagged issues to understand the discussion around the changes)
  • Ensure that you understand the purpose of the changes in the PR
  • Read over the entire PR diff (minus data files) without commenting
  • Then, begin adding comments or thoughts, in the order of
    • Examples
    • Tests
    • Core code

Types of comments

I've found that my comments fall into a few rough categories. This is a list of them in descending order of value:

[If any of these are present, you should request changes]

  • Conceptual -- "I don't understand what these changes aim to do"
  • Scientific -- "This won't work because nitrogen can have four bonds"
  • Algorithmic -- "This may crash because the input list is empty"
  • Testing -- "This functionality was added, but never tested". If the codebase is very young, this may be waived, but after a few months, test coverage should strictly increase with each PR.

[These may be blocking at the reviewer's discretion]

  • API -- "The name of this function is confusing, and I'd expect it to do something else"
  • Documentation -- "Add docstrings to these functions/Improve the ones that are there"

[Simple fixes that generally aren't blocking]

  • Grammar -- "It's its, not its'"
  • Readability -- This triple list comprehension is unnecessary and would be really difficult to debug"

A few other tips for reviewers:

  • In larger-scale software development, the code review begins almost before any code is written. These conversations cover architecture, API design and specifications, and provide a blueprint for the work to be done. It's hard to receive code reviews after you've written an entire new module saying "actually, these two classes should be three classes", because it's requesting a large-scale change that would add days if not weeks to the development time. We are here to experiment with cool science, so be sure to give the code author an "out" if you're asking for large, not-completely-essential changes, by recommending that the refactor become an Issue that can be discussed over time and addressed as the code becomes more mature.
  • Give concrete suggestions whenever possible. It's better to say "Consider renaming this to molecule_to_string" than "this function name is confusing".
  • If you give open-ended feedback, be active in the discussion thread as the author asks for clarification or proposes solutons.
  • If there are three or less instances of the same concern, it's OK to point out each with a separate comment. However, if there are lots of repeats, just say "this issue appears throughout the proposed changes".
  • Say some nice things. I've learned a lot from doing code reviews, and it's cool when people point out a trick from my code that they'll use later, or something that they thought was particularly well-designed. As a dispersed team, we don't have the lab banter that usually clears up interpersonal tension, so be positive whenever you have the chance.
  • Every codebase has a different history and level of maturity, and each PR should ensure that the changes improve its overall quality a little bit. So adjust the level of the review based on what stage you see the software in.
  • Aim for around ~10 to 15 comments. More than that is overwhelming.
  • Watch out for toolkit- or file-dependent behavior in tests. Just because the example molecule puts all the heavy atoms before the hydrogens doesn't mean that will always be the case.
  • You are certainly not limited by the above. If you'd like to learn more, consider reading Google's Code Review Guide and feel free to suggest changes to this message.
  • Keeping a checklist while you review might be useful!
Checklist example

- [X] This task has been finished
- [ ] This item is still pending

What am I?

I am the Open Force Field Dangerbot. You can find my code and installation instructions
at https://github.com/openforcefield/dangerbot.

This reviewer was selected out of a list of Open Force Field volunteers: ["j-wags", "jaimergp", "simonboothroyd", "trevorgokey", "vtlim", "dfhahn", "jthorton", "chayast", "dgasmith", "maxentile", "jchodera"]

@@ -33,18 +47,18 @@ Behavior changed
""""""""""""""""
- `PR #469 <https://github.com/openforcefield/openforcefield/pull/469>`_:
When running :py:meth:`Topology.to_openmm <openforcefield.topology.Topology.to_openmm>`, unique atom names
are generated (overriding any existing atom names) if the provided atom names are not unique. This
are generated if the provided atom names are not unique (overriding any existing atom names). This
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to releasehistory below here are small grammar/formatting/syntax fixes and are not related to this PR.

# Prepare sqm.in file as if we were going to run charge calc
# TODO: Add error handling if antechamber chokes
subprocess.check_output([
"antechamber", "-i", "molecule.sdf", "-fi", "sdf", "-o",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used subprocess.check_output before so I am not sure if you can get the same effect using subprocess.run but this is recommended after python 3.5 https://docs.python.org/3/library/subprocess.html.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgasmith recently recommended using check_output, but that was in the context of replacing several os.system calls.

The big difference between run and check_output [1] according to the docs is that check_output operates like run, except the keyword argument check=True is always set. This is good for our use case, because if some external dependency throws an error on the command line, we should raise an exception for the user [2]. I'm happy to standardize on either solution, but right now it will save us a few characters to use check_output everywhere.

[1] https://docs.python.org/3/library/subprocess.html#subprocess.check_output
[2] Except I've heard tales that tleap is shady about this -- apparently it can raise a non-zero exit value, but finish successfully

raise (IOError("Antechamber not found, cannot run "
"AmberToolsToolkitWrapper.assign_fractional_bond_orders()"))

if len(molecule._conformers) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this point but this could be a pain for users, is it not better to assume we need to generate a conformer if one is not supplied and just do this for them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, and actually this parallels an API change that we'll make with compute_partial_charges. I'll make this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addresses in 8bd7fec

"molecule.generate_conformers() before calling molecule.assign_fractional_bond_orders"
)
if len(molecule._conformers) > 1:
logger.warning(f"Warning: In AmberToolsToolkitWrapper.assign_fractional_bond_orders: "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how slow would it be to generate the bond order for all of the supplied conformers and average over them in some way? Could this even be done in parallel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think conformer averaging would be a nice thing to do. But, like ELF10, it's not clear that there's a single common-sense way to do it. So, I'd like to leave conformer averaging algorithms as something for the community to experiment with. Once someone wants to try it, they can invent a new keyword for bond_order_model, and implement support for it in a ToolkitWrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely seems like a thing to do separately/unresolved science.

# Note that sqm calculate WBOs for ALL PAIRS of atoms, not just those that have
# bonds defined in the original molecule. So here we iterate over the bonds in
# the original molecule and only nab the WBOs for those.
for bond in molecule.bonds:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this could go wrong and we pull out a bond order for the wrong pair of atoms due to file reordering somewhere in the workflow? Should we compare atom symbols as well to make sure the bond is correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this could go wrong and we pull out a bond order for the wrong pair of atoms due to file reordering somewhere in the workflow

Yes.

Right now, AmberToolsToolkitWrapper.assign_fractional_bond_orders and compute_partial_charges assume that the atom indexing is not modified by antechamber. I don't know if this is guaranteed everywhere, but nobody has reported a bug so far. The toolkit makes several assumptions like this, and beyond our "throw a lot of molecules through it, occasionally reversing the atom ordering, and see if it's alright" tests, I doubt even the AMBER manual says anything about this.

So, this is something that occasionally keeps me up at night, but other than sifting through a bunch of fortran, I'm not sure that we can really prove this assumption.

Long story short, "added in 43377fe "

@jthorton
Copy link
Collaborator

jthorton commented Feb 12, 2020

Overall looks really good, with great test coverage! One thing to check is subprocess I think you could get the same result using run with check=True.

@j-wags
Copy link
Member Author

j-wags commented Feb 13, 2020

Thanks for the quick review, @jthorton. I've addressed your comments and will merge when tests pass

Copy link
Collaborator

@jthorton jthorton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great excited to see the bond order interpolation now!

@j-wags j-wags merged commit 4eda341 into master Feb 13, 2020
@j-wags j-wags deleted the ambertools-wbo branch February 13, 2020 22:31
@j-wags j-wags added this to the 0.7.0 milestone Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement and test compute_wiberg_bond_orders in AmberToolsToolkitWrapper
6 participants