Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix runtime issues in toolkit showcase #1391

Merged
merged 16 commits into from
Sep 1, 2022
Merged

Conversation

mattwthompson
Copy link
Member

No description provided.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

Merging #1391 (904749a) into main (7a6c7b5) will increase coverage by 0.22%.
The diff coverage is n/a.

Additional details and impacted files

@mattwthompson
Copy link
Member Author

I suspect the trajectory file not existing could be due to the simulation not reaching 100 steps in 1 minute of walltime

@mattwthompson
Copy link
Member Author

Timings on my machine (M1 Pro):

========================================================================= slowest durations ==========================================================================
155.27s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
150.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
63.26s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
60.98s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
42.83s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
16.25s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
11.70s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
8.44s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
7.24s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
3.22s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
2.84s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 8
1.84s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
0.93s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 14
0.80s setup    examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
0.28s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 4
0.22s teardown examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
0.05s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 3
0.05s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 2
0.01s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 1
0.01s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 9

@mattwthompson
Copy link
Member Author

Here are timings from a recent run (in CI):

============================== slowest durations ===============================
1056.[66](https://github.com/openforcefield/openff-toolkit/runs/8121297722?check_suite_focus=true#step:13:67)s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
489.38s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
426.00s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
165.37s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
72.06s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
62.54s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
40.20s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
30.86s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
22.18s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
9.78s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 8
8.46s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
2.24s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 14
1.99s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
1.72s setup    examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
1.51s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 3
0.[68](https://github.com/openforcefield/openff-toolkit/runs/8121297722?check_suite_focus=true#step:13:69)s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 4
0.24s teardown examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
0.13s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 2
0.02s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 1
0.02s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 9

Cell 15 is the one that does the energy minimization, Cell 12 calls ForceField.create_interchange, and Cell 13 calls Interchange.to_openmm

NB_ARGS: -v --nbval-lax --ignore=examples/deprecated
NB_ARGS: -v --nbval-lax --ignore=examples/deprecated --durations=20
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ pytest --help | grep durations
  --durations=N         show N slowest setup/test durations (N=0 for all).
  --durations-min=N     Minimal duration in seconds for inclusion in slowest

@mattwthompson
Copy link
Member Author

We were able to isolate two possible issues

  • The energy minimization takes a very long time with the default OpenMM arguments; scaling the tolerance to 20 kJ/mol or higher drops the energy minimization time to a few seconds
  • The water padding creates a pretty large box because the protein is not too spherical.

Comment on lines -142 to +139
python -m pytest $NB_ARGS examples
python -m pytest $PYTEST_ARGS $NB_ARGS examples
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding $PYTEST_ARGS mostly serves the role of counting examples for code coverage. If that's something we want to do.

@mattwthompson
Copy link
Member Author

Timings are much better now:

============================= slowest 20 durations =============================
272.31s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
161.80s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
122.00s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
107.73s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
101.41s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
97.95s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
86.21s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
80.95s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
78.26s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
60.61s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
54.90s call     examples/conformer_energies/conformer_energies.ipynb::Cell 3
38.19s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
36.78s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
20.87s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
19.28s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
18.72s call     examples/external/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 1
15.53s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 4
15.41s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
13.70s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
8.46s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11

@mattwthompson mattwthompson changed the title Debug toolkit showcase failures Fix runtime issues in toolkit showcase Sep 1, 2022
@Yoshanuikabundi
Copy link
Collaborator

OK, these changes are great. I initially didn't love the change to the water box padding, but I'm convinced its essential to get the runtime down. I've added a little warning so no one misses it when adapting their own work flows. I'm going to have a look for alternative protein-ligand systems in benchmarks that are smaller/more spherical to see if we can bring the time down further.

@Yoshanuikabundi
Copy link
Collaborator

Okay Galectin from the protein ligand benchmark seems promising; it has some very funky ligands and comes up to about 31k atoms with a 1nm buffer, and 19k with a 0.5nm buffer. it's the smallest radius protein in either the protein-ligand benchmark set or the Merck FEP benchmark set. However, it's apparently being removed by PR 52 in that repo. I've gotta run somewhere now but I'll try and get galectin into this branch either later tonight or tomorrow morning.

@Yoshanuikabundi
Copy link
Collaborator

Wait no lol if you calculate the radius instead of a completely random and meaningless quantity there's an even smaller protein in the merck dataset. I should get more sleep.

Copy link
Collaborator

@Yoshanuikabundi Yoshanuikabundi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old protein with 0.5nm buffer has 43 636 atoms.

New protein with 1.0nm buffer has 23 978 atoms. If we're still hungry for speed we can try making the buffer smaller again, it reduces the count down to 13 985 atoms.

Ligand is now extra funky. It has fluorines, a sulfone, an indane and a nitrile (two of those names I didn't have to look up!)

New timings (ubuntu-latest, Python 3.8, RDKit=false, OpenEye=true):

 ============================= slowest 20 durations =============================
129.19s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
115.29s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
107.08s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
103.11s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
81.60s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
67.76s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
65.03s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
62.39s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
49.18s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
22.96s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
20.35s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
13.85s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
12.63s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
9.47s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
8.47s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11
6.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
6.68s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 9
5.02s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 4
4.11s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 3
3.45s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 16
=========== 109 passed, 3 skipped, 11 warnings in 968.21s (0:16:08) ============

@mattwthompson If you're happy with the state of this, it looks good to me. Thanks also for looking over the code with an eye to readability - using MDTraj's selection language, renaming mdt_traj to trajectory are both great improvements.

@@ -5,40 +5,49 @@ channels:
dependencies:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the examples environment to more closely match the environments used in CI. @j-wags @mattwthompson At some point it would be good to talk about what we want to do with this environment - its the user-facing examples environment and at some point it stopped being the environment being used in CI, which was news to me today.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only made that switch during this release, weird things were happening around updating an existing environment and which environment the shell had access to later on.

IIUC users aren't expected to install from an environment themselves, since we distribute an openff-toolkit-examples package ourselves.

@mattwthompson
Copy link
Member Author

Thanks! I'll double-check the timings before merging but as long as things outside of our control aren't too slow, I think this is good. Making Interchange creation and .to_openmm run faster is an immediate priority so I'm not worried about those cells at the moment.

Just noting a few surprising coverage changes (each of which are probably the result of other notebooks' calls being included):

Now included (and ideally included in unit tests):

Now excluded (surprisingly):

@mattwthompson
Copy link
Member Author

I'm pretty happy with this (I forget why I'm re-checking timings - in my defense this is the first comment I've written today with coffee brewed)

============================= slowest 20 durations =============================
137.57s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
127.17s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
112.79s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
100.58s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
91.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
85.57s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
61.12s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
60.89s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
55.69s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
26.21s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
23.62s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
17.35s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
14.47s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
9.92s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11
9.26s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
8.06s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
7.61s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 9
5.19s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 4
4.13s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 3
3.96s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 16
=========== 109 passed, 3 skipped, 11 warnings in 1043.40s (0:17:23) ===========

@mattwthompson mattwthompson merged commit 9b61d77 into main Sep 1, 2022
@mattwthompson mattwthompson deleted the fix-toolkit-showcase branch September 1, 2022 15:23
@j-wags
Copy link
Member

j-wags commented Sep 1, 2022

Thanks for the quick work and decisiveness on this PR - It was awesome to wake up to this! I'm just seeing that the new notebook is 40 MB and won't render on GitHub, so I'm going to prune it a bit and push directly to main so I can link this in the release announcement :-)

Screen Shot 2022-09-01 at 9 50 14 AM

@mattwthompson
Copy link
Member Author

If in the future you didn't prefer notebooks also include their output, here's a nice tool to automatically strip everything out: https://github.com/kynan/nbstripout

@j-wags
Copy link
Member

j-wags commented Sep 1, 2022

In most cases I like it, since the notebooks have meaningful output. But this one is just all nglview that won't render online anyway, so the cost/benefit of saving output is pretty high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants