Fix runtime issues in toolkit showcase #1391

mattwthompson · 2022-08-31T16:06:00Z

No description provided.

review-notebook-app · 2022-08-31T16:06:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2022-08-31T16:18:38Z

Codecov Report

Merging #1391 (904749a) into main (7a6c7b5) will increase coverage by 0.22%.
The diff coverage is n/a.

Additional details and impacted files

mattwthompson · 2022-08-31T17:29:55Z

I suspect the trajectory file not existing could be due to the simulation not reaching 100 steps in 1 minute of walltime

mattwthompson · 2022-08-31T19:20:50Z

Timings on my machine (M1 Pro):

========================================================================= slowest durations ==========================================================================
155.27s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
150.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
63.26s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
60.98s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
42.83s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
16.25s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
11.70s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
8.44s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
7.24s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
3.22s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
2.84s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 8
1.84s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
0.93s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 14
0.80s setup    examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
0.28s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 4
0.22s teardown examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
0.05s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 3
0.05s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 2
0.01s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 1
0.01s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 9

mattwthompson · 2022-08-31T20:12:20Z

Here are timings from a recent run (in CI):

============================== slowest durations ===============================
1056.[66](https://github.com/openforcefield/openff-toolkit/runs/8121297722?check_suite_focus=true#step:13:67)s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
489.38s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
426.00s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
165.37s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
72.06s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
62.54s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
40.20s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
30.86s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
22.18s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
9.78s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 8
8.46s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
2.24s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 14
1.99s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
1.72s setup    examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 0
1.51s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 3
0.[68](https://github.com/openforcefield/openff-toolkit/runs/8121297722?check_suite_focus=true#step:13:69)s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 4
0.24s teardown examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 17
0.13s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 2
0.02s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 1
0.02s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 9

Cell 15 is the one that does the energy minimization, Cell 12 calls ForceField.create_interchange, and Cell 13 calls Interchange.to_openmm

mattwthompson · 2022-08-31T20:21:21Z

.github/workflows/examples.yml

-      NB_ARGS: -v --nbval-lax --ignore=examples/deprecated
+      NB_ARGS: -v --nbval-lax --ignore=examples/deprecated --durations=20


$ pytest --help | grep durations --durations=N show N slowest setup/test durations (N=0 for all). --durations-min=N Minimal duration in seconds for inclusion in slowest

mattwthompson · 2022-08-31T20:44:39Z

We were able to isolate two possible issues

The energy minimization takes a very long time with the default OpenMM arguments; scaling the tolerance to 20 kJ/mol or higher drops the energy minimization time to a few seconds
The water padding creates a pretty large box because the protein is not too spherical.

Co-authored-by: Jeff Wagner <[email protected]>

mattwthompson · 2022-08-31T21:23:58Z

.github/workflows/examples.yml

-          python -m pytest $NB_ARGS examples
+          python -m pytest $PYTEST_ARGS $NB_ARGS examples


Adding $PYTEST_ARGS mostly serves the role of counting examples for code coverage. If that's something we want to do.

mattwthompson · 2022-08-31T22:02:48Z

Timings are much better now:

============================= slowest 20 durations =============================
272.31s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
161.80s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
122.00s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
107.73s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
101.41s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
97.95s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
86.21s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
80.95s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
78.26s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
60.61s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
54.90s call     examples/conformer_energies/conformer_energies.ipynb::Cell 3
38.19s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
36.78s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
20.87s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
19.28s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
18.72s call     examples/external/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 1
15.53s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 4
15.41s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 5
13.70s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
8.46s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11

This reverts commit 76e949a.

Yoshanuikabundi · 2022-09-01T05:40:11Z

OK, these changes are great. I initially didn't love the change to the water box padding, but I'm convinced its essential to get the runtime down. I've added a little warning so no one misses it when adapting their own work flows. I'm going to have a look for alternative protein-ligand systems in benchmarks that are smaller/more spherical to see if we can bring the time down further.

Yoshanuikabundi · 2022-09-01T06:52:24Z

Okay Galectin from the protein ligand benchmark seems promising; it has some very funky ligands and comes up to about 31k atoms with a 1nm buffer, and 19k with a 0.5nm buffer. it's the smallest radius protein in either the protein-ligand benchmark set or the Merck FEP benchmark set. However, it's apparently being removed by PR 52 in that repo. I've gotta run somewhere now but I'll try and get galectin into this branch either later tonight or tomorrow morning.

Yoshanuikabundi · 2022-09-01T09:59:16Z

Wait no lol if you calculate the radius instead of a completely random and meaningless quantity there's an even smaller protein in the merck dataset. I should get more sleep.

Yoshanuikabundi

Old protein with 0.5nm buffer has 43 636 atoms.

New protein with 1.0nm buffer has 23 978 atoms. If we're still hungry for speed we can try making the buffer smaller again, it reduces the count down to 13 985 atoms.

Ligand is now extra funky. It has fluorines, a sulfone, an indane and a nitrile (two of those names I didn't have to look up!)

New timings (ubuntu-latest, Python 3.8, RDKit=false, OpenEye=true):

 ============================= slowest 20 durations =============================
129.19s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
115.29s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
107.08s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
103.11s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
81.60s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
67.76s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
65.03s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
62.39s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
49.18s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
22.96s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
20.35s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
13.85s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
12.63s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
9.47s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
8.47s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11
6.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
6.68s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 9
5.02s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 4
4.11s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 3
3.45s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 16
=========== 109 passed, 3 skipped, 11 warnings in 968.21s (0:16:08) ============

@mattwthompson If you're happy with the state of this, it looks good to me. Thanks also for looking over the code with an eye to readability - using MDTraj's selection language, renaming mdt_traj to trajectory are both great improvements.

Yoshanuikabundi · 2022-09-01T11:26:25Z

examples/environment.yaml

@@ -5,40 +5,49 @@ channels:
 dependencies:


I updated the examples environment to more closely match the environments used in CI. @j-wags @mattwthompson At some point it would be good to talk about what we want to do with this environment - its the user-facing examples environment and at some point it stopped being the environment being used in CI, which was news to me today.

We only made that switch during this release, weird things were happening around updating an existing environment and which environment the shell had access to later on.

IIUC users aren't expected to install from an environment themselves, since we distribute an openff-toolkit-examples package ourselves.

mattwthompson · 2022-09-01T14:57:59Z

Thanks! I'll double-check the timings before merging but as long as things outside of our control aren't too slow, I think this is good. Making Interchange creation and .to_openmm run faster is an immediate priority so I'm not worried about those cells at the moment.

Just noting a few surprising coverage changes (each of which are probably the result of other notebooks' calls being included):

Now included (and ideally included in unit tests):

openff-toolkit/openff/toolkit/topology/topology.py

Lines 1627 to 1630 in 7a6c7b5

    
           if off_topology.box_vectors is not None: 
        
               from openff.units.openmm import to_openmm 
        
               omm_topology.setPeriodicBoxVectors(to_openmm(off_topology.box_vectors))

openff-toolkit/openff/toolkit/utils/rdkit_wrapper.py

Line 2027 in 7a6c7b5

rdmol.SetDoubleProp(name, value)

Now excluded (surprisingly):

openff-toolkit/openff/toolkit/topology/molecule.py

Line 981 in 7a6c7b5

raise ValueError(

openff-toolkit/openff/toolkit/topology/molecule.py

Lines 994 to 999 in 7a6c7b5

    
           msg = ( 
        
               f"Cannot construct openff.toolkit.topology.Molecule from {other}\n" 
        
           ) 
        
           for value_error in value_errors: 
        
               msg += str(value_error) 
        
           raise ValueError(msg)

openff-toolkit/openff/toolkit/utils/openeye_wrapper.py

Line 2479 in 7a6c7b5

bond_order_model = "am1-wiberg"

mattwthompson · 2022-09-01T15:23:00Z

I'm pretty happy with this (I forget why I'm re-checking timings - in my defense this is the first comment I've written today with coffee brewed)

============================= slowest 20 durations =============================
137.57s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
127.17s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
112.79s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
100.58s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
91.74s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
85.57s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
61.12s call     examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
60.89s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
55.69s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
26.21s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
23.62s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
17.35s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
14.47s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
9.92s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11
9.26s call     examples/virtual_sites/vsite_showcase.ipynb::Cell 4
8.06s call     examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
7.61s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 9
5.19s call     examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 4
4.13s call     examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 3
3.96s call     examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 16
=========== 109 passed, 3 skipped, 11 warnings in 1043.40s (0:17:23) ===========

j-wags · 2022-09-01T16:57:50Z

Thanks for the quick work and decisiveness on this PR - It was awesome to wake up to this! I'm just seeing that the new notebook is 40 MB and won't render on GitHub, so I'm going to prune it a bit and push directly to main so I can link this in the release announcement :-)

mattwthompson · 2022-09-01T17:00:18Z

If in the future you didn't prefer notebooks also include their output, here's a nice tool to automatically strip everything out: https://github.com/kynan/nbstripout

j-wags · 2022-09-01T17:04:05Z

In most cases I like it, since the notebooks have meaningful output. But this one is just all nglview that won't render online anyway, so the cost/benefit of saving output is pretty high.

mattwthompson added 2 commits August 31, 2022 10:51

Use MDTraj API for atom selection

5e9c1a9

Use DCD reporter from OpenMM, debug

ad857df

Fix typo, debug

af93fe7

Do not close reporter

5d349b6

Debug timings

7ccbb19

Remove some debug code

65120fc

mattwthompson commented Aug 31, 2022

View reviewed changes

Run notebook again

b094de4

mattwthompson mentioned this pull request Aug 31, 2022

Toolkit showcase example broken/unworkably slow #1392

Closed

mattwthompson and others added 2 commits August 31, 2022 15:58

Decrease water padding, loosen energy minimization tolerance

5b6d17e

Co-authored-by: Jeff Wagner <[email protected]>

Merge remote-tracking branch 'upstream/main' into fix-toolkit-showcase

6bd7dbb

mattwthompson commented Aug 31, 2022

View reviewed changes

mattwthompson requested a review from Yoshanuikabundi August 31, 2022 22:02

mattwthompson changed the title ~~Debug toolkit showcase failures~~ Fix runtime issues in toolkit showcase Sep 1, 2022

Yoshanuikabundi added 5 commits September 1, 2022 12:46

Add DCD files to gitignore

437b0f2

Try a much higher EM tolerance with 1nm padding

76e949a

Revert "Try a much higher EM tolerance with 1nm padding"

798fbcd

This reverts commit 76e949a.

Update examples environment

04d3a8d

Add warning about box size

4d947e6

Switch to smaller protein

6acdbbb

Yoshanuikabundi approved these changes Sep 1, 2022

View reviewed changes

Remove brackets

904749a

mattwthompson merged commit 9b61d77 into main Sep 1, 2022

mattwthompson deleted the fix-toolkit-showcase branch September 1, 2022 15:23

mattwthompson mentioned this pull request Sep 1, 2022

Add nbqa-flake8 to safeguard against unused imports #1396

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix runtime issues in toolkit showcase #1391

Fix runtime issues in toolkit showcase #1391

mattwthompson commented Aug 31, 2022

review-notebook-app bot commented Aug 31, 2022

codecov bot commented Aug 31, 2022 •

edited

Loading

mattwthompson commented Aug 31, 2022

mattwthompson commented Aug 31, 2022

mattwthompson commented Aug 31, 2022

mattwthompson Aug 31, 2022

mattwthompson commented Aug 31, 2022

mattwthompson Aug 31, 2022

mattwthompson commented Aug 31, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi left a comment

Yoshanuikabundi Sep 1, 2022

mattwthompson Sep 1, 2022

mattwthompson commented Sep 1, 2022

mattwthompson commented Sep 1, 2022

j-wags commented Sep 1, 2022

mattwthompson commented Sep 1, 2022

j-wags commented Sep 1, 2022

		NB_ARGS: -v --nbval-lax --ignore=examples/deprecated
		NB_ARGS: -v --nbval-lax --ignore=examples/deprecated --durations=20

		python -m pytest $NB_ARGS examples
		python -m pytest $PYTEST_ARGS $NB_ARGS examples

Fix runtime issues in toolkit showcase #1391

Fix runtime issues in toolkit showcase #1391

Conversation

mattwthompson commented Aug 31, 2022

review-notebook-app bot commented Aug 31, 2022

codecov bot commented Aug 31, 2022 • edited Loading

Codecov Report

mattwthompson commented Aug 31, 2022

mattwthompson commented Aug 31, 2022

mattwthompson commented Aug 31, 2022

mattwthompson Aug 31, 2022

Choose a reason for hiding this comment

mattwthompson commented Aug 31, 2022

mattwthompson Aug 31, 2022

Choose a reason for hiding this comment

mattwthompson commented Aug 31, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi commented Sep 1, 2022

Yoshanuikabundi left a comment

Choose a reason for hiding this comment

Yoshanuikabundi Sep 1, 2022

Choose a reason for hiding this comment

mattwthompson Sep 1, 2022

Choose a reason for hiding this comment

mattwthompson commented Sep 1, 2022

mattwthompson commented Sep 1, 2022

j-wags commented Sep 1, 2022

mattwthompson commented Sep 1, 2022

j-wags commented Sep 1, 2022

codecov bot commented Aug 31, 2022 •

edited

Loading