-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix runtime issues in toolkit showcase #1391
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
I suspect the trajectory file not existing could be due to the simulation not reaching 100 steps in 1 minute of walltime |
Timings on my machine (M1 Pro):
|
Here are timings from a recent run (in CI):
Cell 15 is the one that does the energy minimization, Cell 12 calls |
NB_ARGS: -v --nbval-lax --ignore=examples/deprecated | ||
NB_ARGS: -v --nbval-lax --ignore=examples/deprecated --durations=20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ pytest --help | grep durations
--durations=N show N slowest setup/test durations (N=0 for all).
--durations-min=N Minimal duration in seconds for inclusion in slowest
We were able to isolate two possible issues
|
python -m pytest $NB_ARGS examples | ||
python -m pytest $PYTEST_ARGS $NB_ARGS examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding $PYTEST_ARGS
mostly serves the role of counting examples for code coverage. If that's something we want to do.
Timings are much better now:
|
OK, these changes are great. I initially didn't love the change to the water box padding, but I'm convinced its essential to get the runtime down. I've added a little warning so no one misses it when adapting their own work flows. I'm going to have a look for alternative protein-ligand systems in benchmarks that are smaller/more spherical to see if we can bring the time down further. |
Okay Galectin from the protein ligand benchmark seems promising; it has some very funky ligands and comes up to about 31k atoms with a 1nm buffer, and 19k with a 0.5nm buffer. it's the smallest radius protein in either the protein-ligand benchmark set or the Merck FEP benchmark set. However, it's apparently being removed by PR 52 in that repo. I've gotta run somewhere now but I'll try and get galectin into this branch either later tonight or tomorrow morning. |
Wait no lol if you calculate the radius instead of a completely random and meaningless quantity there's an even smaller protein in the merck dataset. I should get more sleep. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old protein with 0.5nm buffer has 43 636 atoms.
New protein with 1.0nm buffer has 23 978 atoms. If we're still hungry for speed we can try making the buffer smaller again, it reduces the count down to 13 985 atoms.
Ligand is now extra funky. It has fluorines, a sulfone, an indane and a nitrile (two of those names I didn't have to look up!)
New timings (ubuntu-latest, Python 3.8, RDKit=false, OpenEye=true):
============================= slowest 20 durations =============================
129.19s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 12
115.29s call examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 2
107.08s call examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 14
103.11s call examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 2
81.60s call examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 15
67.76s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 15
65.03s call examples/forcefield_modification/forcefield_modification.ipynb::Cell 3
62.39s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 16
49.18s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 13
22.96s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 6
20.35s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 11
13.85s call examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 3
12.63s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 10
9.47s call examples/virtual_sites/vsite_showcase.ipynb::Cell 4
8.47s call examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 11
6.74s call examples/toolkit_showcase/toolkit_showcase.ipynb::Cell 7
6.68s call examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 9
5.02s call examples/using_smirnoff_with_amber_protein_forcefield/toluene_in_T4_lysozyme.ipynb::Cell 4
4.11s call examples/using_smirnoff_with_amber_protein_forcefield/BRD4_inhibitor_benchmark.ipynb::Cell 3
3.45s call examples/external/swap_amber_parameters/swap_existing_ligand_parameters.ipynb::Cell 16
=========== 109 passed, 3 skipped, 11 warnings in 968.21s (0:16:08) ============
@mattwthompson If you're happy with the state of this, it looks good to me. Thanks also for looking over the code with an eye to readability - using MDTraj's selection language, renaming mdt_traj
to trajectory
are both great improvements.
@@ -5,40 +5,49 @@ channels: | |||
dependencies: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the examples environment to more closely match the environments used in CI. @j-wags @mattwthompson At some point it would be good to talk about what we want to do with this environment - its the user-facing examples environment and at some point it stopped being the environment being used in CI, which was news to me today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only made that switch during this release, weird things were happening around updating an existing environment and which environment the shell had access to later on.
IIUC users aren't expected to install from an environment themselves, since we distribute an openff-toolkit-examples
package ourselves.
Thanks! I'll double-check the timings before merging but as long as things outside of our control aren't too slow, I think this is good. Making Just noting a few surprising coverage changes (each of which are probably the result of other notebooks' calls being included): Now included (and ideally included in unit tests):
Now excluded (surprisingly):
|
I'm pretty happy with this (I forget why I'm re-checking timings - in my defense this is the first comment I've written today with coffee brewed)
|
If in the future you didn't prefer notebooks also include their output, here's a nice tool to automatically strip everything out: https://github.com/kynan/nbstripout |
In most cases I like it, since the notebooks have meaningful output. But this one is just all nglview that won't render online anyway, so the cost/benefit of saving output is pretty high. |
No description provided.