Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select dihedrals faster #2706

Merged
merged 4 commits into from
Jun 3, 2020
Merged

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Jun 1, 2020

Fixes (part of) #2671

Changes made in this Pull Request:

  • made phi/psi_selection longer, but faster with manual boolean array matching of atom names
  • made Ramachandran really long and ugly, but really faster with manual boolean array matching to atom names
  • no longer hardcodes phi/psi selection names in Ramachandran

The original Ramachandran.__init__ with the GRO/XTC files was 180 s on my laptop. When I changed only the phi_selection code (not psi_selection), it dropped to 122 s. When I changed the Ramachandran code, it dropped to 0.25 s. Most of that is selecting the protein.

Screenshot 2020-06-01 at 9 23 10 PM

How much testing does this need? lots

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

To do

  • update omega_selection and chi1_selection too
  • add c_name, n_name etc. as kwargs to those, it's annoying that the names are hard-coded in (edit: will remove pending disapproval from another dev)
  • add the group selection methods to ResidueGroup
  • update Ramachandran.__init__ to use the group selection methods

Edit: Ramachandran is no longer in the slowest 50 tests.

========================== slowest 50 test durations ===========================

39.76s call     MDAnalysisTests/analysis/test_density.py::Test_density_from_Universe::test_notwithin

31.18s call     MDAnalysisTests/analysis/test_density.py::TestNotWithin::test_within

30.83s call     MDAnalysisTests/analysis/test_density.py::TestNotWithin::test_not_within

29.70s call     MDAnalysisTests/analysis/test_density.py::TestNotWithin::test_has_DeprecationWarning

29.58s call     MDAnalysisTests/analysis/test_hbonds.py::TestHydrogenBondAnalysis::test_true_traj

29.21s call     MDAnalysisTests/analysis/test_hbonds.py::TestHydrogenBondAnalysisPBC::test_true_traj

29.15s call     MDAnalysisTests/analysis/test_hbonds.py::TestHydrogenBondAnalysisHeuristic::test_true_traj

29.11s call     MDAnalysisTests/analysis/test_hbonds.py::TestHydrogenBondAnalysisHeavy::test_true_traj

29.11s call     MDAnalysisTests/analysis/test_hbonds.py::TestHydrogenBondAnalysisHeavyFail::test_true_traj

21.78s call     MDAnalysisTests/visualization/test_streamlines.py::test_streamplot_2D

14.44s call     MDAnalysisTests/analysis/test_hole2.py::TestHoleAnalysis::test_context_manager

12.24s call     MDAnalysisTests/coordinates/test_pdb.py::test_conect_bonds_all

10.71s call     MDAnalysisTests/analysis/test_density.py::Test_density_from_Universe::test_update_selection

9.67s call     MDAnalysisTests/analysis/test_density.py::TestDensityAnalysis::test_run[dynamic]

8.56s call     MDAnalysisTests/coordinates/test_gro.py::TestGROLargeWriter::test_writer_large

8.44s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis

8.34s call     MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None

7.49s call     MDAnalysisTests/analysis/test_encore.py::TestEncore::test_ces_error_estimation_ensemble_bootstrap

@richardjgowers
Copy link
Member

richardjgowers commented Jun 1, 2020 via email

@lilyminium
Copy link
Member Author

@richardjgowers that's a great idea, thanks! While we're looking at phi/psi_selection, though, and given the recent discussion on residx vs resids, should we be using segid and resid to select atoms? (I kept this logic in the manual matching for now). I can see the importance of resid when the protein is missing a chunk in the middle, but how should segid be treated? If multiple segments are called the same thing, are they supposed to be treated as the same segment?

@richardjgowers
Copy link
Member

richardjgowers commented Jun 1, 2020 via email

@lilyminium lilyminium marked this pull request as draft June 1, 2020 12:05
@codecov
Copy link

codecov bot commented Jun 1, 2020

Codecov Report

Merging #2706 into develop will decrease coverage by 0.03%.
The diff coverage is 91.21%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2706      +/-   ##
===========================================
- Coverage    91.19%   91.16%   -0.04%     
===========================================
  Files          174      174              
  Lines        23700    23881     +181     
  Branches      3099     3120      +21     
===========================================
+ Hits         21614    21771     +157     
- Misses        1472     1488      +16     
- Partials       614      622       +8     
Impacted Files Coverage Δ
package/MDAnalysis/core/topologyattrs.py 95.37% <89.55%> (-2.12%) ⬇️
package/MDAnalysis/analysis/dihedrals.py 96.72% <100.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0911ff3...c1827e7. Read the comment docs.

@richardjgowers richardjgowers self-assigned this Jun 1, 2020
@lilyminium
Copy link
Member Author

lilyminium commented Jun 1, 2020

Ramachandran.__init__ is now ~25 lines shorter but takes 0.32 s on my laptop :-( I'll let other people decide if it's worth the slowdown.

Edit: closer to 15 lines, actually.

@lilyminium lilyminium marked this pull request as ready for review June 1, 2020 16:01
"""Parameters
----------
atomgroup : AtomGroup or ResidueGroup
atoms for residues for which :math:`\phi` and :math:`\psi` are
calculated
c_name: str (optional)
name for the backbone C atom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify defaults for these in the docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe add a line showing how they can be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember discussing this with @orbeckst and coming to the agreement that defaults are in the function signature anyway, so it's easier for docs not to diverge if they're not specified in the docstring. Happy to go either way though


.. versionadded:: 1.0.0
"""
u = residues[0].universe
nxres = rview = np.array([None]*len(residues), dtype=object)
nxres = np.array([None]*len(residues), dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, seems like a candidate for atomgroupgroup #1861

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'll let someone else have a look before I merge it

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok test times look sub 40mins now, thanks @lilyminium !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants