Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 2656 nsgrid segfault #2665

Merged
merged 7 commits into from
Jun 6, 2020
Merged

Conversation

richardjgowers
Copy link
Member

Fixes #2656

Changes made in this Pull Request:

  • fixes issue with 0-sized capped distance causing a segfault
  • various performance improvements to FastNS (capped distances)

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@@ -1102,9 +1102,9 @@ def _nsgrid_capped_self(reference, max_cutoff, min_cutoff=None, box=None,
gridsearch = FastNS(max_cutoff, reference, box=box)
results = gridsearch.self_search()

pairs = results.get_pairs()[::2, :]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we used to double save pairs then slice them, so instead you can just not double save

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone depend on that behavior? This is the unit test change below?


cdef ns_int i, cellindex = -1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old data structure was a sort of 2 dimensional array, which required two passes through the data to first calculate the size to allocate, then another to fill this array. Instead a simple linked list (of fixed and smaller size) can be used

searchcoords_bbox = self.box.fast_put_atoms_in_bbox(searchcoords)
searchgrid = _NSGrid(searchcoords_bbox.shape[0], self.grid.cutoff, self.box, self.max_gridsize, force=True)
searchgrid.fill_grid(searchcoords_bbox)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second grid didn't need to get populated, just sized out for later calculations

Comment on lines 872 to 880
j = self.grid.cell_head[cellindex_probe]
while j != -1:
# find distance between search coords[i] and coords[j]
d2 = self.box.fast_distance2(&searchcoords_bbox[i, XX],
&self.coords_bbox[j, XX])
if d2 <= cutoff2:
results.add_neighbors(current_beadid, bid, d2)
npairs += 1
results.add_neighbors(i, j, d2)

j = self.grid.next_id[j]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now looping through a linked list of unknown size (-1 terminates)

@@ -231,3 +231,14 @@ def test_nsgrid_probe_close_to_box_boundary():
expected_dists = np.array([2.3689647], dtype=np.float64)
assert_equal(results.get_pairs(), expected_pairs)
assert_allclose(results.get_pair_distances(), expected_dists, rtol=1.e-6)


def test_zero_max_dist():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This previously used to segfault (see #2656)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comparison to the expected value of no pairs found?

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to supersede my PR to fix the same issue, you should probably reference it: #2657

Some initial comments:

  1. Why combine performance improvements with a bug fix in the same PR?
  2. Why not include the unit tests Lily and I wrote in BUG: fix segfault with 0.0 around sels #2657 for around 0.0 selections?

@codecov
Copy link

codecov bot commented Apr 20, 2020

Codecov Report

Merging #2665 into develop will decrease coverage by 0.06%.
The diff coverage is 90.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2665      +/-   ##
===========================================
- Coverage    91.22%   91.15%   -0.07%     
===========================================
  Files          176      159      -17     
  Lines        24033    21936    -2097     
  Branches      3140     3175      +35     
===========================================
- Hits         21923    19996    -1927     
+ Misses        1488     1327     -161     
+ Partials       622      613       -9     
Impacted Files Coverage Δ
package/MDAnalysis/lib/nsgrid.pyx 83.72% <90.00%> (+0.05%) ⬆️
package/MDAnalysis/lib/mdamath.py 95.55% <0.00%> (-4.45%) ⬇️
package/MDAnalysis/analysis/waterdynamics.py 90.51% <0.00%> (-3.43%) ⬇️
package/MDAnalysis/coordinates/GRO.py 93.46% <0.00%> (-1.97%) ⬇️
package/MDAnalysis/analysis/pca.py 95.29% <0.00%> (-1.40%) ⬇️
package/MDAnalysis/coordinates/MOL2.py 94.11% <0.00%> (-0.24%) ⬇️
package/MDAnalysis/coordinates/PDB.py 90.17% <0.00%> (-0.21%) ⬇️
package/MDAnalysis/analysis/dihedrals.py 96.52% <0.00%> (-0.20%) ⬇️
package/MDAnalysis/auxiliary/base.py 88.92% <0.00%> (-0.15%) ⬇️
package/MDAnalysis/analysis/base.py 100.00% <0.00%> (ø)
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d7d77f...62c7305. Read the comment docs.

@@ -213,7 +213,7 @@ def test_nsgrid_selfsearch(box, result):
searcher = nsgrid.FastNS(cutoff, points, box=box)
searchresults = searcher.self_search()
pairs = searchresults.get_pairs()
assert_equal(len(pairs)//2, result)
assert_equal(len(pairs), result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explanation for changing the result of an old unit test?

@tylerjereddy
Copy link
Member

I think the performance improvements need to be split out to a separate PR with an indication of the benchmarks reflecting the performance change.

None of the comments here currently clearly explain which change actually fixed the issue, which is detracting from the review process.

@richardjgowers richardjgowers force-pushed the issue-2656-nsgrid_segfault branch 2 times, most recently from ab0cf0e to 1b7d43f Compare April 21, 2020 08:18
Comment on lines 254 to 261
def test_around_superposed_small_res(u_pbc_triclinic):
ag = u_pbc_triclinic.select_atoms('around 0.0 resid 10')
assert len(ag) == 0


def test_around_superposed_large_res(u_pbc_triclinic):
ag = u_pbc_triclinic.select_atoms('around 0.0 resid 3')
assert len(ag) == 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are a little different to Lily's tests, I think the correct answer is 0 atoms, because an Around selection won't select the thing it's around-ing (https://www.mdanalysis.org/docs/documentation_pages/selections.html#geometric)

I think the small box (0.001) is causing an infinite loop here but I'm looking into it....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split out the tests by residue size because the smaller one uses the brute-force method and the larger one uses the nsgrid method. bruteforce never seg-faulted so I'm not sure that test_around_small_res is needed, and it would be nice to save time on tests given #2671.

To nitpick the test_around_superposed_large_res name -- the larger box means there are no more atoms in exactly the same spot anymore, so the superposed isn't really accurate anymore. Also, it would be helpful for future readers to rename or comment the test to make clear which method of capped_distance is being tested here.

if not force:
# Calculate best cutoff, with 0.01A minimum
cutoff = max(cutoff, 0.01)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix here

@lilyminium
Copy link
Member

lilyminium commented Apr 21, 2020

I'm still getting a segmentation fault here and I think I'm working off your branch. 😕 #2657 with pkdtree successfully returns an AtomGroup with 48 atoms, so maybe the original resid tests were wrong or disliked the small box as well.

Edit: sorry, that was wrong, sick brain. Haven't looked at your code but it's nice that the below finds the mirrored atoms! 🎉 Thanks!

>>> import MDAnalysis as mda
>>> from MDAnalysis.lib import distances
>>> import numpy as np
>>> u = mda.Universe.empty(60, trajectory=True)
>>> xyz = np.zeros((60, 3))
>>> x = np.tile(np.arange(12), (5,))+np.repeat(np.arange(5)*100, 12)
>>> x  # 5 images of 12 atoms
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11, 100,
       101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 200, 201,
       202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 300, 301, 302,
       303, 304, 305, 306, 307, 308, 309, 310, 311, 400, 401, 402, 403,
       404, 405, 406, 407, 408, 409, 410, 411])
>>> xyz[:, 0] = x  # y and z are 0
>>> u.load_new(xyz)
<Universe with 60 atoms>
>>> u.dimensions = [100, 100, 100, 60, 60, 60]
>>> dist = distances.distance_array(u.atoms[:12].positions,
...                                 u.atoms[12:].positions,
...                                 box=u.dimensions)
>>> np.count_nonzero(np.any(dist <= 0.0, axis=0))
48
>>> u.select_atoms('around 0.0 index 0:11')
<AtomGroup with 48 atoms>

@tylerjereddy
Copy link
Member

The patch looks more focused here now, I'll close my other PR and let you and Lily wrap this up then.

@tylerjereddy tylerjereddy dismissed their stale review April 21, 2020 14:35

Performance changes have been separate out & Lily is checking Richard's unit tests

@richardjgowers
Copy link
Member Author

@lilyminium your original test:

box = np.array([boxsize, boxsize, boxsize, 60., 60., 60], dtype=np.float32)

u = mda.Universe(PDB)
u.dimensions = box

u.select_atoms('around 0.0 resid 3')

still causes the interpreter to hang. And it doesn't like being interrupted so I think it's something in the C/Cython layer. I'm going to try and fix that in this PR too as it's related then hopefully this is good to go.

@richardjgowers
Copy link
Member Author

@lilyminium ok I made a new issue for the tiny box - #2670 it seems separate from the zero sized box issue. I think this PR is finished for fixing #2656 and can be reviewed

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the test in #2665 (comment) would be beneficial as it is easily interpreted and all the other tests find 0 atoms overlapping.

Otherwise LGTM but I'm not very familiar with Cython or the lib module.

@@ -231,3 +231,14 @@ def test_nsgrid_probe_close_to_box_boundary():
expected_dists = np.array([2.3689647], dtype=np.float64)
assert_equal(results.get_pairs(), expected_pairs)
assert_allclose(results.get_pair_distances(), expected_dists, rtol=1.e-6)


def test_zero_max_dist():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comparison to the expected value of no pairs found?

Comment on lines 254 to 261
def test_around_superposed_small_res(u_pbc_triclinic):
ag = u_pbc_triclinic.select_atoms('around 0.0 resid 10')
assert len(ag) == 0


def test_around_superposed_large_res(u_pbc_triclinic):
ag = u_pbc_triclinic.select_atoms('around 0.0 resid 3')
assert len(ag) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split out the tests by residue size because the smaller one uses the brute-force method and the larger one uses the nsgrid method. bruteforce never seg-faulted so I'm not sure that test_around_small_res is needed, and it would be nice to save time on tests given #2671.

To nitpick the test_around_superposed_large_res name -- the larger box means there are no more atoms in exactly the same spot anymore, so the superposed isn't really accurate anymore. Also, it would be helpful for future readers to rename or comment the test to make clear which method of capped_distance is being tested here.

@lilyminium lilyminium mentioned this pull request Jun 5, 2020
6 tasks
@richardjgowers richardjgowers force-pushed the issue-2656-nsgrid_segfault branch from 1b66dd8 to 96a074d Compare June 6, 2020 11:34
@richardjgowers
Copy link
Member Author

Ok @lilyminium I think I've addressed comments

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming Travis is green, LGTM! Thank you!

package/CHANGELOG Outdated Show resolved Hide resolved
Co-authored-by: Lily Wang <[email protected]>
@richardjgowers richardjgowers merged commit abfd748 into develop Jun 6, 2020
@IAlibay IAlibay deleted the issue-2656-nsgrid_segfault branch May 29, 2022 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

select_atoms("around 0.0 protein") produces segmentation fault.
3 participants