Issue 2656 nsgrid segfault #2665

richardjgowers · 2020-04-20T17:31:00Z

Changes made in this Pull Request:

fixes issue with 0-sized capped distance causing a segfault
various performance improvements to FastNS (capped distances)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

richardjgowers · 2020-04-20T17:31:40Z

package/MDAnalysis/lib/distances.py

@@ -1102,9 +1102,9 @@ def _nsgrid_capped_self(reference, max_cutoff, min_cutoff=None, box=None,
            gridsearch = FastNS(max_cutoff, reference, box=box)
            results = gridsearch.self_search()

-        pairs = results.get_pairs()[::2, :]


Here we used to double save pairs then slice them, so instead you can just not double save

Does anyone depend on that behavior? This is the unit test change below?

richardjgowers · 2020-04-20T17:36:11Z

package/MDAnalysis/lib/nsgrid.pyx


-        cdef ns_int i, cellindex = -1


The old data structure was a sort of 2 dimensional array, which required two passes through the data to first calculate the size to allocate, then another to fill this array. Instead a simple linked list (of fixed and smaller size) can be used

richardjgowers · 2020-04-20T17:36:39Z

package/MDAnalysis/lib/nsgrid.pyx

        searchcoords_bbox = self.box.fast_put_atoms_in_bbox(searchcoords)
-        searchgrid = _NSGrid(searchcoords_bbox.shape[0], self.grid.cutoff, self.box, self.max_gridsize, force=True)
-        searchgrid.fill_grid(searchcoords_bbox)


The second grid didn't need to get populated, just sized out for later calculations

richardjgowers · 2020-04-20T17:37:19Z

package/MDAnalysis/lib/nsgrid.pyx

+                            j = self.grid.cell_head[cellindex_probe]
+                            while j != -1:
+                                # find distance between search coords[i] and coords[j]
+                                d2 = self.box.fast_distance2(&searchcoords_bbox[i, XX],
+                                                             &self.coords_bbox[j, XX])
                                if d2 <= cutoff2:
-                                    results.add_neighbors(current_beadid, bid, d2)
-                                    npairs += 1
+                                    results.add_neighbors(i, j, d2)
+
+                                j = self.grid.next_id[j]


This is now looping through a linked list of unknown size (-1 terminates)

richardjgowers · 2020-04-20T17:37:44Z

testsuite/MDAnalysisTests/lib/test_nsgrid.py

@@ -231,3 +231,14 @@ def test_nsgrid_probe_close_to_box_boundary():
    expected_dists = np.array([2.3689647], dtype=np.float64)
    assert_equal(results.get_pairs(), expected_pairs)
    assert_allclose(results.get_pair_distances(), expected_dists, rtol=1.e-6)
+
+
+def test_zero_max_dist():


This previously used to segfault (see #2656)

Could you add a comparison to the expected value of no pairs found?

tylerjereddy

If you're going to supersede my PR to fix the same issue, you should probably reference it: #2657

Some initial comments:

Why combine performance improvements with a bug fix in the same PR?
Why not include the unit tests Lily and I wrote in BUG: fix segfault with 0.0 around sels #2657 for around 0.0 selections?

codecov · 2020-04-20T21:09:28Z

Codecov Report

Merging #2665 into develop will decrease coverage by 0.06%.
The diff coverage is 90.00%.

@@             Coverage Diff             @@
##           develop    #2665      +/-   ##
===========================================
- Coverage    91.22%   91.15%   -0.07%     
===========================================
  Files          176      159      -17     
  Lines        24033    21936    -2097     
  Branches      3140     3175      +35     
===========================================
- Hits         21923    19996    -1927     
+ Misses        1488     1327     -161     
+ Partials       622      613       -9

Impacted Files	Coverage Δ
package/MDAnalysis/lib/nsgrid.pyx	`83.72% <90.00%> (+0.05%)`	⬆️
package/MDAnalysis/lib/mdamath.py	`95.55% <0.00%> (-4.45%)`	⬇️
package/MDAnalysis/analysis/waterdynamics.py	`90.51% <0.00%> (-3.43%)`	⬇️
package/MDAnalysis/coordinates/GRO.py	`93.46% <0.00%> (-1.97%)`	⬇️
package/MDAnalysis/analysis/pca.py	`95.29% <0.00%> (-1.40%)`	⬇️
package/MDAnalysis/coordinates/MOL2.py	`94.11% <0.00%> (-0.24%)`	⬇️
package/MDAnalysis/coordinates/PDB.py	`90.17% <0.00%> (-0.21%)`	⬇️
package/MDAnalysis/analysis/dihedrals.py	`96.52% <0.00%> (-0.20%)`	⬇️
package/MDAnalysis/auxiliary/base.py	`88.92% <0.00%> (-0.15%)`	⬇️
package/MDAnalysis/analysis/base.py	`100.00% <0.00%> (ø)`
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d7d77f...62c7305. Read the comment docs.

tylerjereddy · 2020-04-21T01:37:24Z

testsuite/MDAnalysisTests/lib/test_nsgrid.py

@@ -213,7 +213,7 @@ def test_nsgrid_selfsearch(box, result):
        searcher = nsgrid.FastNS(cutoff, points, box=box)
        searchresults = searcher.self_search()
    pairs = searchresults.get_pairs()
-    assert_equal(len(pairs)//2, result)
+    assert_equal(len(pairs), result)


explanation for changing the result of an old unit test?

tylerjereddy · 2020-04-21T01:39:23Z

I think the performance improvements need to be split out to a separate PR with an indication of the benchmarks reflecting the performance change.

None of the comments here currently clearly explain which change actually fixed the issue, which is detracting from the review process.

richardjgowers · 2020-04-21T08:21:50Z

testsuite/MDAnalysisTests/lib/test_nsgrid.py

+def test_around_superposed_small_res(u_pbc_triclinic):
+    ag = u_pbc_triclinic.select_atoms('around 0.0 resid 10')
+    assert len(ag) == 0
+
+
+def test_around_superposed_large_res(u_pbc_triclinic):
+    ag = u_pbc_triclinic.select_atoms('around 0.0 resid 3')
+    assert len(ag) == 0


These are a little different to Lily's tests, I think the correct answer is 0 atoms, because an Around selection won't select the thing it's around-ing (https://www.mdanalysis.org/docs/documentation_pages/selections.html#geometric)

I think the small box (0.001) is causing an infinite loop here but I'm looking into it....

I split out the tests by residue size because the smaller one uses the brute-force method and the larger one uses the nsgrid method. bruteforce never seg-faulted so I'm not sure that test_around_small_res is needed, and it would be nice to save time on tests given #2671.

To nitpick the test_around_superposed_large_res name -- the larger box means there are no more atoms in exactly the same spot anymore, so the superposed isn't really accurate anymore. Also, it would be helpful for future readers to rename or comment the test to make clear which method of capped_distance is being tested here.

richardjgowers · 2020-04-21T08:22:07Z

package/MDAnalysis/lib/nsgrid.pyx

        if not force:
+            # Calculate best cutoff, with 0.01A minimum
+            cutoff = max(cutoff, 0.01)


This is the fix here

lilyminium · 2020-04-21T10:54:10Z

I'm still getting a segmentation fault here and I think I'm working off your branch. 😕 #2657 with pkdtree successfully returns an AtomGroup with 48 atoms, so maybe the original resid tests were wrong or disliked the small box as well.

Edit: sorry, that was wrong, sick brain. Haven't looked at your code but it's nice that the below finds the mirrored atoms! 🎉 Thanks!

>>> import MDAnalysis as mda
>>> from MDAnalysis.lib import distances
>>> import numpy as np
>>> u = mda.Universe.empty(60, trajectory=True)
>>> xyz = np.zeros((60, 3))
>>> x = np.tile(np.arange(12), (5,))+np.repeat(np.arange(5)*100, 12)
>>> x  # 5 images of 12 atoms
array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11, 100,
       101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 200, 201,
       202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 300, 301, 302,
       303, 304, 305, 306, 307, 308, 309, 310, 311, 400, 401, 402, 403,
       404, 405, 406, 407, 408, 409, 410, 411])
>>> xyz[:, 0] = x  # y and z are 0
>>> u.load_new(xyz)
<Universe with 60 atoms>
>>> u.dimensions = [100, 100, 100, 60, 60, 60]
>>> dist = distances.distance_array(u.atoms[:12].positions,
...                                 u.atoms[12:].positions,
...                                 box=u.dimensions)
>>> np.count_nonzero(np.any(dist <= 0.0, axis=0))
48
>>> u.select_atoms('around 0.0 index 0:11')
<AtomGroup with 48 atoms>

tylerjereddy · 2020-04-21T14:32:43Z

The patch looks more focused here now, I'll close my other PR and let you and Lily wrap this up then.

Performance changes have been separate out & Lily is checking Richard's unit tests

richardjgowers · 2020-04-22T09:09:53Z

@lilyminium your original test:

box = np.array([boxsize, boxsize, boxsize, 60., 60., 60], dtype=np.float32)

u = mda.Universe(PDB)
u.dimensions = box

u.select_atoms('around 0.0 resid 3')

still causes the interpreter to hang. And it doesn't like being interrupted so I think it's something in the C/Cython layer. I'm going to try and fix that in this PR too as it's related then hopefully this is good to go.

richardjgowers · 2020-04-23T09:04:08Z

@lilyminium ok I made a new issue for the tiny box - #2670 it seems separate from the zero sized box issue. I think this PR is finished for fixing #2656 and can be reviewed

lilyminium

I think adding the test in #2665 (comment) would be beneficial as it is easily interpreted and all the other tests find 0 atoms overlapping.

Otherwise LGTM but I'm not very familiar with Cython or the lib module.

lilyminium · 2020-04-24T23:31:05Z

testsuite/MDAnalysisTests/lib/test_nsgrid.py

@@ -231,3 +231,14 @@ def test_nsgrid_probe_close_to_box_boundary():
    expected_dists = np.array([2.3689647], dtype=np.float64)
    assert_equal(results.get_pairs(), expected_pairs)
    assert_allclose(results.get_pair_distances(), expected_dists, rtol=1.e-6)
+
+
+def test_zero_max_dist():


Could you add a comparison to the expected value of no pairs found?

lilyminium · 2020-04-24T23:35:32Z

testsuite/MDAnalysisTests/lib/test_nsgrid.py

+def test_around_superposed_small_res(u_pbc_triclinic):
+    ag = u_pbc_triclinic.select_atoms('around 0.0 resid 10')
+    assert len(ag) == 0
+
+
+def test_around_superposed_large_res(u_pbc_triclinic):
+    ag = u_pbc_triclinic.select_atoms('around 0.0 resid 3')
+    assert len(ag) == 0


I split out the tests by residue size because the smaller one uses the brute-force method and the larger one uses the nsgrid method. bruteforce never seg-faulted so I'm not sure that test_around_small_res is needed, and it would be nice to save time on tests given #2671.

To nitpick the test_around_superposed_large_res name -- the larger box means there are no more atoms in exactly the same spot anymore, so the superposed isn't really accurate anymore. Also, it would be helpful for future readers to rename or comment the test to make clear which method of capped_distance is being tested here.

segfaulting on zero box size

fixes Issue #2656

richardjgowers · 2020-06-06T11:35:55Z

Ok @lilyminium I think I've addressed comments

lilyminium

Assuming Travis is green, LGTM! Thank you!

package/CHANGELOG

Co-authored-by: Lily Wang <[email protected]>

richardjgowers commented Apr 20, 2020

View reviewed changes

tylerjereddy previously requested changes Apr 20, 2020

View reviewed changes

tylerjereddy added Component-lib defect labels Apr 20, 2020

tylerjereddy reviewed Apr 21, 2020

View reviewed changes

richardjgowers force-pushed the issue-2656-nsgrid_segfault branch 2 times, most recently from ab0cf0e to 1b7d43f Compare April 21, 2020 08:18

richardjgowers commented Apr 21, 2020

View reviewed changes

lilyminium requested changes Apr 24, 2020

View reviewed changes

lilyminium mentioned this pull request Jun 5, 2020

Release 1.0 #2443

Closed

6 tasks

richardjgowers and others added 5 commits June 6, 2020 12:34

added test for issue #2656

d5e0d13

segfaulting on zero box size

fixed division by zero in NSGrid

2e37dac

fixes Issue #2656

added more tests for issue-2656

c3b0009

update comment

e4417ef

fixed up tests from review

96a074d

richardjgowers force-pushed the issue-2656-nsgrid_segfault branch from 1b66dd8 to 96a074d Compare June 6, 2020 11:34

changelog for #2656

62c7305

lilyminium approved these changes Jun 6, 2020

View reviewed changes

package/CHANGELOG Outdated Show resolved Hide resolved

Update package/CHANGELOG

2853d1e

Co-authored-by: Lily Wang <[email protected]>

richardjgowers merged commit abfd748 into develop Jun 6, 2020

IAlibay deleted the issue-2656-nsgrid_segfault branch May 29, 2022 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 2656 nsgrid segfault #2665

Issue 2656 nsgrid segfault #2665

richardjgowers commented Apr 20, 2020

richardjgowers Apr 20, 2020

tylerjereddy Apr 21, 2020

richardjgowers Apr 20, 2020

richardjgowers Apr 20, 2020

richardjgowers Apr 20, 2020

richardjgowers Apr 20, 2020

lilyminium Apr 24, 2020

tylerjereddy left a comment

codecov bot commented Apr 20, 2020 •

edited

Loading

tylerjereddy Apr 21, 2020

tylerjereddy commented Apr 21, 2020

richardjgowers Apr 21, 2020

lilyminium Apr 24, 2020

richardjgowers Apr 21, 2020

lilyminium commented Apr 21, 2020 •

edited

Loading

tylerjereddy commented Apr 21, 2020

richardjgowers commented Apr 22, 2020

richardjgowers commented Apr 23, 2020

lilyminium left a comment

lilyminium Apr 24, 2020

lilyminium Apr 24, 2020

richardjgowers commented Jun 6, 2020

lilyminium left a comment

Issue 2656 nsgrid segfault #2665

Issue 2656 nsgrid segfault #2665

Conversation

richardjgowers commented Apr 20, 2020

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerjereddy left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 20, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

tylerjereddy commented Apr 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilyminium commented Apr 21, 2020 • edited Loading

tylerjereddy commented Apr 21, 2020

richardjgowers commented Apr 22, 2020

richardjgowers commented Apr 23, 2020

lilyminium left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Jun 6, 2020

lilyminium left a comment

Choose a reason for hiding this comment

codecov bot commented Apr 20, 2020 •

edited

Loading

lilyminium commented Apr 21, 2020 •

edited

Loading