Increase maximum tree depth #13

Huite · 2024-04-23T12:19:22Z

The current max depth is set to 32. I took this smallish value, but currently working on some smalll, but somewhat perverse data (an earcut triangulation from polygons), it creates out-of-bounds errors. Setting the depth to 128 fixes it.

Huite · 2024-07-17T10:24:53Z

Just check the VTK CellTreeLocator which uses a C++ std::stack, which I guess is heap allocated.

Would be worthwhile to compare with a numba list, and do some benchmarking. The stack allocated stack might be a premature optimization on my part.

Generating a largish mesh, 2.6 million triangles:

import geopandas as gpd
import numpy as np
import shapely.geometry as sg

import pandamesh as pm

outer_coords = np.array([(0.0, 0.0), (10.0, 0.0), (10.0, 10.0), (0.0, 10.0)]) * 1e4
inner_coords = np.array([(3.0, 3.0), (7.0, 3.0), (7.0, 7.0), (3.0, 7.0)]) * 1e4
line_coords = np.array([(2.0, 8.0), (8.0, 2.0)]) * 1e4
inner = sg.LinearRing(inner_coords)
outer = sg.LinearRing(outer_coords)
line = sg.LineString(line_coords)
donut = sg.Polygon(outer, holes=[inner])

gdf = gpd.GeoDataFrame(data={"cellsize": [100.0]}, geometry=[donut])

mesher = pm.TriangleMesher(gdf)
vertices, triangles = mesher.generate()

vertices.astype(np.float64).tofile("vertices.bin")
triangles.astype(np.int64).tofile("triangles.bin")

Benchmarking:

import numpy as np
import numba_celltree

# %%

triangles = np.fromfile("c:/src/pandamesh/triangles.bin", dtype=np.int64).reshape((-1, 3))
vertices = np.fromfile("c:/src/pandamesh/vertices.bin", dtype=np.float64).reshape((-1, 2))
# %%

tree = numba_celltree.CellTree2d(vertices, triangles, -1)

# %%

xmin = vertices[:, 0].min()
ymin = vertices[:, 1].min()
xmax = vertices[:, 0].max()
ymax = vertices[:, 1].max()
points = np.random.rand(int(1e6), 2)
points[:, 0] = points[:, 0] * (xmax - xmin) + xmin 
points[:, 1] = points[:, 1] * (ymax - ymin) + ymin 
# %%

%timeit result = tree.locate_points(points)

Static stack-memory stack, size 32:

64.2 ms ± 1.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
63.3 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
64 ms ± 738 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
63.3 ms ± 738 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67 ms ± 877 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Static stack, size 128 gives basically the same result.

A list based stack (which allows us to just append and pop), is at least twice slower:

260 ms ± 81.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
312 ms ± 9.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
135 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
317 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
127 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

No clue why the timings are so inconsistent.

A ordinary heap-memory stack, size 32:

68.9 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.2 ms ± 701 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.1 ms ± 804 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
64.3 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
65 ms ± 265 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Heap memory, size 128:

67.6 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.2 ms ± 438 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.7 ms ± 669 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.1 ms ± 406 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
64.8 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It's still a pretty tiny allocation all things together; it does allocate once per query, since the queries are parallellized and we don't want to share pre-allocated memory.

I'll keep the stack-memory, and bump up the stack size for now.

Huite · 2024-07-17T12:40:14Z

Alternatively, we could implement bounds-checking. Ideally, I guess it reallocates a larger stack.
That might have some performance ramifications, although the heap allocations above suggest that it will be negligible.. Alternatively, at least for somewhat normal meshes which result in a somewhat balanced tree, the current stack of 128 has such that that it should readily support billions of cells.

If someone runs into an issue in the future, they should enable bounds checking for numba. A resulting index error is then almost certainly due to the fixed stack size.

Huite mentioned this issue Apr 23, 2024

Add a method to compute overlaps between grid cells and polygons Deltares/xugrid#234

Open

Huite closed this as completed in ff34af1 Jul 17, 2024

Huite mentioned this issue Oct 3, 2024

Numba celltree crashes due to out-of-bounds array access of statically sized stack #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase maximum tree depth #13

Increase maximum tree depth #13

Huite commented Apr 23, 2024

Huite commented Jul 17, 2024

Huite commented Jul 17, 2024

Increase maximum tree depth #13

Increase maximum tree depth #13

Comments

Huite commented Apr 23, 2024

Huite commented Jul 17, 2024

Huite commented Jul 17, 2024