Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase maximum tree depth #13

Closed
Huite opened this issue Apr 23, 2024 · 2 comments
Closed

Increase maximum tree depth #13

Huite opened this issue Apr 23, 2024 · 2 comments

Comments

@Huite
Copy link
Collaborator

Huite commented Apr 23, 2024

The current max depth is set to 32. I took this smallish value, but currently working on some smalll, but somewhat perverse data (an earcut triangulation from polygons), it creates out-of-bounds errors. Setting the depth to 128 fixes it.

@Huite
Copy link
Collaborator Author

Huite commented Jul 17, 2024

Just check the VTK CellTreeLocator which uses a C++ std::stack, which I guess is heap allocated.

Would be worthwhile to compare with a numba list, and do some benchmarking. The stack allocated stack might be a premature optimization on my part.

Generating a largish mesh, 2.6 million triangles:

import geopandas as gpd
import numpy as np
import shapely.geometry as sg

import pandamesh as pm

outer_coords = np.array([(0.0, 0.0), (10.0, 0.0), (10.0, 10.0), (0.0, 10.0)]) * 1e4
inner_coords = np.array([(3.0, 3.0), (7.0, 3.0), (7.0, 7.0), (3.0, 7.0)]) * 1e4
line_coords = np.array([(2.0, 8.0), (8.0, 2.0)]) * 1e4
inner = sg.LinearRing(inner_coords)
outer = sg.LinearRing(outer_coords)
line = sg.LineString(line_coords)
donut = sg.Polygon(outer, holes=[inner])

gdf = gpd.GeoDataFrame(data={"cellsize": [100.0]}, geometry=[donut])

mesher = pm.TriangleMesher(gdf)
vertices, triangles = mesher.generate()

vertices.astype(np.float64).tofile("vertices.bin")
triangles.astype(np.int64).tofile("triangles.bin")

Benchmarking:

import numpy as np
import numba_celltree

# %%

triangles = np.fromfile("c:/src/pandamesh/triangles.bin", dtype=np.int64).reshape((-1, 3))
vertices = np.fromfile("c:/src/pandamesh/vertices.bin", dtype=np.float64).reshape((-1, 2))
# %%

tree = numba_celltree.CellTree2d(vertices, triangles, -1)

# %%

xmin = vertices[:, 0].min()
ymin = vertices[:, 1].min()
xmax = vertices[:, 0].max()
ymax = vertices[:, 1].max()
points = np.random.rand(int(1e6), 2)
points[:, 0] = points[:, 0] * (xmax - xmin) + xmin 
points[:, 1] = points[:, 1] * (ymax - ymin) + ymin 
# %%

%timeit result = tree.locate_points(points)

Static stack-memory stack, size 32:

64.2 ms ± 1.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
63.3 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
64 ms ± 738 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
63.3 ms ± 738 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67 ms ± 877 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Static stack, size 128 gives basically the same result.

A list based stack (which allows us to just append and pop), is at least twice slower:

260 ms ± 81.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
312 ms ± 9.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
135 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
317 ms ± 2.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
127 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

No clue why the timings are so inconsistent.

A ordinary heap-memory stack, size 32:

68.9 ms ± 2.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.2 ms ± 701 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.1 ms ± 804 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
64.3 ms ± 2.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
65 ms ± 265 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Heap memory, size 128:

67.6 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.2 ms ± 438 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.7 ms ± 669 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
67.1 ms ± 406 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
64.8 ms ± 3.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It's still a pretty tiny allocation all things together; it does allocate once per query, since the queries are parallellized and we don't want to share pre-allocated memory.

I'll keep the stack-memory, and bump up the stack size for now.

@Huite Huite closed this as completed in ff34af1 Jul 17, 2024
@Huite
Copy link
Collaborator Author

Huite commented Jul 17, 2024

Alternatively, we could implement bounds-checking. Ideally, I guess it reallocates a larger stack.
That might have some performance ramifications, although the heap allocations above suggest that it will be negligible.. Alternatively, at least for somewhat normal meshes which result in a somewhat balanced tree, the current stack of 128 has such that that it should readily support billions of cells.

If someone runs into an issue in the future, they should enable bounds checking for numba. A resulting index error is then almost certainly due to the fixed stack size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant