-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better slicing via union_offsets
#776
Better slicing via union_offsets
#776
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is the solution to get the right offsets and coordinate arrays. Do you think there's a chance for optimization if the slices are simple contiguous range that can be represented by offset
and size
, the resulting child column slices can also be zero-copy?
@pytest.mark.parametrize("slice_index", [0, 1, 2, 3, 4]) | ||
def test_multipoint_multilinestring_sliced_many( | ||
multipoint_generator, multilinestring_generator, slice_twenty, slice_index | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make changes according to my suggestion of fixture above, you don't need slice_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reflects a pattern that I'm trying to build for some other test cases as well. There are a few tests where I use premade slicing patterns, then iterate over them to get a fair coverage of ways a column can be sliced. In those tests I just rewrite the same slice group a few different times inside of the test file, because you can't parameterize a feature in pytest. I realized while writing this that I could put the slices in conftest.py
as a fixture, then simply iterate over the fixture using a @pytest.mark.parametrize
over a range, as here. The goal is not fundamentally to iterate the segments of 0,4...4,8...
but to iterate over any predefined slicing range. Given our limited resources I didn't opt to refactor those tests yet, but I'd like to leave this, unless you know of a better way to iterate over a fixture in multiple tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is not fundamentally to iterate the segments of 0,4...4,8... but to iterate over any predefined slicing range.
If you need other slicing range, can't you just create other fixtures?
rerun tests |
@gpucibot merge |
Description
Closes #771
This PR modifies the production of
geometry_offset
,part_offset
, andring_offset
by sampling the existing values in aGeoSeries
before returning their various offsets. This has the effect of usingcudf.ListSeries
to re-pack any features into a new, denseGeoColumn
, then returning the offsets based on it.Previously,
GeoSeries
that had been modified by slicing would have the appearance of the sliced elements, but when_offset
buffers were used they would return the full original offset buffer that the slicedGeoSeries
had originated from. This was a problem because it made slicing useless for our algorithms.I also modify the
core/spatial/distance.py
andcore/spatial/nearest_points.py
files to useas_column(linestrings.lines.geometry_offset)
instead oflinestrings.lines.geometry_offset._column
because there doesn't appear to be a reason to use acudf.Series
to wrap the offset buffers. They are private methods essentially, don't need indexes, and will eventually be factored out so that they're hidden from the user.I wrote a new test file
test_geocolumn_accessor.py
to exercise the new {geometry_buffer
...} accessors for all geometry types.Finally I added tests for a base case, a more complicated case, and a case with noncontiugous slices to the inputs of
test_point_linestring_distance.py
, validating that the changes have exactly the effect we need.Checklist