-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header-only Refactor of point_in_polygon
#587
Header-only Refactor of point_in_polygon
#587
Conversation
…into refactor/pip_header_only
|
||
if (y_in_bounds && test_point.x < (run / rise) * rise_to_point + a.x) { | ||
point_is_within = not point_is_within; | ||
for (auto point_idx = ring_begin; point_idx < ring_end; point_idx++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to know how much of the speedup came just from eliminating the division vs. all the other changes. Because there is potential for more thread divergence in the new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will include the comparison data of division in the PR description. I will also include a comment stating the assumption we have here is that the cycles it costs on division is greater than that of warp syncs. It's apparently dependent on many factors such as whether user compiles the code with fast-math
, or how fast the hardware synchronizes divergence.
That said, I don't think it's appropriate to mention benchmark results in the comments as it's hardware dependent and will get stale very quickly.
struct PointInPolygonUnsupportedChronoTypesTest : public BaseFixture { | ||
}; | ||
|
||
TYPED_TEST_CASE(PointInPolygonUnsupportedChronoTypesTest, ChronoTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember why we have unsupported types tests specifically for Chrono types but not for other non-numerical types (e.g. nested types). Is it necessary to test these at all? Every type we test increases compile time, so I wonder if we can just test one unsupported type as a proxy for all the others?
Don't necessarily need to decide this in the present PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization!
Co-authored-by: Mark Harris <[email protected]>
@gpucibot merge |
Overview
This PR closes #569. This PR rewrites the core pip test with Eric Haine's "crossings-multiply" algorithm: http://erich.realtimerendering.com/ptinpoly/
And this gives 30%~70% kernel time reduction (please ignore
Status
column):Potential impact introducing divergence and removal of division
EDIT: there's no divergence: #587 (comment)
@zhangjianting and @harrism mentions in different occasions of the changes in
cuspatial/cpp/include/cuspatial/experimental/detail/point_in_polygon.cuh
Lines 85 to 88 in e3a2134
--fast-math
flag or how fast the hardware supports warp convergence. On a Tesla V100 the benchmark between divergence branch and division branch is as follows:which gives visible speedups. This PR adopts the change based on this benchmark.
Minor improvement and follow ups
This PR also introduces benchmarking suites for
point_in_polygon
API.Follow ups to this PR includes: