-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Refactor cuspatial::point-in-polygon() to header-only #569
Labels
Milestone
Comments
rapids-bot bot
pushed a commit
that referenced
this issue
Jul 27, 2022
## Overview This PR closes #569. This PR rewrites the core pip test with Eric Haine's "crossings-multiply" algorithm: http://erich.realtimerendering.com/ptinpoly/ And this gives 30%~70% kernel time reduction (please ignore `Status` column): ``` ['before_optim.json', 'optim_2.json'] # point_in_polygon_benchmark ## [0] Tesla V100-SXM2-32GB | CoordsType | NumTestPoints | NumSidesPerRing | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |--------------|-----------------|-------------------|------------|-------------|------------|-------------|---------------|---------|----------| | F32 | 1000 | 4 | 63.706 us | 6.09% | 36.483 us | 6.33% | -27.223 us | -42.73% | FAIL | | F32 | 100000 | 4 | 113.862 us | 0.90% | 55.468 us | 1.75% | -58.394 us | -51.29% | FAIL | | F32 | 10000000 | 4 | 9.167 ms | 0.02% | 3.785 ms | 0.14% | -5382.554 us | -58.71% | FAIL | | F32 | 1000 | 10 | 109.911 us | 1.04% | 55.120 us | 1.82% | -54.791 us | -49.85% | FAIL | | F32 | 100000 | 10 | 190.151 us | 0.57% | 81.772 us | 1.14% | -108.380 us | -57.00% | FAIL | | F32 | 10000000 | 10 | 16.934 ms | 0.17% | 5.987 ms | 0.06% | -10947.199 us | -64.65% | FAIL | | F32 | 1000 | 100 | 813.854 us | 0.14% | 314.469 us | 0.30% | -499.384 us | -61.36% | FAIL | | F32 | 100000 | 100 | 1.292 ms | 0.18% | 343.747 us | 0.37% | -948.724 us | -73.40% | FAIL | | F32 | 10000000 | 100 | 124.471 ms | 0.09% | 29.318 ms | 0.06% | -95152.593 us | -76.45% | FAIL | | F64 | 1000 | 4 | 73.915 us | 1.40% | 46.365 us | 2.23% | -27.550 us | -37.27% | FAIL | | F64 | 100000 | 4 | 149.814 us | 2.77% | 74.001 us | 1.42% | -75.813 us | -50.60% | FAIL | | F64 | 10000000 | 4 | 10.541 ms | 0.08% | 5.186 ms | 0.05% | -5354.802 us | -50.80% | FAIL | | F64 | 1000 | 10 | 133.100 us | 0.65% | 76.647 us | 1.25% | -56.453 us | -42.41% | FAIL | | F64 | 100000 | 10 | 277.337 us | 3.51% | 117.104 us | 0.89% | -160.233 us | -57.78% | FAIL | | F64 | 10000000 | 10 | 19.648 ms | 0.06% | 9.214 ms | 0.05% | -10434.298 us | -53.11% | FAIL | | F64 | 1000 | 100 | 1.017 ms | 0.19% | 485.950 us | 0.37% | -531.440 us | -52.24% | FAIL | | F64 | 100000 | 100 | 1.992 ms | 0.63% | 517.869 us | 0.25% | -1474.597 us | -74.01% | FAIL | | F64 | 10000000 | 100 | 137.110 ms | 0.08% | 40.617 ms | 0.07% | -96493.030 us | -70.38% | FAIL | # Summary - Total Matches: 18 - Pass (diff <= min_noise): 0 - Unknown (infinite noise): 0 - Failure (diff > min_noise): 18 ``` ## Potential impact introducing divergence and removal of division **EDIT:** there's no divergence: #587 (comment) @zhangjianting and @harrism mentions in different occasions of the changes in https://github.com/rapidsai/cuspatial/blob/e3a21341054ea5bf92896267800aed31a3493ee0/cpp/include/cuspatial/experimental/detail/point_in_polygon.cuh#L85-L88 may cause unwanted overhead due to divergence. The assumption to do so is that the cycles cost by division is greater than that of warp convergence. This is apparently dependent on many factors, such as compiling with/without `--fast-math` flag or how fast the hardware supports warp convergence. On a Tesla V100 the benchmark between divergence branch and division branch is as follows: ``` ['optim_1.json', 'optim_2.json'] # point_in_polygon_benchmark ## [0] Tesla V100-SXM2-32GB | CoordsType | NumTestPoints | NumSidesPerRing | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status | |--------------|-----------------|-------------------|------------|-------------|------------|-------------|--------------|---------|----------| | F32 | 1000 | 4 | 37.351 us | 6.39% | 36.483 us | 6.33% | -0.868 us | -2.32% | PASS | | F32 | 100000 | 4 | 56.265 us | 1.63% | 55.468 us | 1.75% | -0.797 us | -1.42% | PASS | | F32 | 10000000 | 4 | 3.858 ms | 0.02% | 3.785 ms | 0.14% | -72.843 us | -1.89% | FAIL | | F32 | 1000 | 10 | 56.490 us | 2.04% | 55.120 us | 1.82% | -1.370 us | -2.43% | FAIL | | F32 | 100000 | 10 | 85.581 us | 1.08% | 81.772 us | 1.14% | -3.810 us | -4.45% | FAIL | | F32 | 10000000 | 10 | 6.327 ms | 0.11% | 5.987 ms | 0.06% | -340.206 us | -5.38% | FAIL | | F32 | 1000 | 100 | 314.513 us | 0.41% | 314.469 us | 0.30% | -0.044 us | -0.01% | PASS | | F32 | 100000 | 100 | 347.750 us | 0.49% | 343.747 us | 0.37% | -4.003 us | -1.15% | FAIL | | F32 | 10000000 | 100 | 29.822 ms | 0.07% | 29.318 ms | 0.06% | -504.290 us | -1.69% | FAIL | | F64 | 1000 | 4 | 49.142 us | 1.83% | 46.365 us | 2.23% | -2.777 us | -5.65% | FAIL | | F64 | 100000 | 4 | 78.167 us | 1.23% | 74.001 us | 1.42% | -4.166 us | -5.33% | FAIL | | F64 | 10000000 | 4 | 5.641 ms | 0.05% | 5.186 ms | 0.05% | -454.642 us | -8.06% | FAIL | | F64 | 1000 | 10 | 80.904 us | 1.13% | 76.647 us | 1.25% | -4.257 us | -5.26% | FAIL | | F64 | 100000 | 10 | 125.702 us | 0.65% | 117.104 us | 0.89% | -8.598 us | -6.84% | FAIL | | F64 | 10000000 | 10 | 10.074 ms | 0.03% | 9.214 ms | 0.05% | -860.423 us | -8.54% | FAIL | | F64 | 1000 | 100 | 487.959 us | 0.40% | 485.950 us | 0.37% | -2.009 us | -0.41% | FAIL | | F64 | 100000 | 100 | 528.070 us | 0.35% | 517.869 us | 0.25% | -10.201 us | -1.93% | FAIL | | F64 | 10000000 | 100 | 44.663 ms | 0.18% | 40.617 ms | 0.07% | -4045.961 us | -9.06% | FAIL | ``` which gives visible speedups. This PR adopts the change based on this benchmark. ## Minor improvement and follow ups This PR also introduces benchmarking suites for `point_in_polygon` API. Follow ups to this PR includes: - Introduce a set of functions to do input validation. This can help alleviate API misuse of #492. - Improve test cases of this function. The function supports polygons of any sides, but test cases contains largely only quads. Authors: - Michael Wang (https://github.com/isVoid) Approvers: - H. Thomson Comer (https://github.com/thomcom) - Mark Harris (https://github.com/harrism) URL: #587
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Create header-only APIs for the following and refactor existing APIs / tests on top of the header-only API:
See https://github.com/rapidsai/cuspatial/blob/branch-22.08/cpp/include/cuspatial/point_in_polygon.hpp
The text was updated successfully, but these errors were encountered: