Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Refactor cuspatial::point-in-polygon() to header-only #569

Closed
harrism opened this issue Jun 15, 2022 · 0 comments · Fixed by #587
Closed

[FEA] Refactor cuspatial::point-in-polygon() to header-only #569

harrism opened this issue Jun 15, 2022 · 0 comments · Fixed by #587
Assignees
Labels
feature request New feature or request libcuspatial Relates to the cuSpatial C++ library

Comments

@harrism
Copy link
Member

harrism commented Jun 15, 2022

Create header-only APIs for the following and refactor existing APIs / tests on top of the header-only API:

  • cuspatial::point_in_polygon()

See https://github.com/rapidsai/cuspatial/blob/branch-22.08/cpp/include/cuspatial/point_in_polygon.hpp

@harrism harrism added feature request New feature or request Needs Triage Need team to review and classify c++ labels Jun 15, 2022
@harrism harrism added this to the header-only C++ API milestone Jun 15, 2022
@isVoid isVoid self-assigned this Jul 6, 2022
@rapids-bot rapids-bot bot closed this as completed in #587 Jul 27, 2022
rapids-bot bot pushed a commit that referenced this issue Jul 27, 2022
## Overview

This PR closes #569. This PR rewrites the core pip test with Eric Haine's "crossings-multiply" algorithm: http://erich.realtimerendering.com/ptinpoly/

And this gives 30%~70% kernel time reduction (please ignore `Status` column):

```
['before_optim.json', 'optim_2.json']
# point_in_polygon_benchmark

## [0] Tesla V100-SXM2-32GB

|  CoordsType  |  NumTestPoints  |  NumSidesPerRing  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|--------------|-----------------|-------------------|------------|-------------|------------|-------------|---------------|---------|----------|
|     F32      |      1000       |         4         |  63.706 us |       6.09% |  36.483 us |       6.33% |    -27.223 us | -42.73% |   FAIL   |
|     F32      |     100000      |         4         | 113.862 us |       0.90% |  55.468 us |       1.75% |    -58.394 us | -51.29% |   FAIL   |
|     F32      |    10000000     |         4         |   9.167 ms |       0.02% |   3.785 ms |       0.14% |  -5382.554 us | -58.71% |   FAIL   |
|     F32      |      1000       |        10         | 109.911 us |       1.04% |  55.120 us |       1.82% |    -54.791 us | -49.85% |   FAIL   |
|     F32      |     100000      |        10         | 190.151 us |       0.57% |  81.772 us |       1.14% |   -108.380 us | -57.00% |   FAIL   |
|     F32      |    10000000     |        10         |  16.934 ms |       0.17% |   5.987 ms |       0.06% | -10947.199 us | -64.65% |   FAIL   |
|     F32      |      1000       |        100        | 813.854 us |       0.14% | 314.469 us |       0.30% |   -499.384 us | -61.36% |   FAIL   |
|     F32      |     100000      |        100        |   1.292 ms |       0.18% | 343.747 us |       0.37% |   -948.724 us | -73.40% |   FAIL   |
|     F32      |    10000000     |        100        | 124.471 ms |       0.09% |  29.318 ms |       0.06% | -95152.593 us | -76.45% |   FAIL   |
|     F64      |      1000       |         4         |  73.915 us |       1.40% |  46.365 us |       2.23% |    -27.550 us | -37.27% |   FAIL   |
|     F64      |     100000      |         4         | 149.814 us |       2.77% |  74.001 us |       1.42% |    -75.813 us | -50.60% |   FAIL   |
|     F64      |    10000000     |         4         |  10.541 ms |       0.08% |   5.186 ms |       0.05% |  -5354.802 us | -50.80% |   FAIL   |
|     F64      |      1000       |        10         | 133.100 us |       0.65% |  76.647 us |       1.25% |    -56.453 us | -42.41% |   FAIL   |
|     F64      |     100000      |        10         | 277.337 us |       3.51% | 117.104 us |       0.89% |   -160.233 us | -57.78% |   FAIL   |
|     F64      |    10000000     |        10         |  19.648 ms |       0.06% |   9.214 ms |       0.05% | -10434.298 us | -53.11% |   FAIL   |
|     F64      |      1000       |        100        |   1.017 ms |       0.19% | 485.950 us |       0.37% |   -531.440 us | -52.24% |   FAIL   |
|     F64      |     100000      |        100        |   1.992 ms |       0.63% | 517.869 us |       0.25% |  -1474.597 us | -74.01% |   FAIL   |
|     F64      |    10000000     |        100        | 137.110 ms |       0.08% |  40.617 ms |       0.07% | -96493.030 us | -70.38% |   FAIL   |

# Summary

- Total Matches: 18
  - Pass    (diff <= min_noise): 0
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  18
```

## Potential impact introducing divergence and removal of division
**EDIT:** there's no divergence: #587 (comment)


@zhangjianting and @harrism mentions in different occasions of the changes in https://github.com/rapidsai/cuspatial/blob/e3a21341054ea5bf92896267800aed31a3493ee0/cpp/include/cuspatial/experimental/detail/point_in_polygon.cuh#L85-L88 may cause unwanted overhead due to divergence. The assumption to do so is that the cycles cost by division is greater than that of warp convergence. This is apparently dependent on many factors, such as compiling with/without `--fast-math` flag or how fast the hardware supports warp convergence. On a Tesla V100 the benchmark between divergence branch and division branch is as follows:

```
['optim_1.json', 'optim_2.json']
# point_in_polygon_benchmark

## [0] Tesla V100-SXM2-32GB

|  CoordsType  |  NumTestPoints  |  NumSidesPerRing  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|--------------|-----------------|-------------------|------------|-------------|------------|-------------|--------------|---------|----------|
|     F32      |      1000       |         4         |  37.351 us |       6.39% |  36.483 us |       6.33% |    -0.868 us |  -2.32% |   PASS   |
|     F32      |     100000      |         4         |  56.265 us |       1.63% |  55.468 us |       1.75% |    -0.797 us |  -1.42% |   PASS   |
|     F32      |    10000000     |         4         |   3.858 ms |       0.02% |   3.785 ms |       0.14% |   -72.843 us |  -1.89% |   FAIL   |
|     F32      |      1000       |        10         |  56.490 us |       2.04% |  55.120 us |       1.82% |    -1.370 us |  -2.43% |   FAIL   |
|     F32      |     100000      |        10         |  85.581 us |       1.08% |  81.772 us |       1.14% |    -3.810 us |  -4.45% |   FAIL   |
|     F32      |    10000000     |        10         |   6.327 ms |       0.11% |   5.987 ms |       0.06% |  -340.206 us |  -5.38% |   FAIL   |
|     F32      |      1000       |        100        | 314.513 us |       0.41% | 314.469 us |       0.30% |    -0.044 us |  -0.01% |   PASS   |
|     F32      |     100000      |        100        | 347.750 us |       0.49% | 343.747 us |       0.37% |    -4.003 us |  -1.15% |   FAIL   |
|     F32      |    10000000     |        100        |  29.822 ms |       0.07% |  29.318 ms |       0.06% |  -504.290 us |  -1.69% |   FAIL   |
|     F64      |      1000       |         4         |  49.142 us |       1.83% |  46.365 us |       2.23% |    -2.777 us |  -5.65% |   FAIL   |
|     F64      |     100000      |         4         |  78.167 us |       1.23% |  74.001 us |       1.42% |    -4.166 us |  -5.33% |   FAIL   |
|     F64      |    10000000     |         4         |   5.641 ms |       0.05% |   5.186 ms |       0.05% |  -454.642 us |  -8.06% |   FAIL   |
|     F64      |      1000       |        10         |  80.904 us |       1.13% |  76.647 us |       1.25% |    -4.257 us |  -5.26% |   FAIL   |
|     F64      |     100000      |        10         | 125.702 us |       0.65% | 117.104 us |       0.89% |    -8.598 us |  -6.84% |   FAIL   |
|     F64      |    10000000     |        10         |  10.074 ms |       0.03% |   9.214 ms |       0.05% |  -860.423 us |  -8.54% |   FAIL   |
|     F64      |      1000       |        100        | 487.959 us |       0.40% | 485.950 us |       0.37% |    -2.009 us |  -0.41% |   FAIL   |
|     F64      |     100000      |        100        | 528.070 us |       0.35% | 517.869 us |       0.25% |   -10.201 us |  -1.93% |   FAIL   |
|     F64      |    10000000     |        100        |  44.663 ms |       0.18% |  40.617 ms |       0.07% | -4045.961 us |  -9.06% |   FAIL   |
```
which gives visible speedups. This PR adopts the change based on this benchmark.

## Minor improvement and follow ups

This PR also introduces benchmarking suites for `point_in_polygon` API.

Follow ups to this PR includes:
- Introduce a set of functions to do input validation. This can help alleviate API misuse of #492.
- Improve test cases of this function. The function supports polygons of any sides, but test cases contains largely only quads.

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - H. Thomson Comer (https://github.com/thomcom)
  - Mark Harris (https://github.com/harrism)

URL: #587
@rapids-bot rapids-bot bot moved this to Done in cuSpatial Jul 27, 2022
@harrism harrism added libcuspatial Relates to the cuSpatial C++ library and removed c++ labels Nov 16, 2022
@jarmak-nv jarmak-nv removed the Needs Triage Need team to review and classify label Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcuspatial Relates to the cuSpatial C++ library
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants