[DISCUSSION] Remove dependency on libcudf? #474

harrism · 2022-01-11T02:15:18Z

We made a decision in the past to base the libcuspatial API on libcudf. Specifically, cuspatial:: APIs take cudf::column and cudf::table inputs. Internally, these APIs use cudf::type_dispatcher to dispatch on the underlying column types at run time. As in libcudf, this may make it easier to provide multiple language support (Python, JAVA). Type erasure also avoids having to template all public APIs on data type.

libcudf supports many types that cuSpatial algorithms currently don't need. Currently, libcuspatial only uses floating-point and integer types for most APIs. Only one module uses anything else, and that is the trajectory module which uses cudf::timestamp. Also, the concept of columns and tables may not be familiar to many users of cuSpatial, who may not be interested in using cuDF. We have had at least one potential user request a simpler API with just vectors of floats/doubles/ints. Also, some users may need Windows support, and libcudf currently doesn't support Windows and supporting it would be a huge lift for libcudf (whereas libcuspatial is currently a much smaller library).

Note also that currently libcudf limits column size to 2B elements (cudf::size_type is a 32-bit signed integer). The intent is that large columns would be partitioned at a higher level (e.g. Dask or Spark). But depending on the application, this limit may not be palatable to users of libcuspatial.

The purpose of this issue is to discuss the costs and benefits of converting libcuspatial to remove the dependency on libcudf.

Risks

Eliminating the tight coupling of libcuspatial to libcudf may make integration of the two libraries in workflows more difficult. Currently the cuSpatial Python API also relies heavily on cuDF Series and Dataframe types. It's possible to keep this tight integration at the Python level while removing the dependency on libcudf at the C++ level. It's even possible to provide a second Python API that does not have tight integration with cuDF. But I don't yet know what is most useful at the Python level, or what the costs are. We should discuss this.

Possible Approaches

I see three possible approaches.

Convert libcuspatial to a fixed-type library
Convert libcuspatial to a type-templated header library.
Implement custom type-erased vector types and type dispatcher for libcuspatial.

Without much analysis, Approach 3 seems like overkill to me. Specifically, current column parameters that represent indices always dispatch to integer types. Parameters that represent timestamps always dispatch timestamp types. And coordinate / position columns always dispatch floating point numbers (it's imaginable, but perhaps unlikely that users might like to dispatch fixed-point or integer coordinates, but I think that is a stretch). So my gut feeling is that Approaches 1 and 2 are preferred.

Currently any scalar parameters (e.g. the position of the origin) are provided using double precision. And (I believe) indices use 32-bit integers. So Approach 1 would be to use vectors of double for all current floating-point column inputs / outputs, and vectors of int32_t for all current index / integer column inputs. (And see below for timestamps). The trouble with this is that double precision is at least 2x as expensive as single precision, in terms of both storage and computation cost. On some GPUs, the computation cost of FP64 is even greater relative to FP32. Moreover, there are other floating point types (e.g. FP16) which may be desirable in some use cases.

Approach 2 adds more flexibility, by templating all APIs on the type of the inputs/outputs. This would enable supporting whatever floating-point precision is needed, and also using a larger index space via int64_t indices. The tradeoff is that the library would then be a header library, which has both benefits and costs.

Timestamps

For Approach 1 we would still need a timestamp type for libcuspatial, but libcudf's own timestamp is implemented in terms of std::chrono types, e.g.

template <class Duration>
using time_point = cuda::std::chrono::sys_time<Duration>;

template <class Duration>
using timestamp = time_point<Duration>;

libcu++ provides std::cuda::chrono which can be used for host-device support of these types.

Windows support

We have not attempted to build cuSpatial on Windows to date. cuSpatial depends on RMM, which currently doesn't support Windows. However, the only known reason that RMM doesn't support Windows is its use of a Myers singleton for the current device resource, which uses a function-local static. Apparently this is not guaranteed to be shared across DLLs on Windows since RMM is a header-only library. This could potentially be fixed by having a compiled binary component of RMM on Windows systems.

There may be other obstacles to Windows support, but eliminating the libcudf dependency is likely to greatly reduce the number and size of obstacles.

PS

I take full responsibility for leaning heavily on the original decision to depend on libcudf. A big part of the reason was to avoid duplicating functionality. But with new information I thought it important to discuss it again.

The text was updated successfully, but these errors were encountered:

cwharris · 2022-01-11T02:35:00Z

I'm for the removal of libcudf as a dependency of cuspatial, especially if cuspatial and cudf can be used side-by-side. cudf can track features, and cuspatial can number crunch. If we can perform a spatial search in cuspatial, and use the resulting indices to search a subset of features in a libcudf table, that might be the best of both worlds.

KirkDybvik · 2022-01-11T12:40:30Z

I would be interested in a reduced dependency cuspatial, and also in the ability to use it on Windows.

jrhemstad · 2022-01-11T14:38:44Z

The tradeoff is that the library would then be a header library, which has both benefits and costs.

I think there's an option 2.5 where the cuSpatial C++ library is a header-only template library (whether it's templated for iterators, ranges, span<T>, or mdspan<T> is a decision that would need to be made), but then the cuSpatial Python library can instantiate the versions of the template it needs and dispatch accordingly. In essence, it would move the type dispatch to the Python layer.

I don't think this would be too onerous as the number of instantiations would likely be much smaller than libcudf and there wouldn't need to be any specialization of the different instantiations like in libcudf.

Likewise, this would likely be appealing to anyone using cuSpatial purely in C++.

harrism · 2022-01-11T20:33:11Z

think there's an option 2.5 where the cuSpatial C++ library is a header-only template library, but then the cuSpatial Python library can instantiate the versions of the template it needs and dispatch accordingly.

This is exactly what I was thinking of as Approach 2. Any run-time-typed language would need to dispatch to the compile-time-typed C++ library.

jrhemstad · 2022-01-11T21:05:24Z

think there's an option 2.5 where the cuSpatial C++ library is a header-only template library, but then the cuSpatial Python library can instantiate the versions of the template it needs and dispatch accordingly.

This is exactly what I was thinking of as Approach 2. Any run-time-typed language would need to dispatch to the compile-time-typed C++ library.

That is definitely the approach I like the best.

harrism · 2022-01-12T00:27:01Z

@KirkDybvik can you provide any more information on your use case, and if possible any details of application or software in which you are (or want to) using cuSpatial? Also any organization / company details you are willing to share. Feel free to contact me privately (mharris at my company name dot com) or on the RAPIDS-GoAI Slack Channel.

KirkDybvik · 2022-01-12T03:31:30Z

@harrism - I will send you a private note with additional details. Overview is that we have some Windows applications that perform geospatial operations, and have a variety of GPUs available (Quadro and GeForce) that could be used. We also have some server side geospatial and image processing that could also be accelerated. Since these are currently running on Windows OS, I would like to see cuSpatial supported there as well. I would be interested in the C++ library only for these apps.

github-actions · 2022-02-11T05:03:09Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Following rapidsai/cudf#10589, this PR removes the dependency to `cudf::CUDA_TRY` and introduces `CUSPATIAL_CUDA_TRY`. Contributes to #474 Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Mark Harris (https://github.com/harrism) - H. Thomson Comer (https://github.com/thomcom) URL: #516

harrism · 2022-05-12T00:20:21Z

We are proceeding with adding a lower-level header-only API that is independent of libcudf. The existing cuDF-based API is a layer on top. So far we have merged one PR and have one under review and I have one that is in progress but not open yet. I created this milestone to track.

github-actions · 2022-06-11T02:38:33Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-09-09T02:57:08Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

harrism · 2023-03-07T21:37:04Z

The decision on this was to create the header-only API which contains all algorithm implementations and base the column-based API on that. The column-based API simply handles type dispatching and type-based error checking. Closing this issue now.

This is tracked by the header-only refactoring milestone

harrism added question Further information is requested cuDF Python Related to Python code libcuspatial Relates to the cuSpatial C++ library labels Jan 11, 2022

github-actions bot added the inactive-30d label Feb 11, 2022

harrism mentioned this issue Feb 15, 2022

[FEA] Make return value order of quadtree_point_to_nearest_polyline match quadtree_point_in_polygon #491

Open

harrism added this to cuSpatial Mar 17, 2022

harrism moved this to In Progress in cuSpatial Apr 4, 2022

isVoid mentioned this issue Apr 8, 2022

Replace CUDA_TRY with CUSPATIAL_CUDA_TRY #516

Merged

harrism added this to the header-only C++ API milestone May 12, 2022

github-actions bot removed the inactive-30d label May 12, 2022

github-actions bot added the inactive-30d label Jun 11, 2022

github-actions bot added the inactive-90d label Sep 9, 2022

jarmak-nv removed inactive-30d labels Nov 15, 2022

harrism self-assigned this Jan 23, 2023

harrism closed this as completed Mar 7, 2023

github-project-automation bot moved this from In Progress to Done in cuSpatial Mar 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] Remove dependency on libcudf? #474

[DISCUSSION] Remove dependency on libcudf? #474

harrism commented Jan 11, 2022 •

edited

Loading

cwharris commented Jan 11, 2022

KirkDybvik commented Jan 11, 2022

jrhemstad commented Jan 11, 2022 •

edited

Loading

harrism commented Jan 11, 2022

jrhemstad commented Jan 11, 2022

harrism commented Jan 12, 2022 •

edited

Loading

KirkDybvik commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

harrism commented May 12, 2022

github-actions bot commented Jun 11, 2022

github-actions bot commented Sep 9, 2022

harrism commented Mar 7, 2023 •

edited

Loading

[DISCUSSION] Remove dependency on libcudf? #474

[DISCUSSION] Remove dependency on libcudf? #474

Comments

harrism commented Jan 11, 2022 • edited Loading

Risks

Possible Approaches

Timestamps

Windows support

PS

cwharris commented Jan 11, 2022

KirkDybvik commented Jan 11, 2022

jrhemstad commented Jan 11, 2022 • edited Loading

harrism commented Jan 11, 2022

jrhemstad commented Jan 11, 2022

harrism commented Jan 12, 2022 • edited Loading

KirkDybvik commented Jan 12, 2022

github-actions bot commented Feb 11, 2022

harrism commented May 12, 2022

github-actions bot commented Jun 11, 2022

github-actions bot commented Sep 9, 2022

harrism commented Mar 7, 2023 • edited Loading

harrism commented Jan 11, 2022 •

edited

Loading

jrhemstad commented Jan 11, 2022 •

edited

Loading

harrism commented Jan 12, 2022 •

edited

Loading

harrism commented Mar 7, 2023 •

edited

Loading