Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fix build with latest RAFT branch-21.08 #4012

Merged
merged 10 commits into from
Jun 29, 2021

Conversation

trxcllnt
Copy link
Collaborator

These things are breaking cuML builds with the latest raft branch-21.08. Blocked until rapidsai/raft#279 is resolved.

@trxcllnt trxcllnt requested review from a team as code owners June 24, 2021 01:41
@trxcllnt trxcllnt changed the title Update to latest RAFT changes [FIX] Fix build with latest RAFT branch-21.08 Jun 24, 2021
cpp/src/tsne/distances.cuh Outdated Show resolved Hide resolved
@ajschmidt8 ajschmidt8 added non-breaking Non-breaking change bug Something isn't working labels Jun 24, 2021
Copy link
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake changes LGTM

@dantegd
Copy link
Member

dantegd commented Jun 24, 2021

@trxcllnt: CI will fail without the (really small) change needed for numpy 1.21: #4008, could you apply the fix here (I'll go ahead and close 4008)

@trxcllnt trxcllnt requested a review from a team as a code owner June 24, 2021 16:58
@trxcllnt
Copy link
Collaborator Author

@dantegd sure thing, pulled

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jun 24, 2021
@cjnolet
Copy link
Member

cjnolet commented Jun 24, 2021

@trxcllnt in the past we have locked cuml into a specific raft commit hash so that updates to raft don't break cuml. It looks like the pinned tag is being set automatically in cuml now which means changes to raft are now strongly coupled to cuml. Do you think there's a way we could decouple these two and sync them back at release time? This would allow us to make updates to raft and update the cuml side intentionally (without breaking everyone else).

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I would like to update the sparse bfknn API on the raft side to make the handle const. It's a super simple change and I can file a PR for it if you prefer.

handle.get_cusparse_handle(), handle.get_device_allocator(), stream,
ML::Sparse::DEFAULT_BATCH_SIZE, ML::Sparse::DEFAULT_BATCH_SIZE,
DEFAULT_DISTANCE_METRIC);
const_cast<raft::handle_t &>(handle), ML::Sparse::DEFAULT_BATCH_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy. I don't know how I missed this one- that should be a const on the raft side for sure.

handle.get_cusparse_handle(), d_alloc, stream,
ML::Sparse::DEFAULT_BATCH_SIZE, ML::Sparse::DEFAULT_BATCH_SIZE,
raft::distance::DistanceType::L2Expanded);
const_cast<raft::handle_t &>(handle), ML::Sparse::DEFAULT_BATCH_SIZE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one as well.

@cjnolet
Copy link
Member

cjnolet commented Jun 24, 2021

@trxcllnt I opened rapidsai/raft#280 to add the const qualifier to the raft handle in the sparse bfknn

@trxcllnt
Copy link
Collaborator Author

@cjnolet cuML and cuGraph using separate commits isn't compatible with using a shared build cache via CMake's FETCHCONTENT_BASE_DIR. We're hoping to use this strategy to speed up local and CI builds soon (and have been using it in node-rapids), since all the duplicated dependencies can be built once in a common location and reused for the next project that declares them.

For example, cuML and cuGraph both declare a dependency on faiss. It's built as a byproduct of cuML, and cuGraph can find it already built in the common build location. The common build dir path can also be stable, enabling new clean-builds of faiss to be sped up via ccache (or sccache).

This relies on the two using the same version of any common dependencies, since FETCHCONTENT_BASE_DIR can only be set globally, not for individual dependencies. It's an all-or-nothing approach unfortunately.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cjnolet
Copy link
Member

cjnolet commented Jun 25, 2021

rerun tests

@dantegd
Copy link
Member

dantegd commented Jun 25, 2021

Argh it seems I missed this instance of the abs NumPy call that requires an update:

X_dist = np.abs(Xc - Xc.T, dtype=datatype)

@trxcllnt
Copy link
Collaborator Author

rerun tests

@cjnolet
Copy link
Member

cjnolet commented Jun 25, 2021

@gpucibot merge

@levsnv
Copy link
Contributor

levsnv commented Jun 26, 2021

same failures:

FAILED cuml/test/dask/test_dbscan.py::test_dbscan_precomputed[int32-500-float32-1]
FAILED cuml/test/dask/test_dbscan.py::test_dbscan_precomputed[int64-500-float32-1]

rerun tests

@cjnolet
Copy link
Member

cjnolet commented Jun 26, 2021

@levsnv i think the CI wasn't running the most recent code because the trace in the test failure is no longer in @trxcllnt's branch. Wondering if there was a hiccup in CI and it didn't properly pre-empt a previous execution from "rerun tests"

@cjnolet
Copy link
Member

cjnolet commented Jun 26, 2021

@levsnv @trxcllnt I see what happened now. The same changes to the precomputed test also need to be made to the Dask test. It should be copy/paste of those 2 updates to the non-dask test.

@ajschmidt8
Copy link
Member

hidden in the log output is the following error. I'm experiencing this locally when trying to build cuml as well.

image

@trxcllnt
Copy link
Collaborator Author

rerun tests

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@ac37465). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #4012   +/-   ##
===============================================
  Coverage                ?   85.44%           
===============================================
  Files                   ?      230           
  Lines                   ?    18088           
  Branches                ?        0           
===============================================
  Hits                    ?    15455           
  Misses                  ?     2633           
  Partials                ?        0           
Flag Coverage Δ
dask 48.04% <0.00%> (?)
non-dask 77.79% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac37465...1a760ed. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Jun 29, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f71d369 into rapidsai:branch-21.08 Jun 29, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
These things are breaking cuML builds with the latest raft `branch-21.08`. Blocked until rapidsai/raft#279 is resolved.

Authors:
  - Paul Taylor (https://github.com/trxcllnt)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Dante Gama Dessavre (https://github.com/dantegd)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CMake CUDA/C++ Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants