Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix building cugraph with CCCL main #4404

Merged
merged 43 commits into from
May 29, 2024

Conversation

trxcllnt
Copy link
Collaborator

@trxcllnt trxcllnt commented May 8, 2024

Similar to rapidsai/cudf#15552, we are testing building RAPIDS with CCCL's main branch to get ahead of any breaking changes.

@trxcllnt trxcllnt requested a review from a team as a code owner May 8, 2024 21:53
@trxcllnt trxcllnt added improvement Improvement / enhancement to an existing function DO NOT MERGE Hold off on merging; see PR for details non-breaking Non-breaking change labels May 8, 2024
@trxcllnt trxcllnt requested a review from a team as a code owner May 8, 2024 22:07
@github-actions github-actions bot added the CMake label May 8, 2024
@trxcllnt
Copy link
Collaborator Author

trxcllnt commented May 9, 2024

The new kv_store_t constructor is necessary because there are code paths that attempt to construct a thrust::tuple from a scalar (invalid_vertex_id<vertex_t>::value here).

While this is possible in CCCL 2.2.0's version of thrust::tuple, I checked with @jrhemstad it was definitely never intended to work. CCCL 2.4+ makes thrust::tuple an alias of cuda::std::tuple, so this behavior is now gone, and cuGraph needs to be updated.

I commented out the prims/mg_extract_transform_e.cu test for now, because the compiler resisted my attempts to rewrite this functor to be compatible with the new CCCL behavior.

The problem case is when the key_t template parameter is thrust::tuple<vertex_t, int32_t>, but the functor is called with the invalid value scalar -1.

Alternatively, I attempted to specialize the invalid_idx struct to return a thrust::tuple if the vertex_t template parameter is a tuple, but that seemed to break in other ways.

cc: @BradReesWork if you have any ideas or know the right person to help fix this.

@trxcllnt trxcllnt requested a review from a team as a code owner May 9, 2024 20:31
@trxcllnt
Copy link
Collaborator Author

trxcllnt commented May 9, 2024

The nvcc error tells us half of the issue:

mg_extract_transform_e.cu(210): error: no instance of function template "cugraph::extract_transform_e" matches the argument list
              argument types are: (raft::handle_t, cugraph::graph_view_t<int32_t, int32_t, false, true, void>, cugraph::detail::edge_major_property_view_t<int32_t, const int32_t *, int32_t>, cugraph::detail::edge_minor_property_view_t<int32_t, const int32_t *, int32_t>, cugraph::edge_dummy_property_view_t, e_op_t<thrust::tuple<int32_t, int32_t>, int32_t, int32_t, int32_t>)
          cugraph::extract_transform_e(*handle_,
          ^

The error from clang tells us the other half:

extract_transform_e.cuh(91, 1): Candidate template ignored: substitution failure [with
  GraphViewType = graph_view_t<int, int, false, true>,
  EdgeSrcValueInputWrapper = edge_major_property_view_t<int, const_value_iterator, int>,
  EdgeDstValueInputWrapper = edge_minor_property_view_t<int, const_iterator, int>,
  EdgeValueInputWrapper = edge_dummy_property_view_t,
  EdgeOp = e_op_t<key_t, int, result_t, int>
]: implicit instantiation of undefined template 'cugraph::detail::edge_op_result_type<int, int, int, int, thrust::nullopt_t, e_op_t<int, int, int, int>>'

Combining info from both errors, the issue is that when extract_transform_e instantiates detail::edge_op_result_type, it tries to get the return type of invoking e_opt_t with int, int, int, int.

But e_opt_t was declared with template parameters thrust::tuple<int32_t, int32_t>, int32_t, int32_t, int32_t, and since thrust tuples can no longer be default constructed from a scalar, the operator()() overload cannot be invoked with int as the first argument.

@ChuckHastings
Copy link
Collaborator

The new kv_store_t constructor is necessary because there are code paths that attempt to construct a thrust::tuple from a scalar (invalid_vertex_id<vertex_t>::value here).

While this is possible in CCCL 2.2.0's version of thrust::tuple, I checked with @jrhemstad it was definitely never intended to work. CCCL 2.4+ makes thrust::tuple an alias of cuda::std::tuple, so this behavior is now gone, and cuGraph needs to be updated.

I commented out the prims/mg_extract_transform_e.cu test for now, because the compiler resisted my attempts to rewrite this functor to be compatible with the new CCCL behavior.

The problem case is when the key_t template parameter is thrust::tuple<vertex_t, int32_t>, but the functor is called with the invalid value scalar -1.

Alternatively, I attempted to specialize the invalid_idx struct to return a thrust::tuple if the vertex_t template parameter is a tuple, but that seemed to break in other ways.

cc: @BradReesWork if you have any ideas or know the right person to help fix this.

@seunghwak is our expert in this area

@trxcllnt trxcllnt requested a review from seunghwak May 13, 2024 18:23
.github/workflows/pr.yaml Outdated Show resolved Hide resolved
@trxcllnt
Copy link
Collaborator Author

It looks like the main build is working, but the wheel builds are blocked by #4436?

@github-actions github-actions bot added the ci label May 24, 2024
ci/build_cpp.sh Outdated Show resolved Hide resolved
ci/build_wheel.sh Outdated Show resolved Hide resolved
@BradReesWork BradReesWork added this to the 24.06 milestone May 28, 2024
@github-actions github-actions bot removed the ci label May 28, 2024
Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@trxcllnt
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 4c797bf into rapidsai:branch-24.06 May 29, 2024
132 checks passed
trxcllnt added a commit to nv-rliu/cugraph that referenced this pull request Jun 3, 2024
rapids-bot bot pushed a commit to rapidsai/rapids-cmake that referenced this pull request Jun 7, 2024
Updates CCCL to v2.5.0 and a more recent cuCollections commit.

Using cuco at [6923b3](NVIDIA/cuCollections@6923b3b) because it was before NVIDIA/cuCollections#479, which is a breaking change for libcudf.

CCCL PR:
* NVIDIA/cccl#1667

RAPIDS PRs:
* rapidsai/cudf#15552
* rapidsai/cuml#5886
* rapidsai/cugraph#4404
* rapidsai/cuspatial#1382

Authors:
  - Paul Taylor (https://github.com/trxcllnt)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Bradley Dice (https://github.com/bdice)

URL: #607
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cuGraph improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants