Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement predict_per_tree() in FIL #5303

Merged
merged 19 commits into from
Mar 30, 2023

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Mar 27, 2023

No description provided.

@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Mar 27, 2023
@hcho3 hcho3 added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Mar 27, 2023
@hcho3 hcho3 marked this pull request as ready for review March 27, 2023 22:35
@hcho3 hcho3 requested review from a team as code owners March 27, 2023 22:35
@hcho3 hcho3 changed the title Implement predict_per_tree() Implement predict_per_tree() in FIL Mar 27, 2023
Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! I left some inline feedback on details, but in general it looks quite good.

The one larger thing I'd like to see us clean up is how we're handling the shared memory to global memory fallback. Rather than having custom logic at every point we need to touch that, it would be nice to encapsulate all of that in shared_memory_buffer. Specifically, I would love to see the signature for fill change to

fill(index_type element_count, T value=T{}, T* fallback_buffer=nullptr)

Then, we don't have to have any special logic for when we are or are not using the fallback. Before we launch the kernel, we allocate global memory if we detect that we don't have enough shared memory. Otherwise, we have an empty buffer. We then pass that pointer all the way through the layers without any other processing down to shared_memory_buffer. If it sees that the fill is going to fail, it fills the fallback buffer and returns that pointer. That keeps the logic for all different prediction types the same, and it encapsulates any special handling in the shared_memory_buffer object.

Other than that, I think we're looking pretty great!

cpp/include/cuml/experimental/fil/detail/infer.hpp Outdated Show resolved Hide resolved
cpp/include/cuml/experimental/fil/detail/infer/gpu.cuh Outdated Show resolved Hide resolved
cpp/include/cuml/experimental/fil/detail/infer/gpu.cuh Outdated Show resolved Hide resolved
cpp/include/cuml/experimental/fil/detail/infer/gpu.cuh Outdated Show resolved Hide resolved
cpp/include/cuml/experimental/fil/detail/infer/gpu.cuh Outdated Show resolved Hide resolved
cpp/include/cuml/experimental/fil/output_kind.hpp Outdated Show resolved Hide resolved
python/cuml/experimental/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/experimental/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/experimental/fil/fil.pyx Outdated Show resolved Hide resolved
python/cuml/experimental/fil/fil.pyx Show resolved Hide resolved
@hcho3 hcho3 mentioned this pull request Mar 28, 2023
@hcho3
Copy link
Contributor Author

hcho3 commented Mar 29, 2023

FYI, I renamed the enum throughout the codebase so that it's not confused with the array type.

  • output_type -> infer_type or predict_type, depending on context.
  • output_kind -> infer_kind

Copy link
Contributor

@wphicks wphicks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Everything about this is great. I had one small request for where we put the output_t logic, but it should not hold up a merge unnecessarily.

I'm testing for perf regressions now, and we can merge as soon as that gets cleared. If we get that output_t change in before that's done, great, but otherwise I'm fine with merging as is.

cpp/include/cuml/experimental/fil/detail/output_type.hpp Outdated Show resolved Hide resolved
@wphicks
Copy link
Contributor

wphicks commented Mar 30, 2023

/merge

@rapids-bot rapids-bot bot merged commit ecd4d02 into rapidsai:branch-23.04 Mar 30, 2023
@hcho3 hcho3 deleted the predict_per_tree branch March 30, 2023 22:20
wphicks added a commit that referenced this pull request Apr 4, 2023
wphicks added a commit to wphicks/cuml that referenced this pull request Apr 6, 2023
This reverts commit ecd4d02 to avoid a
race condition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants