-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test categorical features with column-split gpu quantile #9595
Conversation
I will submit a PR for fixing the macos test tomorrow. |
src/common/quantile.cu
Outdated
@@ -657,7 +657,9 @@ void SketchContainer::MakeCuts(HistogramCuts* p_cuts, bool is_column_split) { | |||
size_t column_size = std::max(static_cast<size_t>(1ul), this->Column(i).size()); | |||
if (IsCat(h_feature_types, i)) { | |||
// column_size is the number of unique values in that feature. | |||
CheckMaxCat(max_values[i].value, column_size); | |||
if (!is_column_split) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? (please add comment to code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm there seems to be a bug dealing with missing columns. Trying to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the underlying issue. Please take another look.
|
||
// The device vector needs to be initialized explicitly since we may have some missing columns. | ||
SketchEntry default_entry{}; | ||
dh::caching_device_vector<SketchEntry> d_max_results(d_in_columns_ptr.size() - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the caching device vector does initialize the value? (call constructor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also verified it in debugger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I think we ran into trouble with it before as commented in the XGBCcachingDeviceAllocatorImpl
. But you are correct.
thrust::cuda::par(alloc), key_it, key_it + in_cut_values.size(), val_it, d_max_keys.begin(), | ||
d_max_values.begin(), thrust::equal_to<bst_feature_t>{}, | ||
[] __device__(auto l, auto r) { return l.value > r.value ? l : r; }); | ||
d_max_keys.erase(new_end.first, d_max_keys.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by these two erases, what are they doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrink the two vectors to actual size. If we have missing columns, they won't be fully populated.
src/common/quantile.cu
Outdated
SketchEntry default_entry{}; | ||
dh::caching_device_vector<SketchEntry> d_max_results(d_in_columns_ptr.size() - 1, | ||
default_entry); | ||
thrust::scatter(d_max_values.begin(), d_max_values.end(), d_max_keys.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exec policy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
No description provided.