-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace default stream for scalars and column factories usages (because of defaulted arguments) #14354
Conversation
cpp/src/io/csv/writer_impl.cu
Outdated
cudf::string_scalar newline{options.get_line_terminator(), true, stream}; | ||
auto p_str_col_w_nl = cudf::strings::detail::join_strings(str_column_view, | ||
newline, | ||
string_scalar("", false), | ||
string_scalar("", false, stream), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline
is defined separately while the 3rd parameter is defined inline. Can you make them consistent please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think, it's not a matter of being consistent. That temporary scalar is not required after this usage. So, it's not a named parameter. its scope ends after function call.
newline
is used below.
cpp/src/io/csv/writer_impl.cu
Outdated
cudf::string_scalar(delimiter_str, true, stream), | ||
narep, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One parameter is inline defined while the other is not. Please make them consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That temporary scalar is not required after this usage. So, it's not a named parameter. its scopes end after function call.
Here narep
is used below this line too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started a review earlier and didn't submit it -- sorry. Flushing comments for now, and I'll try to get back to completing the review soon.
cpp/src/groupby/groupby.cu
Outdated
empty_like(values), | ||
0, | ||
{}, | ||
cudf::get_default_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is introducing a use of the default stream, which we want to avoid. There are public APIs accepting a user stream that can call this, which won't use the desired stream. We need to refactor all callers of this code path, and pass their stream through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the stream relevant in this case? Is any allocation actually happening when creating an empty column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before that issue is closed, we don't have any input stream to put into line 115 above and have to use the default stream anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupby needs a bigger cleanup; I can exclude it as part of this PR. It should be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed groupby from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Posted a few follow-up questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not add new calls to cudf::get_default_stream()
in internal implementations. @vuule just raised this here: https://github.com/rapidsai/cudf/pull/14354/files#r1393097787
It's the same problem as mentioned here: #14354 (comment)
Is there a way we can fix this in order, so that those calls have the stream passed through by their caller?
I'm okay with explicitly passing the default stream in cases where we don't have the user stream available (yet). It makes the default stream use very visible so we'll find these places easily once we get to updating those APIs to take a stream. IMO this is a good change, even though it does not really change the stream. |
I would recommend just adding a stream parameter with a default value and then forwarding that to the column constructor. Is there any reason we can't do that? It shouldn't break any existing APIs. While we're at it, we probably want to add a default mr parameter too. The table shouldn't be default copy-constructible. I think either leaving the implicit default stream or adding an explicit argument to the column constructor is fine as a short-term solution, but since columns already support streams in the public API I don't see any reason to delay adding that support for tables. |
As @vuule mentioned, default stream is added to make its use visible and hence allows us to replace it in future PRs. I thought about Moving this PR for 24.02. |
Removing my blocking review, after further conversation.
cpp/src/io/csv/writer_impl.cu
Outdated
cudf::string_scalar(options_.get_true_value(), true, stream_), | ||
cudf::string_scalar(options_.get_false_value(), true, stream_), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind making a change similar to the one in #14444?
None of the Python changes look like they belong in this PR. Is it because you were previously targeted to 23.12 and have some changes from there that are not yet in 24.02 due to the forward-merge no coming across yet? |
@wence- pulled 24.02. python changes are gone. |
Thanks! Removed the now unnecessary codeowner review requests. |
/merge |
Description
This PR contributes to #13744
Adds missing stream for scalars, column factories
Uses right
set_null_mask
for moving null masks instead of copy. (due to defaulted stream argument).Checklist