-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strings udf C++ classes and functions for phase II #11912
Add strings udf C++ classes and functions for phase II #11912
Conversation
Codecov ReportBase: 87.40% // Head: 88.11% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11912 +/- ##
================================================
+ Coverage 87.40% 88.11% +0.71%
================================================
Files 133 133
Lines 21833 22003 +170
================================================
+ Hits 19084 19389 +305
+ Misses 2749 2614 -135
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! I have a bunch of nitty-gritty comments but the core of the code is sound.
I notice some redundancy in udf_string
methods (at
vs []
, =
vs assign
). From the ones that I checked they all match std::string
methods so I assume that's intentional, just noting that in case there are any extra ones it would be good to clean them out.
Out of curiosity, what prompted the change in name from the earlier d_string
?
} | ||
} | ||
|
||
__device__ inline udf_string& udf_string::replace(cudf::size_type pos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the wrong variable in my example, which is probably adding to the confusion. I think I had two questions, of which only one is still relevant.
First question: what happens if overwritingpos + count
characters leads to out of bounds accesses? Plugging the snippet below into the compiler explorer suggests that the code prevents out of bounds accesses. Is that something we should promise for our implementation too?
#include <iostream>
#include <string>
int main()
{
std::string str{"First"};
// This line has the same behavior for any value of count >= 5. Is that expected
auto count = 100;
str.replace(0, count, "Second");
std::cout << str << std::endl;
}
Second question: what happens if the replacement string is larger than count
? That use case seems like it is fine and the reallocation is the behavior expected by the standard. I don't know if we need to document that since on second thought it seems fairly obvious that this would be standard behavior since otherwise the API would be very limited if it could only replace a string of equal length.
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks fine to me. I have a few small comments, resolve them how you see fit. Thanks @davidwendt!
} | ||
} | ||
|
||
__device__ inline udf_string& udf_string::replace(cudf::size_type pos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might deserve docs (for developers) since the behavior differs from https://en.cppreference.com/w/cpp/string/basic_string/replace in a non-obvious way. As I understand it, no exception is thrown (not possible), but a partial replacement (up to the end of the string) is performed.
} // namespace detail | ||
|
||
// external APIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove this comment.
This works exactly like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for iterating!
@gpucibot merge |
Description
Adds the C++ classes and functions for the phase II of strings udf. This specifically includes the device side string class which can be used for building udfs the create or modify strings.
Also included are some basic helper functions for split, strip, case, and numeric conversion.
Checklist