-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad #11016
Conversation
ZMZ91
commented
Aug 27, 2021
- add max/min return length for space/lpad/rpad udfs
- correct return length
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format?
or
See also: |
EXPECT_EQ(std::string(out, out_len), std::string(65536, ' ')); | ||
out = space_int64(ctx_ptr, 9223372036854775807, &out_len); | ||
EXPECT_EQ(std::string(out, out_len), std::string(65536, ' ')); | ||
out = space_int64(ctx_ptr, -2639077559, &out_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error is appearing in the AppVeyor build is because you are using a value that is lower than the minimum integer.
Try to change to -2639077559L
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems still not working. But thank you anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try -2639077559LL
. long long is guaranteed to be atleast 64 bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZMZ91 I think Projjal's suggestion worked! I looked at the failed tests and it seems that was caused by environment issues when it tries to install dependencies.
Do the rebase with Arrow's master repository and push it to execute the builds again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @anthonylouisbsb! BTW, could you help check my another pr #10884? That one was opened even much earlier than this one.
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32 | |||
// case (return_length > text_char_count) | |||
// case where it needs to copy "fill_text" on the string left. The total number | |||
// of chars to copy is given by (return_length - text_char_count) | |||
char* ret = | |||
reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length)); | |||
gdv_int32 return_char_length = evaluate_return_char_length( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?
Never mind, the text_char_count was confusing. I think the text_char_count is actually text_len - invalid utf8..maybe rename 'text_char_count' to 'actual_text_len'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to actual_text_len. @rkavanap if no more issues, could you help merge this pr since it's been quite a while since open. Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ZMZ91 I am done with my review and this PR looks good. Please let me know what else I need to do to help merge the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do next? I have no permission to merge this pr, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check with @projjal ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALso you may have to rebase and make the failing checks pass
@ZMZ91 rebase your branch with master and I think the failing builds you be fixed. |
@ZMZ91 I think the error in the falling workflow is not related to your changes in this PR. We need to wait for the fix in the master branch and then rebase your branch. |
@ZMZ91 try to rebase with master again. The error in the workflow was fixed, I tested with a PR. |
Thank you! It worked! Could you help merge it then? |
@ZMZ91 I will ask for @pravindra to merge it. |
Benchmark runs are scheduled for baseline = 8aa1ead and contender = 3346fa6. 3346fa6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
- add max/min return length for space/lpad/rpad udfs - correct return length Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count Authored-by: ZMZ <[email protected]> Signed-off-by: Pindikura Ravindra <[email protected]> Signed-off-by: Yuan Zhou <[email protected]>
- add max/min return length for space/lpad/rpad udfs - correct return length Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count Authored-by: ZMZ <[email protected]> Signed-off-by: Pindikura Ravindra <[email protected]> Signed-off-by: Yuan Zhou <[email protected]>
* ARROW-12567: [C++][Gandiva] Implement LPAD and RPAD functions for string input values - LPAD([string] basetext, [number] x, [optional string] padtext) - RPAD([string] basetext, [number] x, [optional string] padtext) lpad - Prepends padtext to basetext in a way that allows as many characters as possible from padtext given an output string length of x. When x is less than or equal to the length of basetext, only characters from basetext are printed in the output. If padtext is omitted then spaces are prepended. rpad - Appends padtext to basetext in a way that allows as many characters as possible from padtext given an output string length of x. When x is less than or equal to the length of basetext, only characters from basetext are printed in the output. If padtext is omitted then spaces are appended. Closes apache#10173 from jpedroantunes/feature/lpad-rpad-functions and squashes the following commits: 4efc0fe <João Pedro> Add utf8_length method that ignore invalid char considering size 1 33a5a14 <João Pedro> Fix identation on function string registry 4c4b2f4 <João Pedro> Change lpad and rpad functions signature and definition 26b90b0 <João Pedro> Correct ci lint errors on gandiva 66594a0 <João Pedro> Correct lint local errors on gandiva b6b63e9 <João Pedro> Add projector test for RPAD string function dc72148 <João Pedro> Add function registry for RPAD string function without pad text c270fb1 <João Pedro> Add base implementation and tests for RPAD functions 08d2053 <João Pedro> Add function registry for LPAD string function without pad text 585cad3 <João Pedro> Add base implementation and tests for LPAD function without pad texts considering string input values 73927fc <João Pedro> Add projector test for LPAD string function 2c929a9 <João Pedro> Add function registry for LPAD string function aecaff6 <João Pedro> Add base implementation and tests for LPAD function considering string input values Authored-by: João Pedro <[email protected]> Signed-off-by: Praveen <[email protected]> * ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad - add max/min return length for space/lpad/rpad udfs - correct return length Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count Authored-by: ZMZ <[email protected]> Signed-off-by: Pindikura Ravindra <[email protected]> Signed-off-by: Yuan Zhou <[email protected]> * fix concat Signed-off-by: Yuan Zhou <[email protected]> Co-authored-by: João Pedro <[email protected]> Co-authored-by: ZMZ <[email protected]>