Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad #11016

Closed
wants to merge 8 commits into from

Conversation

ZMZ91
Copy link
Contributor

@ZMZ91 ZMZ91 commented Aug 27, 2021

  • add max/min return length for space/lpad/rpad udfs
  • correct return length

@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW

Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@ZMZ91 ZMZ91 changed the title Bugfix/limit return chars count ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad Aug 27, 2021
@github-actions
Copy link

EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
out = space_int64(ctx_ptr, -2639077559, &out_len);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error is appearing in the AppVeyor build is because you are using a value that is lower than the minimum integer.

Try to change to -2639077559L

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems still not working. But thank you anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projjal can you give us help with that problem? I do not know why the build is still failing even with the tip I gave to @ZMZ91

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try -2639077559LL. long long is guaranteed to be atleast 64 bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZMZ91 I think Projjal's suggestion worked! I looked at the failed tests and it seems that was caused by environment issues when it tries to install dependencies.

Do the rebase with Arrow's master repository and push it to execute the builds again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @anthonylouisbsb! BTW, could you help check my another pr #10884? That one was opened even much earlier than this one.

@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
// case (return_length > text_char_count)
// case where it needs to copy "fill_text" on the string left. The total number
// of chars to copy is given by (return_length - text_char_count)
char* ret =
reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
gdv_int32 return_char_length = evaluate_return_char_length(
Copy link

@rkavanap rkavanap Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?

Never mind, the text_char_count was confusing. I think the text_char_count is actually text_len - invalid utf8..maybe rename 'text_char_count' to 'actual_text_len'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to actual_text_len. @rkavanap if no more issues, could you help merge this pr since it's been quite a while since open. Thanks a lot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZMZ91 I am done with my review and this PR looks good. Please let me know what else I need to do to help merge the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do next? I have no permission to merge this pr, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check with @projjal ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALso you may have to rebase and make the failing checks pass

@anthonylouisbsb
Copy link
Contributor

@ZMZ91 rebase your branch with master and I think the failing builds you be fixed.

@anthonylouisbsb
Copy link
Contributor

@ZMZ91 I think the error in the falling workflow is not related to your changes in this PR. We need to wait for the fix in the master branch and then rebase your branch.

@anthonylouisbsb
Copy link
Contributor

@ZMZ91 try to rebase with master again. The error in the workflow was fixed, I tested with a PR.

@ZMZ91
Copy link
Contributor Author

ZMZ91 commented Nov 24, 2021

@ZMZ91 try to rebase with master again. The error in the workflow was fixed, I tested with a PR.

Thank you! It worked! Could you help merge it then?

@anthonylouisbsb
Copy link
Contributor

@ZMZ91 I will ask for @pravindra to merge it.

@pravindra pravindra closed this in 3346fa6 Nov 25, 2021
@ursabot
Copy link

ursabot commented Nov 25, 2021

Benchmark runs are scheduled for baseline = 8aa1ead and contender = 3346fa6. 3346fa6 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed] ursa-i9-9960x
[Finished ⬇️0.09% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request Jun 9, 2022
- add max/min return length for space/lpad/rpad udfs
- correct return length

Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count

Authored-by: ZMZ <[email protected]>
Signed-off-by: Pindikura Ravindra <[email protected]>
Signed-off-by: Yuan Zhou <[email protected]>
zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request Jun 13, 2022
- add max/min return length for space/lpad/rpad udfs
- correct return length

Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count

Authored-by: ZMZ <[email protected]>
Signed-off-by: Pindikura Ravindra <[email protected]>
Signed-off-by: Yuan Zhou <[email protected]>
zhouyuan added a commit to oap-project/arrow that referenced this pull request Jun 14, 2022
* ARROW-12567: [C++][Gandiva] Implement LPAD and RPAD functions for string input values

- LPAD([string] basetext, [number] x, [optional string] padtext)
- RPAD([string] basetext, [number] x, [optional string] padtext)

lpad - Prepends padtext to basetext in a way that allows as many characters as possible from padtext given an output string length of x. When x is less than or equal to the length of basetext, only characters from basetext are printed in the output. If padtext is omitted then spaces are prepended.

rpad - Appends padtext to basetext in a way that allows as many characters as possible from padtext given an output string length of x. When x is less than or equal to the length of basetext, only characters from basetext are printed in the output. If padtext is omitted then spaces are appended.

Closes apache#10173 from jpedroantunes/feature/lpad-rpad-functions and squashes the following commits:

4efc0fe <João Pedro> Add utf8_length method that ignore invalid char considering size 1
33a5a14 <João Pedro> Fix identation on function string registry
4c4b2f4 <João Pedro> Change lpad and rpad functions signature and definition
26b90b0 <João Pedro> Correct ci lint errors on gandiva
66594a0 <João Pedro> Correct lint local errors on gandiva
b6b63e9 <João Pedro> Add projector test for RPAD string function
dc72148 <João Pedro> Add function registry for RPAD string function without pad text
c270fb1 <João Pedro> Add base implementation and tests for RPAD functions
08d2053 <João Pedro> Add function registry for LPAD string function without pad text
585cad3 <João Pedro> Add base implementation and tests for LPAD function without pad texts considering string input values
73927fc <João Pedro> Add projector test for LPAD string function
2c929a9 <João Pedro> Add function registry for LPAD string function
aecaff6 <João Pedro> Add base implementation and tests for LPAD function considering string input values

Authored-by: João Pedro <[email protected]>
Signed-off-by: Praveen <[email protected]>

* ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

- add max/min return length for space/lpad/rpad udfs
- correct return length

Closes apache#11016 from ZMZ91/bugfix/limit_return_chars_count

Authored-by: ZMZ <[email protected]>
Signed-off-by: Pindikura Ravindra <[email protected]>
Signed-off-by: Yuan Zhou <[email protected]>

* fix concat

Signed-off-by: Yuan Zhou <[email protected]>

Co-authored-by: João Pedro <[email protected]>
Co-authored-by: ZMZ <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants