Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-581] Improve GetArrayItem(Split()) performance #933

Merged
merged 8 commits into from
May 25, 2022

Conversation

zhouyuan
Copy link
Collaborator

@zhouyuan zhouyuan commented May 24, 2022

What changes were proposed in this pull request?

The implementation converts GetArrayItem(Split()) into SplitPart() as Gazelle does not support Array based functions yet.

How was this patch tested?

pass jenkins

@zhouyuan zhouyuan changed the title Wip splitpart WIP splitpart May 24, 2022
@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

@zhouyuan zhouyuan force-pushed the wip_splitpart branch 2 times, most recently from 8810e41 to 5a41c36 Compare May 24, 2022 02:38
@zhouyuan zhouyuan changed the title WIP splitpart Implement SplitPart May 24, 2022
@zhouyuan zhouyuan changed the title Implement SplitPart [NSE-581] Implement SplitPart May 24, 2022
@github-actions
Copy link

#581

copy(str = newFirst, regex = newSecond, limit = newThird)
}

case class StringSplitPart(str: Expression, regex: Expression, limit: Expression, index: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer to register StringSplitPart through UDFRegistration, rather than adding whole expression code to gazelle, which will affect the difficulty of upgrading spark versions later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, these codes will be moved to Spark321 shim layers or dynamically register this on session starts - will also try to upstream SplitPart to upstream Spark

StringSplit(str.expr, Literal(pattern), Literal(limit))
}
def splitpart(str: Column, pattern: String, limit: Int, index: Int): Column = withExpr {
StringSplitPart(str.expr, Literal(pattern), Literal(limit), Literal(index))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@PHILO-HE
Copy link
Collaborator

To align with spark's behavior, please apply this patch to arrow: oap-project/arrow#107.

Signed-off-by: Yuan Zhou <[email protected]>
@zhouyuan zhouyuan changed the title [NSE-581] Implement SplitPart [NSE-581] Improve GetArrayItem(Split()) performance May 24, 2022
@zhouyuan zhouyuan marked this pull request as ready for review May 24, 2022 12:15
@zhouyuan zhouyuan merged commit 47af257 into oap-project:main May 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants