-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18395. Performance improvement in hadoop-common Text#find #4714
base: trunk
Are you sure you want to change the base?
Conversation
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huxinqiu Thanks for your report. LGTM.
Can you add one UT with a very long string and timeout to test the performance of this method?
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Description of PR
JIRA: HADOOP-18395
The current implementation reset src and tgt to the mark and continues searching when tgt has remaining and src expired first. which is probably not necessary.
In some cases, this commit can reduce the complexity from O(n²) to O(n), which can significantly improve performance, as in the following example.
How was this patch tested?
unit test in org.apache.hadoop.io.TestText#testFind