Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data analysts should be able to reverse strings using Text.reverse #3377

Merged
merged 4 commits into from
Apr 5, 2022

Conversation

indiv0
Copy link
Contributor

@indiv0 indiv0 commented Apr 4, 2022

Pull Request Description

This commit implements Text.reverse as an extension on Text.
Text.reverse reverses strings. For example: "Hello World!".reverse
results in "!dlroW olleH".

Strings are reversed by their Extended Grapheme Clusters not by their
characters. This has some performance implications because we need to
find these grapheme cluster boundaries when iterating. To do so,
BreakIterator.getCharacterInstance is used.

Implements: https://www.pivotaltracker.com/n/projects/2539304/stories/181265419

Important Notes

Checklist

Please include the following checklist in your PR:

  • The documentation has been updated if necessary.
  • All code conforms to the Scala, Java, and Rust style guides.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed: Enso GUI was tested when built using BOTH ./run dist and ./run watch.

@indiv0 indiv0 requested review from 4e6 and jdunkerley as code owners April 4, 2022 19:28
@indiv0 indiv0 self-assigned this Apr 4, 2022
@indiv0 indiv0 requested a review from radeusgd as a code owner April 4, 2022 19:28
@CLAassistant
Copy link

CLAassistant commented Apr 4, 2022

CLA assistant check
All committers have signed the CLA.

@indiv0 indiv0 force-pushed the wip/npekin/text-reverse-181265419 branch 2 times, most recently from ca03516 to cb9eff4 Compare April 4, 2022 21:18
This commit implements `Text.reverse` as an extension on `Text`.
`Text.reverse` reverses strings. For example: `"Hello World!".reverse`
results in `"!dlroW olleH"`.

Strings are reversed by their Extended Grapheme Clusters not by their
characters. This has some performance implications because we need to
find these grapheme cluster boundaries when iterating. To do so,
`BreakIterator.getCharacterInstance` is used.

Implements: https://www.pivotaltracker.com/n/projects/2539304/stories/181265419
@indiv0 indiv0 force-pushed the wip/npekin/text-reverse-181265419 branch from cb9eff4 to 7b283f3 Compare April 4, 2022 21:20
@jdunkerley
Copy link
Member

Looks good to me (once @4e6 fix applied).

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the comment to fix otherwise looks good,

@indiv0 indiv0 requested review from jdunkerley and 4e6 April 5, 2022 13:01
Comment on lines 69 to 72
> Example
Reverse the text "Hello, world!".

"Hello, world!".reverse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Example
Reverse the text "Hello, world!".
"Hello, world!".reverse
> Example
Reverse the text "Hello, world!".
"Hello, world!".reverse

I think the code in examples should also indented 4 spaces relative to the caption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, missed that. Should be fixed now!

iterator.setText this

iterate prev next = if next == -1 then reverseStringBuilder.toString else
reverseStringBuilder.append (Text_Utils.substring this next prev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a StringBuilder works, but I'm wondering if using our + wouldn't be more idiomatic - for example our Vector.join does in fact use +.

Performance-wise it should be quite comparable, because our + uses ropes under the hood to achieve O(1) amortized append time. Still, probably StringBuilder has a bit less of an overhead, so likely still a bit faster.

So I'm not really saying to change it - because this definitely works and is fast, but wondering what best practices we should adopt - because I think it would be good to have consistent approach to this between this function, Vector.join and possibly other ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Text ropes cache collapsed strings - see ToJavaStringNode::inplaceFlatten.

As for Vector.join - I'm talking about the function which concatenates a vector of texts into a single text, did you mean Vector.+ instead?

Copy link
Contributor Author

@indiv0 indiv0 Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idiomatically, + does seem like the right choice.

When I was implementing this, I actually did compare both approaches and the + was ~2-3x slower for a 10000 length string. Though that may be due to me making some other error in the implementation. I can re-draft that benchmark and include it in the PR if you'd like to compare for yourself. I didn't include the benchmark in the commit because it didn't seem like this is something likely to change, performance-wise, over time.

Not sure if I follow the Vector.join and Vector.+ comments.

StringBuilder impl:

Text.reverse 10000 characters/iteration:0: 595.53ms                                                                                                                                     
Text.reverse 10000 characters/iteration:1: 405.00ms                                                                                                                                     
Text.reverse 10000 characters/iteration:2: 402.30ms                                                                                                                                     
Text.reverse 10000 characters/iteration:3: 402.01ms                                                                                                                                     
Text.reverse 10000 characters/iteration:4: 400.68ms                                                                                                                                     
Text.reverse 10000 characters/iteration:5: 401.56ms                                                                                                                                     
Text.reverse 10000 characters/iteration:6: 409.98ms                                                                                                                                     
Text.reverse 10000 characters/iteration:7: 417.73ms                                                                                                                                     
Text.reverse 10000 characters/iteration:8: 408.80ms                                                                                                                                     
Text.reverse 10000 characters/iteration:9: 400.80ms 

+ impl:

Text.reverse 10000 characters/iteration:0: 1083.13ms                                        
Text.reverse 10000 characters/iteration:1: 1274.55ms                                        
Text.reverse 10000 characters/iteration:2: 1267.65ms                                        
Text.reverse 10000 characters/iteration:3: 1195.36ms                                        
Text.reverse 10000 characters/iteration:4: 1140.85ms                                        
Text.reverse 10000 characters/iteration:5: 1167.93ms                                        
Text.reverse 10000 characters/iteration:6: 1162.50ms                                        
Text.reverse 10000 characters/iteration:7: 1199.97ms                                        
Text.reverse 10000 characters/iteration:8: 1203.35ms                                        
Text.reverse 10000 characters/iteration:9: 1242.65ms

Naive (StringBuilder.new this).reverse (not really a fair comparison because not extended-grapheme-cluster-aware):

Text.reverse 10000 characters/iteration:0: 6.57ms                                           
Text.reverse 10000 characters/iteration:1: 3.47ms                                           
Text.reverse 10000 characters/iteration:2: 3.96ms                                           
Text.reverse 10000 characters/iteration:3: 3.45ms                                           
Text.reverse 10000 characters/iteration:4: 4.05ms                                           
Text.reverse 10000 characters/iteration:5: 4.81ms                                           
Text.reverse 10000 characters/iteration:6: 6.69ms                                           
Text.reverse 10000 characters/iteration:7: 6.56ms                                           
Text.reverse 10000 characters/iteration:8: 7.33ms                                           
Text.reverse 10000 characters/iteration:9: 8.02ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@radeusgd ping ^

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then I think you are right that it may be better to keep the stringbuilder. I guess the difference is because the ropes still have some more allocation overhead.

As for Vector.join and Vector.+ - no worry it was a reply to a later deleted comment, please ignore 😅

@indiv0 indiv0 added the CI: Ready to merge This PR is eligible for automatic merge label Apr 5, 2022
@indiv0 indiv0 removed the CI: Ready to merge This PR is eligible for automatic merge label Apr 5, 2022
Copy link
Contributor

@4e6 4e6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Btw, you don't need approvals of all reviewers. A code owner approve is enough 😃

@4e6 4e6 added the CI: Ready to merge This PR is eligible for automatic merge label Apr 5, 2022
@mergify mergify bot merged commit 22e3941 into develop Apr 5, 2022
@mergify mergify bot deleted the wip/npekin/text-reverse-181265419 branch April 5, 2022 16:45
mergify bot pushed a commit that referenced this pull request Apr 5, 2022
This pull request adds a benchmark for the `Text.reverse` function added in #3377 as part of https://www.pivotaltracker.com/n/projects/2539304/stories/181265419.

Per discussion with @jdunkerley on Discord it is useful to have this benchmark as this is a low-level item we want to track.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants