-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data analysts should be able to reverse strings using Text.reverse
#3377
Conversation
ca03516
to
cb9eff4
Compare
This commit implements `Text.reverse` as an extension on `Text`. `Text.reverse` reverses strings. For example: `"Hello World!".reverse` results in `"!dlroW olleH"`. Strings are reversed by their Extended Grapheme Clusters not by their characters. This has some performance implications because we need to find these grapheme cluster boundaries when iterating. To do so, `BreakIterator.getCharacterInstance` is used. Implements: https://www.pivotaltracker.com/n/projects/2539304/stories/181265419
cb9eff4
to
7b283f3
Compare
distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
Outdated
Show resolved
Hide resolved
Looks good to me (once @4e6 fix applied). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the comment to fix otherwise looks good,
> Example | ||
Reverse the text "Hello, world!". | ||
|
||
"Hello, world!".reverse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Example | |
Reverse the text "Hello, world!". | |
"Hello, world!".reverse | |
> Example | |
Reverse the text "Hello, world!". | |
"Hello, world!".reverse |
I think the code in examples should also indented 4 spaces relative to the caption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, missed that. Should be fixed now!
iterator.setText this | ||
|
||
iterate prev next = if next == -1 then reverseStringBuilder.toString else | ||
reverseStringBuilder.append (Text_Utils.substring this next prev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a StringBuilder
works, but I'm wondering if using our +
wouldn't be more idiomatic - for example our Vector.join
does in fact use +
.
Performance-wise it should be quite comparable, because our +
uses ropes under the hood to achieve O(1) amortized append time. Still, probably StringBuilder
has a bit less of an overhead, so likely still a bit faster.
So I'm not really saying to change it - because this definitely works and is fast, but wondering what best practices we should adopt - because I think it would be good to have consistent approach to this between this function, Vector.join
and possibly other ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Text ropes cache collapsed strings - see ToJavaStringNode::inplaceFlatten
.
As for Vector.join
- I'm talking about the function which concatenates a vector of texts into a single text, did you mean Vector.+
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idiomatically, +
does seem like the right choice.
When I was implementing this, I actually did compare both approaches and the +
was ~2-3x slower for a 10000 length string. Though that may be due to me making some other error in the implementation. I can re-draft that benchmark and include it in the PR if you'd like to compare for yourself. I didn't include the benchmark in the commit because it didn't seem like this is something likely to change, performance-wise, over time.
Not sure if I follow the Vector.join
and Vector.+
comments.
StringBuilder
impl:
Text.reverse 10000 characters/iteration:0: 595.53ms
Text.reverse 10000 characters/iteration:1: 405.00ms
Text.reverse 10000 characters/iteration:2: 402.30ms
Text.reverse 10000 characters/iteration:3: 402.01ms
Text.reverse 10000 characters/iteration:4: 400.68ms
Text.reverse 10000 characters/iteration:5: 401.56ms
Text.reverse 10000 characters/iteration:6: 409.98ms
Text.reverse 10000 characters/iteration:7: 417.73ms
Text.reverse 10000 characters/iteration:8: 408.80ms
Text.reverse 10000 characters/iteration:9: 400.80ms
+
impl:
Text.reverse 10000 characters/iteration:0: 1083.13ms
Text.reverse 10000 characters/iteration:1: 1274.55ms
Text.reverse 10000 characters/iteration:2: 1267.65ms
Text.reverse 10000 characters/iteration:3: 1195.36ms
Text.reverse 10000 characters/iteration:4: 1140.85ms
Text.reverse 10000 characters/iteration:5: 1167.93ms
Text.reverse 10000 characters/iteration:6: 1162.50ms
Text.reverse 10000 characters/iteration:7: 1199.97ms
Text.reverse 10000 characters/iteration:8: 1203.35ms
Text.reverse 10000 characters/iteration:9: 1242.65ms
Naive (StringBuilder.new this).reverse
(not really a fair comparison because not extended-grapheme-cluster-aware):
Text.reverse 10000 characters/iteration:0: 6.57ms
Text.reverse 10000 characters/iteration:1: 3.47ms
Text.reverse 10000 characters/iteration:2: 3.96ms
Text.reverse 10000 characters/iteration:3: 3.45ms
Text.reverse 10000 characters/iteration:4: 4.05ms
Text.reverse 10000 characters/iteration:5: 4.81ms
Text.reverse 10000 characters/iteration:6: 6.69ms
Text.reverse 10000 characters/iteration:7: 6.56ms
Text.reverse 10000 characters/iteration:8: 7.33ms
Text.reverse 10000 characters/iteration:9: 8.02ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radeusgd ping ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, then I think you are right that it may be better to keep the stringbuilder. I guess the difference is because the ropes still have some more allocation overhead.
As for Vector.join
and Vector.+
- no worry it was a reply to a later deleted comment, please ignore 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Btw, you don't need approvals of all reviewers. A code owner approve is enough 😃
This pull request adds a benchmark for the `Text.reverse` function added in #3377 as part of https://www.pivotaltracker.com/n/projects/2539304/stories/181265419. Per discussion with @jdunkerley on Discord it is useful to have this benchmark as this is a low-level item we want to track.
Pull Request Description
This commit implements
Text.reverse
as an extension onText
.Text.reverse
reverses strings. For example:"Hello World!".reverse
results in
"!dlroW olleH"
.Strings are reversed by their Extended Grapheme Clusters not by their
characters. This has some performance implications because we need to
find these grapheme cluster boundaries when iterating. To do so,
BreakIterator.getCharacterInstance
is used.Implements: https://www.pivotaltracker.com/n/projects/2539304/stories/181265419
Important Notes
Checklist
Please include the following checklist in your PR:
If GUI codebase was changed: Enso GUI was tested when built using BOTH./run dist
and./run watch
.