Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text.take and Text.drop #3287

Merged
merged 32 commits into from
Feb 22, 2022
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ea962db
Adding Range.contains
jdunkerley Feb 17, 2022
66f2c77
Adding Runtime.lazy
jdunkerley Feb 18, 2022
987202d
First, Last, Before and After working
jdunkerley Feb 18, 2022
11a2121
Removing Runtime.lazy
jdunkerley Feb 18, 2022
e1fb16d
Support for While
jdunkerley Feb 18, 2022
a6616c0
Support for Range for take
jdunkerley Feb 18, 2022
6c49709
Move to Range over Pair for internals
jdunkerley Feb 19, 2022
aa26c94
Working drop and take
jdunkerley Feb 21, 2022
bc7ed35
Emoji tests
jdunkerley Feb 21, 2022
2b3c078
Take accent tests
jdunkerley Feb 21, 2022
481bd60
Drop on graphemes.
jdunkerley Feb 21, 2022
c584926
More tests
jdunkerley Feb 21, 2022
6a96756
Final tests and changelog
jdunkerley Feb 21, 2022
71d2ac2
PR comment
jdunkerley Feb 21, 2022
15508dd
Line length
jdunkerley Feb 22, 2022
cc1e4d7
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
bb5c52f
Failing test.
jdunkerley Feb 22, 2022
bd1929d
Failing test.
jdunkerley Feb 22, 2022
88e0dad
New line
jdunkerley Feb 22, 2022
0043fee
Update SELECT library documentation
jdunkerley Feb 22, 2022
b5533b8
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
bee5298
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
7abbecd
PR comments on Range
jdunkerley Feb 22, 2022
a59d3c1
More PR comments
jdunkerley Feb 22, 2022
1e20ab9
More PR comments
jdunkerley Feb 22, 2022
0d48153
Move to BreakIterator.next count
jdunkerley Feb 22, 2022
4a95875
More detail for Input_Indices_Already_Matched
jdunkerley Feb 22, 2022
9f7b706
Better description
jdunkerley Feb 22, 2022
a03cf3c
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
f2fdd48
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
16c0289
Flip _Last for ""
jdunkerley Feb 22, 2022
e64d382
Move to next count for 2 more cases
jdunkerley Feb 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@
- [Made `Text.compare_to` correctly handle Unicode normalization][3282]
- [Extend `Text.contains` API to support regex and case insensitive
search.][3285]
- [Implemented new `Text.take` and `Text.drop` functions, replacing existing
functions][3287]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -75,6 +77,7 @@
[3283]: https://github.com/enso-org/enso/pull/3283
[3282]: https://github.com/enso-org/enso/pull/3282
[3285]: https://github.com/enso-org/enso/pull/3285
[3287]: https://github.com/enso-org/enso/pull/3287

#### Enso Compiler

Expand Down
11 changes: 11 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,14 @@ type Range
to_vector =
length = Math.max 0 (this.end - this.start)
Vector.new length (i -> i + this.start)

## Does the range contains the specified value
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

> Example
Check if an index is in the range of a Vector

vec = ["A", "B", "C", "D", "E"]
0.up_to vec.length . contains 3
contains : Integer -> Boolean
contains value =
(this.start <= value) && (this.end > value)
225 changes: 110 additions & 115 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import Standard.Base.Data.Text.Regex
import Standard.Base.Data.Text.Regex.Mode
import Standard.Base.Data.Text.Line_Ending_Style
import Standard.Base.Data.Text.Split_Kind
import Standard.Base.Data.Text.Text_Sub_Range
import Standard.Base.Data.Locale
import Standard.Base.Meta

Expand Down Expand Up @@ -45,13 +46,10 @@ Text.length : Integer
Text.length =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
nxt = iterator.next

count accum iter = if iter == -1 then accum else
counter = accum + 1
next_nxt = iterator.next
@Tail_Call count counter next_nxt
count 0 nxt
@Tail_Call count (accum + 1) iterator.next
count 0 iterator.next

## Applies the provided `function` to each character in `this`.

Expand All @@ -72,15 +70,10 @@ Text.each function =
iterator = BreakIterator.getCharacterInstance
iterator.setText this

fst = iterator.first
nxt = iterator.next

iterate prev nxt = if nxt == -1 then Nothing else
function (Text_Utils.substring this prev nxt)
next_nxt = iterator.next
@Tail_Call iterate nxt next_nxt
iterate fst nxt
Nothing
@Tail_Call iterate nxt iterator.next
iterate iterator.first iterator.next

## ALIAS Get Character

Expand Down Expand Up @@ -112,17 +105,10 @@ Text.at index =
False ->
iterator = BreakIterator.getCharacterInstance
iterator.setText this

loop prev next count = if count == index then (Text_Utils.substring this prev next) else
next_next = iterator.next
if next_next == -1 then count else
@Tail_Call loop next next_next (count + 1)

first = iterator.next
result = if (first == -1) then 0 else (loop 0 first 0)
case result of
Integer -> Error.throw (Index_Out_Of_Bounds_Error index result)
_ -> result
first = iterator.next index
next = if first == -1 then -1 else iterator.next
if (next == -1) then (Error.throw (Index_Out_Of_Bounds_Error index this.length)) else
Text_Utils.substring this first next

## ALIAS Get Characters

Expand Down Expand Up @@ -850,106 +836,116 @@ Text.repeat : Integer -> Text
Text.repeat count =
0.up_to count . fold "" acc-> _-> acc + this

## Creates a new text by removing the first `count` characters of `this`,
returning an empty text if `count` is greater than or equal to the length of
`this`.

Arguments:
- count: The number of characters to remove from the start of `this`.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.

> Example
Removing the first three characters from the text "ABBA".

"ABBA".drop_first 3
Text.drop_first : Integer -> Text
Text.drop_first count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.first
boundary = iterator.next count
if boundary == -1 then '' else Text_Utils.drop_first this boundary

## Creates a new text by removing the last `count` characters of `this`,
returning an empty text if `count` is greater than or equal to the length of
`this`.

Arguments:
- count: The number of characters to remove from the end of `this`.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.

> Example
Removing the last three characters from the text "ABBA".

"ABBA".drop_last 3
Text.drop_last : Integer -> Text
Text.drop_last count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.last
boundary = iterator.next -count
if boundary == -1 then '' else Text_Utils.substring this 0 boundary

## Creates a new text by selecting the first `count` characters of `this`,
returning `this` if `count` is greater than or equal to the length of `this`.
## PRIVATE
Utility function taking a range pointing at grapheme clusters and converting to a range on the underlying code points
range_to_char_indices : Text -> Range -> Range ! Index_Out_Of_Bounds_Error
range_to_char_indices text range =
len = text.length
start = if range.start < 0 then range.start + len else range.start
end = if range.end == Nothing then len else (if range.end < 0 then range.end + len else range.end)
is_valid = (Range 0 len+1).contains

case (Pair (is_valid start) (is_valid end)) of
Pair False _ -> Error.throw (Index_Out_Of_Bounds_Error range.start len)
Pair True False -> Error.throw (Index_Out_Of_Bounds_Error range.end len)
Pair True True ->
if start>=end then (Range 0 0) else
iterator = BreakIterator.getCharacterInstance
iterator.setText text

start_index = iterator.next start
end_index = iterator.next (end - start)
Range start_index end_index

## ALIAS first, last, left, right, mid, substring
Creates a new Text by selecting the specified range of the input.

This can select a section of text from the beginning, end, or middle of the
input using various criteria defined by the range parameter.

Arguments:
- count: The number of characters to take from the start of `this`.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.

> Example
Make a new text from the first two characters of "boo".

"boo".take_first 2
Text.take_first : Integer -> Text
Text.take_first count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.first
boundary = iterator.next count
if boundary == -1 then this else Text_Utils.substring this 0 boundary

## Creates a new text by selecting the last `count` characters of `this`,
returning `this` if `count` is greater than or equal to the length of `this`.
- range: The section of the this text to return.
If a `Text_Sub_Range`, then the selection is interpreted following the rules of that type.
If a `Range`, the selection is specified by two indices, from and to.

Returns:
The part of the input as specified by the range parameter.

> Examples
Various different ways to take part of "Hello World!"

"Hello World!".take First.new == "H"
"Hello World!".take (First 5) == "Hello"
"Hello World!".take (First 0) == ""
"Hello World!".take Last.new == "!"
"Hello World!".take (Last 6) == "World!"
"Hello World!".take (Before " ") == "Hello"
"Hello World!".take (Before_Last "o") == "Hello W"
"Hello World!".take (After " ") == "World!"
"Hello World!".take (After_Last "o") == "rld!"
"Hello World!".take (While c->c!=" ") == "Hello"
"Hello World!".take (Range 3 5) == "lo"
"Hello World!".take (Range -3 -1) == "ld"
"Hello World!".take (Range -3 Nothing) == "ld!"
"Hello World!".take (Range 5 Nothing) == " World!"
"Hello World!".take (Range 5 12) == " World!"
"Hello World!".take (Range 12 12) == ""
Comment on lines +874 to +892
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully aware of the design there but I think the idea of examples is that (probably in the future) you can select one in the IDE and it will be placed on the stage - so then each example should probably be self contained instead of consisting of multiple variants.
That would make a LOT examples here, but I guess it can be useful to the users.
May worth asking @wdanilo or someone from IDE if the examples should be 'single separate units' or if grouping them like this is ok. Not sure how important that is since I'm not sure if/when this example insertion is going to be implemented, but I remember hearing something about it so would be good to clarify this.

Awesome set of examples by the way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add to open questions

Text.take : (Text_Sub_Range | Range) -> Text ! Index_Out_Of_Bounds_Error
Text.take range =
char_range = case range of
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
Comment on lines +896 to +897
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
Range _ _ -> here.range_to_char_indices this range
Text_Sub_Range -> range.to_char_range this

would be cool to be able to do it, but I think it may be impossible. Did you try it? If it doesn't work, it may be worth reporting to the engine - if I understand correctly the typeset design, this should be possible - and in this case would make the code much clearer than a _. (It would also catch errors earlier, rather than failing with some random value does not define a to_char_range method).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't work when I tried it earlier so ended up with this approach.

will raise with engine team

if char_range.is_error then char_range else
Text_Utils.substring this char_range.start char_range.end

## Creates a new Text by removing the specified range of the input.

This can select a section of text from the beginning, end, or middle of the
input using various criteria defined by the range parameter.

Arguments:
- count: The number of characters to take from the end of `this`.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.

> Example
Make a new text from the last two characters of "boo".

"boo".take_last 2
Text.take_last : Integer -> Text
Text.take_last count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.last
boundary = iterator.next -count
if boundary == -1 then this else Text_Utils.drop_first this boundary
- range: The section of the this text to return.
If a `Text_Sub_Range`, then the selection is interpreted following the rules of that type.
If a `Range`, the selection is specified by two indices, from and to.

Returns:
The part of the input as specified by the range parameter.

> Examples
Various different ways to take part of "Hello World!"

"Hello World!".drop First.new == "ello World!"
"Hello World!".drop (First 5) == " World!"
"Hello World!".drop (First 0) == "Hello World!"
"Hello World!".drop Last.new == "Hello World"
"Hello World!".drop (Last 6) == "Hello "
"Hello World!".drop (Before " ") == " World!"
"Hello World!".drop (Before_Last "o") == "orld!"
"Hello World!".drop (After " ") == "Hello "
"Hello World!".drop (After_Last "o") == "Hello Wo"
"Hello World!".drop (While c->c!=" ") == " World!"
"Hello World!".drop (Range 3 5) == "Hel World!"
"Hello World!".drop (Range -3 -1) == "Hello Wor!"
"Hello World!".drop (Range -3 Nothing) == "Hello Wor"
"Hello World!".drop (Range 5 Nothing) == "Hello"
"Hello World!".drop (Range 5 12) == "Hello"
"Hello World!".drop (Range 12 12) == "Hello World!"
Text.drop : (Text_Sub_Range | Range) -> Text ! Index_Out_Of_Bounds_Error
Text.drop range =
char_range = case range of
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
if char_range.start == 0 then Text_Utils.drop_first this char_range.end else
prefix = Text_Utils.substring this 0 char_range.start
if char_range.end == (Text_Utils.char_length this) then prefix else
prefix + Text_Utils.drop_first this char_range.end

## ALIAS Lower Case

Converts each character in `this` to lower case.

Arguments:
- locale: specifies the locale for charater case mapping. Defaults to the
- locale: specifies the locale for character case mapping. Defaults to the
`Locale.default` locale.

! What is a Character?
Expand Down Expand Up @@ -978,7 +974,7 @@ Text.to_lower_case locale=Locale.default =
Converts each character in `this` to upper case.

Arguments:
- locale: specifies the locale for charater case mapping. Defaults to
- locale: specifies the locale for character case mapping. Defaults to
`Locale.default`.

! What is a Character?
Expand All @@ -1001,4 +997,3 @@ Text.to_lower_case locale=Locale.default =
Text.to_upper_case : Locale.Locale -> Text
Text.to_upper_case locale=Locale.default =
UCharacter.toUpperCase locale.java_locale this

Loading