Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text.take and Text.drop #3287

Merged
merged 32 commits into from
Feb 22, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
ea962db
Adding Range.contains
jdunkerley Feb 17, 2022
66f2c77
Adding Runtime.lazy
jdunkerley Feb 18, 2022
987202d
First, Last, Before and After working
jdunkerley Feb 18, 2022
11a2121
Removing Runtime.lazy
jdunkerley Feb 18, 2022
e1fb16d
Support for While
jdunkerley Feb 18, 2022
a6616c0
Support for Range for take
jdunkerley Feb 18, 2022
6c49709
Move to Range over Pair for internals
jdunkerley Feb 19, 2022
aa26c94
Working drop and take
jdunkerley Feb 21, 2022
bc7ed35
Emoji tests
jdunkerley Feb 21, 2022
2b3c078
Take accent tests
jdunkerley Feb 21, 2022
481bd60
Drop on graphemes.
jdunkerley Feb 21, 2022
c584926
More tests
jdunkerley Feb 21, 2022
6a96756
Final tests and changelog
jdunkerley Feb 21, 2022
71d2ac2
PR comment
jdunkerley Feb 21, 2022
15508dd
Line length
jdunkerley Feb 22, 2022
cc1e4d7
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
bb5c52f
Failing test.
jdunkerley Feb 22, 2022
bd1929d
Failing test.
jdunkerley Feb 22, 2022
88e0dad
New line
jdunkerley Feb 22, 2022
0043fee
Update SELECT library documentation
jdunkerley Feb 22, 2022
b5533b8
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
bee5298
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
7abbecd
PR comments on Range
jdunkerley Feb 22, 2022
a59d3c1
More PR comments
jdunkerley Feb 22, 2022
1e20ab9
More PR comments
jdunkerley Feb 22, 2022
0d48153
Move to BreakIterator.next count
jdunkerley Feb 22, 2022
4a95875
More detail for Input_Indices_Already_Matched
jdunkerley Feb 22, 2022
9f7b706
Better description
jdunkerley Feb 22, 2022
a03cf3c
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
f2fdd48
Merge branch 'develop' into wip/jd/text-take-drop-181265131
jdunkerley Feb 22, 2022
16c0289
Flip _Last for ""
jdunkerley Feb 22, 2022
e64d382
Move to next count for 2 more cases
jdunkerley Feb 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
methods added to `Standard.Test`][3276]
- [Implemented `Integer.parse`][3283]
- [Made `Text.compare_to` correctly handle Unicode normalization][3282]
- [Implemented new `Text.take` and `Text.drop` functions, replacing existing
functions][3287]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand All @@ -70,6 +72,7 @@
[3276]: https://github.com/enso-org/enso/pull/3276
[3283]: https://github.com/enso-org/enso/pull/3283
[3282]: https://github.com/enso-org/enso/pull/3282
[3287]: https://github.com/enso-org/enso/pull/3287

#### Enso Compiler

Expand Down
10 changes: 10 additions & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,13 @@ type Range
to_vector =
length = Math.max 0 (this.end - this.start)
Vector.new length (i -> i + this.start)

## Does the range contains the specified value
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

> Example
Check if a index is in the range of a Vector
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

vec = [1,3,4,5,7]
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
contains : Integer -> Boolean
contains value =
(this.start <= value) && (this.end > value)
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import Standard.Base.Data.Text.Regex
import Standard.Base.Data.Text.Regex.Mode
import Standard.Base.Data.Text.Line_Ending_Style
import Standard.Base.Data.Text.Split_Kind
import Standard.Base.Data.Text.Text_Sub_Range
import Standard.Base.Data.Locale
import Standard.Base.Meta

Expand Down Expand Up @@ -840,58 +841,119 @@ Text.drop_last count =
boundary = iterator.next -count
if boundary == -1 then '' else Text_Utils.substring this 0 boundary

## Creates a new text by selecting the first `count` characters of `this`,
returning `this` if `count` is greater than or equal to the length of `this`.
## PRIVATE
Utility function taking a range and getting char indices
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
range_to_char_indices : Text -> Range -> Range ! Index_Out_Of_Bounds_Error
range_to_char_indices text range =
len = text.length
start = if range.start < 0 then range.start + len else range.start
end = if range.end == Nothing then len else (if range.end < 0 then range.end + len else range.end)
valid = (Range 0 len+1).contains
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

Arguments:
- count: The number of characters to take from the start of `this`.
case (Pair (valid start) (valid end)) of
Pair False _ -> Error.throw (Index_Out_Of_Bounds_Error range.start len)
Pair True False -> Error.throw (Index_Out_Of_Bounds_Error range.end len)
Pair True True ->
if start>=end then (Range 0 0) else
iterator = BreakIterator.getCharacterInstance
iterator.setText text

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.
loop index start_char end_char start_index =
if index == end then (Range start_index (start_char)) else
@Tail_Call loop (index + 1) end_char iterator.next (if index == start then start_char else start_index)

> Example
Make a new text from the first two characters of "boo".
loop 0 0 iterator.next Nothing
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

"boo".take_first 2
Text.take_first : Integer -> Text
Text.take_first count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.first
boundary = iterator.next count
if boundary == -1 then this else Text_Utils.substring this 0 boundary
## ALIAS first, last, left, right, mid, substring
Creates a new Text by selecting the specified range of the input.

## Creates a new text by selecting the last `count` characters of `this`,
returning `this` if `count` is greater than or equal to the length of `this`.
This can select a section of text from the beginning, end, or middle of the
input using various criteria defined by the range parameter.

Arguments:
- count: The number of characters to take from the end of `this`.
- range: The section of the this text to return.
If a `Text_Sub_Range`, then the selection is interpreted following the rules of that type.
If a `Range`, the selection is specified by two indices, from and to.

Returns:
The part of the input as specified by the range parameter.

> Examples
Various different ways to take part of "Hello World!"

"Hello World!".take First.new == "H"
"Hello World!".take (First 5) == "Hello"
"Hello World!".take (First 0) == ""
"Hello World!".take Last.new == "!"
"Hello World!".take (Last 6) == "World!"
"Hello World!".take (Before " ") == "Hello"
"Hello World!".take (Before_Last "o") == "Hello W"
"Hello World!".take (After " ") == "World!"
"Hello World!".take (After_Last "o") == "rld!"
"Hello World!".take (While c->c!=" ") == "Hello"
"Hello World!".take (Range 3 5) == "lo"
"Hello World!".take (Range -3 -1) == "ld"
"Hello World!".take (Range -3 Nothing) == "ld!"
"Hello World!".take (Range 5 Nothing) == " World!"
"Hello World!".take (Range 5 12) == " World!"
"Hello World!".take (Range 12 12) == ""
Comment on lines +874 to +892
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fully aware of the design there but I think the idea of examples is that (probably in the future) you can select one in the IDE and it will be placed on the stage - so then each example should probably be self contained instead of consisting of multiple variants.
That would make a LOT examples here, but I guess it can be useful to the users.
May worth asking @wdanilo or someone from IDE if the examples should be 'single separate units' or if grouping them like this is ok. Not sure how important that is since I'm not sure if/when this example insertion is going to be implemented, but I remember hearing something about it so would be good to clarify this.

Awesome set of examples by the way!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add to open questions

Text.take : (Text_Sub_Range | Range) -> Text ! Index_Out_Of_Bounds_Error
Text.take range =
char_range = case range of
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
Comment on lines +896 to +897
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
Range _ _ -> here.range_to_char_indices this range
Text_Sub_Range -> range.to_char_range this

would be cool to be able to do it, but I think it may be impossible. Did you try it? If it doesn't work, it may be worth reporting to the engine - if I understand correctly the typeset design, this should be possible - and in this case would make the code much clearer than a _. (It would also catch errors earlier, rather than failing with some random value does not define a to_char_range method).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't work when I tried it earlier so ended up with this approach.

will raise with engine team

if char_range.is_error then char_range else
Text_Utils.substring this char_range.start char_range.end

## Creates a new Text by removing the specified range of the input.

This can select a section of text from the beginning, end, or middle of the
input using various criteria defined by the range parameter.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Standard Annex 29. This is the smallest unit that still has semantic
meaning in most text-processing applications.

> Example
Make a new text from the last two characters of "boo".

"boo".take_last 2
Text.take_last : Integer -> Text
Text.take_last count =
iterator = BreakIterator.getCharacterInstance
iterator.setText this
iterator.last
boundary = iterator.next -count
if boundary == -1 then this else Text_Utils.drop_first this boundary
Arguments:
- range: The section of the this text to return.
If a `Text_Sub_Range`, then the selection is interpreted following the rules of that type.
If a `Range`, the selection is specified by two indices, from and to.

Returns:
The part of the input as specified by the range parameter.

> Examples
Various different ways to take part of "Hello World!"

"Hello World!".drop First.new == "ello World!"
"Hello World!".drop (First 5) == " World!"
"Hello World!".drop (First 0) == "Hello World!"
"Hello World!".drop Last.new == "Hello World"
"Hello World!".drop (Last 6) == "Hello "
"Hello World!".drop (Before " ") == " World!"
"Hello World!".drop (Before_Last "o") == "orld!"
"Hello World!".drop (After " ") == "Hello "
"Hello World!".drop (After_Last "o") == "Hello Wo"
"Hello World!".drop (While c->c!=" ") == " World!"
"Hello World!".drop (Range 3 5) == "Hel World!"
"Hello World!".drop (Range -3 -1) == "Hello Wor!"
"Hello World!".drop (Range -3 Nothing) == "Hello Wor"
"Hello World!".drop (Range 5 Nothing) == "Hello"
"Hello World!".drop (Range 5 12) == "Hello"
"Hello World!".drop (Range 12 12) == "Hello World!"
Text.drop : (Text_Sub_Range | Range) -> Text ! Index_Out_Of_Bounds_Error
Text.drop range =
char_range = case range of
Range _ _ -> here.range_to_char_indices this range
_ -> range.to_char_range this
if char_range.is_error then char_range else
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
if char_range.start == 0 then Text_Utils.drop_first this char_range.end else
prefix = Text_Utils.substring this 0 char_range.start
if char_range.end == (Text_Utils.char_length this) then prefix else
prefix + Text_Utils.drop_first this char_range.end

## ALIAS Lower Case

Converts each character in `this` to lower case.

Arguments:
- locale: specifies the locale for charater case mapping. Defaults to the
- locale: specifies the locale for character case mapping. Defaults to the
`Locale.default` locale.

! What is a Character?
Expand Down Expand Up @@ -943,4 +1005,3 @@ Text.to_lower_case locale=Locale.default =
Text.to_upper_case : Locale.Locale -> Text
Text.to_upper_case locale=Locale.default =
UCharacter.toUpperCase locale.java_locale this

Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from Standard.Base import all
from Standard.Base.Data.Text.Extensions import Index_Out_Of_Bounds_Error

polyglot java import com.ibm.icu.text.BreakIterator
polyglot java import org.enso.base.Text_Utils

## Type defining a substring of a Text
type Text_Sub_Range
## Select the first `count` characters.
Select an empty string if `count` is less than or equal to 0.
Select the entire string if `count` is greater than the length of the input.
type First (count : Integer = 1)

## Select the last `count` characters.
Select an empty string if `count` is less than or equal to 0.
Select the entire string if `count` is greater than the length of the input.
type Last (count : Integer = 1)

## Select characters until the first instance of `delimiter`.
Select an empty string if `delimiter` is empty.
Select the entire string if the input does not contain `delimiter`.
type Before (delimiter : Text)

## Select characters until the last instance of `delimiter`.
Select an empty string if `delimiter` is empty.
Select the entire string if the input does not contain `delimiter`.
type Before_Last (delimiter : Text)

## Select characters after the first instance of `delimiter`.
Select an empty string if the input does not contain `delimiter`.
type After (delimiter : Text)

## Select characters after the last instance of `delimiter`.
Select an empty string if the input does not contain `delimiter`.
type After_Last (delimiter : Text)

## Select characters while the predicate returns `True`.
type While (predicate : (Text -> Boolean))

## PRIVATE
Evaluates the Text_Sub_Range returning the underlying char array indices
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
to_char_range : Text -> Range
to_char_range text =

## Utility function to find char indices for Text_Sub_Range.
Arguments:
- text: Text to search
- predicate: Function to test each character, receives:
- index: current index
start: index the char array to start of grapheme cluster
end: index the char array to start of next grapheme cluster
return True to exit loop
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
Returns: either a Pair of char indices for current grapheme cluster or
Pair -1 (char array length) if not found.
find_sub_range_end = text->predicate->
iterator = BreakIterator.getCharacterInstance
iterator.setText text

loop index start end =
if end == -1 then (Pair -1 start) else
if predicate index start end then (Pair start end) else
@Tail_Call loop (index + 1) end iterator.next

loop 0 0 iterator.next

case this of
First count ->
if count <= 0 then (Range 0 0) else
indices = find_sub_range_end text (index->_->_-> index+1 == count)
Range 0 indices.second
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
Last count ->
if count <= 0 then (Range 0 0) else
first_count = text.length - count
indices = find_sub_range_end text (index->_->_-> index+1 == first_count)
if indices.first == -1 then (Range 0 indices.second) else
(Range indices.second (Text_Utils.char_length text))
Before delimiter ->
if delimiter.is_empty then (Range 0 0) else
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
index = Text_Utils.index_of text delimiter
if index == -1 then (Range 0 (Text_Utils.char_length text)) else
(Range 0 index)
Before_Last delimiter ->
if delimiter.is_empty then (Range 0 0) else
index = Text_Utils.last_index_of text delimiter
if index == -1 then (Range 0 (Text_Utils.char_length text)) else
(Range 0 index)
After delimiter ->
if delimiter.is_empty then (Range 0 0) else
index = Text_Utils.index_of text delimiter
if index == -1 then (Range 0 0) else
(Range (index + Text_Utils.char_length delimiter) (Text_Utils.char_length text))
After_Last delimiter ->
if delimiter.is_empty then (Range 0 0) else
index = Text_Utils.last_index_of text delimiter
if index == -1 then (Range 0 0) else
(Range (index + Text_Utils.char_length delimiter) (Text_Utils.char_length text))
While predicate ->
wrapped start end = predicate (Text_Utils.substring text start end) . not
indices = find_sub_range_end text (_->wrapped)
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved
if indices.first == -1 then (Range 0 indices.second) else
Range 0 indices.first

## UNSTABLE
A temporary workaround to allow the `First` constructor to work with default
arguments.

It is needed, because there are issues with relying on default arguments of
Atom constructors, as described in the following issue:
https://github.com/enso-org/enso/issues/1600
Once that issue is fixed, it can be removed.
First.new : Integer -> First
First.new (count = 1) = First count

## UNSTABLE
A temporary workaround to allow the `Last` constructor to work with default
arguments.

It is needed, because there are issues with relying on default arguments of
Atom constructors, as described in the following issue:
https://github.com/enso-org/enso/issues/1600
Once that issue is fixed, it can be removed.
Last.new : Integer -> Last
Last.new (count = 1) = Last count
45 changes: 45 additions & 0 deletions std-bits/base/src/main/java/org/enso/base/Text_Utils.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.enso.base;

import com.ibm.icu.text.Normalizer;
import com.ibm.icu.text.StringSearch;
import java.nio.charset.StandardCharsets;
import java.util.regex.Pattern;

Expand Down Expand Up @@ -225,4 +226,48 @@ public static boolean contains(String string, String substring) {
public static String replace(String str, String oldSequence, String newSequence) {
return str.replace(oldSequence, newSequence);
}

/**
* Gets the length of char array of a string
*
* @param str the string to measure
* @return length of the string
*/
public static long char_length(String str) {
return str.length();
}

/**
* Find the first index of needle in the haystack
*
* @param haystack the string to search
* @param needle the substring that is searched for
* @return index of the first needle or -1 if not found.
*/
public static long index_of(String haystack, String needle) {
StringSearch search = new StringSearch(needle, haystack);
int pos = search.first();
return pos == StringSearch.DONE ? -1 : pos;
}

/**
* Find the last index of needle in the haystack
*
* @param haystack the string to search
* @param needle the substring that is searched for
* @return index of the last needle or -1 if not found.
*/
public static long last_index_of(String haystack, String needle) {
StringSearch search = new StringSearch(needle, haystack);
int pos = search.first();
if (pos == StringSearch.DONE) {
return -1;
}

for (int next = search.next(); next != StringSearch.DONE; next = search.next()) {
pos = next;
}

return pos;
}
}
11 changes: 11 additions & 0 deletions test/Tests/src/Data/Range_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,14 @@ spec = Test.group "Range" <|
1.up_to 10 . find (> 10) . should_be_a Nothing
Test.specify "should allow conversion to vector" <|
1.up_to 6 . to_vector . should_equal [1, 2, 3, 4, 5]

Test.specify "should allow checking if a value is in the range"
0.up_to 10 . contains 5 . should_be_true
0.up_to 10 . contains 0 . should_be_true
0.up_to 10 . contains 9 . should_be_true
0.up_to 10 . contains 10 . should_be_false
0.up_to 0 . contains 10 . should_be_false
0.up_to 0 . contains 0 . should_be_false
3.up_to 5 . contains 2 . should_be_false
jdunkerley marked this conversation as resolved.
Show resolved Hide resolved

main = Test.Suite.run_main here.spec
Loading