-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes, strings: add iterator forms of existing functions #61901
Comments
Will |
Should |
Is |
This proposal has been added to the active column of the proposals project |
That relies on specialized optimizations to avoid an allocation and a copy. I'm not sure if those optimizations exist right now. The current best way to do it, as far as I know, is for i := 0; i < len(s); i++ {
c := s[i]
// ...
} I don't think that's particularly difficult, personally, but I think that |
Unless I misunderstand I think you just mean Split and Fields. And there is another misplaced reference to Runes a few lines down. |
I like this overall. Two things I wonder about:
|
Don‘t think so. And shouldn‘t there be only one way of doing things in go? |
You can pass iterators to other functions instead of just immediately using it in a range so being able to make iterators for built in things is still useful. |
ISTM that strings.Lines encourages people to write:
I'd rather see an iterator version of bufio.Scanner. |
@aarzilli I think it's pretty trivial to add an |
I wonder if it's necessary to return the index for bytes, since it's trivial to count them yourself. I guess it's nice to mirror the for/range statement.
Should this return the start index of the string too?
Should we have SplitAfterNSeq and SplitNSeq too? If not, why? These are the only ones left out.
If you need to know the length, the existing functions can handle that case. |
Join would be useful to have. For example, just now I wanted to make a String() method for a named slice. The easy way to do it would be |
Finishing this proposal discussion is blocked on #61405. |
Change https://go.dev/cl/558735 mentions this issue: |
Have all remaining concerns about this proposal been addressed? The proposal is to add these to both bytes and strings:
|
I would not expect them to include the terminating newlines if I didn't read the documentation. |
Good reason to read the documentation! 😄 If the newlines are not included, then the user of the iterator cannot distinguish Lines("abc") from Lines("abc\n"). It is often important to know whether a file has a non-terminated final line. |
If the 'last' line does end in a newline, does Edit: And if it does yield that last line, how is |
I know there are cases where it matters if the file ends with a newline or not but I don't think I have ever had to deal with that in any code I have ever written. |
I think the reason to keep the \n is then you don't have to document what happens to \r\n. |
It does not yield a final "". |
I'm late to the party, but I noticed that the doc comments (at tip) of
By inspecting the implementation of those functions, I understand why the iterators they return are single-use. For example, the push iterator returned by func Lines(s string) iter.Seq[string] {
return func(yield func(string) bool) {
for len(s) > 0 {
var line string
if i := IndexByte(s, '\n'); i >= 0 {
line, s = s[:i+1], s[i+1:] // <-------------
} else {
line, s = s, ""
}
if !yield(line) {
return
}
}
return
}
} However, I don't understand why the resulting iterators have to be single-use. I may have missed something, but I don't think their single-use nature was discussed in this proposal. What's the rationale for it? For convenience, shouldn't iterators be reusable whenever possible/inexpensive? If reusability is indeed desired, achieving it in this case would be straightforward; for instance, for func Lines(s string) iter.Seq[string] {
return func(yield func(string) bool) {
s := s // <--------- local copy
for len(s) > 0 {
var line string
if i := IndexByte(s, '\n'); i >= 0 {
line, s = s[:i+1], s[i+1:]
} else {
line, s = s, ""
}
if !yield(line) {
return
}
}
return
}
} |
We wouldn't want to do that for |
@ianlancetaylor Good point. I guess consistency between |
I know the proposal has already been accepted (which rejoices me), but here is some anecdotal argument in favour of |
Today I wished I had strings.JoinSeq (and could have sworn it was already in Go 1.23), so I opened #70034. |
Change https://go.dev/cl/637176 mentions this issue: |
For newly funcs SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq. Updates #61901. Change-Id: I3c97bfd9c2250de68aaea348c82a05635ee797af Reviewed-on: https://go-review.googlesource.com/c/go/+/637176 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Robert Griesemer <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/637358 mentions this issue: |
For newly funcs SplitSeq, SplitAfterSeq, FieldsSeq, FieldsFuncSeq. Updates golang#61901. Change-Id: I3c97bfd9c2250de68aaea348c82a05635ee797af Reviewed-on: https://go-review.googlesource.com/c/go/+/637176 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Robert Griesemer <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/647875 mentions this issue: |
The bytes package iterators return subslices, not substrings. Updates #61901. Change-Id: Ida91d3e33a0f178edfe9a267861adf4f13f9a965 Reviewed-on: https://go-review.googlesource.com/c/go/+/647875 Reviewed-by: Ian Lance Taylor <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
Change https://go.dev/cl/648095 mentions this issue: |
…in doc comments The bytes package iterators return subslices, not substrings. Updates #61901. Change-Id: Ida91d3e33a0f178edfe9a267861adf4f13f9a965 Reviewed-on: https://go-review.googlesource.com/c/go/+/647875 Reviewed-by: Ian Lance Taylor <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]> (cherry picked from commit ff27d27) Reviewed-on: https://go-review.googlesource.com/c/go/+/648095 Reviewed-by: Dmitri Shuralyov <[email protected]> Reviewed-by: Dmitri Shuralyov <[email protected]> TryBot-Bypass: Cherry Mui <[email protected]>
We propose to add the following functions to package bytes and package strings, to allow callers to iterate over these results without having to allocate the entire result slice. This text shows only the string package form.
This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.
Iterating over lines is an incredibly common operation that we’ve resisted adding only because we didn’t want to encourage allocation of a potentially large slice. Iterators provide a way to finally add it.
Iterating over bytes in a string is common and too difficult, since range ranges over runes. This function will inline to the obvious for loop (because we will make sure it does):
Iterating over runes is served by a regular range loop, but like slices.All and maps.All, it could be useful as an input to other iterator adapters. The name is Runes, not Seq or All, so that its clear at call sites what is being iterated over (runes not bytes).
Similar to Lines, there should be iterator forms of Split, Fields, and Runes, to avoid requiring the allocation of a slice when the caller only wants to iterate over the individual results. If we were writing the library from scratch, we might use the names Split, Fields, and Runes for the iterator-returning versions, and code that wanted the full slice could use slices.Collect. But that's not an option here, so we add a distinguishing Seq suffix. We do not expect that new functions will use the Seq suffix. For example the function above is Lines, not LinesSeq.
The text was updated successfully, but these errors were encountered: