Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: strings, bytes: Add JoinSeq #70034

Open
earthboundkid opened this issue Oct 24, 2024 · 10 comments
Open

proposal: strings, bytes: Add JoinSeq #70034

earthboundkid opened this issue Oct 24, 2024 · 10 comments
Labels
Milestone

Comments

@earthboundkid
Copy link
Contributor

Proposal Details

#61901 added SplitSeq and friends to bytes, but nothing for joining. Right now you have to do slices.Collect first or use a bytes.Buffer/strings.Builder, which feels unfortunate. I propose to add strings.JoinSeq(seq iter.Seq[string], sep string) string and the bytes equivalent as well.

@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@seh
Copy link
Contributor

seh commented Oct 24, 2024

Do you imagine that it would collect all the items from the sequence first to figure out the total length of the final string before concatenating them?

@earthboundkid
Copy link
Contributor Author

I imagine it would just use a strings.Builder and opt into whatever the default buffer there is. Maybe there could be optimization to use a slice for sequences shorter than N.

@zigo101
Copy link

zigo101 commented Oct 25, 2024

1.23+ iterators will change Go culture. But should it?

@earthboundkid
Copy link
Contributor Author

I don't think this is a change on the scale of adding iter.Map or iter.Filter where it would materially affect the experience of using Go. This is pretty much just the same as #61901. In fact I proposed it in that issue already.

@apparentlymart
Copy link

apparentlymart commented Oct 28, 2024

My intuition for a function like this is that by choosing to use an iterator I'm implying that it's not important to preallocate exactly the right number of bytes for the result, and so I expect the implementation to be like using bytes.Buffer.

If avoiding potential intermediate allocations were important to me then I think using just an iterator would be applying the wrong tool.

However, for []byte in particular this does seem to prompt the usual question of whether callers should have the option of providing their own backing array to use, append-style. I think in some other issue there'd been discussion of a convention of adding Append to the front of the name of a function that supports that, which in this case would lead to a second function:

package bytes

func AppendJoinSeq(s []byte, seq iter.Seq[string], sep string) []byte

...and then a caller that does know (or can at least make a good estimate of) the expected result length can provide some excess capacity of that size for the function to write into.

Assuming a bytes.Buffer-wrapping implementation, argument s would presumably be passed as the argument to bytes.NewBuffer.

(This comment is effectively making a separate proposal, but I'm posting it here as indirect support for the proposed JoinSeq with the claim that AppendJoinSeq could be added later if warranted. I'm not sure yet if it is warranted, since perhaps that situation is rare enough that it's okay to directly use bytes.Buffer when it arises.)

@seankhliao
Copy link
Member

This should be a sum function for Reduce in #61898

@jimmyfrasche
Copy link
Member

another way to cut this is

package iterx
func Join[T any](seq iter.Seq[T], sep T) iter.Seq[T]

and

package bytes
func Flatten(seq iter.Seq[[]byte]) []byte

and for strings as well.

@earthboundkid
Copy link
Contributor Author

This should be a sum function for Reduce in #61898

In the case of strings at least, adding the substrings in a reduce operation is likely to be significantly less efficient than using a dedicated function that can do some smart buffering.

@DeedleFake
Copy link

DeedleFake commented Oct 28, 2024

This should be a sum function for Reduce in #61898

In the case of strings at least, adding the substrings in a reduce operation is likely to be significantly less efficient than using a dedicated function that can do some smart buffering.

You could use Reduce() or a hypothetical Sum() shortcut function for this if you also had something like the following:

func Separate[T any](seq iter.Seq[T], sep T) iter.Seq[T] {
  return func(yield func(T) bool) {
    first := true
    for v := range seq {
      if !first && !yield(sep) {
        return
      }
      first = false
      if !yield(v) {
        return
      }
    }
  }
}

Then it would just be Sum(Separate(stringSeq, ", ")).

Edit: I realize that I misread your comment and missed the part about optimizing the buffering of the separators. Whoops. Ah well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

9 participants