Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slices: add Repeat function #65238

Closed
adamroyjones opened this issue Jan 23, 2024 · 28 comments
Closed

slices: add Repeat function #65238

adamroyjones opened this issue Jan 23, 2024 · 28 comments

Comments

@adamroyjones
Copy link

adamroyjones commented Jan 23, 2024

Proposal Details

Note

The current version of this proposal is here.

The signature in that version of the proposal is func Repeat[T any](count int, xs ...T) []T.

I propose a small, trivial addition to the slices package.

// Repeat returns a new slice containing count copies of x.
func Repeat[T any](x T, count int) []T {
	if count < 0 {
		panic("count must be non-negative")
	}
	xs := make([]T, count)
	for i := 0; i < count; i++ {
		xs[i] = x
	}
	return xs
}

There's prior, related art in other languages with list-like types, including the following. (The Python and JavaScript examples were provided by @Jorropo and @earthboundkid; see below.) Not all of the below are identical in their behaviour, but they're thematically similar.

I've personally found it a useful helper to have—particularly when combined with Concat, which looks like it'll make its way into Go 1.22.

I appreciate that slices oughtn't become a dumping ground for everyone's ideas about what's useful, but this feels like a natural addition to me.

I'm happy to contribute a pull request with tests if this is considered a good idea.

@gopherbot gopherbot added this to the Proposal milestone Jan 23, 2024
@apparentlymart
Copy link

apparentlymart commented Jan 23, 2024

At least some of the examples you shared in other languages are of repeating (something equivalent to) []T, rather than just T. For example, in Rust the Vec::repeat repeats all of the elements of the vector n times, rather than just a single element value.

The hypothetical Go Repeat function could potentially support both usage patterns at once by swapping the argument order and using a variadic signature:

func Repeat[T any](count int, vals ...T) []T {
	l := count * len(vals)
	xs := make([]T, l)
	for i := 0; i < l; i += len(vals) {
		copy(xs[i:], vals)
	}
	return xs
}

This new signature still supports repeating just one item, at the expense of the program having to quietly construct a single-element slice for that case.

@Jorropo
Copy link
Member

Jorropo commented Jan 23, 2024

If we repeat slices and not elements then it should work in-place if the underlying array has enough capacity.

@apparentlymart
Copy link

apparentlymart commented Jan 24, 2024

I would find it quite surprising if I asked this function to repeat a sub-slice of a larger slice and it reacted by overwriting other elements outside of the sub-slice I provided.

I think writing outside of the bounds of a slice into the excess capacity -- particularly if it only happens under specific circumstances like only when there is enough excess capacity -- is likely to lead to bugs where callers don't fully understand the implications. In particular, I imagine an author writing unit tests where the source slice always has no excess capacity because that's the easiest thing to write as a literal, not considering that a caller might provide a sub-slice of a larger backing array in more practical usage

If it's desirable to repeat while avoiding a new allocation, then I would suggest changing the signature to pass a separate slice to write the result into, and document that it's acceptable for the target slice to overlap the source slice only if the two slices share a common prefix, so that the function doesn't have to jump through hoops to avoid overwriting its own input during its work.

However, I would also tend to think that repeating into excess capacity of the backing array of the input slice would be a rare enough need that I would hand-write it that way if I needed it, rather than expecting the standard library to provide it.

YMMV, of course!


Possible alternate signature meeting that additional requirement:

func RepeatCopy[S ~[]E, E any](dst []E, src S)

The function would keep copying src into consecutive indices of dst until it has written to every element of dst.

If src is shorter than dst then src gets repeated to fill that space. If len(dst) is not an integer multiple of len(src) then the final copy is truncated.

If src is longer than dst then it's exactly equivalent to copy(dst, src). Not particularly useful, but consistent with the rest of the behavior.

If dst and src both share the same backing array and &dst[0] == &src[0], this would then perform the in-place repeat-extension as the previous comment suggests.

Repeat as I suggested it in my earlier comment could then potentially be a wrapper around this RepeatCopy, optimizing for the (presumed) more common case:

func Repeat[T any](count int, vals ...T) []T {
	l := count * len(vals)
	xs := make([]T, l)
	RepeatCopy(xs, vals)
	return xs
}

@Jorropo
Copy link
Member

Jorropo commented Jan 24, 2024

I would find it quite surprising if I asked this function to repeat a sub-slice of a larger slice and it reacted by overwriting other elements outside of the sub-slice I provided.

My original idea is that you want to do that for anything that adds new elements else you risk quadratic time complexity when the operation is repeated on the same slice.
However I now think this is wrong, you can't not have quadratic complexity anyway for this particular operation (at least without doing fancy things like ropes).

Things would be much better if we had type system support for slices that are only up to length long.
It's currently OOB knowledge what functions append and which one never touch extra capacity.
The best hint is if Append is in the name, then I think AppendRepeat would be the better name.

@Jorropo
Copy link
Member

Jorropo commented Jan 24, 2024

To add to

There's prior, related art in other languages with list-like types, including the following

Python has this builtin on all sequence types (including lists):

s + t the concatenation of s and t
s * n or n * s equivalent to adding s to itself n times

@apparentlymart
Copy link

Looking again at the motivating examples for other languages, I notice that the Scala one would probably translate into Go more like this:

func RepeatFunc[T any](count int, makeElem func () T) []T {
	xs := make([]T, count)
	for i := 0; i < count; i++ {
		xs[i] = makeElem()
	}
	return xs
}

I'm not sure if this additional variant is actually useful enough to justify inclusion, but just exploring the different design tradeoffs that each of the other languages made in their similar library features.

@gophun
Copy link

gophun commented Jan 24, 2024

Another option: make it return an iter.Seq[T] that can be collected via slices.Collect.

@earthboundkid
Copy link
Contributor

FWIW, Python uses *:

>>> ["a"] * 5
['a', 'a', 'a', 'a', 'a']

JavaScript is weirder. You first make an Array with capacity and no items then you fill it.

> new Array(5).fill("a")
< ["a", "a", "a", "a", "a"]

It seems like a pretty common slice utility across languages.

@adamroyjones
Copy link
Author

adamroyjones commented Jan 24, 2024

@apparentlymart: You're quite right. The "related" in "prior, related art" is bearing more load than I made clear. I'm sorry about that!


Thanks for all of the illuminating comments. So far, I'd categorise things into two mirrored families of three variants.

  1. With slices.
    a. The original, func(x T, count int) []T.
    b. The variadic extension, func(count int, xs ...T) []T.
    c. The functional extension, func(count int, fn func() T) []T.
  2. With sequences.
    a. As 1a, except it returns iter.Seq[T].
    b. As 1b, except it returns iter.Seq[T].
    c. As 1c, except it returns iter.Seq[T].

We might think of 1c and 2c as "RepeatFunc", as suggested.

I think there are two (current) weaknesses of 2a and 2b. Firstly, the iter package is crystallising, though it looks like it'll be accepted imminently. Secondly, the slices package operates on and returns slices. I'm not sure what the implications of new ideas like range-over-func and iter will be for existing packages, but I expect there'd be a general preference to keep the standard library's signatures frozen (to avoid breaking changes) and to keep each package's signatures internally "consistent" (and so precluding a mixture of slices and sequences in the slices package).

I think 1b is strictly preferable to 1a. It's simple, it's predictable in its behaviour and performance (with a single natural implementation), and it directly extends 1a without contortion. I think it'd slot easily into the existing slices package.

1c and 2c are interesting. Repeatedly calling a function is a natural way to generate a sequence. As well as the Scala example, there's a loosely-related (lazy, infinite) version of this in Elixir called Stream.unfold/2 which has a kind of beauty to it. But I'm not sure exactly how 1c and 2c would fit with Go just yet; I'll confess that I've not played around with the iter (and xiter, etc.) experiments. It all seems in flux.

@earthboundkid
Copy link
Contributor

earthboundkid commented Jan 25, 2024

1c strikes me as not any easier than just writing a loop.

n := -2
s :=  slices.Repeat(5, func() int {
  n += 2
  return n
})

// vs

s := make([]int, 5)
n := 0
for i := range s {
  s[i] = n
  n += 2
}

@adamroyjones
Copy link
Author

adamroyjones commented Jan 25, 2024

I think there's a case for 1c, but that it's marginal, with the advantages probably being small enough to not warrant extending the slices package. Go's syntax for closures is a bit noisy, but things can be visually compressed to

package main

import "fmt"

func main() {
	n := -2
	fmt.Printf("%v\n", repeat(10, func() int { n += 2; return n }))
}

func repeat[T any](count int, fn func() T) []T {
	xs := make([]T, count)
	for i := 0; i < count; i++ {
		xs[i] = fn()
	}
	return xs
}

which does turn the 6 lines to 2 (but at what cost?). I think it'd be a better fit if Go had syntax like

func main() {
	n := -2
	fmt.Printf("%v\n", repeat(10, () => n += 2))
}

but Go's not such a language—and it probably shouldn't become such a language.

@adonovan
Copy link
Member

adonovan commented Feb 1, 2024

I agree that slices.Repeat should repeat slices (not elements), just as strings.Repeat repeats strings.

The implementation needs to check for overflow. It should panic with a message containing the values of both n and len(values), instead of unhelpfully calling make with a negative argument, or, worse, spuriously returning an empty slice when the product is a multiple of the word size.

Also, there's a nice optimization you can do here which is to make fewer calls to copy by working in powers of two:
https://github.com/google/starlark-go/blob/f86470692795f8abcf9f837a3c53cf031c5a3d7e/starlark/eval.go#L1190-L1197

@adamroyjones
Copy link
Author

@adonovan: Thanks for the suggestions and guidance. The log improvement's pleasant and something I'm embarrassed to have missed with the first cut.

An updated implementation follows. My check for wrapping around is likely suboptimal. (It's at least not obviously wrong.) Below that, in a <details> block, I've attempted to write some tests.

// Repeat returns a new slice that repeats the provided slice the given number of times.
func Repeat[E any](count int, xs ...E) []E {
	if count < 0 {
		panic(fmt.Sprintf("the count (%d) must be non-negative", count))
	}
	if count > 1 && len(xs) > math.MaxInt/count {
		panic(fmt.Sprintf("wraparound: the count (%d) and the given input slice (len: %d) are mutually too large", count, len(xs)))
	}

	out := make([]E, count*len(xs))
	n := copy(out, xs)
	for n < len(out) {
		copy(out[n:], out[:n])
		n *= 2
	}
	return out
}
An attempt at a test suite.
package main

import (
	"slices"
	"testing"
	"unsafe"
)

func TestRepeat(t *testing.T) {
	for _, tc := range []struct {
		count int
		xs    []int
		exp   []int
	}{
		{count: 0, xs: []int{}, exp: []int{}},
		{count: 0, xs: []int{1}, exp: []int{}},
		{count: 1, xs: []int{}, exp: []int{}},
		{count: 1, xs: []int{0}, exp: []int{0}},
		{count: 2, xs: []int{0}, exp: []int{0, 0}},
		{count: 2, xs: []int{1}, exp: []int{1, 1}},
		{count: 3, xs: []int{1, 2}, exp: []int{1, 2, 1, 2, 1, 2}},
	} {
		if out := Repeat(tc.count, tc.xs...); !slices.Equal(tc.exp, out) {
			t.Errorf("Repeat(%d, %v...): expected %v, given %v", tc.count, tc.xs, tc.exp, out)
		}
	}
}

func TestRepeat_panics(t *testing.T) {
	var x int
	wordSize := 8 * unsafe.Sizeof(x)

	for _, tc := range []struct {
		name        string
		count       int
		xs          []struct{}
		shouldPanic bool
	}{
		{name: "negative_count", count: -1, xs: []struct{}{}, shouldPanic: true},
		{name: "wraparound/1", count: 1 << (wordSize - 2), xs: make([]struct{}, 2), shouldPanic: true},
		{name: "wraparound/2", count: 2, xs: make([]struct{}, 1<<(wordSize-2)), shouldPanic: true},
		{name: "wraparound/3", count: 1 << (wordSize/2 - 1), xs: make([]struct{}, 1<<(wordSize/2)), shouldPanic: true},
		{name: "wraparound/4", count: 1 << (wordSize / 2), xs: make([]struct{}, 1<<((wordSize/2)-1)), shouldPanic: true},
		{name: "wraparound/5", count: 1 << (wordSize/2 - 1), xs: make([]struct{}, 1<<(wordSize/2-1)), shouldPanic: false},
	} {
		var r any
		t.Run(tc.name, func(t *testing.T) {
			func() {
				defer func() { r = recover() }()
				_ = Repeat(tc.count, tc.xs...)
			}()

			if panics := r != nil; panics != tc.shouldPanic {
				if panics {
					t.Errorf("Repeat: unexpectedly panicked")
				} else {
					t.Errorf("Repeat: unexpectedly failed to panic")
				}
			}
		})
	}
}

@adonovan
Copy link
Member

adonovan commented Feb 4, 2024

The log improvement's pleasant and something I'm embarrassed to have missed with the first cut.

No reason to be embarrassed. I wasn't aware of it either until @josharian suggested it while reviewing the linked Starlark code.

You might want to add this comment at the overflow check:

// See Hacker's Delight 2-12, detecting overflow of signed multiplication for nonnegative arguments.

@adamroyjones
Copy link
Author

I've found a copy of it (the 2nd edition) and the relevant section and table. This book is amazing—thanks for the suggestion!

I've added the comment below. I'll also update the original proposal to link to this comment.

// Repeat returns a new slice that repeats the provided slice the given number of times.
func Repeat[E any](count int, xs ...E) []E {
	if count < 0 {
		panic(fmt.Sprintf("the count (%d) must be non-negative", count))
	}
	// See Section 2-13 of Hacker's Delight (2nd ed.) on detecting overflow of signed multiplication
	// for non-negative arguments, in particular Table 2-2 (p. 32).
	if count > 1 && len(xs) > math.MaxInt/count {
		panic(fmt.Sprintf("overflow: the count (%d) and the given input slice (len: %d) are mutually too large", count, len(xs)))
	}

	out := make([]E, count*len(xs))
	n := copy(out, xs)
	for n < len(out) {
		copy(out[n:], out[:n])
		n *= 2
	}
	return out
}
An attempt at a test suite.
package main

import (
	"slices"
	"testing"
	"unsafe"
)

func TestRepeat(t *testing.T) {
	for _, tc := range []struct {
		count int
		xs    []int
		exp   []int
	}{
		{count: 0, xs: []int{}, exp: []int{}},
		{count: 0, xs: []int{1}, exp: []int{}},
		{count: 1, xs: []int{}, exp: []int{}},
		{count: 1, xs: []int{0}, exp: []int{0}},
		{count: 2, xs: []int{0}, exp: []int{0, 0}},
		{count: 2, xs: []int{1}, exp: []int{1, 1}},
		{count: 3, xs: []int{1, 2}, exp: []int{1, 2, 1, 2, 1, 2}},
	} {
		if out := Repeat(tc.count, tc.xs...); !slices.Equal(tc.exp, out) {
			t.Errorf("Repeat(%d, %v...): expected %v, given %v", tc.count, tc.xs, tc.exp, out)
		}
	}
}

func TestRepeat_panics(t *testing.T) {
	var x int
	wordSize := 8 * unsafe.Sizeof(x)

	for _, tc := range []struct {
		name        string
		count       int
		xs          []struct{}
		shouldPanic bool
	}{
		{name: "negative_count", count: -1, xs: []struct{}{}, shouldPanic: true},
		{name: "wraparound/1", count: 1 << (wordSize - 2), xs: make([]struct{}, 2), shouldPanic: true},
		{name: "wraparound/2", count: 2, xs: make([]struct{}, 1<<(wordSize-2)), shouldPanic: true},
		{name: "wraparound/3", count: 1 << (wordSize/2 - 1), xs: make([]struct{}, 1<<(wordSize/2)), shouldPanic: true},
		{name: "wraparound/4", count: 1 << (wordSize / 2), xs: make([]struct{}, 1<<((wordSize/2)-1)), shouldPanic: true},
		{name: "wraparound/5", count: 1 << (wordSize/2 - 1), xs: make([]struct{}, 1<<(wordSize/2-1)), shouldPanic: false},
	} {
		var r any
		t.Run(tc.name, func(t *testing.T) {
			func() {
				defer func() { r = recover() }()
				_ = Repeat(tc.count, tc.xs...)
			}()

			if panics := r != nil; panics != tc.shouldPanic {
				if panics {
					t.Errorf("Repeat: unexpectedly panicked")
				} else {
					t.Errorf("Repeat: unexpectedly failed to panic")
				}
			}
		})
	}
}

@randall77
Copy link
Contributor

You can also use math/bits.Mul to detect overflow. We use something similar in the runtime. It avoids a divide, which is slow.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Feb 7, 2024
@adamroyjones
Copy link
Author

adamroyjones commented Feb 8, 2024

@randall77: Thanks for directing me to math/bits and the sharpened approach. My check for wrapping around was indeed "likely suboptimal". (It's also shown me that math/bits.UintSize exists, so the tests no longer import unsafe.)

Things are now as follows.

package main

import (
	"fmt"
	"math"
	"math/bits"
)

// Repeat returns a new slice that repeats the provided slice the given number of times.
func Repeat[E any](count int, xs ...E) []E {
	if count < 0 {
		panic(fmt.Sprintf("the count (%d) must be non-negative", count))
	}
	if hi, lo := bits.Mul(uint(count), uint(len(xs))); hi > 0 || lo > math.MaxInt {
		panic(fmt.Sprintf("overflow: the count (%d) and the given input slice (len: %d) are mutually too large", count, len(xs)))
	}

	out := make([]E, count*len(xs))
	n := copy(out, xs)
	for n < len(out) {
		copy(out[n:], out[:n])
		n *= 2
	}
	return out
}
An attempt at a test suite.
package main

import (
	"math/bits"
	"slices"
	"testing"
)

func TestRepeat(t *testing.T) {
	for _, tc := range []struct {
		count int
		xs    []int
		exp   []int
	}{
		{count: 0, xs: []int{}, exp: []int{}},
		{count: 0, xs: []int{1}, exp: []int{}},
		{count: 1, xs: []int{}, exp: []int{}},
		{count: 1, xs: []int{0}, exp: []int{0}},
		{count: 2, xs: []int{0}, exp: []int{0, 0}},
		{count: 2, xs: []int{1}, exp: []int{1, 1}},
		{count: 3, xs: []int{1, 2}, exp: []int{1, 2, 1, 2, 1, 2}},
	} {
		if out := Repeat(tc.count, tc.xs...); !slices.Equal(tc.exp, out) {
			t.Errorf("Repeat(%d, %v...): expected %v, given %v", tc.count, tc.xs, tc.exp, out)
		}
	}
}

func TestRepeat_panics(t *testing.T) {
	for _, tc := range []struct {
		name        string
		count       int
		xs          []struct{}
		shouldPanic bool
	}{
		{name: "negative_count", count: -1, xs: []struct{}{}, shouldPanic: true},
		{name: "overflow/1", count: 1 << (bits.UintSize - 2), xs: make([]struct{}, 2), shouldPanic: true},
		{name: "overflow/2", count: 2, xs: make([]struct{}, 1<<(bits.UintSize-2)), shouldPanic: true},
		{name: "overflow/3", count: 1 << (bits.UintSize/2 - 1), xs: make([]struct{}, 1<<(bits.UintSize/2)), shouldPanic: true},
		{name: "overflow/4", count: 1 << (bits.UintSize / 2), xs: make([]struct{}, 1<<((bits.UintSize/2)-1)), shouldPanic: true},
		{name: "overflow/5", count: 1 << (bits.UintSize/2 - 1), xs: make([]struct{}, 1<<(bits.UintSize/2-1)), shouldPanic: false},
	} {
		var r any
		t.Run(tc.name, func(t *testing.T) {
			func() {
				defer func() { r = recover() }()
				_ = Repeat(tc.count, tc.xs...)
			}()

			if panics := r != nil; panics != tc.shouldPanic {
				if panics {
					t.Errorf("Repeat: unexpectedly panicked")
				} else {
					t.Errorf("Repeat: unexpectedly failed to panic")
				}
			}
		})
	}
}

@rsc rsc changed the title proposal: slices: Adding a Repeat function proposal: slices: add Repeat function Feb 8, 2024
@rsc
Copy link
Contributor

rsc commented Feb 9, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Feb 9, 2024
@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

The rationale for adding this is that it is already in strings and bytes, so having it in slices fits with those. Let's not worry about sequences here; the ones in strings and bytes do not take sequences. Also typically Repeat is of a short amount of data, so having a sequence form is not too helpful.

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

Also, for consistency with strings and bytes it seems like the signature should be

func Repeat[E any](x []E, count int) []E

This does mean we can't finesse the question of one element vs a slice argument, but a slice argument is consistent with bytes and strings.

@adamroyjones
Copy link
Author

That's a good point—I hadn't given enough thought to consistency across Go's packages.

It'd mean that (what I've seen as) the common case would be a bit unfortunate-looking, namely

slices.Repeat([]T{t}, count)

for some t in T, but I think consistency trumps a touch of beauty.

@rsc
Copy link
Contributor

rsc commented Mar 1, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to add to package slices:

// Repeat returns a new slice that repeats the provided slice the given number of times.
// The result has length and capacity len(x) * count.
// The result is never nil.
// Repeat panics if count is negative or if the result of (len(x) * count)
// overflows.
func Repeat[E any](x []E, count int) []E

@Jorropo
Copy link
Member

Jorropo commented Mar 1, 2024

 // Repeat returns a new slice that repeats the provided slice the given number of times.
-// The result has length and capacity len(x) * count.
+// The result has length len(x) * count.
 // The result is never nil.
 // Repeat panics if count is negative or if the result of (len(x) * count)
 // overflows.
 func Repeat[E any](x []E, count int) []E

I don't think capacity should be specified, if the underlying implementation can get some free capacity (like by rounding up by classes) it should be allowed to do so.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2024

If you have a global like

var x = slices.Repeat([]string{"hi"}, 3)

then it matters a lot whether the capacity is equal to the len. Multiple goroutines can safely append to x simultaneously to produce different variants. If x has extra capacity, that stops being safe. We've been prioritizing safety over the potential efficiency of leaving a tiny bit of extra capacity in some cases depending on the memory allocator. That only leads to portability problems.

bytes.Repeat also returns a trimmed slice.

@rsc
Copy link
Contributor

rsc commented Mar 8, 2024

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

The proposal is to add to package slices:

// Repeat returns a new slice that repeats the provided slice the given number of times.
// The result has length and capacity len(x) * count.
// The result is never nil.
// Repeat panics if count is negative or if the result of (len(x) * count)
// overflows.
func Repeat[E any](x []E, count int) []E

@rsc rsc moved this from Active to Likely Accept in Proposals Mar 8, 2024
@rsc
Copy link
Contributor

rsc commented Mar 15, 2024

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

The proposal is to add to package slices:

// Repeat returns a new slice that repeats the provided slice the given number of times.
// The result has length and capacity len(x) * count.
// The result is never nil.
// Repeat panics if count is negative or if the result of (len(x) * count)
// overflows.
func Repeat[E any](x []E, count int) []E

@rsc rsc moved this from Likely Accept to Accepted in Proposals Mar 15, 2024
@rsc rsc changed the title proposal: slices: add Repeat function slices: add Repeat function Mar 15, 2024
@rsc rsc modified the milestones: Proposal, Backlog Mar 15, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/571895 mentions this issue: slices: add func Repeat

@ianlancetaylor
Copy link
Member

Sorry for not noticing this earlier, but every other function in package slices that takes a slice as an argument takes the type of the slice as a type argument. For consistency the signature here should be

func Repeat[S ~[]E, E any](x S, count int) S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests