Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

kv: add cycle-length to limit set of written keys #50

Merged
merged 1 commit into from
May 9, 2017
Merged

kv: add cycle-length to limit set of written keys #50

merged 1 commit into from
May 9, 2017

Conversation

tbg
Copy link
Member

@tbg tbg commented May 8, 2017

This parameter truncates & loops the sequence around. As a consequence, the
number of keys is bounded by concurrency * cycleLength.

Something odd I've noticed is that this:

go run main.go --cycle-length 1 --concurrency 8
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0          344.6          344.6    100.7    134.2    285.2
      2s        0          319.2          331.9    104.9    209.7    285.2
      3s        0          325.0          329.6     96.5    184.5    251.7
      4s        0          322.9          327.9    117.4    167.8    209.7

is much slower than this:

go run main.go --cycle-length 10 --concurrency 8
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         3165.0         3164.9      5.5      7.9     10.5
      2s        0         3194.6         3179.8      5.2      8.9     12.6
      3s        0         3093.7         3151.1      5.5      8.4     12.1
      4s        0         2807.6         3065.3      6.3     11.0     13.1

Unless I'm mistaken, the eight clients "never" intersect, so the only
difference is that in one example they're each hitting one own key, and in the
latter eight own keys. Perhaps there is more range parallelism in the latter,
but you wouldn't expect it. The difference disappears with --concurrency=1.

The motivation for this change is
cockroachdb/cockroach#15756:

Bad behavior in the GC queue can be reproduced by running a single-node
cluster and

go run main.go --cycle-length 1 --concurrency 1 --min-block-bytes $((1024*1024)) --max-block-bytes $((1024*1024*2))

and, after a few gigs of data have piled up,

./cockroach zone set .default -f - --insecure <<EOF
gc:
  ttlseconds: 600
EOF

In one run, this resulted in

queue_gc_processingnanos{store="1"} 4.26018985343e+11

or, ~400s.

There's also a "mystery" that I haven't really looked into: Replace the *2
in the go run invocation above with *10, and immediately get driver: bad connection.


This change is Reviewable

@tbg
Copy link
Member Author

tbg commented May 8, 2017

@petermattis I looked into adding --delete-percent as well (actually wrote the code) but it's less useful for me as I need few total keys with many versions, not many keys with each a deletion tombstone - those would split in practice. Could be prevented, but easier to deal with low cardinality.

@tbg
Copy link
Member Author

tbg commented May 8, 2017

(and there's also the issue that a high percentage of deletes would leave few actual values, and a double-delete is idempotent, not helping my cause).

Copy link
Contributor

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

kv/main.go Outdated
@@ -129,7 +131,7 @@ type generator struct {
func newGenerator(seq *sequence) *generator {
return &generator{
seq: seq,
rand: rand.New(rand.NewSource(int64(time.Now().UnixNano()))),
rand: rand.New(rand.NewSource(rand.Int63())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What motivated this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if time.Now().UnixNano() returns the same timestamp (surely shouldn't on reasonable systems, but Never Trust Clocks™), you end up with multiple blockers overlapping, which is unintentional. Seems more legit this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we seed the global rand? If we don't, then this is not doing what you expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/src/math/rand/rand.go#L235:

var globalRand = New(&lockedSource{src: NewSource(1).(Source64)})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about seeding the global rand, I've indeed set us up for a problem here. I'll back this out (the concern above is synthetic anyway).

This parameter truncates & loops the sequence around. As a consequence, the
number of keys is bounded by `concurrency * cycleLength`.

Something odd I've noticed is that this:

```
go run main.go --cycle-length 1 --concurrency 8
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0          344.6          344.6    100.7    134.2    285.2
      2s        0          319.2          331.9    104.9    209.7    285.2
      3s        0          325.0          329.6     96.5    184.5    251.7
      4s        0          322.9          327.9    117.4    167.8    209.7
```

is much slower than this:

```
go run main.go --cycle-length 10 --concurrency 8
_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p95(ms)__p99(ms)_pMax(ms)
      1s        0         3165.0         3164.9      5.5      7.9     10.5
      2s        0         3194.6         3179.8      5.2      8.9     12.6
      3s        0         3093.7         3151.1      5.5      8.4     12.1
      4s        0         2807.6         3065.3      6.3     11.0     13.1
```

Unless I'm mistaken, the eight clients "never" intersect, so the only
difference is that in one example they're each hitting one own key, and in the
latter eight own keys. Perhaps there is more range parallelism in the latter,
but you wouldn't expect it. The difference disappears with `--concurrency=1`.

The motivation for this change is
cockroachdb/cockroach#15756:

Bad behavior in the GC queue can be reproduced by running a single-node
cluster and

```
go run main.go --cycle-length 1 --concurrency 1 --min-block-bytes $((1024*1024)) --max-block-bytes $((1024*1024*2))
```

and, after a few gigs of data have piled up,

```
./cockroach zone set .default -f - --insecure <<EOF
gc:
  ttlseconds: 600
EOF
```

In one run, this resulted in

```
queue_gc_processingnanos{store="1"} 4.26018985343e+11
```

or, ~400s.

There's also a "mystery" that I haven't really looked into: Replace the `*2`
in the `go run` invocation above with `*10`, and immediately get `driver: bad
connection`.
@tbg tbg merged commit b9ecf80 into master May 9, 2017
@tbg tbg deleted the cycle branch May 9, 2017 12:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants