-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/ycsb: support workload D, E and fix key selection for insertion and reading #37804
Conversation
a7ffc5a
to
7f1a6f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Have you tried running them on a roachprod cluster? Consider adding these workloads to the set of ycsb roachtests and then running them and seeing what you see. Both of these are useful and realistic workloads.
Reviewed 1 of 1 files at r1, 2 of 2 files at r2, 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 338 at r1 (raw file):
updateStmts []*gosql.Stmt // The next row index to insert.
Seems like you never want these to be nil pointers when you interact with them below. How about avoiding the allocations above and just taking the address of the field for use with the atomics?
pkg/workload/ycsb/ycsb.go, line 126 at r3 (raw file):
g.flags.StringVar(&g.workload, `workload`, `B`, `Workload type. Choose from A-F.`) g.flags.StringVar(&g.requestDistribution, `request-distribution`, `zipfian`, `Distribution for request key generation [zipfian, uniform, latest].`) g.flags.StringVar(&g.scanLengthDistribution, `scan-length-distribution`, `uniform`, `Distribution for scan length generation [zipfian, uniform].`)
nit: can you augment the help text for these 3 flags to mention that this is for workload E.
pkg/workload/ycsb/ycsb.go, line 510 at r3 (raw file):
defer res.Close() for res.Next() {
nit: no need for this blank line. I guess if you feel like it's visually important you can add a comment inside the loop.
7f1a6f1
to
702552d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 338 at r1 (raw file):
Previously, ajwerner wrote…
Seems like you never want these to be nil pointers when you interact with them below. How about avoiding the allocations above and just taking the address of the field for use with the atomics?
The row indexes and counters are shared between workers, so they have to be allocated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
702552d
to
8b2f3e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
Reviewed 2 of 2 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 4 of 4 files at r6, 2 of 2 files at r7, 1 of 1 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jeffrey-xiao)
pkg/workload/ycsb/latest_generator.go, line 24 at r2 (raw file):
) // LatestGenerator is a random number generator that generates draws from a
The official YCSB benchmark seems to use a skewed latest generator for workload D. Why did you decide to go with a uniform distribution here?
EDIT: never mind, looks like you realized this already. Want to call this SkewedLatestGenerator
to mirror their implementation?
pkg/workload/ycsb/uniform_generator.go, line 62 at r3 (raw file):
} // Uint64 returns a random Uint64 between iMin and iMax, drawn from a uniform
How does this ensure that the value is above iMin
?
pkg/workload/ycsb/ycsb.go, line 406 at r1 (raw file):
// See YCSB paper section 5.3 for a complete description of how keys are chosen. func (yw *ycsbWorker) nextReadKey() string { rowCount := atomic.LoadUint64(yw.rowCount)
Do we ever modify rowCount
? Do we need this atomic load?
pkg/workload/ycsb/ycsb.go, line 144 at r2 (raw file):
return workload.Hooks{ Validate: func() error { switch g.workload {
Do you think we should be defaulting the request distribution differently depending on the workload?
pkg/workload/ycsb/ycsb.go, line 307 at r3 (raw file):
return workload.QueryLoad{}, errors.Errorf("Unknown request distribution: %s", g.requestDistribution) } if err != nil {
Nice catch!
8b2f3e8
to
46598bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/workload/ycsb/uniform_generator.go, line 62 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How does this ensure that the value is above
iMin
?
Oops, forgot to make this change.
pkg/workload/ycsb/ycsb.go, line 406 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we ever modify
rowCount
? Do we need this atomic load?
Also forgot to increment this when iMax gets incremented.
pkg/workload/ycsb/ycsb.go, line 144 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you think we should be defaulting the request distribution differently depending on the workload?
I think that's a reasonable change.
46598bf
to
b3b98e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/workload/ycsb/uniform_generator.go, line 67 at r12 (raw file):
z.mu.Lock() defer z.mu.Unlock() return rand.Uint64()%(z.mu.iMax-z.iMin+1) + z.iMin
This should be using z.mu.r.Uint64()
, not the global rand.Uint64()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/workload/ycsb/uniform_generator.go, line 67 at r12 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This should be using
z.mu.r.Uint64()
, not the globalrand.Uint64()
.
Also, using mod to get integers within a range is slightly biased. This code should probably be using math/rand.Rand.Int63n
. I realize this wasn't changed in this PR, but as long as you're in here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/workload/ycsb/ycsb.go, line 409 at r12 (raw file):
// distribution and modding the samples. To properly support INSERTS, we need // to maintain a separate rownum counter. rownum := yw.randGen.IMaxHead()
This was the only call to IMaxHead
. I think we can remove that method now, which was a bit of an oddity in the random generators.
pkg/workload/ycsb/ycsb.go, line 438 at r12 (raw file):
func (yw *ycsbWorker) nextReadKey() string { rowCount := atomic.LoadUint64(yw.rowCount) rowIndex := yw.hashKey(yw.requestGen.Uint64()) % rowCount
buildKeyName
also calls hashKey
, so this will cause us to call hashKey
twice for each read key. Is that expected?
pkg/workload/ycsb/ycsb.go, line 472 at r12 (raw file):
return err } atomic.AddUint64(yw.rowCount, 1)
Can you confirm that this is what the official YCSB implementation does? I ask because it is still possible to generate a read key which hasn't been inserted yet due to concurrency in the inserts.
b3b98e9
to
3308e2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @petermattis)
pkg/workload/ycsb/uniform_generator.go, line 67 at r12 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Also, using mod to get integers within a range is slightly biased. This code should probably be using
math/rand.Rand.Int63n
. I realize this wasn't changed in this PR, but as long as you're in here...
Done. Unfortunate that there is no Uint64n
.
pkg/workload/ycsb/ycsb.go, line 409 at r12 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This was the only call to
IMaxHead
. I think we can remove that method now, which was a bit of an oddity in the random generators.
Done.
pkg/workload/ycsb/ycsb.go, line 438 at r12 (raw file):
Previously, petermattis (Peter Mattis) wrote…
buildKeyName
also callshashKey
, so this will cause us to callhashKey
twice for each read key. Is that expected?
The issue with the previous implementation is that it gets hashed after the mod which always causes the hotspot to be the first key. The yscb paper suggests to hash then mod. Hashing twice for each read key is also consistent with the official ycsb implementation. When the request distribution is zipfian
, the official yscb implementation uses a ScrambledZipfianGenerator
which hashes the output of a ZipfianGenerator
. Then they hash the generated key num to obtain the key name.
pkg/workload/ycsb/ycsb.go, line 472 at r12 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Can you confirm that this is what the official YCSB implementation does? I ask because it is still possible to generate a read key which hasn't been inserted yet due to concurrency in the inserts.
Thanks for the catch. The official implementation uses an AcknowledgedCounterGenerator
which I've now added.
af20f54
to
eb5e742
Compare
Previously, the keys for insertion were chosen based on the iMaxHead of the zipfian distribution generator. As suggested in the ycsb paper, the zipfian distribution generator was later changed to use a much larger key space, but the key selection for insertion was never changed. This commit adds an index for the next row to be inserted and a row counter that keeps track of the largest contiguous sequence of keys able to be read. Since IMaxHead is no longer used after these changes, it was removed from the randGenerator interface and all implementors. Additionally, we perform the convertion from random number to key in a different manner than the yscb paper. The yscb paper performs a hash then mods the number, while we mod the number then hash it. This causes the hotspot of keys when using the zipfian to be record 0 (I.E. the oldest record). Another hash was added to ensure that the hotspot of keys changes. Hashing twice is also consistent with the official yscb implementation. Finally, there was no logic to handle reading keys that were inserted since the key selection for reading only picked from the initial rows. Instead of modding by the initial rows, we now mod by the total number of rows recorded by the row counter. Release note: None
The skewed latest generator is used for workload D and skews to the most recently inserted keys. Additionally, changed the API for zipfGenerator to generate an inclusive bound in [0, defaultIMax - 1], instead of [1, defaultIMax]. Release note: None
As described in the ycsb paper, the key should be chosen by a zipfian distribution and the number of keys to scan should be chosen by a uniform distribution. Add a command-line flag for choosing the scan length distribution and also changed the API for the uniform generate to accept a lower bound and upper bound. This makes it more consistent with the zipfian generator and the latest generator and is needed for generating a scan length. Release note: None
Release note: None
eb5e742
to
e5e1116
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jeffrey-xiao and @nvanbenschoten)
pkg/workload/ycsb/acknowledged_counter.go, line 45 at r16 (raw file):
c := &AcknowledgedCounter{} c.mu.count = initialCount c.mu.window = make([]bool, windowSize)
I believe this will use windowSize
bytes. You could reduce that by a factor of 8 by using a bitmap. Not sure if that is worth it as we're only talking about 1MB of memory and the current code is clear and concise.
pkg/workload/ycsb/uniform_generator.go, line 67 at r12 (raw file):
Previously, jeffrey-xiao (Jeffrey Xiao) wrote…
Done. Unfortunate that there is no
Uint64n
.
Someday (not today) we may upgrade to golang.org/x/exp/rand
which does have Uint64n
.
bors r+ |
37804: workload/ycsb: support workload D, E and fix key selection for insertion and reading r=jeffrey-xiao a=jeffrey-xiao This PR does the following items: 1. Support workload D (95% reads skewed to newer keys, 5% writes) by adding a latest generator 2. Support workload E (95% scans, 5% writes) by implementing scans 3. Fix key selection for insertion by having a separate index for next row to insert rather than using the `iMaxHead` in the zipfian generator 4. Fix key selection for reading by having a separate counter for number of rows instead of using the number of initial rows. I didn't fully understand the method proposed in the ycsb paper in section 5.3, so if this isn't the best way, I'm open to changes. 5. Improve key selection by hashing before modding. This follows the logic proposed in the ycsb paper. Originally we were hashing after applying a modulo, but that always causes the first inserted key to be the hotspot rather than distribute the hotspot around. 6. Change API for generators to be more consistent. All generators now take in a lower bound and upper bound and generate an inclusive range between the lower and upper bound (I.E. [lower, upper]). Co-authored-by: Jeffrey Xiao <[email protected]>
Build succeeded |
This change fixes the handling of the --request-distribution flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to cockroachdb#37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only.
85920: backupccl: elide data from offline tables during RESTORE r=adityamaru a=msbutler A previous PR (#83915) enabled BACKUP to target offline descriptors (including IMPORTing and RESTOREing tables), but didn't fully handle their rollback on RESTORE. This PR ensures RESTORE properly handles these offline tables: - An offline table created by RESTORE or IMPORT PGDUMP is fully discarded. The table does not exist in the restoring cluster. - An offline table created by an IMPORT INTO has all importing data elided in the restore processor and is restored online to its pre import state. If the RESTORE occurs AOST after the table returns online, the table is RESTOREd to its post offline state. Currently, all tables that return online are fully re-backed up. For many situations above, this is unecessary. Future work should ensure that these tables are not re-backed up when we know all operations within the incremental backup interval on the table were MVCC. This patch only applies to BACKUP and RESTORE flavors that implicitly target an offline importing/restoring table. A future PR should allow a user to explicitly target an existing offline table undergoing an IMPORT. Release justification: fixes bug where offline tables were not properly handled in restore. 86826: workload/ycsb: fix --request-distribution flag r=nvanbenschoten a=nvanbenschoten This change fixes the handling of the `--request-distribution` flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to #37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only. 87047: pgwire: fix txn options for prepared BEGIN r=ZhouXing19 a=rafiss fixes #87012 Release note (bug fix): The statement tag for SHOW command results in the pgwire protocol no longer contain the number of returned rows. Release note (bug fix): Fixed a bug where the options given to the BEGIN TRANSACTION command would be ignored if the BEGIN was a prepared statement. Release justification: low risk bug fix 87113: ci: run `acceptance` test in `ci` config r=jlinder a=rickystewart We need this to get detailed info in the `test.xml`. Release justification: Non-production code changes Release note: None 87114: ci: fix nightlies that are invoking `bazci` incorrectly r=rail a=rickystewart Some of these nightlies were broken as a result of `c8d66a9c334dd87acd62190f314a4f3ec236ce82`. Release justification: Non-production code changes Release note: None 87116: bazci: delete unused `testdata` directory r=celiala a=rickystewart Release justification: Non-production code changes Release note: None Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>
This change fixes the handling of the --request-distribution flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to #37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only.
This change fixes the handling of the --request-distribution flag so that any user overrides are respected. Before, the default request distribution for a given workload was always used, regardless of the flag. This bug dates back to #37804. I fear this has caused a good deal of confusion in the past, as we often turn to YCSB-A uniform as an example of an uncontended read/update workload. Release justification: workload only.
This PR does the following items:
iMaxHead
in the zipfian generator