Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachpb: investigate disabling roachpb.Value checksum computation #21435

Closed
petermattis opened this issue Jan 13, 2018 · 9 comments
Closed

roachpb: investigate disabling roachpb.Value checksum computation #21435

petermattis opened this issue Jan 13, 2018 · 9 comments
Assignees
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. no-issue-activity X-stale
Milestone

Comments

@petermattis
Copy link
Collaborator

petermattis commented Jan 13, 2018

#21395 disabled verification of the roachpb.Value checksums, providing a 15-20% speed boost to large MVCC scans. Checksums are still computed when values are written, they are just not verified. The justification for removing the verification is that TLS and RocksDB checksums already provide adequate coverage. RocksDB checksums have caught disk corruption in the wild. The roachpb.Value checksums never have.

Computation of the checksums was left in because disabling them may be problematic. We'd need to verify that checksums are never computed below Raft or perform a migration. But before doing any of that work, we should verify that disabling the checksum provides a speed boost. Workloads to investigate are simple KV operations are bulk insert operations.

@nvanbenschoten for triage.

Jira issue: CRDB-5886

@petermattis petermattis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Jan 13, 2018
@petermattis petermattis added this to the 2.0 milestone Jan 13, 2018
@nvanbenschoten
Copy link
Member

I did some experimentation with this, both by simply disabling checksum computation and keeping the first 4 bytes in each value empty in nvanbenschoten/someChecksum and by removing the entire 4-byte prefix of each value entirely in nvanbenschoten/noChecksum. I then ran kv -concurrency=32 -read-percent=0 against a local single-node cluster. This produced the following results after three trials in each configuration:

computing checksum leaving checksum empty removing checksum prefix
avg. latency (ms) 7.7 7.666666667 7.4
latency delta - -0.4329004329% -3.896103896%
avg. throughput (ops/sec) 4162.533333 4186.533333 4327.866667
throughput delta - +0.5765719594% +3.971940165%

While leaving the checksum field in roachpb.Values empty is a simple change, it doesn't provide much of a speedup at all. Because of the minimal performance impact and because it may put us in a difficult situation if we decide to turn on checksum verification in the future, it's probably not worth doing.

On the other hand, removing the entire 4-byte checksum field from roachpb.Values does provide a relatively significant speedup. Unfortunately, doing this in a backward compatible manner is much more challenging. roachpb.Values are not versioned, so in order to change their header size, we'd need to perform some kind of migration. The migration could look like:

  1. wait for a new cluster version
  2. add a version prefix to all values
  3. begin writing new values with a new version and without a checksum
  4. migrate all existing values to the new version

This seems like a pretty big change and undoubtedly has some nasty subtleties, so I don't think there's much to do here for 2.0.

@nvanbenschoten nvanbenschoten modified the milestones: 2.0, Later Jan 17, 2018
@petermattis
Copy link
Collaborator Author

Heh, I was about to take a look at this. I think it makes sense to disabling checksum computation completely, but leave the 4-byte checksum field blank. Note that the 0-value for checksum indicates "checksum uninitialized", so I don't think it will be problematic to turn on later.

Also, we're still verifying the checksums when a ScanResponse is returned over the wire via ScanResponse.Verify -> Value.Verify.

@nvanbenschoten
Copy link
Member

If this doesn't complicate turning checksum verification back on then I'm fine with disabling the checksum computation. As you mentioned, the only complication here would be if we created roachpb.Values beneath Raft. I didn't find any instance of us doing this when I first looked, but I'll check again.

@nvanbenschoten nvanbenschoten modified the milestones: Later, 2.0 Jan 17, 2018
@petermattis
Copy link
Collaborator Author

The below Raft thing might be a deal breaker. I think you should try to hook into the below-raft protos test to really be sure we're never calling Value.InitChecksum below Raft.

@nvanbenschoten
Copy link
Member

TestBelowRaftProtos doesn't allow roachpb.Values to be marshaled beneath Raft, so that alone should prove that we don't call Value.InitChecksum for anything that could cause inconsistencies, right? Is there more we'd want to assert here?

A grep shows that the only call to Value.InitChecksum near Raft is in Replica.append, but this is called on Raft log entries, which are unreplicated, so this shouldn't be a problem.

Note that the 0-value for checksum indicates "checksum uninitialized", so I don't think it will be problematic to turn on later.

This also means that the change won't even require a new cluster version, since any uninitialized checksums are simply ignored.

@petermattis
Copy link
Collaborator Author

TestBelowRaftProtos doesn't allow roachpb.Values to be marshaled beneath Raft, so that alone should prove that we don't call Value.InitChecksum for anything that could cause inconsistencies, right? Is there more we'd want to assert here?

We don't marshal roachpb.Value into the values of key/values. Instead we store roachpb.Value.RawBytes directly in the value. So I'm not sure if it shows anything that roachpb.Value is not marshaled below Raft. Rather, I think we need to make any call to Value.InitChecksum act the same as marshaling of roachpb.Value and use the below raft protos infrastructure to verify there are no such calls below Raft.

This also means that the change won't even require a new cluster version, since any uninitialized checksums are simply ignored.

Yep.

@nvanbenschoten
Copy link
Member

We don't marshal roachpb.Value into the values of key/values. Instead we store roachpb.Value.RawBytes directly in the value.

Good point. Hooking into Value.InitChecksum directly shows that it is called twice beneath Raft, both through MVCCPutProto.

The first caller of MVCCPutProto is StateLoader.SetHardState. This one is fine because the RaftHardStateKey is unreplicated.

The second caller of MVCCPutProto is StateLoader.SetMVCCStats. This is more problematic because the RangeStatsKey is replicated, so changing things here naively would cause issues with the consistency checker. That said, we already have other changes that are going to require us to massage the MVCCStats while computing checksums. We should be able to wrap the changes up together, migrating the v2 stats key if present and stripping any checksum that may exist from either version before performing the checksum. This would allow us to make the rest of the proposed change here without any risk of inconsistent state.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 2, 2018
…State key

Fixes cockroachdb#13392.
Unblocks cockroachdb#21435.

This change creates a new replicated range-local key called the
RangeAppliedState key. The key holds a combination of the Raft applied
index, the lease applied index, and the MVCC stats. Each of these pieces
of metadata is written on every single Raft application, so combining them
into single RocksDB key reduces the required number of keys touched on
each Raft application down from 3 to 1.

The applied indices and the stats used to be stored separately in different
keys. We now deem those keys to be "legacy" because they have been replaced
by the range applied state key. However, in order to maintain compatibility
with existing clusters without requiring a migration, we allow them to exist
side-by-side with the new range applied state key. Because we need to support
an upgrade path from these legacy keys to the single new key, we can continue
to write to the legacy keys as long as the range applied state key does not
yet exist. We take advantage of this in two ways:
1. The existence of these legacy keys, even if they are out of date, turns
   out to be useful in cases where we need to synthesize their up-to-date
   values (snapshots, stats computations, and consistency checks). The reason
   for this is that the legacy keys can serve as markers for where these
   legacy values need to be synthesized and injected during MVCC iteration.
   This is all handled by a new iterator type called MigrationIter, which
   translates new keys that may not exist on all replicas in a mixed-version
   cluster into their corresponding legacy representations.
2. They existence of the keys also helps keeps the MVCCStats consistent
   across pre- and post-"range applied state" versions of Cockroach.

For these reasons, we still require that these legacy keys be written during
the initial bootstrapping of each range. We then wait for the first Raft
application to "upgrade" from the use of these keys to the use of the new
range applied state key.

Preliminary benchmarking of this change confirms the expected performance
improvement. When running `./kv --read-percent=0` locally I see a 5-8%
improvement in throughput and 4-6% reduction in avg latency.

The change is still a WIP because the changes made to ComputeStatsGo needs
to be ported over to C++. This shouldn't be particularly challenging because
unlike in Go where we needed to synthesize the legacy keys, in C++ we just
need to make sure we account for their impact on the stats. It also
needs more mixed-version cluster testing like that done in cockroachdb#21120.

Release note (performance improvement): Fewer disk writes are required
for each database write, increasing write throughput and reducing
write latency.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 7, 2018
Fixes cockroachdb#21435.

This change disables roachpb.Value checksum computation, leaving
the checksum field blank on roachpb.Values.

See cockroachdb#21435 (comment)
for details on the measured performance improvement from this change.

Release note (performance improvement): Unnecessary value
checksums are no longer computed, speeding up database writes.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 9, 2018
Fixes cockroachdb#21435.

This change disables roachpb.Value checksum computation, leaving
the checksum field blank on roachpb.Values.

See cockroachdb#21435 (comment)
for details on the measured performance improvement from this change.

Release note (performance improvement): Unnecessary value
checksums are no longer computed, speeding up database writes.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 9, 2018
Fixes cockroachdb#21435.

This change disables roachpb.Value checksum computation, leaving
the checksum field blank on roachpb.Values.

See cockroachdb#21435 (comment)
for details on the measured performance improvement from this change.

Release note (performance improvement): Unnecessary value
checksums are no longer computed, speeding up database writes.
@andreimatei
Copy link
Contributor

#22487 was reverted in #22637. I think this should be re-opened, right?

@andreimatei andreimatei reopened this Jun 19, 2020
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

3 participants