Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: set roachpb.Value checksums when creating data #24128

Merged
merged 1 commit into from
Mar 24, 2018
Merged

importccl: set roachpb.Value checksums when creating data #24128

merged 1 commit into from
Mar 24, 2018

Conversation

maddyblue
Copy link
Contributor

IMPORT was not setting value checksums when generating KV pairs from
CSV rows. This results in CPuts on, for example, secondary indexes
failing because the checksum doesn't match.

Fixes #23984

Release note (bug fix): correctly generate on-disk checksums during
IMPORT. If there is existing data created by IMPORT that cannot
be recreated by this fixed version of IMPORT, use cockroach dump
to rewrite any affected tables.

@maddyblue maddyblue requested review from dt, nvanbenschoten and a team March 21, 2018 20:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member

:lgtm: did you confirm that this fixes the issue?


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/ccl/importccl/csv.go, line 711 at r1 (raw file):

		count += int64(len(kvBatch))
		for _, kv := range kvBatch {
			if err := writer.Put(kv.Key, kv.Value.RawBytes); err != nil {

Adding the call to InitChecksum would be a bit more natural here, but it's up to you.


Comments from Reviewable

@maddyblue
Copy link
Contributor Author

Yes, the test I added produces the error in the bug.


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


pkg/ccl/importccl/csv.go, line 711 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Adding the call to InitChecksum would be a bit more natural here, but it's up to you.

Sadly this function (writeRocksDB) is only called in the local version of import, so adding it here would also mean adding it in a second place for the distributed version. The place where it is now is best because that's the latest point before the implementations diverge.


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Mar 21, 2018

How does this interact with #24126? Do we still need this for mixed-version clusters or is it obsolete?


Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.


Comments from Reviewable

@nvanbenschoten
Copy link
Member

#24126 will fix #23984 for any 2.0 clusters, but the change here is still a good idea for mixed-version clusters where some of the nodes may require checksums.

@maddyblue
Copy link
Contributor Author

I'm not opposed to closing this PR if #24126 is a sufficient fix. Will that make it into 2.0 or 2.0.1? We could choose to not merge this PR since it'd just be a performance decrease for no value.

@bdarnell
Copy link
Contributor

We should merge and cherry-pick this (for 2.0.1, I think) since it will help for mixed-version clusters. Then we can remove the checksum computation at the same time we remove all the others.

IMPORT was not setting value checksums when generating KV pairs from
CSV rows. This results in CPuts on, for example, secondary indexes
failing because the checksum doesn't match.

Fixes #23984

Release note (bug fix): correctly generate on-disk checksums during
IMPORT. If there is existing data created by IMPORT that cannot
be recreated by this fixed version of IMPORT, use `cockroach dump`
to rewrite any affected tables.
@maddyblue maddyblue merged commit 6b1b600 into cockroachdb:master Mar 24, 2018
@maddyblue maddyblue deleted the import-checksum branch March 24, 2018 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: duplicate key value on non-unique index
4 participants