-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: duplicate key value on non-unique index #23984
Comments
So the most surprising thing from the UX perspective in this error is that not only does the UPDATE not change anything in the PK, it doesn't change anything in the conflicting index either. however I think the logic in rowwriter's UpdateRow re-encodes every index when any value in the row is changed. I think this FK violation error can be encountered if the txn gets a conflict in-between the update to a PK and the update to a secondary index -- when another txn successfully writes to the same entry in the index "underneath". UpdateRow doesn't "see" that the txn is aborted and instead runs into a KV conflict and thinks it's a uniqueness violation. This is precisely the scenario that @bdarnell and @nvanbenschoten were discussing yesterday. @vivekmenezes I'm off for a week, this should be looked at by someone until then. Can you please divert until I'm back? @jordanlewis can you check whether this is related to the FK issues that were seen during tpcc testing. I think it is. /cc @tschottdorf |
@mjibson the working hypothesis for that problem is that if your client retries the txn upon encountering this error, the error will disappear. I'm not sure in which scenario you found this, but perhaps a txn retry would be a workaround until the underlying problem is fixed? (Assuming the hypothesis is correct -- we're not yet 100% sure on that) |
I reran the job an hour later and had the exact same error, so it doesn't appear to be a txn-related problem. |
The code running this is at https://github.com/mjibson/hots.dog/blob/36a7b7834677505b6d342a19195c1de873e26757/elo.go#L203 It sets up a pgx pool and runs a bunch of updates from data incoming over a channel, so there's certainly lots of statements going through the same session, assuming the pool isn't making a new session each execution. |
this involves an update to the STORING index where the value of the index is being modified.
note an update to skill results in an update to this index. |
Running these manually, this has nothing to do with retries. With no other transactions going on if I just run one of these by hand it fails with this same error. I'm not sure what I can do to get around it. It's delaying a feature :(. Is there any suspicion that using the 2.0 branch would improve this? I'm already on cluster version 2.0-1, so it's not trivial to just drop in a 2.0 binary otherwise I'd try. |
So do you get different results for these two queries?
|
Ok update: I did a BACKUP, started up a release-2.0 cluster, RESTORE, and now everything seems fine. Would it be useful to spin up a master cluster again to test those queries? Can do. |
I was able to (accidentally) recreate this situation on a release-2.0 cluster. The above query returned identical results (13 rows each, no diff). I find it medium weird that a backup/restore cycle did something to change this. I was able to replicate that again just now (i.e, do a backup/restore and it works fine). So...this maybe got weirder. |
I did a test to verify some things.
I'll continue investigating tomorrow, but this appears to rule out things like intents or other on-disk transaction stuff. |
More updates: doing a |
Tonight's analysis: added some debug prints during cputs and failures. When I drop all indexes except one, the error is the same. The CPut expected value is |
Values contain a checksum, which is not included in their pretty-printed representation. Try printing that out too. We tried phasing out the checksum here, but reverted it (#22637) because ConditionalPut considers checksums in deciding whether things match (#22636). CC @nvanbenschoten |
Interesting. If my memory serves me correctly, @mjibson did upgrade to the version with the checksum change before we pulled it. If the lack of a checksum on certain keys is causing this issue then it would make sense that dumping and restoring the data would fix it. |
I just talked to @mjibson and I was confused before. He's able to reproduce this on master with a fresh store, which means that this wasn't caused by the reverted change referenced above. That said, it does sound similar. He's going to increase the logging of the |
Fixes cockroachdb#23984. Fixes cockroachdb#22636. In cockroachdb#22636 we noticed that conditional puts were not respecting the fact that Value checksums are meant to be optional. This is because they include checksums when performing the equality check between expected and existing values. This caused serious issues and forced us to roll back 75cbeb8. This was unfortunate, but the damage was contained. It demonstrated that we need to be very careful about changing checksums on values. Later, we found in cockroachdb#23984 that a path somewhere in IMPORT was forgetting to set checksums on Values. This was causing SQL operations at a much later time to fail for the same reason that 75cbeb8 caused issues. This bug will be fixed in a different change, but the fact that it exists shows how fragile the current approach is because despite the intention, checksums are absolutely not optional. This change addresses both of these problems by making checksums optional, as they were intended to be. CPut and InitPut no longer compare the checksums of the expected and existing values. Instead, they call into a new `EqualData` method, which ignores the checksum header when comparing byte values. This is a bit higher risk than most of the changes we've been accepting at this stage of the release cycle, but it would be really nice to get this change in to 2.0 so that we can stay on track with the timeline proposed in cockroachdb#22636 (comment) (although I'm not positive that the last step of the timeline is correct because I don't think we can rely on the "baked in" protection with such a low-level change). It will also save anyone who has used IMPORT already and hit cockroachdb#22636. This change should be safe from beneath Raft inconsistency for the same reason that 75cbeb8 was safe. All CPuts beneath Raft are still careful to set checksums, so we should not see any divergence in Raft state because of this error handling change. After this change, the only uses of `bytes.Equal(v1.RawBytes, v2.RawBytes)` are in tests and the `value.Equal` method. There may be an argument for making even `value.Equal` ignore the checksum header, although it isn't used outside of other `proto.Equal` methods. ``` Nathans-MacBook-Pro:cockroach$ grep -lRE 'bytes\.Equal\(.*\.RawBytes,*,.*\.RawBytes,*\)' pkg pkg/roachpb/data.pb.go pkg/storage/engine/mvcc_test.go ``` Release note: None
Fixes cockroachdb#23984. Fixes cockroachdb#22636. In cockroachdb#22636 we noticed that conditional puts were not respecting the fact that Value checksums are meant to be optional. This is because they include checksums when performing the equality check between expected and existing values. This caused serious issues and forced us to roll back 75cbeb8. This was unfortunate, but the damage was contained. It demonstrated that we need to be very careful about changing checksums on values. Later, we found in cockroachdb#23984 that a path somewhere in IMPORT was forgetting to set checksums on Values. This was causing SQL operations at a much later time to fail for the same reason that 75cbeb8 caused issues. This bug will be fixed in a different change, but the fact that it exists shows how fragile the current approach is because despite the intention, checksums are absolutely not optional. This change addresses both of these problems by making checksums optional, as they were intended to be. CPut and InitPut no longer compare the checksums of the expected and existing values. Instead, they call into a new `EqualData` method, which ignores the checksum header when comparing byte values. This is a bit higher risk than most of the changes we've been accepting at this stage of the release cycle, but it would be really nice to get this change in to 2.0 so that we can stay on track with the timeline proposed in cockroachdb#22636 (comment) (although I'm not positive that the last step of the timeline is correct because I don't think we can rely on the "baked in" protection with such a low-level change). It will also save anyone who has used IMPORT already and hit cockroachdb#22636. This change should be safe from beneath Raft inconsistency for the same reason that 75cbeb8 was safe. All CPuts beneath Raft are still careful to set checksums, so we should not see any divergence in Raft state because of this error handling change. After this change, the only uses of `bytes.Equal(v1.RawBytes, v2.RawBytes)` are in tests and the `value.Equal` method. There may be an argument for making even `value.Equal` ignore the checksum header, although it isn't used outside of other `proto.Equal` methods. ``` Nathans-MacBook-Pro:cockroach$ grep -lRE 'bytes\.Equal\(.*\.RawBytes,*,.*\.RawBytes,*\)' pkg pkg/roachpb/data.pb.go pkg/storage/engine/mvcc_test.go ``` Release note: None
version (master branch at d23c18a + a patch dealing with the job registry): CockroachDB CCL v2.0-alpha.20180212-1682-g3d9738f
Running a job that does a bunch of updates (
update players set skill = $1 where game = $2 and blizzid = $3
) and got an error:ERROR: duplicate key value (build,map,mode,hero_level)=(67,14,1,1) violates unique constraint "players_build_map_mode_hero_level_idx" (SQLSTATE 23505)
See the schema below. There is clearly no unique index beside the PK, and the UPDATE didn't change any columns in the PK.
For the schema:
The text was updated successfully, but these errors were encountered: