-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvcc/backend: Fix corruption bug in defrag #11613
Conversation
7e36b79
to
ce8464e
Compare
Codecov Report
@@ Coverage Diff @@
## master #11613 +/- ##
==========================================
- Coverage 66.56% 66% -0.56%
==========================================
Files 403 403
Lines 36630 37165 +535
==========================================
+ Hits 24381 24529 +148
- Misses 10768 11144 +376
- Partials 1481 1492 +11
Continue to review full report at Codecov.
|
Good catch!
Can we also add fail points? Something like: }
+
+ // gofail: var defragBeforeRename struct{}
err = os.Rename(tdbp, dbp)
if err != nil {
if b.lg != nil { Thanks! /cc @xiang90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
This might also be able to corrupt the keyspace, but I haven't been able find any ways to make that happen (in boltdb, the etcd keyspace is keyed by <revision, key> pairs, and any data older than the "last compacted" revision is ignored, so the data accidentally included from a previous defrag shouldn't matter).
I agree for the key
bucket. Data accidentally included from a previous defrag orphaned file will be deleted in the next compaction.
For other buckets such as member
and lease
, they might cause damage?
Thanks Joe! Good catch, let's make sure we backport this to 3.2, 3.3 and 3.4, as well as changelog note. |
Yeah. Sounds like you have the same understanding as I. If this was to somehow modify the keyspace, the only way I can imagine it happening would be: a lease reappears due to this issue (that was previously expired) and somehow deleted keys it wasn't suppose to when it expired again, modifying the keyspace. Hopefully this is not possible?
It can, I've been able reproduce it on my local machine. It's pretty easy, steps are basically:
|
ce8464e
to
07a26f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
lgtm |
Good catch. This sounds possible but unlikely to me.
|
OK. This can corrupt the keyspace. Here's how to reproduce:
Thanks goes to @lavalamp for suggesting this approach. |
f970561
to
9cb6fd5
Compare
9cb6fd5
to
213f7f7
Compare
Great work Joe! So in your example, the deleted key |
Yes, and note that once a bad |
…-origin-release-3.3 Automated cherry pick of #11613 to release-3.3
…-origin-release-3.2 Automated cherry pick of #11613 to release-3.2
…-origin-release-3.4 Automated cherry pick of #11613 to release-3.4
changelog: Add #11613 backport to 3.2, 3.3 and 3.4 changelogs
@gyuho @wenjiaswe may I ask when the new etcd version containing this fix been released? |
I would like to have that release too! also cc @hexfusion |
I would like a new 3.4 release including this backport. Thanks. |
etcd v3.3.19 includes a important bugfix: Fix corruption bug in defrag. etcd-io/etcd#11613
So, what versions contain this fix ? |
@socketpair It's in 3.2.29+, 3.3.19+ and 3.4.4+ |
If etcd is terminated during the defrag operation, the
db.tmp
file that it creates can be orphaned. If this happens, the next defragmentation operation that occurs will open the orphaneddb.tmp
instead of creating an emptydb.tmp
file, and starting with a fresh slate, as it should.Once the defragmentation operation opens
db.tmp
, it traverses all key-values in the maindb
file and writes them todb.tmp
. Any key-values already in thedb.tmp
file that are not overwritten by this copy remain in it, corrupting the boltdb keyspace. When the defragmentation operation completes successfully,db.tmp
replacesdb
via file move and the main db file is now corrupt.Impact:
on must also become the leader.)
See #11613 (comment) and #11613 (comment) for examples of how to reproduce the issue.
There is a narrow window between when the bolt db transaction that populates
db.tmp
is committed anddb.tmp
is moved to replacedb
file that etcd must be terminated in to trigger this.The fix is simple: ensure the temporary file used for defragmentation is always a new, empty file.
cc @wenjiaswe @jingyih @YoyinZyc @gyuho