Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: check data corruption on boot #8554

Merged
merged 4 commits into from
Nov 23, 2017
Merged

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Sep 13, 2017

Fix #8313.


func (f *fakeConsistentIndex) ConsistentIndex() uint64 { return f.rev }

func alarmCorruptTest(cx ctlCtx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this an integration test? there's already a corruption test there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fatal in corruption, fatal logger calls os.Exit(1) (so not testable)?

cx.t.Fatalf("expected error %v after %s", rpctypes.ErrCorrupt, 5*time.Second)
}

// corrupt alarm should now be on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster that's currently running can be expected to be OK since it's periodically checking its hash with its other members. The newly corrupted member should fatal out before joining raft / doing any damage / causing need for an alarm.

Copy link
Contributor Author

@gyuho gyuho Sep 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly corrupted member should fatal out before joining raft

Instead, if a member can contact the client addresses of its peers, it should first fetch hashes from the other members at a known revision and compare before serving any client requests. (from original issue)

Agree.

So should the checking be before s.publish request, in s.start, using Raft entries?

https://github.com/coreos/etcd/blob/master/etcdserver/server.go

func (s *EtcdServer) Start() {
	s.start()
	s.goAttach(func() { s.publish(s.Cfg.ReqTimeout()) })
	s.goAttach(s.purgeFile)
	s.goAttach(func() { monitorFileDescriptor(s.stopping) })
	s.goAttach(s.monitorVersions)
	s.goAttach(s.linearizableReadLoop)
	s.goAttach(s.monitorKVHash)
}

I am trying to figure out the best way to fetch hashes from other members without starting gRPC server, or starting Raft. Or on restart, we still start Raft and gRPC server but do not accept new requests other than hash kv RPC?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just before s.start() entirely; load the cluster membership info and hash from the backend, issue the hashkv RPCs to the member addresses and compare with the backend, then call start() if there's no mismatch.


corrupted := false
for i := 0; i < 5; i++ {
presp, perr := cli0.Put(context.TODO(), "abc", "aaa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the boot check should prevent member 0 from servicing any KV RPCs, otherwise it doesn't provide any better guarantees than the periodic corruption check

@gyuho gyuho added the WIP label Sep 15, 2017
@gyuho gyuho force-pushed the initial-hash-checking branch from 6e3b11e to c68824c Compare September 19, 2017 13:15
@gyuho gyuho changed the title etcdserver: initial hash checking *: initial hash checking Sep 19, 2017
@gyuho gyuho changed the title *: initial hash checking *: check initial kv hashes for consistency Sep 19, 2017
@etcd-io etcd-io deleted a comment from codecov-io Sep 20, 2017
@gyuho gyuho force-pushed the initial-hash-checking branch 12 times, most recently from c15b9f1 to 260eae2 Compare September 21, 2017 09:15
@xiang90
Copy link
Contributor

xiang90 commented Sep 22, 2017

if this PR itself is done, can we remove the WIP label?

}
plog.Infof("corruption checking on %s (%d members)", s.ID().String(), n)

h, rev, _, err := s.kv.HashByRev(0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when this local node is slow, and the revision has been compacted on other peers?

@@ -24,6 +24,69 @@ import (
"github.com/coreos/etcd/pkg/types"
)

func (s *EtcdServer) checkHashKVInit() {
// TODO: separate configuration for initial hash check?
if s.Cfg.CorruptCheckTime < 3*time.Second {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

}
mbs := s.cluster.Members()
n := len(mbs)
if n < 3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

cli.Close()

if resp == nil && cerr != nil {
plog.Fatal(cerr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if all its peers are dead, then this peer cannot start. the cluster cannot be bootstrapped after a full shutdown?

@gyuho gyuho added WIP and removed WIP labels Sep 27, 2017
@gyuho gyuho force-pushed the initial-hash-checking branch from 260eae2 to 89bcaa4 Compare November 20, 2017 17:45
@gyuho gyuho force-pushed the initial-hash-checking branch from 4ab97dd to c342c4f Compare November 22, 2017 21:00
embed/etcd.go Outdated
// since this is before "EtcdServer.Start()"
// if not nil, it will block on "EtcdServer.Close()"
e.Server = nil
return e, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return nil, err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this to close peer listeners after startPeerListeners

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the problem to return a non-nil server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.Server is not nil, then it will block forever inside defer with EtcdServer.Close, since the server never got started.

}
}
if mismatch > 0 {
return fmt.Errorf("%s is corrupt", s.ID())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not exactly true. maybe other node is corrupted. all we know is that there is a state mismatch inside the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. How about %s found data inconsistency with peers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

@gyuho gyuho force-pushed the initial-hash-checking branch from c342c4f to 54c6784 Compare November 22, 2017 23:01
@@ -66,7 +66,8 @@ type ServerConfig struct {

AuthToken string

CorruptCheckTime time.Duration
InitialCorruptCheck bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@gyuho gyuho force-pushed the initial-hash-checking branch from 54c6784 to b58694d Compare November 22, 2017 23:31
@xiang90
Copy link
Contributor

xiang90 commented Nov 22, 2017

LGTM

@gyuho gyuho force-pushed the initial-hash-checking branch from 63df258 to d6d2585 Compare November 22, 2017 23:52
for _, p := range peers {
if p.resp != nil {
peerID := types.ID(p.resp.Header.MemberId)
if h != p.resp.Hash {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to reason about compaction revision. Calling on hashKV(100) on all different members does not guarantee that the hash is same among all member even those there is no corruption.

Suppose local member's highest rev is 100 and has no compaction. then HashKV hashes keys from rev = 0 to rev 100. If one of peer member has compacted at 50, then calling hashKV results HashKV hashes keys from rev= 50 to 100.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, just added compact revision checks to handle that case.

@codecov-io
Copy link

codecov-io commented Nov 23, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@f739853). Click here to learn what that means.
The diff coverage is 40.74%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #8554   +/-   ##
=========================================
  Coverage          ?   75.96%           
=========================================
  Files             ?      359           
  Lines             ?    29786           
  Branches          ?        0           
=========================================
  Hits              ?    22626           
  Misses            ?     5567           
  Partials          ?     1593
Impacted Files Coverage Δ
embed/config.go 61.63% <ø> (ø)
etcdserver/config.go 81.81% <ø> (ø)
etcdmain/config.go 81.39% <100%> (ø)
etcdserver/corrupt.go 54.42% <36.95%> (ø)
embed/etcd.go 68.9% <57.14%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f739853...0e4e8ed. Read the comment docs.

for _, p := range peers {
if p.resp != nil {
peerID := types.ID(p.resp.Header.MemberId)
if h != p.resp.Hash && crev == p.resp.CompactRevision {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to log that we ignore checking as warning if the crev is not matched.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fixed.

if p.err != nil {
switch p.err {
case rpctypes.ErrFutureRev:
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: peer is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning

case rpctypes.ErrFutureRev:
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: peer is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error())
case rpctypes.ErrCompacted:
plog.Errorf("%s cannot check the hash of peer(%q) at revision %d: local node is lagging behind(%q)", s.ID(), p.eps, rev, p.err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning

@gyuho gyuho force-pushed the initial-hash-checking branch 2 times, most recently from 50fdf9f to e4d8791 Compare November 23, 2017 05:12
plog.Errorf("%s's hash %d != %s's hash %d (revision %d, peer revision %d, compact revision %d)", s.ID(), h, peerID, p.resp.Hash, rev, p.resp.Header.Revision, crev)
mismatch++
} else {
plog.Warningf("%s hash mismatch with peer %s at revision %d (compact revision %d, peer compact revision %d)", s.ID(), peerID, rev, crev, p.resp.CompactRevision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this mismatch is expected. we should log this as cannot check hash since the compact reversion is different

@gyuho gyuho force-pushed the initial-hash-checking branch from e4d8791 to f7816fd Compare November 23, 2017 05:15
plog.Errorf("%s's hash %d != %s's hash %d (revision %d, peer revision %d, compact revision %d)", s.ID(), h, peerID, p.resp.Hash, rev, p.resp.Header.Revision, crev)
mismatch++
} else {
plog.Warningf("%s cannot check hash since the compact reversion is different at revision %d (compact revision %d, peer %s compact revision %d)", s.ID(), rev, crev, peerID, p.resp.CompactRevision)
Copy link
Contributor

@xiang90 xiang90 Nov 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well. i mean

%s cannot check hash of peer(%s): peer has a different compact revision %d (revision:%d)

etcdserver: only compare hash values if any

It's possible that peer has higher revision than local node.
In such case, hashes will still be different on requested
revision, but peer's header revision is greater.

etcdserver: count mismatch only when compact revisions are same

Signed-off-by: Gyu-Ho Lee <[email protected]>
@gyuho gyuho force-pushed the initial-hash-checking branch from f7816fd to 0e4e8ed Compare November 23, 2017 05:20
@xiang90
Copy link
Contributor

xiang90 commented Nov 23, 2017

lgtm if CI passes

@gyuho gyuho merged commit d84d3f2 into etcd-io:master Nov 23, 2017
@gyuho gyuho deleted the initial-hash-checking branch November 23, 2017 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants