Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix blocking etcd process #7546

Merged
merged 2 commits into from
Mar 21, 2017
Merged

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Mar 20, 2017

Fix #7512.

If a server starts and aborts due to config error,
it is possible to get stuck in ReadyNotify waits.
This adds select case to get notified on stop channel.

etcdmain/etcd.go Outdated
select {
case <-e.Server.ReadyNotify(): // wait for e.Server to join the cluster
case <-e.Server.StopNotify(): // publish aborted from 'ErrStopped'
e.Server.HardStop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? StopNotify returns done, so it should only hit this if the etcdserver is already torn down and HardStop isn't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. It's closed when (s *EtcdServer) run() exits. Will remove.

@@ -465,6 +465,8 @@ type member struct {

// serverClient is a clientv3 that directly calls the etcdserver.
serverClient *clientv3.Client

keepDataDirStop bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keepDataDirTerminate bool? Another option is Stop the member and remove from c.Members, before calling removeMember on the ID so it never hits Terminate.

@gyuho gyuho added the WIP label Mar 21, 2017
gyuho added 2 commits March 21, 2017 10:22
Fix etcd-io#7512.

If a server starts and aborts due to config error,
it is possible to get stuck in ReadyNotify waits.
This adds select case to get notified on stop channel.

Signed-off-by: Gyu-Ho Lee <[email protected]>
@gyuho gyuho force-pushed the fix-blocking-etcd-process branch from 7532413 to 2d5f890 Compare March 21, 2017 17:29
@xiang90
Copy link
Contributor

xiang90 commented Mar 21, 2017

OK... Interesting bug. Good fix.

<-e.Server.ReadyNotify() // wait for e.Server to join the cluster
select {
case <-e.Server.ReadyNotify(): // wait for e.Server to join the cluster
case <-e.Server.StopNotify(): // publish aborted from 'ErrStopped'
Copy link
Contributor

@heyitsanthony heyitsanthony Mar 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcdserver/server.go still needs updating? ErrStopped on publish won't hardstop the server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying HardStop on ErrStopped case, but process was still halting.
I will add handling in server.go.

@gyuho gyuho removed the WIP label Mar 21, 2017
@heyitsanthony
Copy link
Contributor

lgtm. Thanks!

@gyuho
Copy link
Contributor Author

gyuho commented Mar 21, 2017

Just note: We send errMemberRemoved to https://github.com/coreos/etcd/blob/master/etcdserver/server.go#L493 from rafthttp, which will be returned to EtcdServer.errorc, so no change is needed in etcdserver.

@gyuho gyuho merged commit 327e255 into etcd-io:master Mar 21, 2017
@gyuho gyuho deleted the fix-blocking-etcd-process branch March 21, 2017 19:04
@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@cd70ea3). Click here to learn what that means.
The diff coverage is 50%.

@@            Coverage Diff            @@
##             master    #7546   +/-   ##
=========================================
  Coverage          ?   70.55%           
=========================================
  Files             ?      320           
  Lines             ?    26183           
  Branches          ?        0           
=========================================
  Hits              ?    18473           
  Misses            ?     6262           
  Partials          ?     1448
Impacted Files Coverage Δ
integration/cluster.go 85.3% <33.33%> (ø)
etcdmain/etcd.go 45.37% <66.66%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd70ea3...2d5f890. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants