Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: quit command waits for server shutdown #7603

Merged
merged 1 commit into from
Jul 5, 2016
Merged

cli: quit command waits for server shutdown #7603

merged 1 commit into from
Jul 5, 2016

Conversation

tamird
Copy link
Contributor

@tamird tamird commented Jul 2, 2016

Closes #6585.


This change is Reviewable

ctx := stream.Context()

select {
case <-s.server.stopper.IsStopped():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the stopper close all the listeners? And the connection is still available and can communicate with the client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the stopper closes the connection - see the code on the client side that expressly waits for this to happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the stopper closes the connection first and then stops some other stuff after, why does the client know that the server has completely terminated by the time its connection is closed? I see what the intention of the code is, but I don't see at all why it works.

Copy link
Contributor Author

@tamird tamird Jul 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those things happen already after the stopper is Quiesce()d and Wait()ed, so all the stopper is doing before process death is:

    for _, c := range s.closers {
        c.Close()
    }
    close(s.stopped)

There's a race here, but I just don't see how we can do any better (short of adding yet another phase to the stopper so that connections are the last thing to go).

@tbg
Copy link
Member

tbg commented Jul 2, 2016

I just can't convince myself that there isn't a race that actually makes the shutdown experience worse. A clean solution would close all the server listeners (and generally file descriptors) first, and then the client connection (as you point out above).
That said, if you and @knz are convinced that nothing funky can happen and that this is a strict improvement, it'll do for me.

@tbg tbg removed their assignment Jul 2, 2016
@tamird
Copy link
Contributor Author

tamird commented Jul 2, 2016

Well, I do think it's a strict improvement. Do I have your blessing to merge?

@tbg
Copy link
Member

tbg commented Jul 2, 2016

I don't see much won here except a chance for random errors on shutdown, so
I'd like to defer that decision to @knz (who likely feels more strongly
about this going in than I do).

On Sat, Jul 2, 2016 at 10:53 AM Tamir Duberstein [email protected]
wrote:

Well, I do think it's a strict improvement. Do I have your blessing to
merge?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#7603 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AE135PyXmnR-OLMY0z0xDEFiJgar8qjdks5qRnttgaJpZM4JDlxi
.

-- Tobias

@knz
Copy link
Contributor

knz commented Jul 2, 2016

I do not really like the complexity added by this PR. In my view the simpler approach would be to:

  1. client sends drain message, waits for shutdownReady
  2. server processes drain locally; when the drain is completed, server sends a shutdownReady to the client, waits for Ack
  3. the client sends Ack back to the server
  4. server issues os.Exit
  5. meanwhile client waits for socket to close remotely (which will be an acknowledgement from the server OS that the server process has died!)

Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


cli/start.go, line 521 [r1] (raw file):

              fmt.Println("ok")
          }
          return err

I do not like the meaning of encountering an error here. What is the state of the server then?


Comments from Reviewable

@bdarnell
Copy link
Contributor

bdarnell commented Jul 5, 2016

Introducing bidirectional communication to the quit command seems a lot more complex than what's being changed here (a server that has drained but is unable to exit and restart due to an unresponsive client would be a bad failure mode). This change isn't perfect, but it looks like the best we can do with a reasonable amount of effort. LGTM.


Review status: 0 of 5 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


cli/start.go, line 521 [r1] (raw file):

Previously, knz (kena) wrote…

I do not like the meaning of encountering an error here. What is the state of the server then?

If an error occurs during `quit`, the state of the server is unknown. What else could we do?

Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jul 5, 2016

As I explained to tamir I do not have a better solution in mind. So of course I'll sign off on this since it brings us closer to a solution. See my comments about code structure below however.


Reviewed 5 of 5 files at r1.
Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


cli/start.go, line 526 [r1] (raw file):

  }

  {

I;d suggest turning this entire block into its own function and call the same functions 2 times. The code replication here is unnerving.


cli/start.go, line 528 [r1] (raw file):

  {
      stream, err := c.Drain(ctx, &serverpb.DrainRequest{
          Shutdown: true,

Please add a comment here to remind the reader why not having "onModes" means the server should perform a hard shutdown.


server/admin.go, line 691 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

All those things happen already after the stopper is Quiesce()d and Wait()ed, so all the stopper is doing before process death is:


  for _, c := range s.closers {

      c.Close()

  }

  close(s.stopped)

There's a race here, but I just don't see how we can do any better (short of adding yet another phase to the stopper so that connections are the last thing to go).

How much work would the additional phase mean?

Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Jul 5, 2016

Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


cli/start.go, line 526 [r1] (raw file):

Previously, knz (kena) wrote…

I;d suggest turning this entire block into its own function and call the same functions 2 times. The code replication here is unnerving.

Please suggest some code. I looked at this for a while and failed to find an adequate refactor.

cli/start.go, line 528 [r1] (raw file):

Previously, knz (kena) wrote…

Please add a comment here to remind the reader why not having "onModes" means the server should perform a hard shutdown.

Done.

server/admin.go, line 691 [r1] (raw file):

Previously, knz (kena) wrote…

How much work would the additional phase mean?

Worse than the work that it would require is that it would further complicate the stopper, which I'd quite like to avoid.

Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jul 5, 2016

Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


cli/start.go, line 526 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Please suggest some code. I looked at this for a while and failed to find an adequate refactor.

What about this:
func waitForShutdown(stream serverpb.Admin_DrainClient) error {
        for {
                if _, err := stream.Recv(); !grpcutil.IsClosedConnection(err) {
                        return err
                }
                return nil
        }
}

func doShutdown(onModes []int32) error {
        stream, err := c.Drain(ctx, &serverpb.DrainRequest{
                On:       onModes,
                Shutdown: true,
        })
        if err != nil {
                return err
        }
        _, err = stream.Recv()
        if err == nil {
                err = waitForShutdown(stream)
                if err == nil {
                        fmt.Println("ok")
                }
                return err
        }
        return nil
}
func blah() {
        if err := doShutdown(onModes); err != nil {
                fmt.Printf("graceful shutdown failed, proceeding with hard shutdown: %v\n", err)

                // Not passing drain modes so the server doesn't bother and goes
                // straight to shutdown.
                if err := doShutdown(nil); err != nil {
                        return fmt.Errorf("hard shutdown failed: %v", err)
                }
        }
        return nil
}

server/admin.go, line 691 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Worse than the work that it would require is that it would further complicate the stopper, which I'd quite like to avoid.

Ok I see. Fine with me.

Comments from Reviewable

@tamird
Copy link
Contributor Author

tamird commented Jul 5, 2016

Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


cli/start.go, line 526 [r1] (raw file):

Previously, knz (kena) wrote…

What about this:

func waitForShutdown(stream serverpb.Admin_DrainClient) error {
        for {
                if _, err := stream.Recv(); !grpcutil.IsClosedConnection(err) {
                        return err
                }
                return nil
        }
}

func doShutdown(onModes []int32) error {
        stream, err := c.Drain(ctx, &serverpb.DrainRequest{
                On:       onModes,
                Shutdown: true,
        })
        if err != nil {
                return err
        }
        _, err = stream.Recv()
        if err == nil {
                err = waitForShutdown(stream)
                if err == nil {
                        fmt.Println("ok")
                }
                return err
        }
        return nil
}
func blah() {
        if err := doShutdown(onModes); err != nil {
                fmt.Printf("graceful shutdown failed, proceeding with hard shutdown: %v\n", err)

                // Not passing drain modes so the server doesn't bother and goes
                // straight to shutdown.
                if err := doShutdown(nil); err != nil {
                        return fmt.Errorf("hard shutdown failed: %v", err)
                }
        }
        return nil
}
Not equivalent. If `stream.Recv` returns an error (the first time), doShutdown will return `nil` and this whole thing will do the wrong thing.

Comments from Reviewable

@tamird tamird merged commit 96de84d into cockroachdb:master Jul 5, 2016
@tamird tamird deleted the quit-proper branch July 5, 2016 21:20
@jseldess jseldess removed the docs-todo label Jul 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants