Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: configure server keepalive #8535

Merged
merged 3 commits into from
Sep 28, 2017
Merged

*: configure server keepalive #8535

merged 3 commits into from
Sep 28, 2017

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Sep 9, 2017

From #8477.
For #8022.

@heyitsanthony
Copy link
Contributor

please rebase with the watch keepalive test when #8545 is merged; this can't be merged without tests

Also, the keepalive test can drop the time.Sleep calls:

+++ b/clientv3/integration/watch_keepalive_test.go
@@ -66,14 +66,12 @@ func TestWatchKeepAlive(t *testing.T) {
        // ep[0] keepalive time-out after DialKeepAliveTime + DialKeepAliveTimeout
        // wait extra for processing network error for endpoint switching
        timeout := ccfg.DialKeepAliveTime + ccfg.DialKeepAliveTimeout + ccfg.DialTimeout
-       time.Sleep(timeout)
-
        if _, err = clus.Client(1).Put(context.TODO(), "foo", "bar"); err != nil {
                t.Fatal(err)
        }
        select {
        case <-wch:
-       case <-time.After(5 * time.Second):
+       case <-time.After(timeout):
                t.Fatal("took too long to receive events")
        }
 
@@ -82,14 +80,12 @@ func TestWatchKeepAlive(t *testing.T) {
        defer clus.Members[1].Unblackhole()
 
        // wait for ep[0] recover, ep[1] fail
-       time.Sleep(timeout)
-
        if _, err = clus.Client(0).Put(context.TODO(), "foo", "bar"); err != nil {
                t.Fatal(err)
        }
        select {
        case <-wch:
-       case <-time.After(5 * time.Second):
+       case <-time.After(timeout):
                t.Fatal("took too long to receive events")
        }
 }

@gyuho gyuho added the WIP label Sep 12, 2017
@gyuho gyuho changed the title *: configure server keepalive *: configure server keepalive, test watch keepalive Sep 18, 2017
@gyuho gyuho force-pushed the keepalive-server branch 4 times, most recently from aeea06b to bfea25b Compare September 19, 2017 02:11
@xiang90
Copy link
Contributor

xiang90 commented Sep 22, 2017

what is the blocked on?

@gyuho
Copy link
Contributor Author

gyuho commented Sep 23, 2017

@xiang90 This is blocked on #8581, failing a bunch of CIs since #8545 (see #8592). Will investigate more next week.

@gyuho gyuho removed the WIP label Sep 27, 2017
@etcd-io etcd-io deleted a comment from codecov-io Sep 27, 2017
@gyuho gyuho force-pushed the keepalive-server branch 2 times, most recently from e41d020 to 47ae89e Compare September 27, 2017 23:21
embed/config.go Outdated
SnapCount: etcdserver.DefaultSnapCount,
MaxTxnOps: DefaultMaxTxnOps,
MaxRequestBytes: DefaultMaxRequestBytes,
GRPCKeepAliveMinTime: 5 * time.Second,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define this as a const?

@@ -143,6 +144,9 @@ func newConfig() *config {
fs.Int64Var(&cfg.QuotaBackendBytes, "quota-backend-bytes", cfg.QuotaBackendBytes, "Raise alarms when backend size exceeds the given quota. 0 means use the default quota.")
fs.UintVar(&cfg.MaxTxnOps, "max-txn-ops", cfg.MaxTxnOps, "Maximum number of operations permitted in a transaction.")
fs.UintVar(&cfg.MaxRequestBytes, "max-request-bytes", cfg.MaxRequestBytes, "Maximum client request size in bytes the server will accept.")
fs.DurationVar(&cfg.GRPCKeepAliveMinTime, "grpc-keepalive-min-time", cfg.Config.GRPCKeepAliveMinTime, "Minimum interval duration that a client should wait before pinging server.")
fs.DurationVar(&cfg.GRPCKeepAliveInterval, "grpc-keepalive-interval", time.Duration(0), "Frequency duration of server-to-client ping to check if a connection is alive.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain what does the default value do?

@@ -143,6 +144,9 @@ func newConfig() *config {
fs.Int64Var(&cfg.QuotaBackendBytes, "quota-backend-bytes", cfg.QuotaBackendBytes, "Raise alarms when backend size exceeds the given quota. 0 means use the default quota.")
fs.UintVar(&cfg.MaxTxnOps, "max-txn-ops", cfg.MaxTxnOps, "Maximum number of operations permitted in a transaction.")
fs.UintVar(&cfg.MaxRequestBytes, "max-request-bytes", cfg.MaxRequestBytes, "Maximum client request size in bytes the server will accept.")
fs.DurationVar(&cfg.GRPCKeepAliveMinTime, "grpc-keepalive-min-time", cfg.Config.GRPCKeepAliveMinTime, "Minimum interval duration that a client should wait before pinging server.")
fs.DurationVar(&cfg.GRPCKeepAliveInterval, "grpc-keepalive-interval", time.Duration(0), "Frequency duration of server-to-client ping to check if a connection is alive.")
fs.DurationVar(&cfg.GRPCKeepAliveTimeout, "grpc-keepalive-timeout", time.Duration(0), "Additional duration of wait before closing a non-responsive connection.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should find a good default value to enable keepalive related options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined default interval 5s, timeout 3s.

clus.Members[0].Blackhole()

// expects endpoint switch to ep[1]
cli.SetEndpoints(clus.Members[0].GRPCAddr(), clus.Members[1].GRPCAddr())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will setenpoints cause a shuffle directly? or client will still pin to the original one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will trigger gRPC to close ep[0], and client will pin ep[1].


clus := integration.NewClusterV3(t, &integration.ClusterConfig{
Size: 3,
GRPCKeepAliveMinTime: time.Millisecond, // avoid too_many_pings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are testing the client-side keepalive here, why do we need to enable server-side keepalive? we probably can also move this test into another pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed server parameters for this test, leaving server-to-client ping test as TODO.

@gyuho gyuho changed the title *: configure server keepalive, test watch keepalive *: configure server keepalive, test client watch keepalive Sep 28, 2017
@gyuho gyuho changed the title *: configure server keepalive, test client watch keepalive *: configure server keepalive Sep 28, 2017
@gyuho gyuho removed the WIP label Sep 28, 2017
@gyuho
Copy link
Contributor Author

gyuho commented Sep 28, 2017

@xiang90 Used default values in grpc-go for keepalive server interval and timeout. PTAL.

@@ -70,6 +70,12 @@ member flags:
maximum number of operations permitted in a transaction.
--max-request-bytes '1572864'
maximum client request size in bytes the server will accept.
--grpc-keepalive-min-time '5s'
minimum duration interval that a client should wait before pinging server.
--grpc-keepalive-interval '2h'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2h seems too high. probably 5 minutes or something? is there a recommended vault from gRPC-go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC Go defaults to 2h. Should etcd use 5min?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2h is fine if it is gRPC's default.

@xiang90
Copy link
Contributor

xiang90 commented Sep 28, 2017

lgtm

@gyuho gyuho merged commit 10202a5 into etcd-io:master Sep 28, 2017
@gyuho gyuho deleted the keepalive-server branch September 28, 2017 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants