-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: canceled should always be set to true when cancel a watch request #373
Conversation
Thanks! Can you provide any information on what issue this is causing so that we can validate the fix? |
Sure, the problem I encountered is as follows: I used kine as a replacement for etcd in the https://github.com/apache/apisix project, using postgres as the backend database. When performing maintenance operations on postgres, it needs to be restarted. If kine happens to receive a watch request at this time, this request will be canceled by kine due to SQL execution failure: kine/pkg/logstructured/logstructured.go Line 430 in 73df6c7
Although kine cancels this request, it does not set the
I have checked the native etcd code and found that when canceling a watch request, they always set |
That is interesting. This suggests that the error is not being propagated up from the failed list call, since |
this is the log:
I have considered whether to return the cause of the error to the client, such as |
I don't know if we need to expose the raw error message from the sql client to the etcd client, but it does seem correct to return AN error. |
Understood, then I will return a generic database error message. |
Signed-off-by: Nic <[email protected]>
@brandond send a error message to client now, since this error only occurs when there is an exception in the database, it is difficult to write automated tests.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, one nit!
pkg/logstructured/logstructured.go
Outdated
wr.CompactRevision = compact | ||
wr.CurrentRevision = rev | ||
} else { | ||
errc <- errors.New("failed to execute query in database") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add ErrWatchCancelled = rpctypes.ErrGRPCWatchCanceled
in pkg/server/types.go
and just return that here as a generic watch failure message. We try to return the same errors as upstream whenever possible, even if the cause isn't necessarily the same.
If you can think of a better error from https://pkg.go.dev/go.etcd.io/etcd/api/v3/v3rpc/rpctypes to use please feel free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, great, I think ErrGRPCUnhealthy
is better to express the failure come from system level.
{"result":{"header":{},"watch_id":"1","created":true}}
{"result":{"header":{},"watch_id":"1","canceled":true,"cancel_reason":"rpc error: code = Unavailable desc = etcdserver: unhealthy cluster"}}
Signed-off-by: Nic <[email protected]>
Anytime the server attempts to cancel a watch request, it should set the
canceled
field to true, regardless of whethererr
is empty or not. Leterr
only affect thereason
field.