Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry logic doesn't match datastax recommendation #128

Open
therapy-lf opened this issue Apr 10, 2019 · 1 comment
Open

Retry logic doesn't match datastax recommendation #128

therapy-lf opened this issue Apr 10, 2019 · 1 comment

Comments

@therapy-lf
Copy link

As it says in https://docs.datastax.com/en/developer/java-driver/3.2/manual/retries/#retries-and-idempotence

  • retrying in onReadTimeout is always safe, since by definition this error indicates that the query was a read, which didn’t mutate any data;
  • similarly, onUnavailable is safe: the coordinator is telling us that it didn’t find enough replicas, so we know that it didn’t try to apply the query.
  • onWriteTimeout is not safe: some replicas failed to reply to the coordinator in time, but they might still have applied the mutation;
  • onRequestError is not safe either: the query might have been applied before the error occurred. In particular, an OperationTimedOutException could be caused by a network issue that prevented a successful response to come back to the client.

But looking into a code I see that it won't retry onUnavailable but in the same time it will on onWriteTimeout which is not safe:
https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L727-L764
https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/policies/retry/simple.lua#L38-L48

Also, it is not pretty clear where those timeouts come https://github.com/thibaultcha/lua-cassandra/blob/master/lib/resty/cassandra/cluster.lua#L752

So, I have a few questions:

  1. Could it be possible that retry logic is broken or I'm wrong?
  2. Is it safe to retry request when those unknown timeouts occur and from where they might come?

Currently, we switched off the retry mechanism by setting retry_on_timeout to false and max_retries to one.

@thibaultcha
Copy link
Owner

The retry logic and the timeouts all come from the Datastax drivers that existed circa 2015 when this driver was implemented. I won't be updating the logic myself, but thanks for raising the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants