-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cassandra) ensure single coordinator in migrations #2326
Conversation
This ensures we respect the proper 'single coordinator' pattern for migrations when one of them is using the DAO's `find_all()` method. * ensure `find_all()` only uses the migrations coordinator. * ensure we wait for schema consensus before doing so, since the coordinator will have to get responses about the table's content from its peers. * perf: only wait for schema consensus if we ran some migrations at all.
Implement a new `cassandra_schema_consensus_timeout` property to increase the C* `max_schema_consensus_wait` value. Particularly useful for clusters where the inter-nodes communication seems to be slow and the schema changes during migrations can take more than the default of 10s, and make the migration fail unnecessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how this works, as the magic seems to be in the driver.
Would it be possible to add a test for the behaviour?
kong/dao/db/cassandra.lua
Outdated
-- before performing such a DML query | ||
local ok, err = self:wait_for_schema_consensus() | ||
if not ok then | ||
return nil, "could not wait for schema consensus: " .. err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"failed waiting for schema consensus"
kong/dao/factory.lua
Outdated
local ok, err = self.db:wait_for_schema_consensus() | ||
if not ok then | ||
return ret_error_string(self.db.name, nil, | ||
"could not wait for schema consensus: " .. err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"failed waiting for..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both forms are used intermittently in the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that only impatient people 'cannot wait' 😄
The "do not merge" label is for refused PRs.
Sadly not, or else it would be included. |
Actually the driver has little to do with this all. |
surely we close those? this label at least is more descriptive than a foot note in the original post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering updating the error messages optional.
It is sometimes more complicated than that.
Sounds just like the definition of a timeout! |
This ensures we update the Nginx time between schema consensus timeout checks and also adds the ability to manually add and remove C* peers. thibaultcha/lua-cassandra@1.1.1...1.2.1
9ec9e79
to
ddeea29
Compare
Updated them in the end 😉 |
Summary
This ensures we respect the proper 'single coordinator' pattern for
migrations when one of them is using the DAO's
find_all()
method.Full changelog
find_all()
only uses the migrations coordinator.coordinator will have to get responses about the table's content from
its peers.
driver.
cassandra_schema_consensus_timeout
property.when testing for a schema consensus timeout.
NOTE: Do NOT merge yet.