Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cassandra) ensure single coordinator in migrations #2326

Merged
merged 4 commits into from
Apr 7, 2017

Conversation

thibaultcha
Copy link
Member

@thibaultcha thibaultcha commented Apr 3, 2017

Summary

This ensures we respect the proper 'single coordinator' pattern for
migrations when one of them is using the DAO's find_all() method.

Full changelog

  • ensure find_all() only uses the migrations coordinator.
  • ensure we wait for schema consensus before doing so, since the
    coordinator will have to get responses about the table's content from
    its peers.
  • perf: only wait for schema consensus if we ran some migrations at all.
  • properly pass C* timeout values set in the Kong configuration to the
    driver.
  • new cassandra_schema_consensus_timeout property.
  • bump lua-cassandra to 1.2.1 which ensures we update the Nginx time
    when testing for a schema consensus timeout.

NOTE: Do NOT merge yet.

This ensures we respect the proper 'single coordinator' pattern for
migrations when one of them is using the DAO's `find_all()` method.

* ensure `find_all()` only uses the migrations coordinator.
* ensure we wait for schema consensus before doing so, since the
  coordinator will have to get responses about the table's content from
  its peers.
* perf: only wait for schema consensus if we ran some migrations at all.
Implement a new `cassandra_schema_consensus_timeout` property to
increase the C* `max_schema_consensus_wait` value.

Particularly useful for clusters where the inter-nodes communication
seems to be slow and the schema changes during migrations can take more
than the default of 10s, and make the migration fail unnecessarily.
@thibaultcha thibaultcha added this to the 0.10.2 milestone Apr 4, 2017
Copy link
Member

@Tieske Tieske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this works, as the magic seems to be in the driver.

Would it be possible to add a test for the behaviour?

-- before performing such a DML query
local ok, err = self:wait_for_schema_consensus()
if not ok then
return nil, "could not wait for schema consensus: " .. err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"failed waiting for schema consensus"

local ok, err = self.db:wait_for_schema_consensus()
if not ok then
return ret_error_string(self.db.name, nil,
"could not wait for schema consensus: " .. err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"failed waiting for..."

Copy link
Member Author

@thibaultcha thibaultcha Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both forms are used intermittently in the codebase

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that only impatient people 'cannot wait' 😄

@thibaultcha
Copy link
Member Author

thibaultcha commented Apr 5, 2017

The "do not merge" label is for refused PRs.

Would it be possible to add a test for the behaviour?

Sadly not, or else it would be included.

@thibaultcha
Copy link
Member Author

as the magic seems to be in the driver.

Actually the driver has little to do with this all.

@Tieske
Copy link
Member

Tieske commented Apr 5, 2017

The "do not merge" label is for refused PRs.

surely we close those? this label at least is more descriptive than a foot note in the original post.

Copy link
Member

@Tieske Tieske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering updating the error messages optional.

@thibaultcha
Copy link
Member Author

surely we close those?

It is sometimes more complicated than that.

I'd say that only impatient people 'cannot wait' 😄

Sounds just like the definition of a timeout!

This ensures we update the Nginx time between schema consensus timeout
checks and also adds the ability to manually add and remove C* peers.

thibaultcha/lua-cassandra@1.1.1...1.2.1
@thibaultcha thibaultcha force-pushed the fix/single-coordinator-migrations branch from 9ec9e79 to ddeea29 Compare April 6, 2017 22:14
@thibaultcha
Copy link
Member Author

considering updating the error messages optional.

Updated them in the end 😉

@thibaultcha thibaultcha merged commit 2d3bb54 into master Apr 7, 2017
@thibaultcha thibaultcha deleted the fix/single-coordinator-migrations branch April 7, 2017 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants