Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to disable caching field names to avoid schema mismatches #5572

Merged
merged 1 commit into from
Dec 17, 2019

Conversation

pH14
Copy link
Contributor

@pH14 pH14 commented Dec 13, 2019

We've had several incidents where schema migrations + the query plan cache can combine to return results with the wrong column/field names until the cache is refreshed.

ex:

  1. SELECT * FROM table WHERE id = something is executed and cached with field names (foo, bar, baz)
  2. ptosc or ghost migration completes to drop bar
  3. SELECT * FROM table WHERE id = something is executed again, and the tablet grabs the field names from the cache, which includes bar. However, MySQL only returned data for two columns, (foo, baz)
  4. Client either fails loudly, when the field / row types mismatch, or fails silently when the types can be coerced into the wrong columns (scary!).

From my understanding, this mismatch can happen when:

  • A migration is executed out-of-band via a tool like ghost or ptosc
  • Online DDL is executed against a master, in the brief period of time before the schema-refresh hook after it completes.
  • Online DDL is executed against a master, and then replicates to a replica while that replica is serving reads. The replica can return bad results until its query cache is refreshed.

The simplest & most expedient thing I could think of to circumvent this problem entirely is to just stop caching field names with the query plans, which is what this PR enables. Looking for feedback on the changes, and I'm curious if others have run into these issues as well.

Signed-off-by: Paul Hemberger [email protected]

@pH14 pH14 requested a review from sougou as a code owner December 13, 2019 22:15
Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting to make this behavior permanent. Is there a way you can benchmark to see if there's a performance difference with vs without field caching? I think the difference will be negligible enough that we can run without caching always. The win will be the avoidance of yet another flag.

If you don't have the bandwidth, I'll approve this PR.

@pH14
Copy link
Contributor Author

pH14 commented Dec 16, 2019

We don't have formal benchmarks, but we're planning on rolling it out internally on our end this week. We'll update with how the broader before / after numbers look.

@morgo
Copy link
Contributor

morgo commented Dec 16, 2019

If the benchmarks are inconclusive, you could change the default of this option so that caching is disabled. If we don't hear any negative feedback for a release, we can remove the option.

The unfortunate part about this option is that users will not know it is available unless they have a lot of expertise.

@pH14
Copy link
Contributor Author

pH14 commented Dec 16, 2019

Sounds good -- I also realized that this particular flag conflicts with the consolidator code path (if this flag is off, the consolidator is off). We have also disabled the consolidator so that doesn't affect us, but for this PR we'll want to untangle those options.

@demmer
Copy link
Member

demmer commented Dec 16, 2019 via email

Copy link
Contributor

@sougou sougou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @demmer's comment. Although introducing the new flag is more convoluted, it's better to be more cautious about breaking changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants