-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The --exact-rowcount option is buggy on large tables #830
Comments
Very interesting -- thank you for this analysis! I've never seen this scenario happen, and once you point it out, it makes sense. I'm not the owner of this repository at this time, but would you consider a pull request to update the docs? If not, someone else would hopefully do it, and hopefully also fix the code. |
Closes github#830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over.
@zhxnlai / @shlomi-noach: I'm curious having support for cancelling the row-count query would solve this issue? Some ideas that come to mind:
For now I think documenting this limitation sounds good, this issue doesn't come up often |
This will not kill the query. Only way as far as I can tell is to |
Ahh yes, the query won't be killed on the server 🤔👍. Sounds like documentation is the best approach |
* Cancel any row count queries before attempting to cut over Closes #830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over. * Make it threadsafe * Kill the count query on the database side as well * Explicitly grab a connection to run the count, store its connection id * When the query context is canceled, run a `KILL QUERY ?` on that connection id * Rewrite these to use the threadsafe functions, stop exporting the cancel func * Update logger * Update logger Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: dm-2 <[email protected]>
* Cancel any row count queries before attempting to cut over Closes #830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over. * Make it threadsafe * Kill the count query on the database side as well * Explicitly grab a connection to run the count, store its connection id * When the query context is canceled, run a `KILL QUERY ?` on that connection id * Rewrite these to use the threadsafe functions, stop exporting the cancel func * Update logger * Update logger Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: dm-2 <[email protected]>
* Cancel any row count queries before attempting to cut over Closes #830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over. * Make it threadsafe * Kill the count query on the database side as well * Explicitly grab a connection to run the count, store its connection id * When the query context is canceled, run a `KILL QUERY ?` on that connection id * Rewrite these to use the threadsafe functions, stop exporting the cancel func * Update logger * Update logger Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: dm-2 <[email protected]>
* Cancel any row count queries before attempting to cut over Closes github#830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over. * Make it threadsafe * Kill the count query on the database side as well * Explicitly grab a connection to run the count, store its connection id * When the query context is canceled, run a `KILL QUERY ?` on that connection id * Rewrite these to use the threadsafe functions, stop exporting the cancel func * Update logger * Update logger Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: dm-2 <[email protected]>
* Merge pull request github#1143 from github/codeql-run-on-prs Run CodeQL analysis on PRs * Merge pull request github#1141 from shaohk/fix-two-phase-commit-lost-data_v2 fix(lost data in mysql two-phase commit): lost data in mysql two-phas… * compound pk tests (github#387) * compound pk tests * more details in failure diff * more elaborate test; the pk-ts one consistently fails * Fix merge conflict Co-authored-by: Shlomi Noach <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> * Use `switch` statements for readability, simplify `.NewGoMySQLReader()` (github#1135) * Use `switch` statements for readability * Simplify initBinlogReader() * Add context/timeout to HTTP throttle check (github#1131) * Add context/timeout to HTTP throttle check * Dont run `.GetThrottleHTTPInterval()` on every loop * Update help message * Var rename * 2022 * Add timeout flag * Add unix/tcp server commands, use ParseInt() for string->int64 * Var rename * Re-check http timeout on every loop iteration * Remove stale comment * Make throttle interval idempotent * var rename * Usage grammar * Make http timeout idempotent too * Parse time.Duration once * Move timeout to NewThrottler * Help update * Set User-Agent header * Re-add newline Co-authored-by: dm-2 <[email protected]> * Default to `go1.17.11` (github#1136) * Default to go1.17.11 * `go mod vendor` Co-authored-by: dm-2 <[email protected]> * Cancel any row count queries before attempting to cut over (github#846) * Cancel any row count queries before attempting to cut over Closes github#830. Switches from using `QueryRow` to `QueryRowContext`, and stores a context.CancelFunc in the migration context, which is called to halt any running row count query before beginning the cut over. * Make it threadsafe * Kill the count query on the database side as well * Explicitly grab a connection to run the count, store its connection id * When the query context is canceled, run a `KILL QUERY ?` on that connection id * Rewrite these to use the threadsafe functions, stop exporting the cancel func * Update logger * Update logger Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: dm-2 <[email protected]> Co-authored-by: dm-2 <[email protected]> Co-authored-by: Shlomi Noach <[email protected]> Co-authored-by: Shlomi Noach <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Tim Vaillancourt <[email protected]> Co-authored-by: Andrew Mason <[email protected]>
Context
During a schema change, gho-st copies every row from the original table to a ghost table. This results in a lock on the original table and a long running query similar to the following
It appears that when
--exact-rowcount
is enabled, ghost grabs another lock on the original table and starts an addition query to calculate a more accurate progress ETA. By default, row counting happens in parallel to row copying.Problem
After all the rows are copied to the ghost table, gh-ost attempts to cut-over the original table.
Usually, the row counting query finishes before all the rows are copied to the ghost table. At the time of cut-over, because both queries have finished, there is no lock on the origin table. Hence the schema change is successful.
However, we repeatedly found that on a large table (> 400GB), row counting query takes longer than row copying. At the time of cut-over, the row counting query is still running and holding a lock on the original table.
This causes the schema change to fail with this error: "Error 1205: Lock wait timeout exceeded; try restarting transaction", which was described in #773
Proposals
If you run into this issue, try disabling
--exact-rowcount
.Improvements that we could make
The text was updated successfully, but these errors were encountered: