Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decouple olap tx timeout from oltp tx timeout #10946

Merged
merged 15 commits into from
Sep 7, 2022

Conversation

maxenglander
Copy link
Collaborator

@maxenglander maxenglander commented Aug 5, 2022

Description

Since workload=olap bypasses the query timeouts
(--queryserver-config-query-timeout) and also row limits, the natural
assumption is that it also bypasses the transaction timeout.

This is not the case, e.g. for a tablet where the
--queryserver-config-transaction-timeout is 10.

This PR

  • Adds new CLI flag and YAML field to independently configure TX
    timeouts for OLAP workloads --queryserver-config-olap-transaction-timeout with a default value of 0 seconds, disabling OLAP TX timeouts.
  • Decouples TX kill interval from OLTP TX timeout via new CLI flag and
    YAML field --queryserver-config-transaction-killer-interval defaulting to 3 seconds.

One subtlety is that the timeout that is applied to the transaction is based on the value of the workload setting at the beginning of the transaction. If the workload is changed mid-transaction, that may change the timeout applied to queries within the transaction, but it won't change the transaction timeout.

Demo

Using (new) default values), connected to VTGate.

mysql> set workload=oltp;
Query OK, 0 rows affected (0.00 sec)

mysql> begin ; select 1 from data limit 1; select sleep(35); commit;
Query OK, 0 rows affected (0.00 sec)

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

ERROR 1317 (70100): target: dst.-.primary: vttablet: (errno 2013) due to context deadline exceeded, elapsed time: 30.00055243s, killing query ID 106 (CallerID: maxenglander)
ERROR 1317 (70100): target: dst.-.primary: vttablet: rpc error: code = Aborted desc = transaction 1659665741215176173: ended at 2022-08-05 02:18:29.360 UTC (unlocked closed connection) (CallerID: maxenglander)
mysql> set workload=olap;
Query OK, 0 rows affected (0.01 sec)

mysql> begin ; select 1 from data limit 1; select sleep(35); commit;
Query OK, 0 rows affected (0.00 sec)

+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

+-----------+
| sleep(35) |
+-----------+
|         0 |
+-----------+
1 row in set (35.00 sec)

Query OK, 0 rows affected (0.01 sec)

Breaking changes

Currently OLAP transactions are killed after --queryserver-config-transaction-timeout seconds. With this PR, OLAP transactions are killed after --queryserver-config-olap-transaction-timeout seconds (default value 0 means transactions are not timed out).

Currently OLTP and OLAP transactions are evaluated for killing every --queryserver-config-transaction-timeout seconds divided by 10. With this PR, OLAP and OLTP transactions are evaluated for killing every --queryserver-config-transaction-killer-interval seconds.

Related Issue(s)

#10945

Checklist

  • "Backport me!" label has been added if this change should be backported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Aug 5, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.
  • If a new flag is being introduced, review whether it is really needed. The flag names should be clear and intuitive (as far as possible), and the flag's help should be descriptive.
  • If a workflow is added or modified, each items in Jobs should be named in order to mark it as required. If the workflow should be required, the GitHub Admin should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should either include a link to an issue that describes the bug OR an actual description of the bug and how to reproduce, along with a description of the fix.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.

@maxenglander maxenglander force-pushed the maxeng-gh-10946-olaptx branch 13 times, most recently from b6d5c70 to a7801e5 Compare August 8, 2022 03:24
@maxenglander maxenglander marked this pull request as ready for review August 8, 2022 12:01
@deepthi deepthi added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving release notes (needs details) This PR needs to be listed in the release notes in a dedicated section (deprecation notice, etc...) labels Aug 9, 2022
@deepthi
Copy link
Member

deepthi commented Aug 16, 2022

@maxenglander we'll get this reviewed. Can you add notes to the 15_0_0_summary.md file and resolve the conflicts in the meantime?

go/vt/vttablet/tabletserver/tabletserver.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletserver.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletserver.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletserver.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tx_pool.go Show resolved Hide resolved
go/vt/vttablet/tabletserver/tx_pool.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletenv/config.go Outdated Show resolved Hide resolved
@maxenglander maxenglander force-pushed the maxeng-gh-10946-olaptx branch 2 times, most recently from 27c7708 to b908164 Compare August 19, 2022 05:50
@maxenglander maxenglander requested review from harshit-gangal and removed request for systay, shlomi-noach and frouioui August 19, 2022 11:20
@maxenglander maxenglander force-pushed the maxeng-gh-10946-olaptx branch from 5f9ad61 to 98b3941 Compare August 23, 2022 12:21
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other things mostly looks good.

I think we should add more tests for Idle transaction getting killed by transaction killer and expiry time getting updated when we swtich workload.
We might have those tests already, just check atleast.

go/vt/vttablet/tabletserver/stateful_connection.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/stateful_connection.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletenv/config.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tabletenv/config.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tx_pool.go Outdated Show resolved Hide resolved
go/vt/vttablet/tabletserver/tx_pool.go Outdated Show resolved Hide resolved
@@ -141,7 +148,7 @@ func (tp *TxPool) transactionKiller() {
if conn.IsInTransaction() {
tp.txComplete(conn, tx.TxKill)
}
conn.Releasef("exceeded timeout: %v", tp.Timeout())
conn.Releasef("exceeded timeout: %v", timeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use conn.timeout directly here in the method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops sorry I didn't implement this suggestion earlier. I saw it, but I misunderstood it 🤦

…fix comments, set ticks interval once

Signed-off-by: Max Englander <[email protected]>
@maxenglander maxenglander force-pushed the maxeng-gh-10946-olaptx branch 2 times, most recently from f083308 to 4885ab7 Compare August 27, 2022 00:49
Signed-off-by: Max Englander <[email protected]>
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think few previous comments are still pending to be addressed.
Once, addressed can be merged.

go/cmd/vttestserver/main.go Outdated Show resolved Hide resolved
go/vt/servenv/grpc_auth.go Outdated Show resolved Hide resolved
Signed-off-by: Max Englander <[email protected]>
Signed-off-by: Max Englander <[email protected]>
Signed-off-by: Max Englander <[email protected]>
@maxenglander
Copy link
Collaborator Author

@harshit-gangal I implemented the one outstanding suggestion I found (timeout => conn.timeout), sorry for missing that until now. Also reverted the unrelated fmt changes (used git commit --no-verify).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants