Skip to content

Commit

Permalink
RFC: add transaction and session cancellation
Browse files Browse the repository at this point in the history
  • Loading branch information
israelg99 committed Sep 28, 2017
1 parent 5776deb commit 8dea507
Showing 1 changed file with 327 additions and 0 deletions.
327 changes: 327 additions & 0 deletions docs/RFCS/20170814_cancel_txn_session.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
- Feature Name: Transaction and Session cancellation
- Status: draft
- Start Date: 2017-08-14
- Author: Julian Gilyadov
- RFC PR: [#17252](https://github.com/cockroachdb/cockroach/pull/17252)
- Cockroach Issue: None

# Summary

This RFC proposes a set of features which add the ability to cancel
an in-progress transaction and session using the `CANCEL` statements.
Cancellation can be useful in many use cases such as canceling a session
to revoke an unauthorized access to the DB.

The implementation of those proposed features is similar to the implementation
of the query cancellation feature as described in this [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170608_query_cancellation.md)
and [PR #17003](https://github.com/cockroachdb/cockroach/pull/17003).
In summary, query cancellation cancels the query by ID;
it uses the query ID to locate the coordinating node;
and that node looks up and cancels the context related to that query.

Generally, new `CANCEL [TRANSACTION|SESSION] <id>` statements will be added
and new transaction and session identifiers will be implemented to include
the node ID in them. As a result, CockroachDB will know where to route the
request internally without issuing additional requests to other nodes.

# Motivation

Currently, `SHOW SESSIONS` lists active sessions:

```sql
root@:26257/> SHOW SESSIONS;
+---------+----------+----------------+------------------+-------------------------+----------------------------------+----------------------------------+--------+
| node_id | username | client_address | application_name | active_queries | session_start | oldest_query_start | kv_txn |
+---------+----------+----------------+------------------+-------------------------+----------------------------------+----------------------------------+--------+
| 1 | root | [::1]:57557 | | SHOW CLUSTER SESSIONS; | 2017-06-14 19:09:50.609823+00:00 | 2017-06-14 19:09:54.480242+00:00 | NULL |
+---------+----------+----------------+------------------+-------------------------+----------------------------------+----------------------------------+--------+
(1 row)
```

Extending this functionality with transaction and session cancellation
would be a logical next step.

Virtually all current popular DBMS include transaction and/or session cancellation.

Both transaction and session cancellation have different motivations and use cases:

#### Transaction cancellation

Transaction cancellation would involve canceling any active queries(if any) under the transaction,
in addition to rolling it back and cleaning up intents held by the transaction.
Note that currently, query cancellation cancels the underlying transaction too.
However, canceling an idle transaction is impossible using the query cancellation mechanism;
and even if there is a query in progress, canceling it races with the query finishing; so one
can't rely on canceling a query to necessarily do anything such
as canceling the transaction.

#### Session cancellation
Session cancellation would involve canceling the current transaction (if any),
followed by closing the client connection.
A particular useful use case would be revoking unauthorized access to the DB.

_Both MySQL and Postgres support session cancellation._

# Guide-level explanation
New `CANCEL [TRANSACTION|SESSION] <id>` statements will be added to cockroach's SQL dialect.
These statements can be used to cancel running transactions and sessions.

IDs for transactions and sessions can be received through `SHOW SESSIONS`
as 32-character hex strings.

Example use case: Assume there are two concurrently running sessions, session 1 and 2.

Session 1 (running transaction):

```sql
root@:26257/> SHOW TRACE FOR SELECT * FROM generate_series(1,20000000);
```

Session 2 (DB admin wishing to cancel the txn above):

```sql
root@:26257/> SELECT node_id, kv_txn, active_queries FROM [SHOW CLUSTER SESSIONS];
+---------+--------------------------------------+-------------------------------------------------------+
| node_id | kv_txn | active_queries |
+---------+--------------------------------------+-------------------------------------------------------+
| 1 | 14e58aab-9046-1f2d-0000-000000000001 | SHOW TRACE FOR SELECT * FROM generate_series(1,20000) |
| 2 | 14e58aac-50cf-3cfc-0000-000000000002 | SHOW CLUSTER SESSIONS |
+---------+--------------------------------------+-------------------------------------------------------+
(2 rows)

root@:26257/> CANCEL TRANSACTION '14e58aab-9046-1f2d-0000-000000000001';
CANCEL TRANSACTION

root@:26257/> SELECT node_id, kv_txn, active_queries FROM [SHOW CLUSTER SESSIONS];
+---------+--------------------------------------+-------------------------------------------------------+
| node_id | kv_txn | active_queries |
+---------+--------------------------------------+-------------------------------------------------------+
| 1 | NULL | |
| 2 | 14e58ffc-23cf-3f3c-0000-000000000002 | SHOW CLUSTER SESSIONS |
+---------+--------------------------------------+-------------------------------------------------------+
(2 rows)
```

Going back to session 1:

```sql
root@:26257/> SHOW TRACE FOR SELECT * FROM generate_series(1,20000000);
pq: transaction cancelled by user 'root'

root%:26257 ERROR>
```

After a transaction was canceled the session would be in an `Aborted`
state(and thus we will have an `ERROR` status in our prompt). To reset
the session state the client has to send a `ROLLBACK` statement.

Another example use case is session cancellation:

Session 2 (DB admin wishing to cancel session 1)

```sql
root@:26257/> SELECT node_id, session_id FROM [SHOW CLUSTER SESSIONS];
+---------+--------------------------------------+
| node_id | session_id |
+---------+--------------------------------------+
| 1 | 14e58aac-50cf-3cfc-0000-000000000001 |
| 2 | 14e22aab-9046-1f2d-0000-000000000002 |
+---------+--------------------------------------+
(2 rows)

root@:26257/> CANCEL SESSION '14e58aac-50cf-3cfc-0000-000000000001';
CANCEL SESSION

root@:26257/> SELECT node_id, session_id FROM [SHOW CLUSTER SESSIONS];
+---------+--------------------------------------+
| node_id | session_id |
+---------+--------------------------------------+
| 2 | 14e22aab-9046-1f2d-0000-000000000002 |
+---------+--------------------------------------+
(1 rows)
```

Going back to session 1:

```sql
pq: session canceled by user 'root'

user:~$:
```

Common errors include:
1. Invalid ID.
2. Transaction/session not found.
3. Failure to dial to the remote node in the cluster.

# Reference-level explanation

_Will be updated as development progresses._

# Detailed design

### Identifiers

Transactions and sessions will use IDs which will encode the node ID into them,
so when there's a request to cancel, CockroachDB will know where to route the request
without issuing additional requests to other nodes.
The node ID will be decoded from the `CANCEL [TRANSACTION|SESSION]` request, and
an RPC call would be made to that node if it is not the local node to issue the cancel.

Identifiers will be stored as [uint128](https://github.com/cockroachdb/cockroach/tree/master/pkg/util/uint128)
composed of the 96-bit HLC timestamp plus 32 bits of node ID.
Using the full HLC timestamp along with the node ID makes this ID unique across the cluster.

The identifiers will be primarily presented in:
- `SHOW SESSIONS`: session and transaction IDs will be presented
in this table as32-character hex strings.
- in `CANCEL [TRANSACTION/SESSION] <id>` statements as the only argument.

Those identifiers could also be added to other parts of the code, such as to log tags.
However, those changes will be out of scope for this project.

Example output of `SHOW SESSIONS` with transaction IDs (see the `kv_txn` column):

```sql
root@:26257/> SELECT node_id, kv_txn, active_queries FROM [SHOW CLUSTER SESSIONS];
+---------+--------------------------------------+-------------------------------------------------------+
| node_id | kv_txn | active_queries |
+---------+--------------------------------------+-------------------------------------------------------+
| 1 | 14e58aab-9046-1f2d-0000-000000000001 | SHOW TRACE FOR SELECT * FROM generate_series(1,20000) |
| 2 | 14e58aac-50cf-3cfc-0000-000000000002 | SHOW CLUSTER SESSIONS |
+---------+--------------------------------------+-------------------------------------------------------+
(2 rows)
```

### Syntax
New `CANCEL [TRANSACTION|SESSION] <id>` statements will be added to CockroachDB's SQL dialect.

A non-root database user can only issue a cancel request run by that same user.
The root user can cancel any transaction or session on the cluster.

When the statement is executed, the ID will be parsed to determine the node ID,
and an RPC call will be made to that node if it is not the current node.
That node would look up the transaction or session and cancel it.

Each node will maintain a mapping of current (local) transactions and sessions,
indexed by their respective IDs.

The request would return an error if no matching transaction or session was found
for that ID, or if the ID was invalid(InvalidArgument).

## Transaction cancellation
If there are no active queries, the KV transaction will be rolled back and the session will
transition to the `Aborted` state.
If there are active queries, then:
1. No more queries can be started on the session until the transaction is in the Aborted state.
For that, we introduce an `isCancelled` flag in the `txnState`, which is checked in the executor
before a query is about to start executing.
2. Cancel all running queries by iterating all active queries under the
transaction and canceling them, this will also cancel the transaction's context.
3. Wait for the canceled queries to finish and then transition the transaction
into the `Aborted` state if needed. Note that usually, no explicit transition is
needed because the act of canceling all the active queries will have
transitioned the transaction into the desired state already.

#### Tests

Tests for transaction cancellation should be written in `logic_test/run_contorl.go`
and `run_control_test.go`.
Fundamentally, it should be similar to the query cancellation tests.
Multiple types of queries need to be tested:
* When the txn is currently running a distributed query.
* With a query running `crdb_internal.force_retry('10m')` so that it is busy with a retry loop.
* With a query running `show trace for ...` with a long-running statement.

## Session cancellation
The connection with the client is terminated followed by the
cancellation of the current transaction (if any).
This will be achieved by canceling the session's context
and use `newReadTimeoutConn()` to evaluate that Context when
we're blocked on a network socket read.

# Drawbacks
None.

# Rationale and Alternatives

### Why separate transaction cancellation from query cancellation?

Currently, canceling a query aborts the transaction since the
transaction model is very simple but this will not be the case.
In the future, CockroachDB will expose error handling capabilities
in its SQL dialect which will allow one to recover from errors without
toasting a transaction. Thus the semantics of the query cancellation should
not be tied to those of transactions.
Moreover, transactions are not necessarily constantly running queries and
it should be possible to cancel a transaction at a time when there's no query running under it.
And even if there is a query in progress, canceling it races with the query
finishes, so one can't rely on canceling that query to necessarily
do anything such as canceling the transaction.

### Why have separate syntax for query/transaction/session cancellation?

Both MySQL and Postgres support query and sessions cancellation separately,
so it makes sense to separate them in CockroachDB as well.

Needless to say, if we accept that canceling a query or session without disconnecting
a client is useful, we should also accept that canceling a transaction
without disconnecting a client is useful.

Transaction and session cancellation are closely related and the implementation
is quite similar, but it makes sense to have syntax to cancel them separately.

Although there isn't a very clear use case for either query or transaction
separate from session cancellation, the syntax should be separate.

### Not to include node ID in the identifiers

One alternative that was considered for IDs was an in-memory counter combined with the
node ID in one string. This would have led to issues with duplicate IDs if the counter
resets to 0 upon node restart, so it wasn't chosen. `unique_rowid()`s were considered but
were also rejected due to a lack of a strong uniqueness guarantee with it.

Not combining the node ID into the ID was once suggested, but that would have required
the user to supply both the node and appropriate IDs to the `CANCEL` statement. Since the ID
would make little sense out of the context of a node ID, and both IDs will be presented together
in `SHOW QUERIES/SESSIONS`, it makes more sense from a UI perspective to combine them.

Needless to say, there's no global list of sessions. Although, it is possible
to do `SHOW CLUSTER SESSIONS` to get all sessions and linearly search them
for a matching session but it internally dials to each node and asks it for
its list which is too expensive.

# Unresolved questions

### Should we support Postgres wire protocol query cancellation
_This was also discussed in the query cancellation RFC and PR._

Postgres has a built-in function for session cancellation
which has query cancellation as a side-effect.

`pg_cancel_backend()` - takes a session ID as an argument, but it only cancels the current
query in that session (if any).
`pg_terminate_backend()` - is for session cancellation.
Both of these functions take a 32-bit session ID as an argument.
The protocol-level cancellation feature is more or less equivalent
to `pg_cancel_backend()` but it takes two 32-bit arguments:
1. The session id.
2. 32-bit session-level "secret key" (used as a
lightweight authentication mechanism so this type of cancellation can be used without
establishing another fully-authenticated connection to issue the cancellation call).

```
SELECT pg_cancel_backend(<session-id>, <session-secret-key>);
```

Supporting this syntax was considered for compatibility reasons but decided against.
Since functions should ideally not have side-effects, and also because Postgres' IDs are `int32`s
which would be hard to reconcile with our larger IDs.

The small key sizes make these functions difficult
to support (especially when load balancers are involved).
We may want to support these interfaces one day (since they're presumably built into Postgres
client drivers), but for now, we should just support our own syntax with larger IDs and see
how much demand there is for Postgres-compatible cancellation.
Implementing some mapping of 32 or 64-bit IDs to queries
would get its own RFC when/if we decide to do it.

0 comments on commit 8dea507

Please sign in to comment.