Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: implement transaction and session cancellation #17252

Closed
wants to merge 3 commits into from

Conversation

israelg99
Copy link
Contributor

@israelg99 israelg99 commented Jul 27, 2017

RFC for transaction and session cancellation - mentioned in query cancellation RFC and is a follow-up work to #17003.

cc @itsbilal

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor

knz commented Jul 27, 2017

cc @cockroachdb/sql-execution

@knz knz requested a review from andreimatei July 27, 2017 07:58
@knz
Copy link
Contributor

knz commented Jul 27, 2017

Thanks for your contribution.
Some general comments:

  • I am a bit surprised you refer to the RFC on query cancellation. This does not discuss transaction cancellation at all! The approach you have taken here has not, so far I can see, been discussed anywhere.
  • as an unfortunate result of the lack of discussion, I fear there has been a major oversight: there's a race condition between planning and cancellation. For example: in one session the client sends an execute. The planner checks the txn state, sees the txn is open/valid, and starts planning the query. Planning finishes successfully but execution hasn't started yet. Concurrently another session tries to cancel that transaction. Cancellation will not see the query yet, because the query has not started executing yet, so it will not cancel any query and instead mark the txn as aborted. Now in the first session, planning has finished so execution can start. There is no additional check on the txn state between planning and execution, so execution proceeds (albeit with an aborted txn). Now we have a query running, potentially for a long time (e.g. if it uses generate_series it will never see that the kv Txn object has been cancelled), with no way to cancel it any more!

I would strongly suggest that you write up an overview of your proposed approach in a RFC, I would be even OK to have it inside this PR, and gather some input from stakeholders about the approach before we can even check whether the code looks good.


Reviewed 18 of 18 files at r1, 5 of 5 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


pkg/server/status.go, line 994 at r2 (raw file):

	if !found {
		// Unfortunately it grabs local sessions too.

Clarify what "it" means in this sentence.

Clarify what "grabs" means in this sentence.


pkg/server/status.go, line 1009 at r2 (raw file):

				continue
			}
			status, err := s.dialNode(session.NodeID)

Place the dialNode / CancelTransaction calls outside of the loop below. Rearrange the code so that first it searches for the sessions, fails if the session is not found, and below that cancels the txn if the session+txn was found.


pkg/sql/executor.go, line 388 at r2 (raw file):

// CancelTransaction cancells all active queries under the txn in addition to rolling it back.
func (s *Session) CancelTransaction(txnID string, username string) (bool, error) {
	if s.TxnState.mu.txn == nil {

The mu here means something needs to be locked.


pkg/sql/executor.go, line 448 at r2 (raw file):

				return true, nil
			}
			// If the queries finished executing(weren't cancelled)

space missing between the parenthesis.


pkg/sql/run_control_test.go, line 83 at r3 (raw file):

	defer leaktest.AfterTest(t)()

	const transactionToCancel = "CREATE DATABASE txn; CREATE TABLE txn.cancel AS SELECT * FROM generate_series(1,20000000)"

Make this test use multiple types of queries. You need to test:

  • when the txn is currently running a distributed query. (I understand this may not succeed currently because cancellation of distributed queries is not complete yet, but then your test should expect and validate that the cancellation fails.)
  • with a query running crdb_internal.force_retry('10m') so that it is busy with a retry loop.
  • with a query running show trace for ... with a long-running statement.

pkg/sql/parser/sql.y, line 1257 at r1 (raw file):

    $$.val = &CancelQuery{ID: $3.expr()}
  }
| CANCEL TRANSACTION a_expr

Make a new non-terminal cancel_txn_stmt.


Comments from Reviewable

@knz
Copy link
Contributor

knz commented Jul 27, 2017

My very strongly suggested approach to this would be to have a boolean flag in the session object set by query cancellation, and have the executor responsible for reading this flag at key points and be responsible for transitioning the txn in the aborted state.


Review status: all files reviewed at latest revision, 6 unresolved discussions, some commit checks failed.


Comments from Reviewable

@knz knz self-requested a review July 27, 2017 08:21
@andreimatei
Copy link
Contributor

+1 to Rafa's concerns about races with ActiveQueries.

One question: what happens if there's no active queries (or, only parallel ones that don't block the client) and the session is waiting to read data from the client (from pgwire)? Do we / should we send and error code to the client? I'd look at what we do when we're closing sessions while draining.


Review status: all files reviewed at latest revision, 15 unresolved discussions, some commit checks failed.


pkg/sql/executor.go, line 367 at r2 (raw file):

	if queryMeta, exists := r.store[queryID]; exists {
		return queryMeta.Cancel(queryID, username)

make sure you rebase on top of #17219


pkg/sql/executor.go, line 374 at r2 (raw file):

// Cancel flags the query to be cancelled.
func (queryMeta *queryMeta) Cancel(queryID uint128.Uint128, username string) (bool, error) {

no need to return the bool. It's equal to err == nil.


pkg/sql/executor.go, line 387 at r2 (raw file):

// CancelTransaction cancells all active queries under the txn in addition to rolling it back.
func (s *Session) CancelTransaction(txnID string, username string) (bool, error) {

what's the return value?


pkg/sql/executor.go, line 413 at r2 (raw file):

		// Queries may finish executing regardless.
		for _, query := range s.mu.ActiveQueries {
			// We don't have queryID but apparently it is only needed for logging.

the queryID is the key in that map.
for qid, query := range ...


pkg/sql/executor.go, line 415 at r2 (raw file):

			// We don't have queryID but apparently it is only needed for logging.
			cancelled, err := query.Cancel(uint128.FromInts(0, 0), username)
			// Abort if the query wasn't cancelled or errored out.

this comment is unnecessary. Putting it only invites questions about why you're not continuing to attempt to cancel the other queries.


pkg/sql/executor.go, line 416 at r2 (raw file):

			cancelled, err := query.Cancel(uint128.FromInts(0, 0), username)
			// Abort if the query wasn't cancelled or errored out.
			// We cannot handle these errors(and should not get them) e.g "query not found".

not sure what this comment is telling me. I'd remove it. It's usual to pass errors up the stack; it's common to not know how to handle them.


pkg/sql/executor.go, line 417 at r2 (raw file):

			// Abort if the query wasn't cancelled or errored out.
			// We cannot handle these errors(and should not get them) e.g "query not found".
			if !cancelled || err != nil {

no need to check cancelled here (or return in in the first place)


pkg/sql/executor.go, line 422 at r2 (raw file):

		}
		return nil
	}()

join the next line to reduce the scope of err.

if err := func(){..}(); err != nil{}

pkg/sql/executor.go, line 441 at r2 (raw file):

		default:
			if hasQueries() {
				continue

you're just spinning? Sleep a little, or find a way to trigger events when ActiveQueries changes.


Comments from Reviewable

@israelg99 israelg99 force-pushed the cancel-txn branch 2 times, most recently from f4947a2 to 766f926 Compare July 30, 2017 14:55
@israelg99
Copy link
Contributor Author

israelg99 commented Jul 30, 2017

[WIP] I've updated the branch, added draft RFC and the boolean flag suggested by @knz.
At which key points should we have the executor check for the flag and transition the txn into an aborted state?
I commented out the part where the executor is waiting for the queries to finish cancelling/executing due to the race condition between planning and cancelling, also, the boolean flag approach seems like a better fit.

@israelg99 israelg99 force-pushed the cancel-txn branch 2 times, most recently from 791cfdc to 903e25f Compare August 1, 2017 11:35
@knz
Copy link
Contributor

knz commented Aug 1, 2017

@itsbilal can you look at the code and suggest where to add the appropriate hooks?

@itsbilal
Copy link
Member

itsbilal commented Aug 2, 2017

Review status: 0 of 26 files reviewed at latest revision, 16 unresolved discussions, some commit checks failed.


pkg/sql/session.go, line 364 at r7 (raw file):

		// It will be checked at key points by the executor
		// and if it is true the executor will abort the txn.
		IsTxnCancelled int32

It would be a better idea to have this flag inside session.txnState.mu instead of session.mu. Once you rebase on the latest master, query cancellation should also cancel the transaction context (your CancelTransaction() could do it in case of an idle txn).

The executor could check for this flag (after statement execution errors out) and issue an automatic rollback instead of just marking the transaction and txnState as aborted and waiting for a rollback from the client.


Comments from Reviewable

@israelg99 israelg99 force-pushed the cancel-txn branch 5 times, most recently from 21cc40b to f8fc173 Compare August 5, 2017 14:53
@israelg99 israelg99 requested review from a team August 5, 2017 14:53
@israelg99 israelg99 force-pushed the cancel-txn branch 2 times, most recently from 6e6c1e4 to 29e5ce7 Compare August 5, 2017 15:43

It is similar to the motivation behind query cancellation, currently it is possible to monitor sessions and queries, and cancelling queries.
Extending this functionality with transaction cancellation would be a logical next step.
It would be useful to cancel idle transactions for example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this useful? I'm late to this party, but I'm having a hard time imagining a scenario where cancelling transactions (rather than queries, sessions) is useful in CRDB.

@itsbilal
Copy link
Member

itsbilal commented Aug 6, 2017 via email

@knz
Copy link
Contributor

knz commented Sep 18, 2017

Using a separate PR for the RFC and code change is an excellent idea. Thanks for this.

I do agree that building on top of the existing RFC is a good choice. Andrei and I however both agree that the new RFC should contain a high-level short summary of what the existing cancellation features already do and how.

@knz
Copy link
Contributor

knz commented Sep 18, 2017

Sorry I wasn't clear in my previous comment.

When I wrote "building on top of the existing RFC" I meant leave the existing file as-is and create a new one "that builds on top". This new file should contain a high-level summary of the previous functionality.

@andreimatei
Copy link
Contributor

+1 to putting just the RFC in this PR. This is the way we generally do things.

For file organization, I think a new file is better - with a summary and a link to the old one. The old one should also have a link to the new one.


Review status: 0 of 1 files reviewed at latest revision, 64 unresolved discussions, some commit checks failed.


docs/RFCS/20170814_cancel_txn_session.md, line 12 at r21 (raw file):

This RFC proposes a set of features which add the ability to cancel an in-progress transaction and session using `CANCEL` statements.  
Cancellation can be useful in many use cases such as cancelling a session to revoke an unauthorized access to the DB.  
The implementation of those proposed features is similar to the implementation of the query cancellation feature as described in this [RFC](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170608_query_cancellation.md) and [PR #17003](https://github.com/cockroachdb/cockroach/pull/17003).  

nit: style guide says wrap lines at 100 characters


docs/RFCS/20170814_cancel_txn_session.md, line 17 at r21 (raw file):

# Motivation

Similarly, `SHOW SESSIONS` lists active sessions along with start timestamps:

Similarly?


docs/RFCS/20170814_cancel_txn_session.md, line 37 at r21 (raw file):

Transaction cancellation would involve cancelling any active queries(if any) under the transaction,
in addition to rolling it back and cleaning up intents held by the transaction.  
Note that cancelling an idle transaction is impossible using the query cancellation mechanism. And even if there is a query in progress, cancelling it races with the query finishes, so one can't rely on cancelling that query to necessarily do anything such as cancelling the transaction.  

s/finishes/finishing


docs/RFCS/20170814_cancel_txn_session.md, line 37 at r21 (raw file):

Transaction cancellation would involve cancelling any active queries(if any) under the transaction,
in addition to rolling it back and cleaning up intents held by the transaction.  
Note that cancelling an idle transaction is impossible using the query cancellation mechanism. And even if there is a query in progress, cancelling it races with the query finishes, so one can't rely on cancelling that query to necessarily do anything such as cancelling the transaction.  

I like this paragraph, but I think it needs a bit more context: please explain that currently a query cancellation does cancel the transaction (modulo the race you describe). Without this piece of information, the rest doesn't make sense to a layman reader.


docs/RFCS/20170814_cancel_txn_session.md, line 86 at r21 (raw file):

```sql
root@:26257/> SHOW TRACE FOR SELECT * FROM generate_series(1,20000000);

it'd be cool if this example made it clear what state the session is in once the client gets the "transaction cancelled" error. I guess it should be in the Aborted state. Our prompt would look like this:

root@:26257 ERROR>

And have the example show that the client has to send a ROLLBACK for the state to be reset.


docs/RFCS/20170814_cancel_txn_session.md, line 156 at r21 (raw file):

| node_id | id                               | query                                                              | 
+---------+----------------------------------+--------------------------------------------------------------------+
| 1       | 2834b4f8a230ce2394e4f12a8c9236c1 | SHOW CLUSTER QUERIES                                               |

the session_ids printed in the previous examples have dashes in them, but these query ids don't. What's the deal?


docs/RFCS/20170814_cancel_txn_session.md, line 174 at r21 (raw file):

## Transaction cancellation
The transaction will transition immediately if there are no active queries.

s/The transaction will transition immediately if there are no active queries./If there are no active queries, the KV transaction will be rolled back and the session will transition to the Aborted state.

The point is to emphasize the two actions - the rollback (of the lower-level KV transaction - i.e. client.Txn) and the state transition of the SQL session (TxnState).


docs/RFCS/20170814_cancel_txn_session.md, line 176 at r21 (raw file):

The transaction will transition immediately if there are no active queries.
If there are active queries, then:
1. No more queries can be started on the session until the transaction is in the Aborted state.

how exactly will we ensure this?


docs/RFCS/20170814_cancel_txn_session.md, line 177 at r21 (raw file):

If there are active queries, then:
1. No more queries can be started on the session until the transaction is in the Aborted state.
2. Cancel all running queries.

I think we should explain here exactly how we're planning on doing that. I guess we'd just cancel the transaction's Context?


docs/RFCS/20170814_cancel_txn_session.md, line 182 at r21 (raw file):

#### Tests

Tests for transaction cancellation should be written as logic and cluster tests in the `run_control` test suite.  

what's the run_control test suite?


docs/RFCS/20170814_cancel_txn_session.md, line 185 at r21 (raw file):

Fundamentally, it should be similar to the query cancellation tests.  
Multiple types of queries need to be tested:
*   When the txn is currently running a distributed query. (I understand this may not succeed currently because cancellation of distributed queries is not complete yet, but then your test should expect and validate that the cancellation fails.)

cancelation of DistSQL queries is implemented. What made you believe that it isn't?


docs/RFCS/20170814_cancel_txn_session.md, line 186 at r21 (raw file):

Multiple types of queries need to be tested:
*   When the txn is currently running a distributed query. (I understand this may not succeed currently because cancellation of distributed queries is not complete yet, but then your test should expect and validate that the cancellation fails.)
*   With a query running `crdb_internal.force_retry('10m')` so that it is busy with a retry loop.

Would you mind going into more detail about what you were thinking here? Or, rather, what do you expect to happen and why?
To be honest, I'm not sure what currently happens when you try to cancel such a query (with query cancelation). I wonder if anybody notices the canceled context... It might be the case that nobody does and the query is allowed to run until the retries are done. In other words, I'm not sure that the Executor ever checks the


docs/RFCS/20170814_cancel_txn_session.md, line 187 at r21 (raw file):

*   When the txn is currently running a distributed query. (I understand this may not succeed currently because cancellation of distributed queries is not complete yet, but then your test should expect and validate that the cancellation fails.)
*   With a query running `crdb_internal.force_retry('10m')` so that it is busy with a retry loop.
*   With a query running `show trace for ...` with a long-running statement.

what are you thinking of? What's the show trace for for? Do you want to assert something on the trace - like a particular log message?


docs/RFCS/20170814_cancel_txn_session.md, line 190 at r21 (raw file):

## Session cancellation
The connection with the client is terminated followed by cancellation of the current transaction (if any).

How exactly will we do this? I guess we'll cancel the Session's Context? We should make a note about our newReadTimeoutConn() which let's us evaluate that Context when we're blocked on a network socket read.


docs/RFCS/20170814_cancel_txn_session.md, line 230 at r21 (raw file):

### Should we support postgres wire protocol query cancellation
_Should we remove this section as the RFC is no longer about query cancellation?_  

well, I believe that pg_cancel_backend() function is technically about canceling a session. What I kept talking about was a protocol-level feature (as opposed to a SQL level feature) that was canceling a query. See the links I provided before.

So pg_cancel_backend does belong in this RFC, unless you're not adding anything new from the previous RFC, in which case I would drop it.

The protocol-level feature was not mentioned in the other RFC, and so I think it's time to mention it. Whether we want to implement it or not I'm not sure yet; still waiting for opinions from the other folks.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 64 unresolved discussions, all commit checks successful.


docs/RFCS/20170814_cancel_txn_session.md, line 230 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

well, I believe that pg_cancel_backend() function is technically about canceling a session. What I kept talking about was a protocol-level feature (as opposed to a SQL level feature) that was canceling a query. See the links I provided before.

So pg_cancel_backend does belong in this RFC, unless you're not adding anything new from the previous RFC, in which case I would drop it.

The protocol-level feature was not mentioned in the other RFC, and so I think it's time to mention it. Whether we want to implement it or not I'm not sure yet; still waiting for opinions from the other folks.

pg_cancel_backend() takes a session ID as an argument, but it only cancels the current query in that session (if any). pg_terminate_backend() is for session cancellation. Both of these functions take a 32-bit session id as an argument. The protocol-level cancellation feature is more or less equivalent to pg_cancel_backend() but it takes two 32-bit arguments: the session id and a 32-bit session-level "secret key" (used as a lightweight authentication mechanism so this type of cancellation can be used without establishing another fully-authenticated connection to issue the cancellation call)

This was discussed in at least one version of another cancellation RFC. The small key sizes make these functions difficult to support (especially when load balancers are involved). We may want to support these interfaces one day (since they're presumably built into postgres client drivers), but for now we should just support our own syntax with larger IDs and see how much demand there is for postgres-compatible cancellation. Implementing some mapping of 32- or 64-bit ids to queries would get its own RFC when/if we decide to do it.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

Thanks for doing this. Is there any help needed here?

@israelg99
Copy link
Contributor Author

@andreimatei I updated the RFC to address your comments.
@vivekmenezes the current goal is to discuss and address feedback about the RFC, once we will be done with that, we can dive into the implementation of transaction and session cancellation (I already have some proof-of-concept for both) 👍 .

@vivekmenezes
Copy link
Contributor

Thanks for updating the RFC!

@israelg99
Copy link
Contributor Author

Review status: 0 of 1 files reviewed at latest revision, 64 unresolved discussions.


docs/RFCS/20170814_cancel_txn_session.md, line 12 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: style guide says wrap lines at 100 characters

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 17 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Similarly?

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 37 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/finishes/finishing

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 37 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I like this paragraph, but I think it needs a bit more context: please explain that currently a query cancellation does cancel the transaction (modulo the race you describe). Without this piece of information, the rest doesn't make sense to a layman reader.

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 86 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

it'd be cool if this example made it clear what state the session is in once the client gets the "transaction cancelled" error. I guess it should be in the Aborted state. Our prompt would look like this:

root@:26257 ERROR>

And have the example show that the client has to send a ROLLBACK for the state to be reset.

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 156 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

the session_ids printed in the previous examples have dashes in them, but these query ids don't. What's the deal?

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 174 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/The transaction will transition immediately if there are no active queries./If there are no active queries, the KV transaction will be rolled back and the session will transition to the Aborted state.

The point is to emphasize the two actions - the rollback (of the lower-level KV transaction - i.e. client.Txn) and the state transition of the SQL session (TxnState).

Done.


docs/RFCS/20170814_cancel_txn_session.md, line 176 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how exactly will we ensure this?

I propose to introduce an isCancelled flag in the txnState, which is checked in the executor
before a query is about to start executing.


docs/RFCS/20170814_cancel_txn_session.md, line 177 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think we should explain here exactly how we're planning on doing that. I guess we'd just cancel the transaction's Context?

I propose to cancel all running queries by iterating all active queries under the
transaction and canceling them, this will also cancel the transaction's context.


docs/RFCS/20170814_cancel_txn_session.md, line 182 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's the run_control test suite?

I pointed out the files.


docs/RFCS/20170814_cancel_txn_session.md, line 185 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

cancelation of DistSQL queries is implemented. What made you believe that it isn't?

It was outdated. Updated it.. Done.


docs/RFCS/20170814_cancel_txn_session.md, line 186 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Would you mind going into more detail about what you were thinking here? Or, rather, what do you expect to happen and why?
To be honest, I'm not sure what currently happens when you try to cancel such a query (with query cancelation). I wonder if anybody notices the canceled context... It might be the case that nobody does and the query is allowed to run until the retries are done. In other words, I'm not sure that the Executor ever checks the

Initially, I planned to test a transaction with a long-running statement and a distributed one. The show trace for ... and force_retry(10m) were proposed by knz, I honestly do not remember the reasoning behind it.


docs/RFCS/20170814_cancel_txn_session.md, line 187 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what are you thinking of? What's the show trace for for? Do you want to assert something on the trace - like a particular log message?

Please reference my previous comment.


docs/RFCS/20170814_cancel_txn_session.md, line 190 at r21 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

How exactly will we do this? I guess we'll cancel the Session's Context? We should make a note about our newReadTimeoutConn() which let's us evaluate that Context when we're blocked on a network socket read.

I updated it with your suggestion.


docs/RFCS/20170814_cancel_txn_session.md, line 230 at r21 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

pg_cancel_backend() takes a session ID as an argument, but it only cancels the current query in that session (if any). pg_terminate_backend() is for session cancellation. Both of these functions take a 32-bit session id as an argument. The protocol-level cancellation feature is more or less equivalent to pg_cancel_backend() but it takes two 32-bit arguments: the session id and a 32-bit session-level "secret key" (used as a lightweight authentication mechanism so this type of cancellation can be used without establishing another fully-authenticated connection to issue the cancellation call)

This was discussed in at least one version of another cancellation RFC. The small key sizes make these functions difficult to support (especially when load balancers are involved). We may want to support these interfaces one day (since they're presumably built into postgres client drivers), but for now we should just support our own syntax with larger IDs and see how much demand there is for postgres-compatible cancellation. Implementing some mapping of 32- or 64-bit ids to queries would get its own RFC when/if we decide to do it.

I added more details and arguments about if to support pg_cancel_backend(). It is still not clear if we should support it or not, but it looks like this feature better have its own RFC when/if we decide to implement it.


Comments from Reviewable

@israelg99
Copy link
Contributor Author

@knz I referenced the query cancellation RFC(and PR) along with a high-level short summary of what the existing cancellation features already do and how. Let me know if a more in-depth summary should be added.

@andreimatei
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 60 unresolved discussions, all commit checks successful.


docs/RFCS/20170814_cancel_txn_session.md, line 186 at r21 (raw file):

Previously, israelg99 (Julian Gilyadov) wrote…

Initially, I planned to test a transaction with a long-running statement and a distributed one. The show trace for ... and force_retry(10m) were proposed by knz, I honestly do not remember the reasoning behind it.

So I think testing something that uses force_retry() makes sense - we want to make sure that automatic retries are somehow interrupted.
About SHOW TRACE FOR - if you don't tell me what you need the trace for, it's just a distraction. Just say that you'll test a distributed query, if you want. Although I'm not sure how the distributed nature is relevant for this RPC. That's something that's relevant for query cancelation.


docs/RFCS/20170814_cancel_txn_session.md, line 11 at r24 (raw file):

This RFC proposes a set of features which add the ability to cancel
an in-progress transaction and session using the `CANCEL` statements.  

the ... statement (singular)


docs/RFCS/20170814_cancel_txn_session.md, line 55 at r24 (raw file):

A common use case is to cancel a transaction which uses too much of cluster resources.  
Note that currently, query cancellation cancels the underlying transaction too.  
However, canceling an idle transaction is impossible using the query cancellation mechanism; 

grammar nit, feel free to ignore: your use of semicolons is... peculiar :) They're use is supposed to be close to that of periods, not that of commas.
For example, the one in "mechanism; and" is just misplaced. What you want, I think is to tie the "and" to the previous statement and put a comma behind it to separate out the following subordinate:
"However, canceling an idle transaction is impossible using the query cancellation mechanism and_,_ even if..., canceling it races."

The following one, in front of "so" is also questionable. There, I think you simply want a period to separate out the conclusion from the previous statement:
"even if there is a query in progress, canceling it races with the query finishing_._ So_,_ one
can't rely on canceling a query to necessarily do anything such


docs/RFCS/20170814_cancel_txn_session.md, line 65 at r24 (raw file):

A common use case is to cancel a session which uses too much of cluster resources.  
A particular useful use case would be revoking unauthorized access to the DB.  
Also, the existence of cluster settings that don't take effect until all sessions 

What's this cluster thing about? What are you thinking of? Without more details, this use case seems very questionable to me.


docs/RFCS/20170814_cancel_txn_session.md, line 223 at r24 (raw file):

If there are active queries, then:
1. No more queries can be started on the session until the transaction is in the Aborted state. 
For that, we introduce an `isCancelled` flag in the `txnState`, which is checked in the executor 

Well, if we introduce this new field, what would its semantics be exactly? If a query comes while this field is set (but the state is otherwise Open), do we reject it early on with an error result without synchronizing with the parallel execution queue (i.e. synchronizeParallelStmts())? And if a ROLLBACK comes, do we execute that and reset the state again without synchronizing with the parallel execution queue? Or, when exactly would we synchronize and flush out all the running statements?

I don't know if a new field is the best way to go here. What I'd try to do is:

  1. Cancel the txn's context.
  2. Rollback the kv txn.
  3. Move to state Aborted.
  4. Start synchronizing with the parallel execution queue on every statement executed in the Aborted state - this way ensuring that a ROLLBACK flushes out the parallel statements.

I don't think we need to do anything special to cancel the running queries: canceling the txn context will cancel all the child queries' contexts (currently, the queries just use the transaction's context. In the future they'll be child (derived) contexts, but canceling the parent will still cancel them).


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

@israelg99 are you working on this anytime soon? Thanks

@israelg99
Copy link
Contributor Author

@vivekmenezes probably not -- i am working with a company and unfortunately have no time for this PR..

@dianasaur323
Copy link
Contributor

@vivekmenezes does this mean that we should pull development of this feature back internally in our next release?

@vivekmenezes
Copy link
Contributor

this and transaction timeout are potential intern projects.

@andreimatei
Copy link
Contributor

Closing this since it's been abandoned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants