Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update statement does not always block when a different transaction has an exclusive lock for a row #88995

Closed
beikov opened this issue Sep 29, 2022 · 23 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-kv KV Team X-blathers-triaged blathers was able to find an owner

Comments

@beikov
Copy link

beikov commented Sep 29, 2022

Describe the problem

Update statement does not always block when a different transaction has an exclusive lock for a row.

To Reproduce

  1. On connection 1, start TX1
  2. TX1: Run select * from tbl1 t where t.id = 1 for update of t to lock a single row
  3. On connection 2, start TX2
  4. TX2: Run SET statement_timeout TO 100
  5. TX2: Run select * from tbl1 t where t.id = 1
  6. TX2: Run update tbl1 t set t.value = 'changed' where t.id = 1

Expected behavior

TX2 should block at step 6. since TX1 has a lock on the row, and TX2 should run into a timeout. It seems though, that "sometimes" the statement at step 6. succeeds.

As it is with timeout based stuff, this hard to reproduce reliably, but I wanted to report this anyway, as I believe there might be a concurrency issue lurking in CockroachDB that you might be able to find by looking at the code with this given example. The issue happened after a lot of other Hibernate testcases were executed i.e. after a lot of statements like create table, insert, select, update, delete, drop table.

Environment:

  • CockroachDB version: 21.2.15
  • Server OS: Linux Docker
  • Client app: JDBC/Hibernate

Here are SQL log excerpts from the run:

    drop table if exists T_LOCK_A cascade
    create table T_LOCK_A (
       id bigint not null,
        a_value varchar(255),
        primary key (id)
    )

... lots of insert, select for update, update, delete statements

    insert 
    into
        T_LOCK_A
        (a_value, id) 
    values
        (?, ?)
[subsystem] TRACE g.hibernate.orm.jdbc.bind JdbcBindingLogging:28 - binding parameter [1] as [VARCHAR] - [it]
[subsystem] TRACE g.hibernate.orm.jdbc.bind JdbcBindingLogging:28 - binding parameter [2] as [BIGINT] - [6]

TX1 starts:

16:30:14,930 DEBUG SQL:128 - 
    select
        a1_0.id,
        a1_0.a_value 
    from
        T_LOCK_A a1_0 for update of a1_0
[subsystem] TRACE ibernate.orm.jdbc.extract JdbcExtractingLogging:28 - extracted value ([1] : [BIGINT]) - [6]
[subsystem] TRACE ibernate.orm.jdbc.extract JdbcExtractingLogging:28 - extracted value ([2] : [VARCHAR]) - [it]

TX2 starts: 

16:30:14,932 DEBUG SQL:128 - 
    select
        a1_0.id,
        a1_0.a_value 
    from
        T_LOCK_A a1_0 
    where
        a1_0.id=?
[subsystem] TRACE g.hibernate.orm.jdbc.bind JdbcBindingLogging:28 - binding parameter [1] as [BIGINT] - [6]
[subsystem] TRACE ibernate.orm.jdbc.extract JdbcExtractingLogging:28 - extracted value ([1] : [BIGINT]) - [6]
[subsystem] TRACE ibernate.orm.jdbc.extract JdbcExtractingLogging:28 - extracted value ([2] : [VARCHAR]) - [it]
16:30:14,936 DEBUG SQL:128 - 
    UPDATE
        T_LOCK_A 
    SET
        a_value = ? 
    where
        id = ?
[subsystem] TRACE g.hibernate.orm.jdbc.bind JdbcBindingLogging:28 - binding parameter [1] as [VARCHAR] - [changed]
[subsystem] TRACE g.hibernate.orm.jdbc.bind JdbcBindingLogging:28 - binding parameter [2] as [BIGINT] - [6]

Jira issue: CRDB-20070

@beikov beikov added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Sep 29, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 29, 2022

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-schema (found keywords: drop table)
  • @cockroachdb/sql-experience (found keywords: orm,Hibernate)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Sep 29, 2022
@beikov
Copy link
Author

beikov commented Sep 29, 2022

By the way, I'm running Cockroach like this, if that helps:

docker run -d --name=cockroach -p 26257:26257 -p 8080:8080 docker.io/cockroachdb/cockroach:v21.2.15 start-single-node \
    --insecure --store=type=mem,size=0.25 --advertise-addr=localhost
  OUTPUT=
  while [[ $OUTPUT != *"CockroachDB node starting"* ]]; do
        echo "Waiting for CockroachDB to start..."
        sleep 10
        # Note we need to redirect stderr to stdout to capture the logs
        OUTPUT=$(docker logs cockroach 2>&1)
  done
  echo "Enabling experimental box2d operators and some ptimized settings for running the tests"
  #settings documented in https://www.cockroachlabs.com/docs/v21.2/local-testing.html#use-a-local-single-node-cluster-with-in-memory-storage
  docker exec -it cockroach bash -c "cat <<EOF | ./cockroach sql --insecure
SET CLUSTER SETTING sql.spatial.experimental_box2d_comparison_operators.enabled = on;
SET CLUSTER SETTING kv.raft_log.disable_synchronization_unsafe = true;
SET CLUSTER SETTING kv.range_merge.queue_interval = '50ms';
SET CLUSTER SETTING jobs.registry.interval.gc = '30s';
SET CLUSTER SETTING jobs.registry.interval.cancel = '180s';
SET CLUSTER SETTING jobs.retention_time = '15s';
SET CLUSTER SETTING schemachanger.backfiller.buffer_increment = '128 KiB';
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false;
SET CLUSTER SETTING kv.range_split.by_load_merge_delay = '5s';
ALTER RANGE default CONFIGURE ZONE USING \"gc.ttlseconds\" = 5;
ALTER DATABASE system CONFIGURE ZONE USING \"gc.ttlseconds\" = 5;
quit
EOF
"

@beikov
Copy link
Author

beikov commented Sep 29, 2022

Also note that in most of my test runs, the transaction already fails at step 5. because it seems that a transaction TX2 blocks when reading a row, if TX1 has an exclusive lock on it.

@ajwerner
Copy link
Contributor

Cockroach's select for update locks are best-effort. They are an optimization but not a proof. The reason is that the implementation uses non-replicated, in-memory locks. If a lease transfer for the underlying data occurs, the lock will be lost. There is nothing fundamentally incorrect about not locking for SELECT FOR UPDATE because other mechanisms will enforce serializability. When the locks fail, it just means that retries may be more common.

#75456 tracks the issue to make the locks survive lease transfers.

@blathers-crl blathers-crl bot added the T-kv KV Team label Sep 29, 2022
@beikov
Copy link
Author

beikov commented Sep 29, 2022

Let me ask a yes/no question to make it clear what this means. If I change the scenario to

  1. On connection 1, start TX1
  2. TX1: Run select * from tbl1 t where t.id = 1 for update of t to lock a single row
  3. On connection 2, start TX2
  4. TX2: Run SET statement_timeout TO 100
  5. TX2: Run select * from tbl1 t where t.id = 1
  6. TX2: Run update tbl1 t set t.value = 'changed' where t.id = 1
  7. TX2: Commit

Will CockroachDB always fail at step 7. with a non-serializable transaction error? i.e. will TX2 always fail?

@ajwerner
Copy link
Contributor

In this case, it is possible for both of these transaction to succeed. If, however, TX1 went and performed some write which causally impacted TX2, one of these would have to fail. As it stands, it is possible for the system to order TX1 before TX2 and retain serializability.

If TX1 were actually an UPDATE statement which performed a write, then one of the two transactions would have to fail.

@beikov
Copy link
Author

beikov commented Sep 29, 2022

That's good to know. So this means, the for update clause can't be trusted and I probably have to disable all tests that rely on it within Hibernate?

Just out of curiosity, is this still ACID compliant? Consider the following pseudo-program:

tx0 {
    insert("insert into tbl1(id, value) values (1, 'original')")
}
tx1 {
    var value = select("select t.value from tbl1 t where t.id = 1 for update")
    if (value == "original")
        insert("insert into tbl2(id) values (1)")
}
tx2 {
    update("update tbl1 t set t.value = 'changed' where t.id = 1")
}

In every other database that respects locks, the possible end states are

  1. tbl1 = {(1, 'original')}, tbl2 = {(1)} if tx2 runs after tx1 acquired the lock
  2. tbl1 = {(1, 'changed')}, tbl2 = {} if tx2 runs before tx1 acquired the lock

Now you are telling me that CockroachDB also allows the following end state?

  1. tbl1 = {(1, 'changed')}, tbl2 = {(1)}

Doesn't this violate consistency/isolation somehow?

@ajwerner
Copy link
Contributor

I'm confused as to how your outcome 1 happens. It implies that tx2 does not run at all, right?

Let's assume tx0 occurs before tx1 or tx2. Now, given serializability, either tx1 happens before tx2 or tx2 happens before tx1.

If tx2 happens first, then you'll get outcome 2. If tx1 happens first, then you'll move to tbl1 = {(1, 'original')}, tbl2 = {(1)} and then tx2 will run and you'll get tbl1 = {(1, 'changed')}, tbl2 = {(1)} after tx1 commits and unblocks tx2.

@beikov
Copy link
Author

beikov commented Sep 29, 2022

Yeah sorry for that confusing example. During the construction of the example I kind of messed up. My expectation is that TX2 runs while TX1 is running. Something like this:

tx0 {
    insert("insert into tbl1(id, value) values (1, 'original')")
}
tx1 {
    parallel(
        var value = select("select t.value from tbl1 t where t.id = 1 for update")
        tx2 {
            update("update tbl1 t set t.value = 'changed' where t.id = 1")
        }
    )
    if (value == "original")
        insert("insert into tbl2(id) values (1)")
}

@beikov
Copy link
Author

beikov commented Sep 29, 2022

Not sure if I can come up with a good example here, but it's confusing that this last pseudo-program can "finish" with TX2 succeeding in CockroachDB but not in other DBs.

@beikov
Copy link
Author

beikov commented Sep 29, 2022

Thinking about this a bit more, the current behavior seems to allow for multiple transactions to return the same rows for select for update queries, which surely must be problematic, right? What does the SQL spec say about the effects of for update?

@ajwerner
Copy link
Contributor

ajwerner commented Sep 29, 2022

You made us curious so we did some digging. The 2008 SQL spec does not mention FOR UPDATE in the select grammar anywhere, so there's nothing to deviate from in terms of the spec itself. The spec also has a fun note about locking and determinism not being something that it specifies.

As it stands, you're absolutely right that this deviates from postgres, and that's not great in terms of user experience.

@michae2 linked #88262 which seems like one relevant gap. I linked #75456, which is another. In a distributed and fault-tolerant system like cockroach, in order to ensure that the locks remained in the face of server failures, we'd need to replicate the read locks, which would have a cost.

In today's cockroach, if you want to ensure that there is the desired pessimism, you'll need to incur a real write-write conflict.

@beikov
Copy link
Author

beikov commented Sep 29, 2022

Not sure what consequences a write-write conflict has, but what I generally want here is that TX2 blocks rather than fails.

Not sure how accurate the following is, but it states that the for update clause is specified for cursors, so maybe you can find the definition in a different chapter:
https://www.jooq.org/doc/latest/manual/sql-building/sql-statements/select-statement/for-update-clause/#for-update-in-sql-server

@ajwerner
Copy link
Contributor

It is specified for cursors. We looked at that. Technically, we're not talking about cursors here. As or more importantly, that specification doesn't mention blocking as far as I could find. It just talks about the operations supported on that cursor.

@beikov
Copy link
Author

beikov commented Sep 30, 2022

Ok, I guess my expectation/intuition that for update allows to implement a locking scheme is wrong. Just because all other databases seem to behave this way, doesn't mean that this is the only way for update can be implemented. I understand that failing the transactions on a serializability violation is valid as well. It's just unfortunate that we seem to depend on this sort of behavior in the Hibernate test suite.

Regardless, I think that doing a for update should be equivalent to a update statement in the sense that it should cause a write-write conflict. My intuition is that when for update was used, an actual update will happen later. But it seems to me that the nature of the current implementation, that for update is only propagated on a best effort basis, could be the source of issues when communicating with external systems.

Think of the following program

tx0 {
    insert("insert into tbl1(id, value) values (1, 'original')")
}
tx1 {
    select("select t.value from tbl1 t where t.id = 1 for update")
    // Assume we have a "lock" on row with id 1
    // Communicate with external system
    update("update tbl1 t set t.value = 'changed' where t.id = 1")
}
tx2 {
    select("select t.value from tbl1 t where t.id = 1 for update")
    // Assume we have a "lock" on row with id 1
    // Communicate with external system
    update("update tbl1 t set t.value = 'changed' where t.id = 1")
}

On other databases, after a for update statement, we can be sure that the current transaction has the "lock" on a row. But AFAIU with CockroachDB it would be possible that while TX1 runs, TX2 successfully finishes and TX1 only aborts at the update statement.

Thinking about this a bit more, the current behavior seems to allow for multiple transactions to return the same rows for select for update queries, which surely must be problematic, right?

I guess this is even more problematic when thinking about the skip locked clause. I mean this kind of pattern is probably used by many job queue implementations on top of SQL, so it seems wrong to me to propagate for update locks only on a best effort basis.

Is it guaranteed that when two transactions execute an update statement for the same row, that only one transaction will successfully return from that update statement? Or is it possible that both transactions appear to have updated the row and only one transaction might succeed on commit? If both transactions can get past the update statement, then the locking scheme is not implementable with CockroachDB, not even if we emit actual updates instead of select ... for update.

@michae2
Copy link
Collaborator

michae2 commented Sep 30, 2022

Side note: there's also some related discussion in #13546 about pg_advisory_lock.

@ajwerner
Copy link
Contributor

After some internal debate, @bdarnell has suggested that it could be considered a bug if you can have two SELECT FOR UPDATE calls and they both succeed and then they both commit without an error. That's debatable, but I'll file a tracking issue.

In general, the feeling is that while it's convenient to rely on locking semantics of databases for mutual exclusion to then coordinate with other systems (as you seem to desire), it's not bulletproof there either. For example, your client may become partitioned from the database and not learn that it has failed. Given cockroach is a distributed system and other single-node systems are not, we also have to contend with asynchronous network partitions between other parts of the system. There is a tradeoff between providing the safety property you want and retaining system liveness in the face of such modes of failure. Also, deadlocks can happen such that one query is aborted. Admittedly both of those situations are rare. Another factor here is that cockroach has to contend with the fact that it seeks to keep client sessions available even when nodes crash or go away.

We should, to the extent possible, provide the locking semantics you seek. We don't. It's not wrong for our definitions of correctness, but it is painful for the types of use you imagine. We have #88262 as open issues to cover common causes of locking failures #75456. Neither of these will prevent the loss of locks in the face of un-graceful loss of nodes.

In general, we're not eager to pay the price of replicating the lock metadata for every SELECT FOR UPDATE, but we could imagine a session variable which allowed you to opt into such behavior (#89111).

As @michae2 points out, while we have no implemented it, ideally we'd focus on making pg_advisory_lock useful for your use case.

@beikov
Copy link
Author

beikov commented Oct 4, 2022

Since the skip locked clause is very important for most (all?) job queue systems built on top of SQL, and they all rely on locking semantics of the for update clause, I'd like to urge you to replicate locks by default for for update queries. If you think about it, the skip locked clause implies that for update should do locking, right?

I guess that everyone would understand that transactions which do locking might fail when a network/cluster partition occurs while the transaction is running.

@ajwerner
Copy link
Contributor

ajwerner commented Oct 4, 2022

In the common case, skip locked and for update will compose correctly (we're just shipping SKIP LOCKED in 22.2, due out in another month or so). The issue is that there are some cases where the lock can be lost. We do want to fix those cases which happen outside of node failure (#88262, #75456).

Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@beikov
Copy link
Author

beikov commented Mar 28, 2024

I'm just a Hibernate ORM developer that wanted to report this IMO subtle behavior, so I personally don't care about whether CockroachDB behaves like all the other SQL databases. Feel free to close the issue if you don't plan to fix this.

@michae2
Copy link
Collaborator

michae2 commented Mar 29, 2024

@beikov this issue was definitely worth reporting. After you filed this, we did some significant work around SELECT FOR UPDATE and lock replication.

In 23.2 and 24.1, under read committed isolation, SELECT FOR UPDATE should now behave as it does in other databases: it should always block another SELECT FOR UPDATE or a mutation on the same row.

In 23.2 and 24.1, under serializable isolation, using the following settings will also get the same behavior:

  • SET enable_durable_locking_for_serializable = on;
  • SET optimizer_use_lock_op_for_serializable = on;

So under serializable isolation, with those settings, this should be fixed now.

@michae2
Copy link
Collaborator

michae2 commented Apr 1, 2024

I'm going to close this for now, since this should be fixed in 23.2 when using those settings mentioned above.

@michae2 michae2 closed this as completed Apr 1, 2024
@github-project-automation github-project-automation bot moved this to Closed in KV Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-kv KV Team X-blathers-triaged blathers was able to find an owner
Projects
No open projects
Status: Closed
Development

No branches or pull requests

3 participants