spanner: Poor handling of high conflict transactional workloads #11017

richardartoul · 2024-10-22T19:54:20Z

Client

Spanner

Environment

Latest Go version, any distribution.

Code and Dependencies

Using the Golang SQL driver which wraps this library: https://github.com/googleapis/go-sql-spanner

Expected behavior

We have a workload that performs a lot of appends to a table that has a primary key with a unique constraint. 99% of the time the row doesn't exist and everything works fine, however, we rely on Spanner maintaining the primary key constraint and to propagate errors back when we violate the constraint. In our application we handle the conflicts by modifying the primary key and retrying (basically we're doing compare-and-swap).

The expected behavior is that when we perform these inserts, the spanner client would immediately propagate the error back to us so the application can handle the retry (since a primary key violation is not a fundamentally retriable error).

Actual behavior

Based on what i'm observing in our logs, tracing, and whats described in this issue: #7903 I believe what is happening is the spanner client first tries to perform the insert using an inline transaction. This fails, and the ReadWriteTransaction function sees an errInlineBeginTransactionFailed error and then immediately retries with a non-inline transaction. This transaction will fail as well, then our application receives the error. This extra retry causes unnecessary conflicts / retries.

I'd expect to get the foreign key constraint violation after the first insert attempt, not the second. I also don't see any way for me to disable this automatic retry behavior. I also don't see any way to disable the inline transaction behavior, or prevent the initial retry.

olavloite · 2024-10-29T07:32:05Z

@richardartoul Just to be sure:

You are using https://github.com/googleapis/go-sql-spanner and not the Go client library directly.
You are executing the inserts using https://pkg.go.dev/database/sql#Conn.ExecContext, i.e. in 'autocommit mode'. (as opposed to https://pkg.go.dev/database/sql#Tx.ExecContext, where you would first explicitly start a transaction, execute the insert, and then call Commit).

Can you confirm the above?

The reason for asking the above is that:

The retry that is executed in the Go client is to ensure that the statement becomes part of the transaction, as it could be that the application wants to continue in the same transaction to execute a different statement. When the first statement that carries an inline-begin fails, the transaction is also not started, meaning that any locks that would be taken by the first statement are also not held.
We would indeed not need this retry if we know that the statement is the only statement that the application wants to execute in the transaction. This would be the case if you are using https://pkg.go.dev/database/sql#Tx.ExecContext.

One additional tip (this would also avoid the problem with the additional retries, as this method is specifically intended for single write operations):

If your application is not restricted to only the generic database/SQL interface (that is; it does not need to support other database systems), then you could also consider using mutations for your append-only use-case. This would probably improve the throughput a bit for your use case.
You can use mutations directly with the Spanner database/SQL driver by calling this method: https://github.com/googleapis/go-sql-spanner/blob/c39f57f06cbccd7cdde1091c844a916fddc3d67b/driver.go#L471
There's an example for how to use it here: https://github.com/googleapis/go-sql-spanner/blob/main/examples/mutations/main.go

epot · 2024-11-04T16:07:57Z

I am following up on this (I am working with Richie):

yes that's correct we are using https://github.com/googleapis/go-sql-spanner
we are using https://pkg.go.dev/database/sql#Stmt.ExecContext which I guess is the same?

epot · 2024-11-06T15:51:21Z

FYI I tried the mutations approach directly, and that seems to help a lot. However, I am seeing the memory of my processing growing over time without stopping, so I am guessing I am doing something wrong. I am still investigating, but here is the code sample if someone sees something obvious:

// Get a connection so that we can get access to the Spanner specific connection interface SpannerConn.
conn, err := s.db.Conn(ctx)
if err != nil {
	return fmt.Errorf("failed to get a connection: %w", err)
}

if err := conn.Raw(func(driverConn interface{}) error {
	spannerConn, ok := driverConn.(spannerdriver.SpannerConn)
	if !ok {
		return fmt.Errorf("unexpected driver connection %v, expected SpannerConn", driverConn)
	}

	_, err = spannerConn.Apply(ctx, []*spanner.Mutation{
		spanner.Insert(
			"mytable",
			[]string{"log_index", "entry"},
			[]interface{}{
				int64(le.Index),
				entry,
			}),
	})

	if err != nil {
		if strings.Contains(err.Error(), "spanner") &&
			strings.Contains(err.Error(), "AlreadyExists") {
			return errors.Join(err, rsm.ErrAlreadyExists)
		}

		return fmt.Errorf("failed to apply spanner mutation: %w", err)
	}
	return nil
}); err != nil {
	return fmt.Errorf("failed to get a raw spanner connection: %w", err)
}

return nil

epot · 2024-11-06T15:54:40Z

Ok I just saw this in the Conn method doc, I think it would make sense to put it in the example for dummies like me :) :

// Every Conn must be returned to the database pool after use by
// calling [Conn.Close].

epot · 2024-11-07T13:31:52Z

We are happy with your mutations workaround, feel free to close!

Some of the examples missed a call to conn.Close(). Updates googleapis/google-cloud-go#11017

olavloite · 2024-11-27T17:05:02Z

Ok I just saw this in the Conn method doc, I think it would make sense to put it in the example for dummies like me :) :
// Every Conn must be returned to the database pool after use by
// calling [Conn.Close].

Thanks for pointing out that :-)

It has been added in googleapis/go-sql-spanner#324

epot · 2024-11-27T17:17:29Z

Great thanks!

richardartoul added the triage me I really want to be triaged. label Oct 22, 2024

product-auto-label bot added the api: spanner Issues related to the Spanner API. label Oct 22, 2024

blunderbuss-gcf bot assigned harshachinta Oct 22, 2024

olavloite assigned olavloite and unassigned harshachinta Oct 29, 2024

olavloite added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Oct 29, 2024

rahul2393 closed this as completed Nov 14, 2024

olavloite added a commit to googleapis/go-sql-spanner that referenced this issue Nov 18, 2024

docs: add missing conn.Close() calls

ae2dd11

Some of the examples missed a call to conn.Close(). Updates googleapis/google-cloud-go#11017

olavloite mentioned this issue Nov 18, 2024

docs: add missing conn.Close() calls googleapis/go-sql-spanner#324

Merged

olavloite added a commit to googleapis/go-sql-spanner that referenced this issue Nov 18, 2024

docs: add missing conn.Close() calls (#324)

b4803a6

Some of the examples missed a call to conn.Close(). Updates googleapis/google-cloud-go#11017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spanner: Poor handling of high conflict transactional workloads #11017

spanner: Poor handling of high conflict transactional workloads #11017

richardartoul commented Oct 22, 2024

olavloite commented Oct 29, 2024

epot commented Nov 4, 2024

epot commented Nov 6, 2024

epot commented Nov 6, 2024

epot commented Nov 7, 2024

olavloite commented Nov 27, 2024

epot commented Nov 27, 2024

spanner: Poor handling of high conflict transactional workloads #11017

spanner: Poor handling of high conflict transactional workloads #11017

Comments

richardartoul commented Oct 22, 2024

Client

Environment

Code and Dependencies

Expected behavior

Actual behavior

olavloite commented Oct 29, 2024

epot commented Nov 4, 2024

epot commented Nov 6, 2024

epot commented Nov 6, 2024

epot commented Nov 7, 2024

olavloite commented Nov 27, 2024

epot commented Nov 27, 2024