Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: Poor handling of high conflict transactional workloads #11017

Closed
richardartoul opened this issue Oct 22, 2024 · 7 comments
Closed

spanner: Poor handling of high conflict transactional workloads #11017

richardartoul opened this issue Oct 22, 2024 · 7 comments
Assignees
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@richardartoul
Copy link

Client

Spanner

Environment

Latest Go version, any distribution.

Code and Dependencies

Using the Golang SQL driver which wraps this library: https://github.com/googleapis/go-sql-spanner

Expected behavior

We have a workload that performs a lot of appends to a table that has a primary key with a unique constraint. 99% of the time the row doesn't exist and everything works fine, however, we rely on Spanner maintaining the primary key constraint and to propagate errors back when we violate the constraint. In our application we handle the conflicts by modifying the primary key and retrying (basically we're doing compare-and-swap).

The expected behavior is that when we perform these inserts, the spanner client would immediately propagate the error back to us so the application can handle the retry (since a primary key violation is not a fundamentally retriable error).

Actual behavior

Based on what i'm observing in our logs, tracing, and whats described in this issue: #7903 I believe what is happening is the spanner client first tries to perform the insert using an inline transaction. This fails, and the ReadWriteTransaction function sees an errInlineBeginTransactionFailed error and then immediately retries with a non-inline transaction. This transaction will fail as well, then our application receives the error. This extra retry causes unnecessary conflicts / retries.

I'd expect to get the foreign key constraint violation after the first insert attempt, not the second. I also don't see any way for me to disable this automatic retry behavior. I also don't see any way to disable the inline transaction behavior, or prevent the initial retry.

@richardartoul richardartoul added the triage me I really want to be triaged. label Oct 22, 2024
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Oct 22, 2024
@olavloite olavloite assigned olavloite and unassigned harshachinta Oct 29, 2024
@olavloite
Copy link
Contributor

@richardartoul Just to be sure:

  1. You are using https://github.com/googleapis/go-sql-spanner and not the Go client library directly.
  2. You are executing the inserts using https://pkg.go.dev/database/sql#Conn.ExecContext, i.e. in 'autocommit mode'. (as opposed to https://pkg.go.dev/database/sql#Tx.ExecContext, where you would first explicitly start a transaction, execute the insert, and then call Commit).

Can you confirm the above?

The reason for asking the above is that:

  1. The retry that is executed in the Go client is to ensure that the statement becomes part of the transaction, as it could be that the application wants to continue in the same transaction to execute a different statement. When the first statement that carries an inline-begin fails, the transaction is also not started, meaning that any locks that would be taken by the first statement are also not held.
  2. We would indeed not need this retry if we know that the statement is the only statement that the application wants to execute in the transaction. This would be the case if you are using https://pkg.go.dev/database/sql#Tx.ExecContext.

One additional tip (this would also avoid the problem with the additional retries, as this method is specifically intended for single write operations):

  1. If your application is not restricted to only the generic database/SQL interface (that is; it does not need to support other database systems), then you could also consider using mutations for your append-only use-case. This would probably improve the throughput a bit for your use case.
  2. You can use mutations directly with the Spanner database/SQL driver by calling this method: https://github.com/googleapis/go-sql-spanner/blob/c39f57f06cbccd7cdde1091c844a916fddc3d67b/driver.go#L471
  3. There's an example for how to use it here: https://github.com/googleapis/go-sql-spanner/blob/main/examples/mutations/main.go

@olavloite olavloite added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Oct 29, 2024
@epot
Copy link

epot commented Nov 4, 2024

I am following up on this (I am working with Richie):

  1. yes that's correct we are using https://github.com/googleapis/go-sql-spanner
  2. we are using https://pkg.go.dev/database/sql#Stmt.ExecContext which I guess is the same?

@epot
Copy link

epot commented Nov 6, 2024

FYI I tried the mutations approach directly, and that seems to help a lot. However, I am seeing the memory of my processing growing over time without stopping, so I am guessing I am doing something wrong. I am still investigating, but here is the code sample if someone sees something obvious:

// Get a connection so that we can get access to the Spanner specific connection interface SpannerConn.
conn, err := s.db.Conn(ctx)
if err != nil {
	return fmt.Errorf("failed to get a connection: %w", err)
}

if err := conn.Raw(func(driverConn interface{}) error {
	spannerConn, ok := driverConn.(spannerdriver.SpannerConn)
	if !ok {
		return fmt.Errorf("unexpected driver connection %v, expected SpannerConn", driverConn)
	}

	_, err = spannerConn.Apply(ctx, []*spanner.Mutation{
		spanner.Insert(
			"mytable",
			[]string{"log_index", "entry"},
			[]interface{}{
				int64(le.Index),
				entry,
			}),
	})

	if err != nil {
		if strings.Contains(err.Error(), "spanner") &&
			strings.Contains(err.Error(), "AlreadyExists") {
			return errors.Join(err, rsm.ErrAlreadyExists)
		}

		return fmt.Errorf("failed to apply spanner mutation: %w", err)
	}
	return nil
}); err != nil {
	return fmt.Errorf("failed to get a raw spanner connection: %w", err)
}

return nil

@epot
Copy link

epot commented Nov 6, 2024

Ok I just saw this in the Conn method doc, I think it would make sense to put it in the example for dummies like me :) :

// Every Conn must be returned to the database pool after use by
// calling [Conn.Close].

@epot
Copy link

epot commented Nov 7, 2024

We are happy with your mutations workaround, feel free to close!

olavloite added a commit to googleapis/go-sql-spanner that referenced this issue Nov 18, 2024
Some of the examples missed a call to conn.Close().

Updates googleapis/google-cloud-go#11017
olavloite added a commit to googleapis/go-sql-spanner that referenced this issue Nov 18, 2024
Some of the examples missed a call to conn.Close().

Updates googleapis/google-cloud-go#11017
@olavloite
Copy link
Contributor

Ok I just saw this in the Conn method doc, I think it would make sense to put it in the example for dummies like me :) :

// Every Conn must be returned to the database pool after use by
// calling [Conn.Close].

Thanks for pointing out that :-)

It has been added in googleapis/go-sql-spanner#324

@epot
Copy link

epot commented Nov 27, 2024

Great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants