Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for reserved connection usage with transaction #7646

Merged

Conversation

harshit-gangal
Copy link
Member

@harshit-gangal harshit-gangal commented Mar 9, 2021

Description

The issue is when the session is in reserved connection and the connection is timeout.
Next time there is a transaction, that failed with a weird error of tablet alias not matching.

The fix has three parts.

  1. If the transaction query fails on a reserved connection do not update the session as the transaction id is not generated.
  2. Retry the transaction as this is a new transaction getting created and we can recreate the reserved connection.
  3. If the reserved connection is not found on tablet, then it was throwing UNKNOWN error code then the expected code of ABORTED.

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving

1. Update the shard sessions only if there is any update i.e. updated transaction id or updated reserved id
2. If there is failure to execute BeginExecute api and the connection is reserved conn then check if shard session can be reset and execute ReserveBeginExecute api

Signed-off-by: Harshit Gangal <[email protected]>
Signed-off-by: Harshit Gangal <[email protected]>
Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It looks like you audited the code and made sure that all paths that return an error do the right thing. Can you confirm that?

exec(t, conn, `insert into allDefaults () values ()`)
exec(t, conn, `commit`)

time.Sleep(6 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose this can be made any faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transaction timeout is 3 seconds and this is to ensure that transaction killer is run.

@@ -447,7 +447,7 @@ func (sbc *SandboxConn) HandlePanic(err *error) {
//ReserveBeginExecute implements the QueryService interface
func (sbc *SandboxConn) ReserveBeginExecute(ctx context.Context, target *querypb.Target, preQueries []string, sql string, bindVariables map[string]*querypb.BindVariable, options *querypb.ExecuteOptions) (*sqltypes.Result, int64, int64, *topodatapb.TabletAlias, error) {
reservedID := sbc.reserve(ctx, target, preQueries, bindVariables, 0, options)
result, transactionID, alias, err := sbc.BeginExecute(ctx, target, preQueries, sql, bindVariables, reservedID, options)
result, transactionID, alias, err := sbc.BeginExecute(ctx, target, nil, sql, bindVariables, reservedID, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong test setup, so fixed it.
prequeries were getting logged twice which was wrong.

@harshit-gangal harshit-gangal merged commit d88713b into vitessio:master Mar 9, 2021
@harshit-gangal harshit-gangal deleted the fix-reserve-conn-beginexec branch March 9, 2021 17:35
@askdba askdba added this to the v10.0 milestone Mar 10, 2021
@aquarapid
Copy link
Contributor

For searchability purposes; since there isn't an issue associated with this, here is the type of error that this PR addresses:

execInsertUnsharded: got non-matching aliases (cell:"blahblah" uid:0000000100  vs <nil>) for the same target (keyspace: external, tabletType: MASTER, shard: -)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants