Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: lock on jobs table held in column backfill for a long time #83202

Closed
ajwerner opened this issue Jun 22, 2022 · 1 comment
Closed

sql: lock on jobs table held in column backfill for a long time #83202

ajwerner opened this issue Jun 22, 2022 · 1 comment
Assignees
Labels
A-jobs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jun 22, 2022

Describe the problem

In

cockroach/pkg/sql/backfill.go

Lines 1356 to 1358 in 5df10fe

if err := sc.job.FractionProgressed(ctx, txn, jobs.FractionUpdater(fractionCompleted)); err != nil {
return jobs.SimplifyInvalidStatusError(err)
}
we issue a transactional write to the jobs table and then proceed to do a very long backfill operation. This leaves a lock on the jobs table for minutes at a time. Instead, we ought to use a separate transaction to update the jobs table.

Additional context

The transaction in use in that block, more generally, is dubious. It's trying to hold a lease on the descriptor during the backfill, but it's not clear what that accomplishes.

Jira issue: CRDB-16918

@ajwerner ajwerner added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jun 22, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 22, 2022
@ajwerner ajwerner self-assigned this Jun 22, 2022
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jun 23, 2022
Backports will address cockroachdb#83202.

Release note (bug fix): Due to a bug in transaction lifetime management,
a lock could be held for a long period of time when adding a new column
to a table (or altering a column type). This contention could make the
jobs page non-responsive and job adoption slow. This bug has now been
fixed.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jun 23, 2022
Backports will address cockroachdb#83202.

Release note (bug fix): Due to a bug in transaction lifetime management,
a lock could be held for a long period of time when adding a new column
to a table (or altering a column type). This contention could make the
jobs page non-responsive and job adoption slow. This bug has now been
fixed.
craig bot pushed a commit that referenced this issue Jun 24, 2022
83257: sql: update progress in column backfiller non-transactionally r=ajwerner a=ajwerner

Backports will address #83202.

Release note (bug fix): Due to a bug in transaction lifetime management,
a lock could be held for a long period of time when adding a new column
to a table (or altering a column type). This contention could make the
jobs page non-responsive and job adoption slow. This bug has now been
fixed.

83276: sql/schemachanger/scexec/backfiller: fix race r=ajwerner a=ajwerner

We were missing a lock call.

```
==================
WARNING: DATA RACE
Read at 0x00c00b3e4c00 by goroutine 1874:
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller.(*Tracker).collectProgressForCheckpointFlush()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller/tracker.go:402 +0x104
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller.(*Tracker).FlushCheckpoint()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller/tracker.go:232 +0x57
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.BackfillerProgressFlusher.FlushCheckpoint-fm()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/dependencies.go:322 +0x6b
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller.(*periodicProgressFlusher).StartPeriodicUpdates.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller/periodic_progress_flusher.go:80 +0x26b
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller.(*periodicProgressFlusher).StartPeriodicUpdates.func3()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller/periodic_progress_flusher.go:92 +0xf1
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:74 +0x91

Previous write at 0x00c00b3e4c00 by goroutine 115:
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller.(*Tracker).SetBackfillProgress()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/backfiller/tracker.go:197 +0x2bb
  github.com/cockroachdb/cockroach/pkg/sql.(*IndexBackfillPlanner).BackfillIndexes.func2()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/index_backfiller.go:92 +0x21e
  github.com/cockroachdb/cockroach/pkg/sql.(*MetadataCallbackWriter).AddMeta()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:730 +0x65
  github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLReceiver).pushMeta()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:955 +0xd5
  github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLReceiver).Push()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:1084 +0x4cc
  github.com/cockroachdb/cockroach/pkg/sql/rowflow.(*copyingRowReceiver).Push()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowflow/row_based_flow.go:455 +0x2d5
  github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).Run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:374 +0xad1
  github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:475 +0x429
  github.com/cockroachdb/cockroach/pkg/sql/rowflow.(*rowBasedFlow).Run()
      <autogenerated>:1 +0x73
  github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:596 +0x1234
  github.com/cockroachdb/cockroach/pkg/sql.(*IndexBackfillPlanner).plan.func2()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/index_backfiller.go:204 +0x449
  github.com/cockroachdb/cockroach/pkg/sql.(*IndexBackfillPlanner).BackfillIndexes()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/index_backfiller.go:119 +0x71d
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.runBackfill()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/exec_backfill.go:381 +0x24b
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.runBackfiller.func2()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/exec_backfill.go:328 +0x22e
  github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.forEachProgressConcurrently.func1.1()
      /go/src/github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/exec_backfill.go:257 +0x155
  github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
      /go/src/github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:169 +0x4c
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:74 +0x91
```

Release note: None

Co-authored-by: Andrew Werner <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Jun 24, 2022
Backports will address #83202.

Release note (bug fix): Due to a bug in transaction lifetime management,
a lock could be held for a long period of time when adding a new column
to a table (or altering a column type). This contention could make the
jobs page non-responsive and job adoption slow. This bug has now been
fixed.
ajwerner added a commit to ajwerner/cockroach that referenced this issue Jun 27, 2022
Backports will address cockroachdb#83202.

Release note (bug fix): Due to a bug in transaction lifetime management,
a lock could be held for a long period of time when adding a new column
to a table (or altering a column type). This contention could make the
jobs page non-responsive and job adoption slow. This bug has now been
fixed.
@jordanlewis
Copy link
Member

🎉

@blathers-crl blathers-crl bot added the T-jobs label Jun 30, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead T-jobs labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-jobs C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

3 participants