Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-21.1: changefeedccl: fail changefeeds when tables go offline #64372

Merged

Conversation

stevendanna
Copy link
Collaborator

Backport 1/1 commits from #64323.

/cc @cockroachdb/release


In 20.2.4, a changefeed would fail if IMPORT INTO was run against one
of its target tables. The failure would look like:

I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing›
(1)
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219
  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758
  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808
  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859
  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128
  | runtime.goexit
  |     /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371
Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing›
Error
types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError

We want changefeed to fail when IMPORT INTO is run because changes via
the AddSSTable mechanism is not currently reflected in the changefeed,
meaning we would fail to emit imported data.

The previous path that raised this failure depended on:

  1. The descriptor being offline at the point we attempted to acquire a
    lease on it:

    if err := tf.leaseMgr.AcquireFreshestFromStore(ctx, desc.ID); err != nil {

  2. The lease acquisition filtering out offline descriptors with an
    error:

    err = errors.Errorf("%s %q is offline: %s", desc.TypeName(), desc.GetName(), desc.GetOfflineReason())

  3. The failure from the lease acquisition in the schemafeed being
    treated as a fatal error.

I believe our behaviour here has changed a few times on both the 20.2
branch and master because of changes in each of these 3 behaviours.

In this change, rather than relying on the lease acquisition, we
specifically check for offline tables in our ValidateTable
function. This function is called for every descriptor version we get
from the ExportRequest on the Descriptor table.

Currently, I believe that checking for the offline descriptors is
correct since it appears that only restore and import put tables into
an offline state.

Release note (enterprise change): CHANGEFEEDs more reliably fail when
IMPORT INTO is run against a targeted table.

Fixes #64276

See also #62585, #43784

In 20.2.4, a changefeed would fail if IMPORT INTO was run against one
of its target tables. The failure would look like:

```
I210428 10:45:57.982012 2015 jobs/registry.go:1131 ⋮ [n1] CHANGEFEED job 653840730248282113: stepping through state failed with error: ‹relation› ‹"test"› is offline: ‹importing›
(1)
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql/catalog.FilterDescriptorState
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/descriptor.go:387
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:219
  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:758
  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:808
  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:757
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.storage.acquire
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:193
  | github.com/cockroachdb/cockroach/pkg/sql/catalog/lease.acquireNodeLease.func1
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/sql/catalog/lease/lease.go:859
  | github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight.(*Group).doCall
  |     /Users/ssd/go/src/github.com/cockroachdb/cockroach/pkg/util/syncutil/singleflight/singleflight.go:128
  | runtime.goexit
  |     /usr/local/Cellar/go/1.16.3/libexec/src/runtime/asm_amd64.s:1371
Wraps: (3) ‹relation› ‹"test"› is offline: ‹importing›
Error
types: (1) *catalog.inactiveDescriptorError (2) *withstack.withStack (3) *errutil.leafError
```

We want changefeed to fail when IMPORT INTO is run because changes via
the AddSSTable mechanism is not currently reflected in the changefeed,
meaning we would fail to emit imported data.

The previous path that raised this failure depended on:

1) The descriptor being offline at the point we attempted to acquire a
   lease on it:

   https://github.com/cockroachdb/cockroach/blob/d1962910b58005096ce411bccbaddcd0c1d30cbd/pkg/ccl/changefeedccl/schemafeed/schema_feed.go#L514

2) The lease acquisition filtering out offline descriptors with an
error:

   https://github.com/cockroachdb/cockroach/blob/eda2309728392593162e962a61182eab6ab003ff/pkg/sql/catalog/descriptor.go#L209

3) The failure from the lease acquisition in the schemafeed being
   treated as a fatal error.

I believe our behaviour here has changed a few times on both the 20.2
branch and master because of changes in each of these 3 behaviours.

In this change, rather than relying on the lease acquisition, we
specifically check for offline tables in our ValidateTable
function. This function is called for every descriptor version we get
from the ExportRequest on the Descriptor table.

Currently, I believe that checking for the offline descriptors is
correct since it appears that only restore and import put tables into
an offline state.

Release note (enterprise change): CHANGEFEEDs more reliably fail when
IMPORT INTO is run against a targeted table.

Fixes cockroachdb#64276

See also cockroachdb#62585, cockroachdb#43784
@stevendanna stevendanna requested review from a team, pbardea and miretskiy and removed request for a team April 29, 2021 09:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna merged commit 2fb7b6a into cockroachdb:release-21.1 Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants