Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: fix off by one index in fileSSTSink file extension #98041

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

rhu713
Copy link
Contributor

@rhu713 rhu713 commented Mar 5, 2023

Currently, the logic that extends the last flushed file fileSSTSink does not trigger if there is only one flushed file. This failure to extend the first flushed file can result in file entries in the backup manifest with duplicate start keys. For example, if the first export response written to the sink contains partial entries of a single key a, then the span of the first file will be a-a, and the span of the subsequent file will always be a-<end_key>. The presence of these duplicate start keys breaks the encoding of the external manifest files list SST as the file path + start key combination in the manifest are assumed to be unique.

Fixes #97953

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rhu713 rhu713 marked this pull request as ready for review March 6, 2023 15:43
@rhu713 rhu713 requested a review from a team as a code owner March 6, 2023 15:43
@rhu713 rhu713 requested review from dt and removed request for a team March 6, 2023 15:43
@rhu713 rhu713 force-pushed the sst-sink-extend branch from f8183eb to cda9992 Compare March 6, 2023 15:55
@rhu713 rhu713 requested review from adityamaru and removed request for dt March 6, 2023 15:56
@rhu713 rhu713 force-pushed the sst-sink-extend branch 2 times, most recently from 3c046b5 to 0ab3e4e Compare March 6, 2023 20:05
Currently, the logic that extends the last flushed file fileSSTSink does not
trigger if there is only one flushed file. This failure to extend the first
flushed file can result in file entries in the backup manifest with duplicate
start keys. For example, if the first export response written to the sink
contains partial entries of a single key `a`, then the span of the first file
will be `a-a`, and the span of the subsequent file will always be
`a-<end_key>`. The presence of these duplicate start keys breaks the encoding
of the external manifest files list SST as the file path + start key
combination in the manifest are assumed to be unique.

Release note: None
@rhu713 rhu713 force-pushed the sst-sink-extend branch from 0ab3e4e to 6b39306 Compare March 6, 2023 20:12
@rhu713
Copy link
Contributor Author

rhu713 commented Mar 7, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

Build succeeded:

@craig craig bot merged commit a95ffcd into cockroachdb:master Mar 7, 2023
@@ -190,7 +190,7 @@ func (s *fileSSTSink) write(ctx context.Context, resp exportedSpan) error {
// If this span extended the last span added -- that is, picked up where it
// ended and has the same time-bounds -- then we can simply extend that span
// and add to its entry counts. Otherwise we need to record it separately.
if l := len(s.flushedFiles) - 1; l > 0 && s.flushedFiles[l].Span.EndKey.Equal(span.Key) &&
if l := len(s.flushedFiles) - 1; l >= 0 && s.flushedFiles[l].Span.EndKey.Equal(span.Key) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic be expressed in a way that a future person won’t re-introduce an off-by-one?

  • The -1 isn’t great -- how can a reader infer why that’s there?
  • The >= isn’t great either. Most loop bounds should have a clean <.
  • Maybe the several conditions on s.flushedFiles[l] should be inside the loop, with a break or continue.
  • Some well-named local vars would increase clarity.

(Bikeshed, can this be something other than l which looks like 1?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postmortem analysis of the above is kind of pointless unless we can replay the stream of thoughts exactly at the time OBOE was introduced. The choice of the mnemonic to denote the last position may have played some role, but I doubt that was the reason; i.e., l usually stands for length; I would have used last. More likely, the issue is the cognitive dissonance of 0-based and 1-based indexing, both of which are prevalent. E.g., when we talk about nodes in the cluster, we refer to n1 as the first; there are plenty of other examples.

@adityamaru
Copy link
Contributor

@rhu713 we should backport this to all release branches?

@rhu713
Copy link
Contributor Author

rhu713 commented Mar 7, 2023

blathers backport 22.2 22.1

@blathers-crl
Copy link

blathers-crl bot commented Mar 7, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 6b39306 to blathers/backport-release-22.2-98041: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2 failed. See errors above.


error creating merge commit from 6b39306 to blathers/backport-release-22.1-98041: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: duplicate Files in the BackupManifest result in a pebble SSTWriter error
5 participants