Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manifest: unref files with range keys when version refcnt is zero #2022

Merged
merged 1 commit into from
Oct 20, 2022

Conversation

nicktrav
Copy link
Contributor

A manifest version uses reference counting to ensure that obsolete SSTables are only removed once it is safe to do so (i.e. no iterators or snapshots are open referencing tables in a older manifest version).

In #1771, a new slice of LevelMetadata was added to support range keys. A slice already existed for point keys (plus range dels). Currently, when a version's ref count reaches zero, only the files present in the point key LevelMetadata slice have their ref counts decremented. If a file also contains range keys, the reference count never becomes zero and the file becomes "zombied", and can never be deleted from disk.

Decrement the ref counts on files present in the range keys LevelMetadata slice, allowing the ref counts to be correctly zeroed.

Add regression test to exercise the bug observed in cockroachdb/cockroach#90149.

Fix cockroachdb/cockroach#90149.

@nicktrav nicktrav requested review from jbowens and a team October 20, 2022 17:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nicktrav
Copy link
Contributor Author

I can probably add some unit tests to the manifest package for this case. Thought I'd put this up for review now, with the higher level data-driven test.

@nicktrav nicktrav requested a review from itsbilal October 20, 2022 17:52
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right here in DB.Close can we error if len(d.mu.versions.zombieTables) > 0?

We've already:

  1. asserted Version.Refs() == 0 for all previous versions and .Refs() == 1 for the current version.
  2. and scheduled cleaning jobs to remove all obsolete files, which will remove files from d.mu.versions.zombieTables.

I don't think it's possible for len(d.mu.versions.zombieTables) to be greater than zero without a ref count leak.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @itsbilal)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert 0 zombie files and memtables when closing a DB in all our tests, including metamorphic tests?

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @itsbilal)

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo the DB.Close assertion

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @itsbilal)

A manifest version uses reference counting to ensure that obsolete
SSTables are only removed once it is safe to do so (i.e. no iterators or
snapshots are open referencing tables in a older manifest version).

In cockroachdb#1771, a new slice of `LevelMetadata` was added to support range
keys. A slice already existed for point keys (plus range dels).
Currently, when a version's ref count reaches zero, only the files
present in the point key `LevelMetadata` slice have their ref counts
decremented. If a file also contains range keys, the reference count
never becomes zero and the file becomes "zombied", and can never be
deleted from disk.

Decrement the ref counts on files present in the range keys
`LevelMetadata` slice, allowing the ref counts to be correctly zeroed.

Add regression test to exercise the bug observed in
cockroachdb/cockroach#90149.

Fix cockroachdb/cockroach#90149.
@nicktrav nicktrav force-pushed the nickt.range-key-ref-counts branch from 205e149 to 2811cec Compare October 20, 2022 21:03
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: 1 of 4 files reviewed, all discussions resolved (waiting on @itsbilal)

Copy link
Contributor Author

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! Ack on the memtable count - let me follow up once this lands.

in all our tests, including metamorphic tests?

It feels like what we want is some general harness for asserting that a DB and its associated iterators are always closed after tests. Similar to what we have in Cockroach. Let's mull that over on #2032.

Reviewable status: 1 of 4 files reviewed, all discussions resolved (waiting on @itsbilal and @jbowens)

@nicktrav nicktrav merged commit cc0dd37 into cockroachdb:master Oct 20, 2022
@nicktrav nicktrav deleted the nickt.range-key-ref-counts branch October 20, 2022 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: disk space not reclaimed even though live bytes are deleted
4 participants