storage: disk space not reclaimed even though live bytes are deleted #90149

rafiss · 2022-10-18T16:04:50Z

Describe the problem

@glennfawcett ran an experiment with row-level TTL. Even though rows are deleted by the job, and the KV layer seemingly thinks the space is being freed, the disk usage continues steadily increasing.

To Reproduce

See instructions in https://cockroachlabs.slack.com/archives/C028YAH6Y3V/p1665959426411949

Expected behavior

Disk space should be freed.

Additional data / screenshots

Environment:

CockroachDB version v22.2.0-beta.3

Jira issue: CRDB-20616

nicktrav · 2022-10-18T16:14:56Z

I took a look at the /debug/LSM output for one node and it looks like there are a large number of Zombie tables ztlb (tables that are no longer in the working set of the LSM, but "something" is holding a reference to them_. This prevents the tables from being physically deleted from filesystem:

Store 1:
__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
    WAL         1    16 M       -    83 G       -       -       -       -    85 G       -       -       -     1.0
      0         0     0 B    0.00    85 G   5.6 G     534     0 B       0    10 G   8.4 K     0 B       0     0.1
      1         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      2         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      3         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      4        16    54 M    0.90    15 G   288 M      36   499 M     132   112 G    35 K   112 G       1     7.3
      5        54   295 M    0.94    14 G   1.5 G     438   1.5 G     520    66 G    12 K    67 G       1     4.6
      6       125   1.6 G       -    21 G   2.5 G      30   272 M      39    79 G   6.3 K    89 G       1     3.8
  total       195   2.0 G       -    95 G   9.9 G   1.0 K   2.2 G     691   362 G    62 K   268 G       3     3.8
  flush      1784
compact     14035     0 B     0 B       0          (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
  ctype     12972     129     231     691      12       0       0  (default, delete, elision, move, read, rewrite, multi-level)
 memtbl         1    64 M
zmemtbl         0     0 B
   ztbl      8603    46 G
 bcache     495 K   7.3 G   96.0%  (score == hit-rate)
 tcache      38 K    24 M   99.9%  (score == hit-rate)
  snaps         0       -       0  (score == earliest seq num)
 titers         0
 filter         -       -   97.6%  (score == utility)

There are multiple stores on that one node, each with ~45-50 GB of zombie tables.

Unfortunately, the observability story here isn't as good as it could be. The last time we investigated something like this, the goroutine dumps were useful (typically one or more blocked goroutines in Pebble iterator codepaths).

Here's an issue tracking observability improvements.

jbowens · 2022-10-18T16:24:56Z

maybe my request for KV to investigate was premature: there are zero zombie memtables and no open sstable iterators? that’s unexpected… i suppose it’s possible from an iterator leak if there’s been no flushes and the leaked iterator hasn’t been positioned?

nicktrav · 2022-10-18T17:26:39Z

Interestingly, the table cache size is more or less monotonically increasing. In the LSM printout above, we have 38K tables in the cache, yet only 195 + 8603 tables in the LSM (live + zombie). I find this surprising, but it may not be causal.

Gist of table cache size over time:
https://gist.github.com/nicktrav/8843e073b7497c1cb8e294fec964878b

Edit: This was a red herring. The nodes are running with four stores, and the table cache is shared. Adding the ztbl counts and the live table count we get the table cache count. Gist.

nicktrav · 2022-10-19T00:40:48Z

I took another debug.zip and tsdump and put them both here.

nicktrav · 2022-10-19T01:05:34Z

Nothing stood out from the goroutine dumps. Per Jackson's comment, it's also a bit weird that the zombie memtable count and the table iterator counts are both zero. There are also no Pebble snapshots open.

It might be good for someone from KV to take a quick look at this from the TTL perspective, just as a sense check.

As a next step for Storage, it's probably worth trying to run a patched binary that will give us some more insight into where we're leaking the refcounts on the tables that (I assume) is preventing the tables from being deleted.

nicktrav · 2022-10-19T04:16:37Z

There's definitely something off with the ztbls. I found this even before running the workload. Fresh, stock cluster, nothing to do with TTL.

To repro:

# New cluster.
$ ./bin/roachprod create travers-test -n 8 -c gce --gce-machine-type n1-standard-16
$ ./bin/roachprod stage travers-test release v22.2.0-beta.3
$ ./bin/roachprod start travers-test

# Fetch ztbls from each node.
$ date; for i in $(seq 1 8); do echo $i; curl -s http://travers-test-000$i.roachprod.crdb.io:26258/debug/lsm | grep ztbl; done
Tue Oct 18 21:02:05 PDT 2022
1
   ztbl        15    36 K
2
   ztbl       220   368 K
3
   ztbl       210   357 K
4
   ztbl       170   419 K
5
   ztbl       170   536 K
6
   ztbl       170   516 K
7
   ztbl       180   595 K
8
   ztbl       170   480 K

I would not expect these to be zero for an extended period of time.

Here's n2's LSM:

Store 2:
__level_____count____size___score______in__ingest(sz_cnt)____move(sz_cnt)___write(sz_cnt)____read___r-amp___w-amp
    WAL         1   8.3 K       -   507 K       -       -       -       -   520 K       -       -       -     1.0
      0         0     0 B    0.00   512 K   243 K     143     0 B       0   143 K      33     0 B       0     0.3
      1         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      2         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      3         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      4         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      5         0     0 B    0.00     0 B     0 B       0     0 B       0     0 B       0     0 B       0     0.0
      6         4   131 K       -   469 K    84 K      55   1.5 K       1   3.6 M      79   3.9 M       1     7.9
  total         4   131 K       -   848 K   328 K     198   1.5 K       1   4.6 M     112   3.9 M       1     5.5
  flush        33
compact       121     0 B     0 B       0          (size == estimated-debt, score = in-progress-bytes, in = num-in-progress)
  ctype        65       0      55       1       0       0       0  (default, delete, elision, move, read, rewrite, multi-level)
 memtbl         1   256 K
zmemtbl         0     0 B
   ztbl       165   290 K
 bcache     1.0 K   678 K   82.3%  (score == hit-rate)
 tcache       167   111 K   93.5%  (score == hit-rate)
  snaps         0       -       0  (score == earliest seq num)
 titers         0
 filter         -       -   47.5%  (score == utility)

Only 4 tables in the live set, but 165 tables that are already zombied. This is without even having run any DDL or DML.

I then start writing data (I adapted the Python script from the Slack thread into a little Go workload), I start to see the ztbl count increase (albeit very slowly).

$ date; for i in $(seq 1 8); do echo $i; curl -s http://travers-test-000$i.roachprod.crdb.io:26258/debug/lsm | grep ztbl; done
Tue Oct 18 21:07:52 PDT 2022
1
   ztbl        35   7.2 M
2
   ztbl       220   368 K
3
   ztbl       220   2.5 M
4
   ztbl       180    11 M
5
   ztbl       185   4.4 M
6
   ztbl       195   6.5 M
7
   ztbl       205   6.2 M
8
   ztbl       195   7.5 M

Not saying this is related to the TTL issue, but this feels odd enough to dig into.

Getting late here. Will pick this up again tomorrow in the AM.

erikgrinaker · 2022-10-19T08:03:12Z

Any chance this has to do with the iterator reuse changes in 22.2? Or is it just a plain ol' leak?

nicktrav · 2022-10-19T14:55:59Z

Verified this is not an issue in 22.1.9. I don't think this is particularly surprising (the regression was in 22.2 development). Working on tracking it down.

rafiss · 2022-10-19T15:32:10Z

22.2 also had substantial development on how the ttljob works, but they were all at the SQL level.

Just throwing in an idea in case it could help: do we know for sure that the problem is limited to row-level-TTL? Would it help to write a test that has the same setup, but then tries to delete rows with an external script, rather than with row-level-TTL?

nicktrav · 2022-10-19T15:41:39Z

do we know for sure that the problem is limited to row-level-TTL?

Based on what I'm seeing, there's at least one regression around ref counting of Pebble SSTables present on master (and the 22.2 release branch), and this is not related to row-level-TTL.

I'm hoping that whatever that issue may be, it is being tickled even more by row-level-TTL / GC, which is resulting in the issue documented here (large, monotonically increasing ztbl counts). Can't rule out row-level-TTL just yet. Best case we fix that, and it fixes the issue as observed with row-level-TTL.

I'm going to keep posting my findings on the ztbl leak here for now. Open to spinning up a new issue for that, if that's easier.

Would it help to write a test that has the same setup, but then tries to delete rows with an external script, rather than with row-level-TTL?

My attempts at reproducing this with a similar workload and setup to the one provided in the Slack thread didn't yield anything fruitful. I will revisit that once I find out what's going on with this "non-zero ztbls immediately on startup" line of investigation.

nicktrav · 2022-10-19T18:00:17Z

I was able to use some nightly builds to bisect down to a smaller range of commits to check.

cockroach-v22.2.0-alpha.1-172-g448352ce4e.linux-amd64 - BAD
cockroach-v22.2.0-alpha.00000000-2096-g748b89a3fb.linux-amd64 - BAD
cockroach-v22.2.0-alpha.00000000-1181-g4e81340338.linux-amd64 - BAD
cockroach-v22.2.0-alpha.00000000-358-g5a54758ce8.linux-amd64 - GOOD
cockroach-v22.2.0-alpha.00000000-765-g47df1bc3a2.linux-amd64 - GOOD
cockroach-v22.2.0-alpha.00000000-834-g13cb2f6c40.linux-amd64 - GOOD
cockroach-v22.2.0-alpha.00000000-984-ga611410e3b.linux-amd64 - BAD
cockroach-v22.2.0-alpha.00000000-968-gfc7ae339c3.linux-amd64 - BAD
cockroach-v22.2.0-alpha.00000000-895-g7a1ced4a49.linux-amd64 - GOOD
cockroach-v22.2.0-alpha.00000000-935-g7af6c0d3e2.linux-amd64 - GOOD

First bad: cockroach-v22.2.0-alpha.00000000-968-gfc7ae339c3.linux-amd64

This translates to the following two Cockroach SHAs:

Good: 7af6c0d
Bad: fc7ae33

There's a ~days worth of commits in there to pick through:

fc7ae33 Merge #83361
68bd121 Merge #82248
12462e8 Merge #83188 #83191 #83339 #83349
b4fecad Merge #83340
324c837 Merge #83010
b75ebd1 Merge #82873 #83282 🔴
d95c088 Merge #82616 #83153 🔴
5a74621 Merge #82927
851e329 Merge #82799 🔴
c09d7fc Merge #83031 🟢
4ff65df Merge #82695 🟢
7af6c0d Merge #82938 #82992 #83008 #83290 #83299 🟢

There are a few pkg/storage changes in there. More to follow ...

nicktrav · 2022-10-19T20:52:06Z

Ok - I'm confident that the issue was introduced in #82799, back in June. Going to pick through that again now for iterator leaks. cc: @erikgrinaker

erikgrinaker · 2022-10-19T21:01:01Z

Thanks for bisecting this down Nick! That's rather surprising, since that PR only deals with infrastructure for external SST files, and doesn't even really enable any of it in production code. I'll have a closer look tomorrow unless you beat me to it.

nicktrav · 2022-10-19T21:08:59Z

I'll take a stab while you 💤, and post back here when I log off.

It could also be the case that we started using some new Pebble API here, and that's where the leak is coming from.

nicktrav · 2022-10-20T05:31:43Z

Last update for the evening ... I wasn't able to make too much more progress.

There's something holding onto Pebble read states for long periods of time. I've seen cases where the read state stuck around for 60+ seconds, while the database was under import load, so there would have been plenty of compactions causing updates to the manifest. A read state hanging around would prevent the dead tables from being deleted.

In the case of one of these read states, I instrumented Pebble to print the stacks of goroutines that had incremented the ref count on the read state, and then those that decremented the count. I saw two goroutines performing replica consistency checks. These threads too over a minute to complete.

I wonder if it is expected behavior for those threads to run for that long. Maybe? This made me suspect that this is red herring.

Digging into the commit I bisected down to probably makes the most sense. I'll leave that to you Erik. I have a feeling you may be right about it being the re-use / cloning.

It's fairly easy to spot this issue - spin up a few nodes with roachprod, wipe, then start. Hitting /debug/lsm on a few nodes you should see non-zero ztbls (with an 8 node cluster, I see around 150-200 ztbls on various nodes). Note that you shouldn't need to do anything other than start the cluster.

erikgrinaker · 2022-10-20T12:03:27Z

Seems to be caused by the RangeKeyDelete range key tombstones that we lay down at the bottom of Raft snapshots. This patch "fixes" the leak:

diff --git a/pkg/storage/sst_writer.go b/pkg/storage/sst_writer.go
index be4e4c7d28..f920b99bf6 100644
--- a/pkg/storage/sst_writer.go
+++ b/pkg/storage/sst_writer.go
@@ -136,7 +136,7 @@ func (fw *SSTWriter) ClearRawRange(start, end roachpb.Key, pointKeys, rangeKeys
                        return err
                }
        }
-       if rangeKeys && fw.supportsRangeKeys {
+       if rangeKeys && fw.supportsRangeKeys && false {
                fw.DataSize += int64(len(start)) + int64(len(end))
                if err := fw.fw.RangeKeyDelete(fw.scratch, endRaw); err != nil {
                        return err

Or in the original cf162f9:

diff --git a/pkg/storage/sst_writer.go b/pkg/storage/sst_writer.go
index f7c3a16d98..84f2e9e9d8 100644
--- a/pkg/storage/sst_writer.go
+++ b/pkg/storage/sst_writer.go
@@ -206,7 +206,8 @@ func (fw *SSTWriter) ExperimentalClearAllRangeKeys(start roachpb.Key, end roachp
        // SST, to avoid dropping unnecessary range tombstones. However, this may not
        // be safe, because the caller may want to ingest the SST including the range
        // tombstone into an engine that does have range keys that should be cleared.
-       return fw.fw.RangeKeyDelete(EncodeMVCCKeyPrefix(start), EncodeMVCCKeyPrefix(end))
+       //return fw.fw.RangeKeyDelete(EncodeMVCCKeyPrefix(start), EncodeMVCCKeyPrefix(end))
+       return nil
 }

nicktrav · 2022-10-20T13:54:21Z

Thanks for digging into it! Now the question .. why does does that tombstone result in zombied tables.

nicktrav · 2022-10-20T15:28:35Z

Posting some more data.

Looking at the contents of the tables that aren't being deleted, all of the ones I've looked at have range keys present.

For example, in this LSM, there are only three tables in L6 in the live set:

EOF
--- L0 ---
--- L1 ---
--- L2 ---
--- L3 ---
--- L4 ---
--- L5 ---
--- L6 ---
  000547:331157<#0-#0>[/Local/RangeID/1/r/RangeGCThreshold/0,0#0,SET-/Table/39/1/"\xc5\xfc,wv\xdfMw\x8a\xccCu\x93\xce\"\xc9"/0/1666278048.039970776,0#0,SET]
  000199:2030<#0-#0>[/Table/47/1/""/0/1666278047.240654686,0#0,SET-/Table/47/1/"\xbc"/0/1666278047.240654686,0#0,SET]
  000284:1018<#0-#0>[/Table/48/1/0/0/1666278047.069367782,0#0,SET-/Table/48/1/0/0/1666278047.069367782,0#0,SET]

Yet we have 263 tables on disk:

$ find ~/local/travers-test-2/data -name '*.sst' | wc -l
263

Here's the lifecycle of some of the tables that aren't in the working set but are still on disk.

Table 000004.

169/4
  next-file-num: 12
  last-seq-num:  21
  added:         L6 000004:1446<#16-#16>[/Local/RangeID/33/r""/0,0#16,RANGEKEYDEL-/Local/RangeID/33/s""/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)
  added:         L0 000009:1169<#17-#17>[/Local/RangeID/33/u""/0,0#17,RANGEDEL-/Local/RangeID/33/v""/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)
  added:         L0 000005:1527<#18-#18>[/Local/Range/Table/32/0,0#18,RANGEKEYDEL-/Local/Range/Table/33/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)
  added:         L6 000006:1200<#19-#19>[/Local/Lock/Intent/Local/Range/Table/32/0,0#19,RANGEKEYDEL-/Local/Lock/Intent/Local/Range/Table/33/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)
  added:         L6 000007:1175<#20-#20>[/Local/Lock/Intent/Table/32/0,0#20,RANGEKEYDEL-/Local/Lock/Intent/Table/33/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)
  added:         L6 000008:1151<#21-#21>[/Table/32/0,0#21,RANGEKEYDEL-/Table/33/0,0#72057594037927935,RANGEDEL] (2022-10-20T15:00:48Z)

Created in an ingestion into L7:

I221020 15:00:48.123766 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 16  [JOB 5] ingesting: sstable created 000004
I221020 15:00:48.123792 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 17  [JOB 5] ingesting: sstable created 000009
I221020 15:00:48.123802 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 18  [JOB 5] ingesting: sstable created 000005
I221020 15:00:48.123814 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 19  [JOB 5] ingesting: sstable created 000006
I221020 15:00:48.123824 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 20  [JOB 5] ingesting: sstable created 000007
I221020 15:00:48.123833 464 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 21  [JOB 5] ingesting: sstable created 000008
I221020 15:00:48.131327 464 3@pebble/event.go:665 ⋮ [n2,s2,pebble] 28  [JOB 5] ingested L6:000004 (1.4 K), L0:000009 (1.1 K), L0:000005 (1.5 K), L6:000006 (1.2 K), L6:000007 (1.1 K), L6:000008 (1.1 K)

Removed in an elision-only compaction:

I221020 15:00:48.137726 654 3@pebble/event.go:625 ⋮ [n2,s2,pebble] 32  [JOB 11] compacting(elision-only) L6 [000004] (1.4 K) + L6 [] (0 B)
I221020 15:00:48.137851 654 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 35  [JOB 11] compacting: sstable created 000013
I221020 15:00:48.141138 654 3@pebble/event.go:629 ⋮ [n2,s2,pebble] 36  [JOB 11] compacted(elision-only) L6 [000004] (1.4 K) + L6 [] (0 B) -> L6 [000013] (1.1 K), in 0.0s (0.0s total), output rate 553 K/s

The resulting table was stripped of the range del and the range key dels:

696/6
  next-file-num: 14
  last-seq-num:  27
  deleted:       L6 000004
  added:         L6 000013:1172<#0-#0>[/Local/RangeID/33/r/RangeGCThreshold/0,0#0,SET-/Local/RangeID/33/r/RangeVersion/0,0#0,SET] (2022-10-20T15:00:48Z)

From that initial ingestion, all of the tables that contain RANGEKEYDELs are still present on disk. The file 000009 (which only had a RANGEDEL was deleted):

$ find ~/local/travers-test-2/data -name '*.sst' | sort | head -n 10
/home/nickt/local/travers-test-2/data/000004.sst
/home/nickt/local/travers-test-2/data/000005.sst
/home/nickt/local/travers-test-2/data/000006.sst
/home/nickt/local/travers-test-2/data/000007.sst
/home/nickt/local/travers-test-2/data/000008.sst
/home/nickt/local/travers-test-2/data/000014.sst
/home/nickt/local/travers-test-2/data/000015.sst
/home/nickt/local/travers-test-2/data/000016.sst
/home/nickt/local/travers-test-2/data/000017.sst
/home/nickt/local/travers-test-2/data/000018.sst

Table 000009 being deleted:

627/5
  next-file-num: 13
  last-seq-num:  24
  deleted:       L0 000005
  deleted:       L0 000009
  deleted:       L6 000011
  added:         L6 000012:1754<#0-#0>[/Local/RangeID/33/u/RaftHardState/0,0#0,SET-/Local/Store"uptm"/0,0#0,SET] (2022-10-20T15:00:48Z)

I221020 15:00:48.131407 650 3@pebble/event.go:625 ⋮ [n2,s2,pebble] 29  [JOB 10] compacting(default) L0 [000009 000005] (2.6 K) + L6 [000011] (1.5 K)
I221020 15:00:48.131530 650 3@pebble/event.go:657 ⋮ [n2,s2,pebble] 30  [JOB 10] compacting: sstable created 000012
I221020 15:00:48.137659 650 3@pebble/event.go:629 ⋮ [n2,s2,pebble] 31  [JOB 10] compacted(default) L0 [000009 000005] (2.6 K) + L6 [000011] (1.5 K) -> L6 [000012] (1.7 K), in 0.0s (0.0s total), output rate 951 K/s
I221020 15:00:48.137797 653 3@pebble/event.go:661 ⋮ [n2,s2,pebble] 33  [JOB 10] sstable deleted 000009
I221020 15:00:48.137854 653 3@pebble/event.go:661 ⋮ [n2,s2,pebble] 34  [JOB 10] sstable deleted 000011

Putting this all together, the only tables that remain in the working set of no range keys. All of the tables that are present on disk that are not in the working set do have range keys.

$ for table in $(find ~/local/travers-test-2/data/ -name '*.sst' | sort); do if [[ $(./cockroach debug pebble sstable scan $table | grep RANGEKEYDEL | wc -l) == 0 ]]; then echo "$table: NO RANGE KEYS"; fi; done
/home/nickt/local/travers-test-2/data/000199.sst: NO RANGE KEYS
/home/nickt/local/travers-test-2/data/000284.sst: NO RANGE KEYS
/home/nickt/local/travers-test-2/data/000547.sst: NO RANGE KEYS

A manifest version uses reference counting to ensure that obsolete SSTables are only removed once it is safe to do so (i.e. no iterators or snapshots are open referencing tables in a older manifest version). In cockroachdb#1771, a new slice of `LevelMetadata` was added to support range keys. A slice already existed for point keys (plus range dels). Currently, when a version's ref count reaches zero, only the files present in the point key `LevelMetadata` slice have their ref counts decremented. If a file also contains range keys, the reference count never becomes zero and the file becomes "zombied", and can never be deleted from disk. Decrement the ref counts on files present in the range keys `LevelMetadata` slice, allowing the ref counts to be correctly zeroed. Add regression test to exercise the bug observed in cockroachdb/cockroach#90149. Fix cockroachdb/cockroach#90149.

90406: vendor: bump Pebble to cc0dd372b6d1 r=jbowens a=nicktrav ``` cc0dd372 manifest: unref files with range keys when version refcnt is zero 653147d0 metamorphic: Flush before close if DisableWAL is true ff933d45 scripts: default to a 30m timeout in non-stress run-crossversion-meta.sh 24f67a4b db: fix FormatPrePebblev1Marked migration 1424c12d metamorphic: Copy data/wal dirs from initial state 5ed983e5 sstable: adjust sizeEstimate logic for forthcoming value blocks ``` Release note: None. Epic: None. Touches #90149. Co-authored-by: Nick Travers <[email protected]>

A manifest version uses reference counting to ensure that obsolete SSTables are only removed once it is safe to do so (i.e. no iterators or snapshots are open referencing tables in a older manifest version). In #1771, a new slice of `LevelMetadata` was added to support range keys. A slice already existed for point keys (plus range dels). Currently, when a version's ref count reaches zero, only the files present in the point key `LevelMetadata` slice have their ref counts decremented. If a file also contains range keys, the reference count never becomes zero and the file becomes "zombied", and can never be deleted from disk. Decrement the ref counts on files present in the range keys `LevelMetadata` slice, allowing the ref counts to be correctly zeroed. Add regression test to exercise the bug observed in cockroachdb/cockroach#90149. Fix cockroachdb/cockroach#90149.

``` 0d8cae91 manifest: unref files with range keys when version refcnt is zero 8ed3a9e4 metamorphic: Flush before close if DisableWAL is true 4f0ec6bd db: fix FormatPrePebblev1Marked migration ab1e28f6 metamorphic: Copy data/wal dirs from initial state ``` Release note: None. Release justification: Fixes a high severity issue. Epic: None. Informs cockroachdb#90149.

nicktrav · 2022-10-21T02:14:26Z

Fix is now on master and the 22.2 / 22.2.0 release branches. Going to keep this open until we verify that the original problem (observed with the TTL workload) is fixed.

erikgrinaker · 2022-10-21T13:29:50Z

Can we modify the ClearRange roachtest to also verify that Pebble compacts away the garbage?

nicktrav · 2022-10-21T16:35:20Z

We could - thoughts on what the assertion would be?

erikgrinaker · 2022-10-21T18:20:34Z

That the used store space is below some reasonable threshold, or that it dropped some share from peak.

jbowens · 2022-10-21T19:12:03Z

a relevant tbg TODO:

		// TODO(tschottdorf): verify that disk space usage drops below to <some small amount>, but that
		// may not actually happen (see https://github.com/cockroachdb/cockroach/issues/29290).

nicktrav · 2022-10-24T16:09:58Z

We're satisfied that the issues we were seeing in the original description have been addressed.

blathers-crl bot added the T-storage Storage Team label Oct 18, 2022

nicktrav self-assigned this Oct 18, 2022

nicktrav reopened this Oct 20, 2022

nicktrav closed this as completed in nicktrav/pebble@cc0dd37 Oct 20, 2022

rafiss reopened this Oct 20, 2022

This was referenced Oct 20, 2022

[crl-release-22.2]: manifest: unref files with range keys when version refcnt is zero cockroachdb/pebble#2034

Merged

vendor: bump Pebble to cc0dd372b6d1 #90406

Merged

nicktrav mentioned this issue Oct 20, 2022

release-22.2: vendor: bump Pebble to 0d8cae91763c #90412

Merged

nicktrav mentioned this issue Oct 20, 2022

release-22.2.0: vendor: bump Pebble to 0d8cae91763c #90413

Merged

nicktrav removed GA-blocker blocks-22.2.0-rc.1 labels Oct 24, 2022

nicktrav closed this as completed Oct 24, 2022

nicktrav changed the title ~~sql/ttljob: disk space not reclaimed even though live bytes are deleted by ttljob~~ storage: disk space not reclaimed even though live bytes are deleted Oct 24, 2022

jbowens mentioned this issue Oct 25, 2022

kvserver: TestStoreMetrics is flakey #90631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: disk space not reclaimed even though live bytes are deleted #90149

storage: disk space not reclaimed even though live bytes are deleted #90149

rafiss commented Oct 18, 2022 •

edited by cockroach-jira-scripts

Loading

nicktrav commented Oct 18, 2022

jbowens commented Oct 18, 2022

nicktrav commented Oct 18, 2022 •

edited

Loading

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022

erikgrinaker commented Oct 19, 2022

nicktrav commented Oct 19, 2022

rafiss commented Oct 19, 2022

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022 •

edited

Loading

nicktrav commented Oct 19, 2022

erikgrinaker commented Oct 19, 2022

nicktrav commented Oct 19, 2022 •

edited

Loading

nicktrav commented Oct 20, 2022

erikgrinaker commented Oct 20, 2022

nicktrav commented Oct 20, 2022

nicktrav commented Oct 20, 2022

nicktrav commented Oct 21, 2022

erikgrinaker commented Oct 21, 2022

nicktrav commented Oct 21, 2022

erikgrinaker commented Oct 21, 2022

jbowens commented Oct 21, 2022

nicktrav commented Oct 24, 2022

storage: disk space not reclaimed even though live bytes are deleted #90149

storage: disk space not reclaimed even though live bytes are deleted #90149

Comments

rafiss commented Oct 18, 2022 • edited by cockroach-jira-scripts Loading

nicktrav commented Oct 18, 2022

jbowens commented Oct 18, 2022

nicktrav commented Oct 18, 2022 • edited Loading

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022

erikgrinaker commented Oct 19, 2022

nicktrav commented Oct 19, 2022

rafiss commented Oct 19, 2022

nicktrav commented Oct 19, 2022

nicktrav commented Oct 19, 2022 • edited Loading

nicktrav commented Oct 19, 2022

erikgrinaker commented Oct 19, 2022

nicktrav commented Oct 19, 2022 • edited Loading

nicktrav commented Oct 20, 2022

erikgrinaker commented Oct 20, 2022

nicktrav commented Oct 20, 2022

nicktrav commented Oct 20, 2022

nicktrav commented Oct 21, 2022

erikgrinaker commented Oct 21, 2022

nicktrav commented Oct 21, 2022

erikgrinaker commented Oct 21, 2022

jbowens commented Oct 21, 2022

nicktrav commented Oct 24, 2022

rafiss commented Oct 18, 2022 •

edited by cockroach-jira-scripts

Loading

nicktrav commented Oct 18, 2022 •

edited

Loading

nicktrav commented Oct 19, 2022 •

edited

Loading

nicktrav commented Oct 19, 2022 •

edited

Loading