rocksdb: use max_manifest_file_size option #25341

garvitjuniwal · 2018-05-06T23:41:27Z

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None

Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None

cockroach-teamcity · 2018-05-06T23:41:31Z

This change is

garvitjuniwal · 2018-05-06T23:43:19Z

Suggestions on how to validate this change are welcome. I can manually verify that the setting works as advertised by using a smaller bound and reducing the sstable size. What other validations/regression tests should be done?

bdarnell · 2018-05-07T00:52:21Z

Wow, this seems like a really bad default on the rocksdb side. Why would this default to a single manifest file of unbounded size?

Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

petermattis · 2018-05-07T16:03:26Z

As for validation, your idea of making this configurable (at least within a test) and verifying that the manifest does not exceed this size seems reasonable.

Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.

Comments from Reviewable

garvitjuniwal · 2018-05-10T22:59:13Z

@petermattis Using a small memtable and sstable size (4 K), and a small value for the max manifest file size (4 K), I have verified that the setting works as advertised. I could see the MANIFEST file rolling after it reached 4 K. I also did a manifest_dump of the rolled manifest file, and verified that it starts with the previous snapshot (represented as a "AddedFiles" edit). A 4 K manifest contained about 40 entries. This means that a 128 MB manifest will allow for about 1.2 million file operations before rolling. This could be several TB of data churn with cockroach's default sstable size.

I'm good with merging this.

garvitjuniwal · 2018-05-11T01:46:07Z

Shall I post a cherry-pick on 2.0 as well?

petermattis · 2018-05-11T13:07:07Z

bors r+

25341: rocksdb: use max_manifest_file_size option r=petermattis a=garvitjuniwal Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None Co-authored-by: Garvit Juniwal <[email protected]>

petermattis · 2018-05-11T13:08:22Z

@garvitjuniwal Yes, let's cherry-pick this into 2.0. FYI, https://github.com/benesch/backport eases the backport process. You'd end up running something like backport 25341 -r 2.0 which will create a branch with the cherry-pick and open a PR against the 2.0 release branch.

craig · 2018-05-11T13:30:45Z

Build succeeded

GitHub CI (Cockroach)

25471: backport-2.0: rocksdb: use max_manifest_file_size option r=bdarnell a=tschottdorf Backport 1/1 commits from #25341. /cc @cockroachdb/release --- Cockroach uses a single long running rocksdb instance for the entire process lifetime, which could be many months. By default, rocksdb tracks filesystem state changes in a log file called the MANIFEST, which grows without bound until the instance is re-opened. We should bound the maximum file size of rocksdb MANIFEST using the corresponding rocksdb option to prevent unbounded growth. The MANIFEST file grew to several GBs in size in a customer bug report but that was probably because of some other bad behavior in rocksdb state management. We do want to bound the MANIFEST size in such cases as well. Release note: None 25474: backport-2.0: storage: fix deadlock in consistency queue r=bdarnell a=tschottdorf Backport 1/1 commits from #25456. /cc @cockroachdb/release --- When `CheckConsistency` returns an error, the queue checks whether the store is draining to decide whether the error is worth logging. Unfortunately this check was incorrect and would block until the store actually started draining. A toy example of this problem is below (this will deadlock). The dual return form of chan receive isn't non-blocking -- the second parameter indicates whether the received value corresponds to a closing of the channel. Switch to a `select` instead. ```go package main import ( "fmt" ) func main() { ch := make(chan struct{}) _, ok := <-ch fmt.Println(ok) } ``` Touches #21824. Release note (bug fix): Prevent the consistency checker from deadlocking. This would previously manifest itself as a steady number of replicas queued for consistency checking on one or more nodes and would resolve by restarting the affected nodes. Co-authored-by: Garvit Juniwal <[email protected]> Co-authored-by: Tobias Schottdorf <[email protected]>

garvitjuniwal requested a review from a team May 6, 2018 23:41

craig bot merged commit bdf2fd0 into cockroachdb:master May 11, 2018

tbg mentioned this pull request May 14, 2018

backport-2.0: rocksdb: use max_manifest_file_size option #25471

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocksdb: use max_manifest_file_size option #25341

rocksdb: use max_manifest_file_size option #25341

garvitjuniwal commented May 6, 2018

cockroach-teamcity commented May 6, 2018

garvitjuniwal commented May 6, 2018

bdarnell commented May 7, 2018

petermattis commented May 7, 2018

garvitjuniwal commented May 10, 2018

garvitjuniwal commented May 11, 2018

petermattis commented May 11, 2018

petermattis commented May 11, 2018

craig bot commented May 11, 2018

rocksdb: use max_manifest_file_size option #25341

rocksdb: use max_manifest_file_size option #25341

Conversation

garvitjuniwal commented May 6, 2018

cockroach-teamcity commented May 6, 2018

garvitjuniwal commented May 6, 2018

bdarnell commented May 7, 2018

petermattis commented May 7, 2018

garvitjuniwal commented May 10, 2018

garvitjuniwal commented May 11, 2018

petermattis commented May 11, 2018

petermattis commented May 11, 2018

craig bot commented May 11, 2018

Build succeeded