Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocksdb: use max_manifest_file_size option #25341

Merged
merged 1 commit into from
May 11, 2018

Conversation

garvitjuniwal
Copy link
Contributor

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None
@garvitjuniwal garvitjuniwal requested a review from a team May 6, 2018 23:41
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@garvitjuniwal
Copy link
Contributor Author

Suggestions on how to validate this change are welcome. I can manually verify that the setting works as advertised by using a smaller bound and reducing the sstable size. What other validations/regression tests should be done?

@bdarnell
Copy link
Contributor

bdarnell commented May 7, 2018

:lgtm:

Wow, this seems like a really bad default on the rocksdb side. Why would this default to a single manifest file of unbounded size?


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:

As for validation, your idea of making this configurable (at least within a test) and verifying that the manifest does not exceed this size seems reasonable.


Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@garvitjuniwal
Copy link
Contributor Author

@petermattis Using a small memtable and sstable size (4 K), and a small value for the max manifest file size (4 K), I have verified that the setting works as advertised. I could see the MANIFEST file rolling after it reached 4 K. I also did a manifest_dump of the rolled manifest file, and verified that it starts with the previous snapshot (represented as a "AddedFiles" edit). A 4 K manifest contained about 40 entries. This means that a 128 MB manifest will allow for about 1.2 million file operations before rolling. This could be several TB of data churn with cockroach's default sstable size.

I'm good with merging this.

@garvitjuniwal
Copy link
Contributor Author

Shall I post a cherry-pick on 2.0 as well?

@petermattis
Copy link
Collaborator

bors r+

craig bot pushed a commit that referenced this pull request May 11, 2018
25341: rocksdb: use max_manifest_file_size option r=petermattis a=garvitjuniwal

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None

Co-authored-by: Garvit Juniwal <[email protected]>
@petermattis
Copy link
Collaborator

@garvitjuniwal Yes, let's cherry-pick this into 2.0. FYI, https://github.com/benesch/backport eases the backport process. You'd end up running something like backport 25341 -r 2.0 which will create a branch with the cherry-pick and open a PR against the 2.0 release branch.

@craig
Copy link
Contributor

craig bot commented May 11, 2018

Build succeeded

@craig craig bot merged commit bdf2fd0 into cockroachdb:master May 11, 2018
craig bot pushed a commit that referenced this pull request May 14, 2018
25471: backport-2.0: rocksdb: use max_manifest_file_size option r=bdarnell a=tschottdorf

Backport 1/1 commits from #25341.

/cc @cockroachdb/release

---

Cockroach uses a single long running rocksdb instance for the entire
process lifetime, which could be many months. By default, rocksdb tracks
filesystem state changes in a log file called the MANIFEST, which grows
without bound until the instance is re-opened. We should bound the
maximum file size of rocksdb MANIFEST using the corresponding rocksdb
option to prevent unbounded growth.

The MANIFEST file grew to several GBs in size in a customer bug report
but that was probably because of some other bad behavior in rocksdb
state management. We do want to bound the MANIFEST size in such cases as
well.

Release note: None


25474: backport-2.0: storage: fix deadlock in consistency queue r=bdarnell a=tschottdorf

Backport 1/1 commits from #25456.

/cc @cockroachdb/release

---

When `CheckConsistency` returns an error, the queue checks whether the
store is draining to decide whether the error is worth logging.

Unfortunately this check was incorrect and would block until the store
actually started draining.

A toy example of this problem is below (this will deadlock). The dual
return form of chan receive isn't non-blocking -- the second parameter
indicates whether the received value corresponds to a closing of the
channel.

Switch to a `select` instead.

```go
package main

import (
	"fmt"
)

func main() {
	ch := make(chan struct{})
	_, ok := <-ch
	fmt.Println(ok)
}
```

Touches #21824.

Release note (bug fix): Prevent the consistency checker from
deadlocking. This would previously manifest itself as a steady number of
replicas queued for consistency checking on one or more nodes and would
resolve by restarting the affected nodes.


Co-authored-by: Garvit Juniwal <[email protected]>
Co-authored-by: Tobias Schottdorf <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants