Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Badger state DB #4739

Conversation

jozef-slezak
Copy link

Hello, here is small pull request that implements Badger based state DB. It is related to the issue #4681.
Hopefully, you will like this idea. If not, just reuse unit tests for your Bolt state DB.

Reasons for having Badger based state DB:

  1. Badger is maintained (Bolt seems to be archived github.com/hashicorp/nomad/helper/boltdd) and Nomad is not even using https://github.com/etcd-io/bbolt)
  2. Badger is crash resilient (WAL) see https://blog.dgraph.io/post/alice/
  3. Badger is pretty fast https://blog.dgraph.io/post/badger-lmdb-boltdb/

We are still facing #1367 (and able to reproduce), therefore I am experimenting on how to tweak Nomad to avoid the corrupted state.

@tantra35
Copy link
Contributor

tantra35 commented Oct 1, 2018

@jozef-slezak good job. Please clarify the following question is Badger can shrink db file unlike boltdb?

@jozef-slezak
Copy link
Author

jozef-slezak commented Oct 1, 2018

@tantra35 please, have a look at DB.RunValueLogGC()
https://github.com/dgraph-io/badger#garbage-collection
This link https://blog.dgraph.io/post/badger/ can also help to understand the design and principles.

@onlyjob
Copy link
Contributor

onlyjob commented Oct 9, 2018

Badger is improperly vendored by HEAD -- @jozef-slezak, please only ever vendor releases by tag or corresponding commit id.
Badger is semantically versioned and we should respect that. Thanks.

@jozef-slezak
Copy link
Author

Thank you, I will improve the vendoring.

Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating this @jozef-slezak! Definitely passing this and https://github.com/dgraph-io/badger around internally for further discussion/investigation. Very exciting stuff.

Please accept my apologies for the messy state and rapid iteration of the r-clientv2 (and child) branches. It should be merged into master soon in preparation for a pre-HashiConf preview release.

We moved the v2 paths, but since most of your changes are in vendor/ and the new state database files it will hopefully be easy to port/rebase.

Let me reassure you the client state corruption bug - #1367 - will be fixed in the 0.9.0 release. We are confident the issue was in race conditions when serializing state and not a bug in the underlying storage engine (boltdb).

That being said as you pointed out there are known issues with the unmaintained original version of boltdb we currently use. We've wanted to move to either a newer upstream or a new backend, so you're PR is quite welcome!

(A raft backend for Badger would be an exciting experiment as well: https://github.com/hashicorp/raft-boltdb)

Configuring pluggable backends

As for making backends pluggable - #4681 - I would suggest for now checking a key in client.options. state_backend might be a good name for the key.

We can easily promote client.options keys to an official parameter under the client stanza once everyone is happy with the experiment. Here's an example of what reading one of those configuration parameters looks like in code:

// Advertise if this node supports Docker volumes
if d.config.ReadBoolDefault(dockerVolumesConfigOption, dockerVolumesConfigDefault) {
resp.AddAttribute("driver."+dockerVolumesConfigOption, "1")
}

Next steps

In the run-up to HashiConf/0.9.0 I doubt we'll have time to merge this even as an experimental feature.

However, I'll definitely be monitoring this PR's progress and if all goes well maybe we can get it in as an option post-0.9.0. Feel free to @schmichael with questions, and I'll do my best to respond (although October is going to be busy for us!).

opts := badger.DefaultOptions
opts.Dir = stateDir
opts.ValueDir = stateDir
opts.ValueLogLoadingMode = badgeropt.FileIO //no need for (default) mmap here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave the default as the state files shouldn't be too large, and letting the OS manage paging to/from disk seems optimal.

@schmichael
Copy link
Member

Quick update after some internal discussion: sounds like the Consul team had already investigated Badger as an alternative Raft backend, but didn't find it a compelling alternative (our raft implementation basically only needs a WAL).

Making Nomad client's state backend pluggable is unlikely to be a long-term goal. The state backend is rarely the bottleneck, and the state bugs we've encountered so far have never been the fault of boltdb.

Please feel free to experiment as Badger may prove better for state storage, but it is not a high priority for us to make any performance-related tweaks to state management.

@preetapan
Copy link
Contributor

Closing this, more details in the comment from @schmichael above

@preetapan preetapan closed this Oct 12, 2018
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants