Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

physical: add etcd3 backend #2168

Merged
merged 2 commits into from
Jan 3, 2017
Merged

physical: add etcd3 backend #2168

merged 2 commits into from
Jan 3, 2017

Conversation

xiang90
Copy link
Contributor

@xiang90 xiang90 commented Dec 9, 2016

Is there any e2e test I can test against etcd3 backend? Or the unit test is good enough?

I did some simple tests, it seems to work.

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

@xiang90 I have a couple of high level concerns here:

  1. There is already an etcd backend. It would be far nicer/better if a single backend handled whatever version of etcd it's connected to than have two etcd backends. This especially makes it simpler in the migration case of someone upgrading their etcd.

  2. HA in the existing backend is terribly broken so I'm rather hesitant to merge something new with new, unknown behavior, rather than fixing the old.

@jefferai jefferai added this to the future milestone Dec 9, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2016

There is already an etcd backend. It would be far nicer/better if a single backend handled whatever version of etcd it's connected to than have two etcd backends.

I can do this by letting client negotiating the versions.

HA in the existing backend is terribly broken so I'm rather hesitant to merge something new with new, unknown behavior, rather than fixing the old.

This is really unrelated. If we want to fix the old code, we should fix the old code. I, personally, am not familiar with the old code. You probably should ping the original author to fix it. If no one is going to maintain that code, we should depreciate it since it is broken.

It has nothing to do with this new code I think.

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

This is really unrelated. If we want to fix the old code, we should fix the old code. I, personally, am not familiar with the old code. You probably should ping the original author to fix it. If no one is going to maintain that code, we should depreciate it since it is broken.
It has nothing to do with this new code I think.

From a client perspective, if the two backends are merged, it is rather alarming if HA works sometimes and not other times depending on which version.

You hit the nail on the head though: there is already little enough interest in the etcd backend that nobody has fixed very serious problems in it. That makes me very hesitant to merge yet more etcd code, because it can easily introduce new problems and it feels like it will be immediately unmaintained.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2016

You hit the nail on the head though: there is already little enough interest in the etcd backend that nobody has fixed very serious problems in it

Is it possible that there is little enough interest only because the etcd2 code is broken and unmaintained at the first place? We have heard quite a lot of people telling us they cannot use etcd as vault backend because it is broken. As a vault maintainer, shouldnt you maintain it? Did you click the merge button at some point? #1909 Again, if etcd code is broken and you do not want to maintain it and no one want to maintain, you should deprecate it.

Also are you suggesting that I am not going to maintain this 200 lines of code? That is wrong an assumption. I will maintain it.

@philips
Copy link

philips commented Dec 9, 2016

Given the low interest in the current etcd integration please consider just deprecating it and move to the new etcd integration presented here. As Xiang points out the integration in this PR is about half of the original integration because it relies on tested code from the etcd client library and the etcd v3 API improvements.

Seems like the right thing: remove the known non-working code, replace it with maintained library code.

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

Is it possible that there is little enough interest only because the etcd2 code is broken and unmaintained at the first place?

Absolutely, but nobody has stepped up to break the cycle.

We have heard quite a lot of people telling us they cannot use etcd as vault backend because it is broken.

We've heard a lot of complaints about it as well.

As a vault maintainer, shouldnt you maintain it? Did you click the merge button at some point?

No; it (and most others) are explicitly community-maintained. That said, we definitely get concerned when the community-supplied code doesn't work, which is why we turned off HA support by default.

Also are you suggesting that I am not going to maintain this 200 lines of code? That is wrong an assumption. I will maintain it.

Not knowing you or your interest level in Vault, it is reasonable to wonder whether the code would be maintained. Most contributions are one-and-done itch-scratching types.

Given the low interest in the current etcd integration

I believe that there is decently high interest in the current etcd integration; what I don't know about is HA support in the current integration. I have no idea if users are simply rolling the dice and using it, or using etcd without HA.

As Xiang points out the integration in this PR is about half of the original integration

I didn't see any such assertion made, but I don't really see lines of code as being a particularly relevant metric.

Seems like the right thing: remove the known non-working code, replace it with maintained library code.

The right thing to do is not to remove code that our users are relying on. We will not be removing the current etcd backend. This is why I'd prefer that this code and these improvements be integrated into the current backend, so that users of the etcd backend automatically get the enhancements/fixes if the server/client support this API version.

@jefferai jefferai added accepted and removed accepted labels Dec 9, 2016
@philips
Copy link

philips commented Dec 9, 2016

@jefferai I think the trouble here is that we all admit that the current integration is broken and leads to a bad experience. Part of that reason is it relies on older etcd APIs and client libraries and also has free coded things like locks.

We can try and integrate the two into a single backend but I see three issues:

  1. We already have no concept of what works and doesn't work with the current backend nor any tests to confirm it. Mostly the docs just say "known broken". Further, doing things like HA is a configuration flag in Vault and if we try and integrate we will have to figure out, at runtime, what the users Vault state is and then panic I guess if they are using the old integration and APIs.
  2. Because the current backend is known broken and seems light on tests trying to automatically migrate or update users may break things.
  3. The current integration is not something the etcd team, which @xiang90 is a member of, is comfortable supporting because of the above issues and we would prefer to just start from libraries that we run tests against.

Removing code and breaking users is bad. But, as you admit this code isn't recommended at all and no one is supporting it. So, we want to start from a stable foundation the etcd is comfortable supporting. I see two paths forward:

  1. Remove the current etcd physical because it is known broken and recommend this new integration.
  2. Document the current code as not recommended and recommend this new integration.

Aside: The reason LOCs matter is because the current integration has things like a freely coded Semaphore while the new code uses etcd libraries to handle it.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2016

I believe that there is decently high interest in the current etcd integration; what I don't know about is HA support in the current integration. I have no idea if users are simply rolling the dice and using it, or using etcd without HA.

OK. Then let's remove the HA part of the etcd2 integration. If users choose etcd and have both client/server on etcd3, they will enjoy all the new features. I will maintain the code.

Does this solve all your concerns?

I do not think you have answered my original question:

Is there any e2e test I can test against etcd3 backend? Or the unit test is good enough?

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

We already have no concept of what works and doesn't work with the current backend nor any tests to confirm it.

In case someone is interested in digging in, we know that the backend works in a general context and HA doesn't, and there is a decent theory as to why it doesn't work -- see #1184 (comment)

Doing things like HA is a configuration flag in Vault and if we try and integrate we will have to figure out, at runtime, what the users Vault state is and then panic I guess if they are using the old integration and APIs.

I don't quite understand this -- if you were detecting the version at runtime couldn't you implement against an internal interface and simply switch between them depending on which client API version is supported? Considering that the New method simply returns an object satisfying an interface, it seems like it'd be pretty trivial to return e.g. etcd2Backend vs etcd3Backend that each implement the interface. I'm not sure if configuration for each would need to be significantly different. But I don't get what you're saying above about panics.

Because the current backend is known broken and seems light on tests trying to automatically migrate or update users may break things.

The backend doesn't store configuration information; so long as the data in is equivalent to the data that went out, it shouldn't matter which API version it was written from or is consumed by.

I understand that you don't want to maintain the etcd2 code, but I really don't see what the expected issues are combining them. Configuration is read and creates one of two objects, each of which has totally independent function implementations. What am I missing here?

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

Just to be clear, what I've been thinking is basically:

  1. NewEtcdBackend is called. Configuration could either require a particular API version or it could be auto-discovered.
  2. NewEtcdBackend function connects and queries the API to figure out the version.
  3. If version does not match the hard-coded version in config, fail. Otherwise, if version 3 is supported, use that.
  4. Depending on version, instantiate etc2Backend or etcd3Backend, which implement physical.Backend. These could live in separate files for easy maintenance.

So long as the interfaces are the same, this would essentially allow seamless upgrades for clients; just point at their new etcd3 API instead of etcd2. But fallback is still possible, and those still on etcd2 see no difference in functionality.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2016

@jefferai Sounds good to me.

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

@xiang90 If you want to do some E2E testing, have a look at the TestCluster function in vault/testing.go; it gives you a three-node Vault cluster with full HA and real listening sockets. Examples of using it can be found in http/forwarding_test.go.

The downside right now is that it hard-codes using the inmem HA backend; but you could easily change it so that if Physical/HAPhysical are passed in it uses those instead.

@jefferai jefferai modified the milestones: 0.6.4, future Dec 9, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2016

@jefferai Will try. Thanks for your help.

@jefferai
Copy link
Member

jefferai commented Dec 9, 2016

No problem. Let me know if you run into any blockers and I'll be happy to help out.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 14, 2016

@jefferai

Hi, can you take a look at etcd.go? I changed the code logic to what you described.

Auto-discovery and version checking is not implemented yet. But these wont affect any existing users, and there is no external facing change. (no etcd3 backend added).

If you are happy with this approach, we can either merge this now. I will submit follow up prs to add version checking first, then auto discovery.

If you are a fan of larger PR, I can try to add version checking in this PR.

Personally, I prefer to do them in separate prs to keep change list small.

If you are interested in the context: one of the reason that we like to push forward this fast is because people might want to run vault on k8s. We have kind of managed etcd3 service on k8s with etcd operator. So people can setup vault with a simple yaml without even taking care about the backend setup/operation/maintenance. Or we have to suggest people to use consul-etcd (which is a proxy in front of etcd3), which is not stable yet.

@vishalnayak vishalnayak self-assigned this Dec 14, 2016
@jefferai jefferai modified the milestones: 0.6.4, 0.6.5 Dec 16, 2016
@xiang90
Copy link
Contributor Author

xiang90 commented Dec 19, 2016

@jefferai @vishalnayak This pr touches vendor, and keep on getting conflicts with other prs. I will redo the vendoring part after you review it.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 28, 2016

@jefferai @vishalnayak Can you take a look? Thanks!

Copy link
Member

@jefferai jefferai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good!

// EtcdBackend is a physical backend that stores data at specific
// prefix within Etcd. It is used for most production situations as
// it allows Vault to run on multiple machines in a highly-available manner.
type EtcdBackend struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the "official" etcd backend going forward will be the etcd3 one that you are maintaining, I think this should be called Etcd2Backend and the current Etcd3Backend should remain as-is or be the EtcdBackend. This just makes it clearer to viewers of the code that either they're side-by-side or that the 3 one is the "official" one. Minor optics, but those are nice! :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change.

sync = "yes"
}
switch sync {
case "yes", "true", "y", "1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ability to use yes something etcd-specific? Otherwise I'd prefer strconv.ParseBool behavior to keep things consistent with most of the rest of the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is not etcd-specific. this is from the previous code. do we want to keep the compatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, shucks. That's annoying. I guess my preference would be to have compatibility with the rest of the bools in the config file. This seems like a very minor but worthwhile break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change.

}
return &Entry{
Key: key,
Value: resp.Kvs[0].Value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions under which etcd allows multiple values at the same path? My guess is that there can be a "file" stored at the same location as a "directory". If this is the case, is there a guarantee that the "file" will always be first in the list? And, can we return nil, nil if the only value that is returned is a "directory"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. etcd supports range query. when you get a key, it either does not exist or return one key. we check the non-exist case at line 148.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- I took a look through the API and see what it's doing. It always returns a list even if the range option isn't used. Can you add in a check that returns an error just to be safe/clean (e.g. against future accidental changes to the code)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

return keys, nil
}

func (e *Etcd3Backend) HAEnabled() bool { return e.haEnabled }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nitpick: we don't do in-line functions...can you put this on the next line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


func (e *Etcd3Backend) HAEnabled() bool { return e.haEnabled }

// EtcdLock emplements a lock using and etcd backend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/and/an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix.

return nil, err
}

p := path.Join(c.path, "/__lock__/", key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key should be the value given in the function plus the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a requirement? the problem is that our lock implementation does not support lock on a real key. it actually locks on a virtual key.

if you specify a lock with key foo, it will create a queue under prefix foo like:

foo/1
foo/2

So if you have a client read the value of key foo, it wont return the value of foo/1. I am not sure if this is OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From within Vault, we do need to be able to look up the current lock UUID value via creating a lock object with LockWith and calling the lock's Value method to get the current lock value. I think that the actual location of the lock is somewhat unimportant. But, how does adding the __lock__ path actually fixes this problem (if indeed you think it's a problem)...doesn't that just mean you have the same problem at the new path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jefferai

LockWith and LockValue work perfectly now.

The problem here is key conflicts. If we do not add any special prefix, this can happen:

Say you create a lock with lock (A). If one tries to get key (A). they wont see it or see the wrong value.

The etcd2 implementation also has the lock prefix defined as EtcdNodeLockPrefix.

If you think we should not have a special lock prefix, I can remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vault will never write a K/V value to core/lock -- it is always meant for the lock. Whether the lock is virtual or physical is somewhat of an implementation-specific concern -- so long as when getting the lock at (A) they see the correct value, everything should be fine.

If that matches the behavior the code should produce, then let's drop the lock prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

return false, "", nil
}

return true, string(resp.Kvs[0].Value), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here about the length of the Kvs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

answered above.

@jefferai
Copy link
Member

Hi @xiang90 ,

Sorry for the delay -- we had to roll a quick 0.6.4 and we're currently on holidays (https://groups.google.com/d/msg/vault-tool/Yq4Tcx9WXHo/NtOU7hJAEgAJ) but I didn't want to leave you hanging so I just did a review. Looking pretty good overall.

I'm fine with merging in the PR when it's done and having the other items added in subsequent PRs; we'll just not enable the etcd3 backend until those items are finalized.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 29, 2016

@jefferai All fixed. Note that I changed the lock test a little bit since it tries to lock on a conflicting key created by previous kv operation.

Copy link
Member

@jefferai jefferai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few very minor things!

@@ -280,6 +286,9 @@ func (c *Etcd3Lock) Value() (bool, string, error) {
if len(resp.Kvs) == 0 {
return false, "", nil
}
if len(resp.Kvs) > 1 {
panic("unexpected number of keys from a get request")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -148,13 +149,16 @@ func (c *Etcd3Backend) Get(key string) (*Entry, error) {
if len(resp.Kvs) == 0 {
return nil, nil
}
if len(resp.Kvs) > 1 {
panic("unexpected number of keys from a get request")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't panic. An error here will be sufficient to log an issue and fail the client request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@@ -283,7 +283,7 @@ func testBackend_ListPrefix(t *testing.T, b Backend) {

func testHABackend(t *testing.T, b HABackend, b2 HABackend) {
// Get the lock
lock, err := b.LockWith("foo", "bar")
lock, err := b.LockWith("/core/lock/foo", "bar")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change? You said something about the backend not being cleaned up from the previous test, but I didn't quite follow the issue.

By far the nicest thing would be to actually do what you can see the Consul code doing (and some other backend code not in physical) and use dockertest. This will allow all of the etcd code to be tested on every CI run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the backend. the previous test (testBackend_ListPrefix) left some keys. i can change testBackend_ListPrefix to make it cleanup stuff.

I thought we'd better simulate the true behavior. You mentioned that vault will do locking under namespace /core/lock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't do locking under that namespace; core/lock is the actual lock path :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. i will make the change to let testBackend_ListPrefix clean up its garbage.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 30, 2016

@jefferai All fixed. Let me know if it looks good. I will resolve merge conflicts and squash commits.

@jefferai
Copy link
Member

LGTM! Resolve the conflicts and we'll 🚢

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@xiang90
Copy link
Contributor Author

xiang90 commented Jan 3, 2017

@jefferai all fixed. PTAL. I will do some manual e2e testing and run the e2e tests in vault in the next few days.

I already watch vault repo, but my github is very noisy. So please cc me if any user reports etcd3 related issue.

@jefferai jefferai merged commit e086a81 into hashicorp:master Jan 3, 2017
@jefferai
Copy link
Member

jefferai commented Jan 3, 2017

Merged!

Will watch for the next changes to come in...

@xiang90 xiang90 deleted the etcd3 branch January 3, 2017 19:46
@xiang90
Copy link
Contributor Author

xiang90 commented Jan 3, 2017

@jefferai

I tried to modify testing.go to use etcd3 backend. It is not easy due to the re-initialization issue. But I managed to make most of the test passes.

A test fails with

--- FAIL: TestCluster_ForwardRequests (2.09s)
	testing.go:730: Vault is not configured for highly-available mode

But I think I pass in the HA etcd. Do we disable etcd HA somewhere in the codebase explicitly?

@jefferai
Copy link
Member

jefferai commented Jan 3, 2017

No, it just has to be explicitly enabled. Are you modifying that simply to test out the current tests using etcd for HA?

@xiang90
Copy link
Contributor Author

xiang90 commented Jan 3, 2017

Are you modifying that simply to test out the current tests using etcd for HA?

Yes. The current e2e tests are using in-mem backend. I want to swap it to etcd3 to make sure everything works. I changed the coreConfig in a few places. Probably I missed out something.

@jefferai
Copy link
Member

jefferai commented Jan 3, 2017

Probably just need to ensure ha_enabled is set in the etcd3 config.

@xiang90
Copy link
Contributor Author

xiang90 commented Jan 3, 2017

@jefferai Ah. Right. I missed that one. I will give it another try tomorrow. How much confidence can we have if etcd3 can pass all e2e tests? Should I do more testing after that?

@jefferai
Copy link
Member

jefferai commented Jan 3, 2017

Pretty confident. When it comes down to it, the issues the etcd2 backend have don't appear in tests because they're combinations of conditions on the servers/clients/whatever. But for base functionality it should be fine!

chrishoffman added a commit to chrishoffman/vault that referenced this pull request Jan 8, 2017
* upstream/master: (104 commits)
  Rename gRPC request forwarding method
  Split Unseal into Unseal and unsealInternal
  Port some updates
  Update Makefile protoc commands
  Clarify text around redirect addr being required
  etcd3: remove wrong keys checking for prefix request (hashicorp#2231)
  When a JWT wrapping token is returned, audit the inner token both for request and response. This makes it far easier to properly check validity elsewhere in Vault because we simply replace the request client token with the inner value.
  Fixed return types
  Add read locks to LookupToken/ValidateWrappingToken (hashicorp#2232)
  Bump deps
  JWT wrapping tokens (hashicorp#2172)
  Fix generate-root help and progress output.
  prevent startup error when user has multiple private IPs configured locally
  physical: add etcd3 backend (hashicorp#2168)
  Removed unused methods
  Add link to vault-client vc written in go (hashicorp#2225)
  changelog++
  Page results from S3. (hashicorp#2224)
  Update go-syslog package (hashicorp#2219)
  Adds a link to the latest releases CHANGELOG on the downloads.html page (hashicorp#2205)
  ...
@jefferai
Copy link
Member

Hi @xiang90 , still waiting on a PR to do the auto-discovery of API version. We are thinking of doing 0.6.5 soon so it'd be nice to get that in so we don't have to disable the etcd3 backend.

@xiang90
Copy link
Contributor Author

xiang90 commented Jan 23, 2017

@jefferai

Shall we release without auto discovery or not enabling it by default even if we have it? Initially user can specify the etcd version. When we feel more confident about etcd3, we can add auto discovery or turn it on by default.

I kind of want user to be aware of that they are using etcd3 backend.

What do you think?

@jefferai
Copy link
Member

My view on this is: you've done a ton of testing with the etcd3 backend (probably more than ever got tested with etcd2), and the HA behavior of the etcd3 backend could hardly be worse than that of the etcd2 backend which is known horrifically broken. So I'd be happy enabling that whenever possible but keeping the ability for the user to specifically select the etcd2 backend.

I guess whether auto discovery is needed comes down to the install base. If most etcd installs out there support v3, it might be okay to simply put a warning in the changelog and release notes instructing people how to maintain v2 if they need it.

@xiang90
Copy link
Contributor Author

xiang90 commented Jan 23, 2017

My view on this is: you've done a ton of testing with the etcd3 backend (probably more than ever got tested with etcd2), and the HA behavior of the etcd3 backend could hardly be worse than that of the etcd2 backend which is known horrifically broken. So I'd be happy enabling that whenever possible but keeping the ability for the user to specifically select the etcd2 backend.

Makes sense.

I guess whether auto discovery is needed comes down to the install base. If most etcd installs out there support v3, it might be okay to simply put a warning in the changelog and release notes instructing people how to maintain v2 if they need it.

I can do a simple auth discovery:

  1. if user explicitly specifies api version, use that api version
  2. if v3 does not exist use v2
  3. if v3 exists and v2 store for vault is empty -> use v3
  4. if v3 exists and v2 store for vault is not empty -> still use v2

What shall we do if etcd server is not up when user starts vault (basically auto discovery fails)? Default to v2 or v3?

@jefferai
Copy link
Member

Generally if the physical backend can't successfully be started Vault will refuse to start. So I'd say default to v3 (unless overridden in the config file), but it maybe should be an error (depends on how you coded it and how resilient it is to reconnecting like that).

@xiang90
Copy link
Contributor Author

xiang90 commented Jan 23, 2017

@jefferai Got it. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants