Skip to content
This repository has been archived by the owner on Nov 9, 2020. It is now read-only.

Implement multi-node ESX support in tenancy/auth code #1032

Closed
msterin opened this issue Mar 14, 2017 · 7 comments
Closed

Implement multi-node ESX support in tenancy/auth code #1032

msterin opened this issue Mar 14, 2017 · 7 comments
Assignees
Labels

Comments

@msterin
Copy link
Contributor

msterin commented Mar 14, 2017

Implement per the following assumption/desired behavior

Assumption: Cross-ESX multi-tenancy works on shared datastores only
Approach: Centralized backend DB on shared storage, with symlinks from local (/etc/vmware)

Behavior changes:

Milestone 1

  • Admin needs to run "config init" on each ESX: vmdkops_admin config init --datastore=<DS_NAME>
  • DS_NAME must be a shared datastore to support cross-ESX multi tenancy. If a local datastore is provided, everything still works fine except that multi tenancy is limited on one single ESX only.
  • All ESX drivers will share the same auth configuration persisted in one single database.
  • If "config init" is not done, then there’s no authorization - any requests from any Docker Hosts will be executed with no restriction. To keep the backward compatibility, all the volumes should still be created under the dockvols/_DEFAULT folder.

Milestone 2

  • Admin can run “config init” on any ESX: vmdkops_admin config init --datastore=<DS_NAME>
  • All other ESXs that have access to the same shared datastore will discover the configuration automatically
  • Admin can run "config rm" to switch to a different shared datastore (instead of doing "config rm" followed by a "config init")
@msterin msterin added this to the 0.13 milestone Mar 14, 2017
@msterin msterin self-assigned this Mar 14, 2017
@msterin msterin changed the title Implement cross-ESX support in tenancy/auth code Implement multi-node ESX support in tenancy/auth code Mar 25, 2017
@tusharnt tusharnt modified the milestones: 0.14, 0.13 Mar 30, 2017
@shuklanirdesh82
Copy link
Contributor

Milestone 2
Admin can run "config rm" to switch to a different shared datastore (instead of doing "config rm" followed by a "config init")

@msterin/ @shaominchen
question: Shouldn't it be "Admin can run config mv" to switch to a different shared ds?

@msterin
Copy link
Contributor Author

msterin commented Apr 2, 2017

mv moves existing (will move when implmented). rm + init switches to use another DB.
We will probably have to rename rm to unlink for shared DB
@ashahi1

@tusharnt tusharnt added the P0 label Apr 4, 2017
@vxav
Copy link

vxav commented Jun 5, 2017

Question:
I started my 2 nodes lab as a 1 node with local db file.
I later moved auth-db to a shared datastore and created a symlink as the "move" parameter doesn't seem to be supported yet. (Equivalent to a ' config init --datastore "sharedDS"').

However the symlink doesn't seem to persist. so every time the host is restarted auth-db isn't found and _DEFAULT is used.

06/05/17 09:43:10 67487 [Thread-2] [INFO   ] Checking DB mode for /etc/vmware/vmdkops/auth-db...
06/05/17 09:43:10 67487 [Thread-2] [INFO   ] Config DB does not exist. mode NotConfigured
06/05/17 09:43:10 67487 [Thread-2] [INFO   ] Auth DB /etc/vmware/vmdkops/auth-db is missing, allowing all access

I tried the following to restart fresh:

  • config init --local
  • config rm --local --confirm
  • config init --datastore SHARED-Datastore

But the automatically created symlink still doesn't persist after reboot.

A workaround would be to add 'vmdkops_admin config init --datastore "shared-DS"' in the startup script but it doesn't seem like a long term solution.

Any idea on how to make it persistent?

@msterin
Copy link
Contributor Author

msterin commented Jun 5, 2017

@vxav - it's a bug ( #1347 ), we'll release the fix shortly.

Meanwhile, you can refer to KB2043564 and if you are on ESXi 5.1/5.5/6.x add the following to /etc/rc.local.d/local.sh before exit 0

if [ ! -e /etc/vmware/vmdkops/auth-db  ]
then
      ln -s   /vmfs/volumes/sharedDS/dockvols/vmdkops_config.db /etc/vmware/vmdkops/auth-db
fi

Please validate that the code works in your case before rebooting :-)

@vxav
Copy link

vxav commented Jun 6, 2017

It does work fine, thanks for looking into it! :)

Last question:
Is it by design that when no config db is found, _DEFAULT is used?
We can remove the _DEFAULT vmgroup in our config but if the db is inaccessible it doesn't matter, _DEFAULT is still used.
It feels a little bit dangerous and complicates troubleshooting, I don't know about everyone else but I'd rather have my containers creation fail because it can't create a volume than use a new blank vmdk and end up with 2 different sets of data.

For example:

  • Photon1 and Photon2 are in the same swarm.

  • ESX1 has access to the shared db and runs Photon1 <- Photon1 is in vmgroup "MyVmgroup"

  • ESX2 hasn't for some reason (in my case it was the symlink persistence but it could be something else) and runs Photon2 <- so Photon2 ends up in vmgroup "_DEFAULT" (still in the same swarm!)

  • 1 container running on the swarm on ESX1 with a volume attached using a custom file Y.txt (so the vmdk is in the folder "MyVmgroup").

  • Drain stop Photon1 (simulate ESX1 failure) -> container1 restarts on Photon2 (ESX2).

  • Because ESX2 can't access the db file, Photon2 is in vmgroup _DEFAULT, so container1 is mapped to a vmdk in the folder "_DEFAULT" -> Y.txt isn't here anymore (different vmdk).

My concern with this is that:

  • At first glance it looks like it works: container restarts on another host with a volume, but not the actual volume.
  • Could lead to data inconsistency if the restarted container runs long enough on the problematic host (important data in both vmdks).
  • The Photon host ends up with full access on all datastores despite what was configured in the (inaccessible) db file.

Cheers,

@msterin
Copy link
Contributor Author

msterin commented Jun 6, 2017

Yes, it is by design - when there is no DB we fail back on "default everything. no security".

  • It was done to be zero config for simple cases, and to maintain backward compatibility.
  • We also wanted the process of upgrading from "no config / no vmgroups / no quotes" to config to be seamless - you init configuration and everything already created is still available in the same places.

I see your point , a config issue (e.g the one described above) can lead to unpleasant consequences.
But that was a bug. I'll fix it shortly :-).

So when the config is inited (or removed) we'll auto-update local.sh.
If during ESX (re)boot the shared DS is down, the symlink will be pointing to non-existing place and vDVS will go into "brokenLink" mode, will all operations denied (it already works this way).

Does it address the concerns ? We can have a quick teleconf to chat if you want to, this is an experimental feature and we are interested in learning about the way people would prefer to use it ...

@vxav
Copy link

vxav commented Jun 7, 2017

I see, it makes sense, I actually hadn't tried to break the link.

I tried just now and as you said it works as expected: vmdkops goes into broken link and the docker container munching a vsphere volume stays in desired state "Ready".

Nice one, cheers!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants