Allow vic-machine configure to set appropriate roles for ops user #7777

jzt · 2018-04-18T19:17:00Z

This change fixes a bug in vic-machine configure that was preventing a VCH installed without an ops user enabled to be reconfigured to do so. With this change, you can now run vic-machine configure against an existing VCH to configure the ops user credentials and permissions.

Fixes #7725, #7796

codecov-io · 2018-04-18T19:37:35Z

Codecov Report

Merging #7777 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #7777      +/-   ##
==========================================
+ Coverage   25.85%   25.86%   +0.01%     
==========================================
  Files          35       35              
  Lines        5124     5122       -2     
==========================================
  Hits         1325     1325              
+ Misses       3692     3690       -2     
  Partials      107      107

Impacted Files	Coverage Δ
cmd/vic-machine/common/ops_credentials.go	`63.79% <100%> (ø)`	⬆️
cmd/vicadmin/vicadm.go	`2.82% <0%> (ø)`	⬆️
cmd/vic-machine/create/create.go	`42.51% <0%> (+0.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6355f0...8325d30. Read the comment docs.

anchal-agrawal · 2018-04-18T19:49:03Z

cmd/vic-machine/configure/configure.go

@@ -327,7 +328,7 @@ func (c *Configure) Run(clic *cli.Context) (err error) {

 	validator, err := validate.NewValidator(op, c.Data)
 	if err != nil {
-		op.Errorf("Configuring cannot continue - failed to create validator: %s", err)
+		op.Errorf("Configure cannot continue - failed to create validator: %s", err)


anchal-agrawal · 2018-04-18T19:53:18Z

lib/install/management/configure.go

-	err = d.update(conf, settings, isConfigureOp)
-
-	// If successful try to grant permissions to the ops-user
+	// try to grant permissions to the ops-user


Minor: Grant permissions to the ops-user before initializing the appliance

anchal-agrawal · 2018-04-18T19:54:16Z

lib/migration/feature/feature.go

@@ -28,6 +28,8 @@ const (
 	// create time is stored in nanoseconds (previously seconds) in the portlayer.
 	ContainerCreateTimestampVersion

+	VMFolderSupportVersion


Minor: s/VMFolderSupportVersion/VCHFolderSupportVersion perhaps?

A short comment to describe this field for posterity would be nice to have.

Yeah I wondered whether it was a catch-all (VM) or exclusive to the VCH. I'll clarify.

zjs · 2018-04-23T16:38:52Z

lib/install/management/configure.go

@@ -120,6 +126,8 @@ func (d *Dispatcher) Configure(vch *vm.VirtualMachine, conf *config.VirtualConta
 		}
 	}

+	err = d.update(conf, settings, isConfigureOp)


I think the reason that this happens before is that there's no way to rollback opsuser.GrantOpsUserPerms if this fails.

To summarize this issue:

Today, configure (including upgrade) is organized in the following way:

A new configuration for the VCH is constructed based on the old configuration
and requested changes.

The new configuration is validated.

The updated version of any ISO files being changed are uploaded.

Resource settings are updated, and a defer is registered to undo the change
if subsequent steps fail.

A snapshot is taken.

The VCH is updated:
a. If running, it is powered off.
b. If volume stores are being added, those are created.
c. The VCH itself is reconfigured.
d. The VCH is powered on.
e. We wait for the VCH to start and the service to be ready.

If the update is successful, we update the operations user.

If either the update or the operations user update is unsuccessful, we undo:
a. We rollback to the snapshot we took.
b. We delete uploaded images.
c. We delete the snapshot we took.
d. The defer undoes resource settings updates.

This process has a key characteristic: if a failure occurs, we cleanly return to
the initial state.

However, this process also has a key limitation: the process will fail at (6)(d)
or (6)(e) if the operations user requires additional privileges for the basic
operation of the VCH following the configuration change or upgrade.

A simple solution to this might be to reverse the order of steps 6 and 7: update
the operations user before updating the VCH. This will address the limitation,
but will weaken the guarantee around cleanly returning to the initial state when
a failure occurs; it's not clear how to undo the changes to the operations user.

The effects of this are twofold:

Following the rollback of a change where additional privileges were granted
to the operations user, those additional privileges are left as "cruft".

Following the rollback of a change where privileges were revoked from the
operations user, the VCH will not start.

Options to improve this might include:

Looking into ways to undo changes to the operations user, perhaps following a
similar pattern to the resource settings. This requires reading the old state
from the system before making changes, including both the role-privilege map
and resource-role associations.

Moving the operations logic in between (6)(c) and (6)(d) instead of between
(5) and (6). This reduces the cases where we will leave cruft behind.

I am not extremely familiar with this code. But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions. @zjs why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.

In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?

In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?

Returning the operations user to the original permissions set is the desired behavior.

But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions.

This is a possible approach, but actually extracting the "before" state is non-trivial. We need to track the changes being made to the privileges for each role as well as the roles being applied to each resource. Because of the way the vSphere APIs work, it's somewhat complicated to read all of that information. Because we use share the same set of roles for multiple operations users, safely rolling back in the presence of concurrent upgrade operations may not be straight-forward.

why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.

Currently, the operations user changes is the hardest thing to roll back (now tracked by #7814), so we do those as late in the process as possible.

Added #7814 (comment) to describe an alternate pattern, which is likely to be less work than tracking the old privileges/roles.

zjs

I think #7777 (review) needs to be resolved before merging this.

jzt · 2018-04-23T19:46:57Z

I've moved the logic that grants the ops user permissions into the update function itself, right before the poweron step. This should narrow (possibly close) the window for errors between granting the ops user permissions and powering on the VM, reducing the likelihood of leaving the crufty additional privileges lying around after a rollback. It does not fully eliminate this possibility, however, so a new issue has been created here that @zjs will introduce in his changes for 1.5.

zjs · 2018-04-23T20:37:02Z

cmd/vic-machine/configure/configure.go

@@ -204,6 +204,7 @@ func (c *Configure) copyChangedConf(o *config.VirtualContainerHostConfigSpec, n
 	if c.OpsCredentials.IsSet {
 		o.Username = n.Username
 		o.Token = n.Token
+		o.GrantPermsLevel = n.GrantPermsLevel


I'm not sure whether this is the behavior we want. (I'm also not sure that it isn't.)

By including o.GrantPermsLevel = n.GrantPermsLevel in this check, if an administrator uses vic-machine configure to change an operations user's password which previously had a GrantPermsLevel of AddPerms without explicitly including --ops-grant-perms in that command we'll actually downgrade GrantPermsLevel to "". I think that would be surprising behavior: change the password (e.g., due to expiration) and no longer have permissions be automatically managed.

It may be better to have a check here along the lines of clic.IsSet("ops-grant-perms") so that we only adjust GrantPermsLevel if --ops-grant-perms or --ops-grant-perms=false is included on the command line. (To do this, you'd just need to pass clic *cli.Context into the copyChangedConf method.) In that case, we only change the properties a user has asked us to change.

A downside of both the current implementation and this suggestion (and therefore an upside of the proposed change) is that if you change to a completely new operations user we won't clear the GrantPermsLevel. This could also be surprising: this is a case where the user might expect us to change something even though they haven't asked us to (because they may see --ops-grant-perms as being associated with the user, not the VCH). Unfortunately, the way configure is designed really doesn't give us a lot of options here (and leads to similar issues for other settings).

Good catch. There was no clean way to get the name of the option out of the cli flags, nor did there seem to be much reason to declare a constant for it all on its own, but if you find it irksome, I can pull it out as a constant anyway.

zjs

LGTM. Just minor comments on test code.

zjs · 2018-04-23T22:14:43Z

tests/manual-test-cases/Group5-Functional-Tests/5-25-OPS-User-Grant.robot

-    Log  Govc output: ${output}
-    Should Be Equal As Integers  ${rc}  1
-    Should Contain  ${output}  Permission to perform this operation was denied
+    Attempt To Create Resource Pool


I think this can be Attempt To Disable DRS, can't it?

Could be, but it was basically a merge of your additions and @anchal-agrawal's additions, so I left both Attempt To Create Resource Pool and Attempt To Disable DRS in there for variety. Still trying to get granted ops-user perms work after upgrade to pass, so I'll clean it up as I go forward.

Ah, disabling DRS requires less privilege than creating a resource pool. Usually disabling DRS makes a good sanity check that we're not granting the operations user privileges they don't need.

In the specific case that we're using --affinity-vm-group, we currently have to grant the operations user the privilege that lets them disable DRS (as that's the same privilege that lets them manage affinity constructs), so we check creating a resource pool instead.

zjs · 2018-04-23T22:14:54Z

tests/manual-test-cases/Group5-Functional-Tests/5-25-OPS-User-Grant.robot

+
+    Run Privileged Commands
+
+    Cleanup VIC Appliance On Test Server


Missing newline at end of file.

mdubya66

Approved for cherry pick into 1.4

…ware#7777)

)

…ware#7777)

jzt requested review from cgtexmex and anchal-agrawal April 18, 2018 19:17

vmwclabot added the cla-not-required label Apr 18, 2018

anchal-agrawal approved these changes Apr 18, 2018

View reviewed changes

jzt force-pushed the ops/7725 branch 4 times, most recently from 1964072 to 3e2b28e Compare April 19, 2018 19:53

cgtexmex approved these changes Apr 23, 2018

View reviewed changes

zjs reviewed Apr 23, 2018

View reviewed changes

zjs requested changes Apr 23, 2018

View reviewed changes

matthewavery approved these changes Apr 23, 2018

View reviewed changes

jzt force-pushed the ops/7725 branch from 3e2b28e to 5c1bc91 Compare April 23, 2018 19:24

jzt mentioned this pull request Apr 23, 2018

Ops user changes should be rolled back when attempt at VCH reconfigure fails #7814

Open

2 tasks

jzt force-pushed the ops/7725 branch 3 times, most recently from b39ca1e to 3420762 Compare April 23, 2018 20:42

zjs reviewed Apr 23, 2018

View reviewed changes

jzt force-pushed the ops/7725 branch from 3420762 to 160cc03 Compare April 23, 2018 22:01

zjs approved these changes Apr 23, 2018

View reviewed changes

This was referenced Apr 23, 2018

Support add and remove VM Group via configure #7797

Merged

Nightly 6.0 - 2018-04-18 - 5-25-OPS-User-Grant: Cannot init ops-user permissions #7796

Closed

Add VCHFolderSupport plugin version

a97b834

jzt force-pushed the ops/7725 branch from 160cc03 to 3faa0ad Compare April 24, 2018 21:19

Allow configure to grant permissions to ops user

8325d30

jzt force-pushed the ops/7725 branch from 3faa0ad to 8325d30 Compare April 24, 2018 21:38

mdubya66 approved these changes Apr 25, 2018

View reviewed changes

mdubya66 merged commit f243850 into vmware:master Apr 25, 2018

jzt added a commit to jzt/vic that referenced this pull request Apr 25, 2018

Allow vic-machine configure to set appropriate roles for ops user (vm…

36c1a7d

…ware#7777)

jzt mentioned this pull request Apr 25, 2018

Allow vic-machine configure to set appropriate roles for ops user #7834

Merged

jzt added a commit that referenced this pull request Apr 25, 2018

Allow vic-machine configure to set appropriate roles for ops user (#7777

6600412

)

jzt added a commit to jzt/vic that referenced this pull request Apr 25, 2018

Allow vic-machine configure to set appropriate roles for ops user (vm…

acca699

…ware#7777)

jzt deleted the ops/7725 branch April 26, 2018 15:51

stuclem mentioned this pull request Apr 27, 2018

Document vic-machine configure --ops-grant-perms vmware/vic-product#1656

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow vic-machine configure to set appropriate roles for ops user #7777

Allow vic-machine configure to set appropriate roles for ops user #7777

jzt commented Apr 18, 2018 •

edited

Loading

codecov-io commented Apr 18, 2018 •

edited

Loading

anchal-agrawal Apr 18, 2018

anchal-agrawal Apr 18, 2018

anchal-agrawal Apr 18, 2018

jzt Apr 18, 2018

zjs Apr 23, 2018

zjs Apr 23, 2018 •

edited

Loading

matthewavery Apr 23, 2018

zjs Apr 23, 2018 •

edited

Loading

zjs Apr 23, 2018

zjs left a comment

jzt commented Apr 23, 2018

zjs Apr 23, 2018

jzt Apr 23, 2018

zjs left a comment

zjs Apr 23, 2018

jzt Apr 24, 2018

zjs Apr 24, 2018

zjs Apr 23, 2018

mdubya66 left a comment

Allow vic-machine configure to set appropriate roles for ops user #7777

Allow vic-machine configure to set appropriate roles for ops user #7777

Conversation

jzt commented Apr 18, 2018 • edited Loading

codecov-io commented Apr 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjs Apr 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjs Apr 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjs left a comment

Choose a reason for hiding this comment

jzt commented Apr 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zjs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdubya66 left a comment

Choose a reason for hiding this comment

jzt commented Apr 18, 2018 •

edited

Loading

codecov-io commented Apr 18, 2018 •

edited

Loading

zjs Apr 23, 2018 •

edited

Loading

zjs Apr 23, 2018 •

edited

Loading