-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow vic-machine configure to set appropriate roles for ops user #7777
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7777 +/- ##
==========================================
+ Coverage 25.85% 25.86% +0.01%
==========================================
Files 35 35
Lines 5124 5122 -2
==========================================
Hits 1325 1325
+ Misses 3692 3690 -2
Partials 107 107
Continue to review full report at Codecov.
|
@@ -327,7 +328,7 @@ func (c *Configure) Run(clic *cli.Context) (err error) { | |||
|
|||
validator, err := validate.NewValidator(op, c.Data) | |||
if err != nil { | |||
op.Errorf("Configuring cannot continue - failed to create validator: %s", err) | |||
op.Errorf("Configure cannot continue - failed to create validator: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
lib/install/management/configure.go
Outdated
err = d.update(conf, settings, isConfigureOp) | ||
|
||
// If successful try to grant permissions to the ops-user | ||
// try to grant permissions to the ops-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Grant permissions to the ops-user before initializing the appliance
lib/migration/feature/feature.go
Outdated
@@ -28,6 +28,8 @@ const ( | |||
// create time is stored in nanoseconds (previously seconds) in the portlayer. | |||
ContainerCreateTimestampVersion | |||
|
|||
VMFolderSupportVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: s/VMFolderSupportVersion/VCHFolderSupportVersion perhaps?
A short comment to describe this field for posterity would be nice to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wondered whether it was a catch-all (VM) or exclusive to the VCH. I'll clarify.
1964072
to
3e2b28e
Compare
lib/install/management/configure.go
Outdated
@@ -120,6 +126,8 @@ func (d *Dispatcher) Configure(vch *vm.VirtualMachine, conf *config.VirtualConta | |||
} | |||
} | |||
|
|||
err = d.update(conf, settings, isConfigureOp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason that this happens before is that there's no way to rollback opsuser.GrantOpsUserPerms
if this fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize this issue:
Today, configure (including upgrade) is organized in the following way:
- A new configuration for the VCH is constructed based on the old configuration
and requested changes. - The new configuration is validated.
- The updated version of any ISO files being changed are uploaded.
- Resource settings are updated, and a defer is registered to undo the change
if subsequent steps fail. - A snapshot is taken.
- The VCH is updated:
a. If running, it is powered off.
b. If volume stores are being added, those are created.
c. The VCH itself is reconfigured.
d. The VCH is powered on.
e. We wait for the VCH to start and the service to be ready. - If the update is successful, we update the operations user.
- If either the update or the operations user update is unsuccessful, we undo:
a. We rollback to the snapshot we took.
b. We delete uploaded images.
c. We delete the snapshot we took.
d. The defer undoes resource settings updates.
This process has a key characteristic: if a failure occurs, we cleanly return to
the initial state.
However, this process also has a key limitation: the process will fail at (6)(d)
or (6)(e) if the operations user requires additional privileges for the basic
operation of the VCH following the configuration change or upgrade.
A simple solution to this might be to reverse the order of steps 6 and 7: update
the operations user before updating the VCH. This will address the limitation,
but will weaken the guarantee around cleanly returning to the initial state when
a failure occurs; it's not clear how to undo the changes to the operations user.
The effects of this are twofold:
- Following the rollback of a change where additional privileges were granted
to the operations user, those additional privileges are left as "cruft". - Following the rollback of a change where privileges were revoked from the
operations user, the VCH will not start.
Options to improve this might include:
- Looking into ways to undo changes to the operations user, perhaps following a
similar pattern to the resource settings. This requires reading the old state
from the system before making changes, including both the role-privilege map
and resource-role associations. - Moving the operations logic in between (6)(c) and (6)(d) instead of between
(5) and (6). This reduces the cases where we will leave cruft behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not extremely familiar with this code. But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions. @zjs why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.
In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?
Returning the operations user to the original permissions set is the desired behavior.
But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions.
This is a possible approach, but actually extracting the "before" state is non-trivial. We need to track the changes being made to the privileges for each role as well as the roles being applied to each resource. Because of the way the vSphere APIs work, it's somewhat complicated to read all of that information. Because we use share the same set of roles for multiple operations users, safely rolling back in the presence of concurrent upgrade operations may not be straight-forward.
why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.
Currently, the operations user changes is the hardest thing to roll back (now tracked by #7814), so we do those as late in the process as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #7814 (comment) to describe an alternate pattern, which is likely to be less work than tracking the old privileges/roles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think #7777 (review) needs to be resolved before merging this.
I've moved the logic that grants the ops user permissions into the |
b39ca1e
to
3420762
Compare
@@ -204,6 +204,7 @@ func (c *Configure) copyChangedConf(o *config.VirtualContainerHostConfigSpec, n | |||
if c.OpsCredentials.IsSet { | |||
o.Username = n.Username | |||
o.Token = n.Token | |||
o.GrantPermsLevel = n.GrantPermsLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether this is the behavior we want. (I'm also not sure that it isn't.)
By including o.GrantPermsLevel = n.GrantPermsLevel
in this check, if an administrator uses vic-machine configure
to change an operations user's password which previously had a GrantPermsLevel
of AddPerms
without explicitly including --ops-grant-perms
in that command we'll actually downgrade GrantPermsLevel
to ""
. I think that would be surprising behavior: change the password (e.g., due to expiration) and no longer have permissions be automatically managed.
It may be better to have a check here along the lines of clic.IsSet("ops-grant-perms")
so that we only adjust GrantPermsLevel
if --ops-grant-perms
or --ops-grant-perms=false
is included on the command line. (To do this, you'd just need to pass clic *cli.Context
into the copyChangedConf
method.) In that case, we only change the properties a user has asked us to change.
A downside of both the current implementation and this suggestion (and therefore an upside of the proposed change) is that if you change to a completely new operations user we won't clear the GrantPermsLevel
. This could also be surprising: this is a case where the user might expect us to change something even though they haven't asked us to (because they may see --ops-grant-perms
as being associated with the user, not the VCH). Unfortunately, the way configure
is designed really doesn't give us a lot of options here (and leads to similar issues for other settings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. There was no clean way to get the name of the option out of the cli flags, nor did there seem to be much reason to declare a constant for it all on its own, but if you find it irksome, I can pull it out as a constant anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just minor comments on test code.
Log Govc output: ${output} | ||
Should Be Equal As Integers ${rc} 1 | ||
Should Contain ${output} Permission to perform this operation was denied | ||
Attempt To Create Resource Pool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be Attempt To Disable DRS
, can't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be, but it was basically a merge of your additions and @anchal-agrawal's additions, so I left both Attempt To Create Resource Pool
and Attempt To Disable DRS
in there for variety. Still trying to get granted ops-user perms work after upgrade
to pass, so I'll clean it up as I go forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, disabling DRS requires less privilege than creating a resource pool. Usually disabling DRS makes a good sanity check that we're not granting the operations user privileges they don't need.
In the specific case that we're using --affinity-vm-group
, we currently have to grant the operations user the privilege that lets them disable DRS (as that's the same privilege that lets them manage affinity constructs), so we check creating a resource pool instead.
|
||
Run Privileged Commands | ||
|
||
Cleanup VIC Appliance On Test Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline at end of file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved for cherry pick into 1.4
This change fixes a bug in
vic-machine configure
that was preventing a VCH installed without an ops user enabled to be reconfigured to do so. With this change, you can now runvic-machine configure
against an existing VCH to configure the ops user credentials and permissions.Fixes #7725, #7796