-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1010: Add node disruption policies to MachineConfiguration CRD #1764
MCO-1010: Add node disruption policies to MachineConfiguration CRD #1764
Conversation
@yuqi-zhang: This pull request references MCO-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @yuqi-zhang! Some important instructions when contributing to openshift/api: |
Skipping CI for Draft Pull Request. |
@yuqi-zhang: This pull request references MCO-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@yuqi-zhang: This pull request references MCO-1010 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
type NodeDisruptionPolicyStatus struct { | ||
// clusterPolicies is a merge of cluster default and user provided node disruption policies. | ||
// +optional | ||
ClusterPolicies []NodeDisruptionPolicyConfig `json:"clusterPolicies"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status should use a different type because you're likely to grow different fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Type NodeDisruptionPolicyActionType `json:"type"` | ||
// reload specifies the service to reload, only valid if type is reload | ||
// +optional | ||
Reload *string `json:"reload,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want a struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a list so that user can specify multiple servicename together that needs to be reloaded for a NodeDisruptionPolicyType ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I opted them to be separate so a user can theoretically order them as they wish to apply in sequence with other actions
Value string `json:"value"` | ||
// actions represents the series of commands to be executed on changes to the corresponding type and value | ||
// +kubebuilder:validation:Required | ||
Actions []NodeDisruptionPolicyAction `json:"actions"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
// nodeDisruptionPolicySpec allows an admin to set granular node disruption actions for | ||
// MachineConfig-based updates, such as drains, service reloads, etc. Specifying this will allow | ||
// for less downtime when doing small configuration updates to the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make this very very clear that this doesn't apply to cluster version upgrades
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a line
// nodeDisruptionPolicySpec allows an admin to set granular node disruption actions for | ||
// MachineConfig-based updates, such as drains, service reloads, etc. Specifying this will allow | ||
// for less downtime when doing small configuration updates to the cluster. | ||
// +openshift:enable:FeatureSets=TechPreviewNoUpgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need CustomNoUpgrade
in here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
// nodeDisruptionPolicyStatus status reflects what the latest cluster-validated policies are, | ||
// and will be used by the Machine Config Daemon during future node updates. | ||
// +openshift:enable:FeatureSets=TechPreviewNoUpgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CustomNoUpgrade
in here too please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Service represents a NodeDisruption policy that is in effect for changes to a service. | ||
Service NodeDisruptionPolicyType = "service" | ||
|
||
// File represents a NodeDisruption policy that is in effect for changes to a kernel argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// File represents a NodeDisruption policy that is in effect for changes to a kernel argument. | |
// kernelArgument represents a NodeDisruption policy that is in effect for changes to a kernel argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed kargs as per review on the enhacement
Service NodeDisruptionPolicyType = "service" | ||
|
||
// File represents a NodeDisruption policy that is in effect for changes to a kernel argument. | ||
KernelArgument NodeDisruptionPolicyType = "kernelArgument" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MCO doesn't have a way today to apply kargs without reboot. Are we thinking of adding a way to apply these kargs live? If not, perhaps we may want to skip letting user skip drain/reboot in the initial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
None NodeDisruptionPolicyActionType = "none" | ||
|
||
// Special represents an action that is internal to the MCO, and is not allowed in user defined NodeDisruption policies. | ||
Special NodeDisruptionPolicyActionType = "special" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we have something in mind that MCO will utilize in the beginning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to discussion on this, I don't intend this to be user specify-able
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since removing API later is hard, maybe we can add this or similar field later on depending upon usecase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case right now is that we would like to display the current cluster defaults, so the user can see (and optionally override) the image registry logic. The initial set we decided on is to have this keyword and description (that you can see via oc describe
, etc.) to explain what the default is.
Happy to change this if there's a better route though! Mostly just wanted to be able to show the current cluster settings.
This right now is also tech preview only so we should be able to change it before GA too if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context. Let's keep it then and we can update it if needed while API will be in TechPreview. Maybe MCOInternal, MCODefault etc would provide more clarity.
// +unionDiscriminator | ||
// +kubebuilder:validation:Required | ||
Type NodeDisruptionPolicyActionType `json:"type"` | ||
// reload specifies the service to reload, only valid if type is reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Admin maybe benefited with service restart
as well as I beleive sometime just reloading service is not enough to apply the config changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, thanks!
df6de7b
to
a76a939
Compare
a76a939
to
9b02645
Compare
|
||
// nodeDisruptionPolicySpec allows an admin to set granular node disruption actions for | ||
// MachineConfig-based updates, such as drains, service reloads, etc. Specifying this will allow | ||
// for less downtime when doing small configuration updates to the cluster. This is NOT intended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might just change the way this is worded, "intended" to me sounds like we don't want you to do that, but it might work, perhaps
// This configuration has no effect on cluster upgrades which will still incur node disruption where required.
A node upgrade always reboots right? So maybe we don't even need the where required on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change, a cluster upgrade in practice almost always has an associated OS update, even if it's just a minor package bump. In all of OCP 4 I think I've seen one z-stream bump not come with an associated update, so the overlap is 0
// clusterDefaultPolicies is managed by the Machine Config Operator, and reflects the latest cluster defaults | ||
// +optional | ||
ClusterDefaultPolicies NodeDisruptionPolicyConfig `json:"clusterDefaultPolicies"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't want this in spec if it's expected to be set by the operator rather than the user right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design has it such that there is:
- a user spec, for the user to set
- a cluster spec, for the MCO to set, which may change depending on version
- a status which is the validated merge of the two
The expectation being that it is easier for the user to see what the cluster defaults are.
If we remove this from the spec, then the status will hold the cluster defaults + anything overriden by the user, which... I guess still mostly achieves our goal? If that's more aligned with API conventions, I'm happy to do it that way instead.
// +patchStrategy=merge | ||
// +listType=map | ||
// +listMapKey=type | ||
Conditions []metav1.Condition `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention, this should be the first field in the status struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the discussion we had in the call, I think this makes more sense living in the MachineConfigurationStatus
object directly, which currently has StaticPodOperatorStatus
that we were thinking of deprecating. Could you help me point to a resource on how that should be done?
} | ||
|
||
// NodeDisruptionPolicyConfig is the overall spec definition for files/units/sshkeys | ||
type NodeDisruptionPolicyConfig struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For discoverability on this API, we should drop omitempty, that way the API will be written out by the installer
nodeDisruptionPolicy:
userPolicy:
files: []
units: []
sshKey: ...
// userPolicies define user-provided node disruption policies | ||
// +optional | ||
UserPolicies NodeDisruptionPolicyConfig `json:"userPolicies"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do not have cluster scoped default policies in here, wondering if we need this extra indentation. What possible future options might live alongside these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head any potential future extensions would be with the new image based workflow. I think an example could be I could specify some additional non-machineconfig content (image based) and specify something here to indicate how it can be applied.
But maybe for now we can just collapse this to the higher level policy and add a new field when that comes into play?
|
||
// ReloadService allows the user to specify the services to be reloaded | ||
type ReloadService struct { | ||
// ServiceName is the full name (e.g. crio.service) of the service to be reloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be lower case at the start of the godoc. Similar questions above about service and the godoc here
|
||
// RestartService allows the user to specify the services to be restarted | ||
type RestartService struct { | ||
// ServiceName is the full name (e.g. crio.service) of the service to be restarted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for reload
} | ||
|
||
// NodeDisruptionPolicyActionType is a string enum used in a NodeDisruptionPolicyAction object. They describe an action to be performed. | ||
// +kubebuilder:validation:Enum:="reboot";"drain";"reload";"restart";"daemon-reload";"none";"special" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PascalCase these values please
Restart NodeDisruptionPolicyActionType = "restart" | ||
|
||
// DaemonReload represents an action that TBD | ||
DaemonReload NodeDisruptionPolicyActionType = "daemon-reload" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be DaemonReload
// None represents an action that no handling is required by the MCO. | ||
None NodeDisruptionPolicyActionType = "none" | ||
|
||
// Special represents an action that is internal to the MCO, and is not allowed in user defined NodeDisruption policies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make sure the spec for user defined does not allow this
9b02645
to
b59260a
Compare
Fixed based on comments and rebased on master |
Path string `json:"path"` | ||
// actions represents the series of commands to be executed on changes to the file at | ||
// corresponding file path. This is an atomic list, which will be validated by | ||
// the MachineConfigOperator, with any conflicts reflecting as an error in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see what a conflict would actually look like in this list, could you provide and example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this would be more around if the user sets reboot
and none
, they shouldn't be able to set other options. And that they shouldn't be able to set special
(but this point isn't really relevant since we won't expose it in spec anymore after removing cluster defaults)
So I guess we could instead have a validation for list items, and if reboot
or none
exists they need to be the only entry. The other consideration is that if we want to modify rules in the future and expand on the actions, we'll probably need to have some validation in the MCO? Although that's a bit vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in slack, we can use CEL validation to enforce that reboot or none are singletons, something along the lines of the ternary below
+kubebuilder:validation:XValidation:rule="self.exists(x, x == 'Reboot') ? size(self) == 1 : true"
d583e3b
to
e59c247
Compare
Pushed some changes based on the discussion above, now only the user spec should exist in spec, and status will reflect cluster defaults. Added some more validation, and also removed the original StaticPodOperatorStatus. Will open a separate PR to tombstone those fields, so this might not pass CI atm. |
e59c247
to
f455c3e
Compare
e726779
to
4751043
Compare
4751043
to
55178eb
Compare
@JoelSpeed I believe @djoshy has covered the latest round of reviews (thanks!). There's just the open question about tombstone-ing the static operator status. Could this PR merge without that or would that have to go in first? FWIW I did another brief check and nothing is currently depending on this (the CRD itself was added to the MCO in 4.15 but no objects exist and no other repo is referring to it outside of the API and MCO) (latest update was to remove a testfile which was accidentally added) |
55178eb
to
5574447
Compare
/cc @hexfusion |
- lastTransitionTime | ||
- message | ||
- reason | ||
- status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to required fields, has this CRD been included in payload yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place this is currently referenced is by the MCO code, where I think we technically bring in the CRD. (So I think it does exist in clusters 4.15? and up, but no objects)
What is making this generated field required now and is that an issue?
} | ||
|
||
type MachineConfigurationStatus struct { | ||
StaticPodOperatorStatus `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we revert this for now and handle separately? We can't merge this before the handling of removing the operator status in this hybrid state
// NodeDisruptionPolicyFile is a file entry and corresponding actions to take | ||
type NodeDisruptionPolicyFile struct { | ||
// path is the file path to a file on disk managed through a MachineConfig. | ||
// Actions specified will be applied when changes to the file at the path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not resolved
// +kubebuilder:validation:XValidation:rule=`self.matches('\\.(service|socket|device|mount|automount|swap|target|path|timer|snapshot|slice|scope)$')`, message="Invalid ${SERVICETYPE} in service name. Expected format is ${NAME}${SERVICETYPE}, where ${SERVICETYPE} must be one of \".service\", \".socket\", \".device\", \".mount\", \".automount\", \".swap\", \".target\", \".path\", \".timer\",\".snapshot\", \".slice\" or \".scope\"." | ||
// +kubebuilder:validation:XValidation:rule=`self.matches('^[a-zA-Z0-9:._\\\\-]+\\..')`, message="Invalid ${NAME} in service name. Expected format is ${NAME}${SERVICETYPE}, where {NAME} must be atleast 1 character long and can only consist of alphabets, digits, \":\", \"-\", \"_\", \".\", and \"\\\"" | ||
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MaxLength=255 | ||
ServiceName string `json:"serviceName"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is re-used several times, you could create a type alias for this and assign the validations to that instead, not blocking though. Similar to how you have done the action types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, moved to a new type
actions: | ||
- type: DaemonReload | ||
- type: Reload | ||
expectedError: "Reload is required when type is reload, and forbidden otherwise" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong way around, think about it from an end user perspective, they see the field as lower r and the type as upper r
expectedError: "Reload is required when type is reload, and forbidden otherwise" | |
expectedError: "reload is required when type is Reload, and forbidden otherwise" |
- type: Drain | ||
- type: Restart | ||
restart: | ||
serviceName: a.b.c.d.e.snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, should include new line char as final character in every file
// +kubebuilder:validation:XValidation:rule=`self.matches('^[a-zA-Z0-9:._\\\\-]+\\..')`, message="Invalid ${NAME} in service name. Expected format is ${NAME}${SERVICETYPE}, where {NAME} must be atleast 1 character long and can only consist of alphabets, digits, \":\", \"-\", \"_\", \".\", and \"\\\"" | ||
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MaxLength=255 | ||
Name string `json:"name"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeDisruptionPolicyServiceName?
// +kubebuilder:validation:XValidation:rule=`self.matches('^[a-zA-Z0-9:._\\\\-]+\\..')`, message="Invalid ${NAME} in service name. Expected format is ${NAME}${SERVICETYPE}, where {NAME} must be atleast 1 character long and can only consist of alphabets, digits, \":\", \"-\", \"_\", \".\", and \"\\\"" | ||
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:MaxLength=255 | ||
Name string `json:"name"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeDisruptionPolicyServiceName?
} | ||
|
||
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Reload' ? has(self.reload) : !has(self.reload)",message="reload is required when type is Reload, and forbidden otherwise" | ||
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Restart' ? has(self.restart) : !has(self.restart)",message="Restart is required when type is restart, and forbidden otherwise" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Restart' ? has(self.restart) : !has(self.restart)",message="Restart is required when type is restart, and forbidden otherwise" | |
// +kubebuilder:validation:XValidation:rule="has(self.type) && self.type == 'Restart' ? has(self.restart) : !has(self.restart)",message="restart is required when type is Restart, and forbidden otherwise" |
6a403bb
to
5609243
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Will follow up separately on the static pod status removal, build out on this first and transition to metav1.Condition before we ship.
/test verify |
Add a new sub-spec/status to the MachineConfiguration operator object, which will allow users to specify actions to take when small MachineConfig updates happen to the cluster. This will be behind a NodeDisruptionPolicy featuregate, and will be managed and consumed by the Machine Config Operator in-cluster.
The new NodeDisruption status objects contain a "special" action type that will only be used by the MCO's controller to indicate some internal actions. They are not part of the NodeDisruptionPolicyConfig object and cannot be set by the user.
- don't remove staticpodoperatorstatus for now - update godocs to be more clear - add a type alias for serviceName
5609243
to
4bdf2e3
Compare
Whoops, was failing verify since it wasn't rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@yuqi-zhang: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-cluster-config-api-container-v4.16.0-202403230015.p0.g2252c7a.assembly.stream.el9 for distgit ose-cluster-config-api. |
Draft API of openshift/enhancements#1525
This extension of MachineConfiguration object allows uses to specify how their MachineConfig object changes affect node disruption, allowing for non-drain and non-reboot updates to some config files. The MachineConfigController and MachineConfigDaemon will ultimately be implementing and executing on this object.
Also currently based on #1672 by @djoshy since that should probably merge first
Major questions:
such that the list doesn't have unique keys (it would be type + value, but value isn't required for all types so it cannot be made into a listMapKey). Maybe it would be better to make it more MachineConfig like and have it instead be:
for the actions, we have the same issue, as we'd like to allow users to specify multiple actions with the same key
So there is not a unique key here either.
Currently the proposed set up is, actions are union discriminators (so if you reload, you have to specify the services to reload), type/value pairs are also validated, and otherwise there is no verification on the lists. The MachineConfigController will do the rest of the validation, so a split of responsibilities.
Alternatively, we could either:
controller validation and status:
Currently we list out clusterDefaultPolicies in the spec, which will be populated by the MCController every time it updates. Users should not modify that sub-spec, but since the daemon will only apply what's in status (so validated version) the current approach has no additional gating and just has the controller overwrite any changes before applying to status.
Is there a way we would be able to let just the MCController (and not say, someone using the MCC service account or other admin privs) write that sub-object? It should be mutate-able, just not by anything other than the MCO/MCC