Add PinnedImageSet crd, controller and prefetch manager #4094

hexfusion · 2024-01-03T18:59:27Z

This PR implements openshift/enhancements#1481

This PR adds.

PinnedImageSet Controller
PinnedImageSet CRD
Prefetch Manager

The PinnedImageSetController reconciles against two desired states.

Defining the CRI-O pinned_images configuration via MachineConfig. This is populated via the PinnedImageSet CRD[2].
prefetching images: A secondary controller located in MCD is tasked with ensuring the images defined by PinnedInameSet are pulled. Currently the results of this operation are reported via node annotation. In the future, this will probably be reported via MachineConfigNode status. Once the nodes in the pool defined by the CR have completed the Status is updated to reflect.

The Prefetch Manager worker pool ensures that.

Adequate storage is available for the images before they are pulled.
Ensures that images are not available locally before it requests they are pulled.
One single worker is deployed on control-plane nodes to reduce I/O disruptions.
Pull failures are retried max 5
Image Pull requests are done via CRI gRPC client using the same method and the Kubelet.
Authentication is provided where appropriate for images.

Additional Logic.

postAction result of the configuration being written is a CRI-O reload.

Considerations

Because we are pulling possibly a large number of images there is a concern about how that could affect the control-plane. For this reason only a single worker is deployed on a master node. Each image is pulled serial with a 1s cool down period. But this still results in noticeable I/O. This is a basic idle AWS cluster. While this latency on its own is not an issue under load it should be a consideration. Current proposed mitigations include exposing knobs around concurrency and the throttle duration.

example CR

apiVersion: machineconfiguration.openshift.io/v1
kind: PinnedImageSet
metadata:
 name: worker-test
spec:
 machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
 pinnedImages:
   - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7aa95f32af51fc7892546a1e028808ec1bab1e507cf671b88d8280d2521e61d6
   - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d98ddbe73bda2ffed4d1aeb52be0500b8f8fe870cb465a8bb0cb113f7ed5ade3

ref.
[1] MCO-838 https://issues.redhat.com//browse/MCO-838
[2] openshift/api#1713

Blocked by
https://issues.redhat.com/browse/OCPNODE-1986

openshift-ci · 2024-01-03T18:59:35Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

hexfusion · 2024-01-03T19:13:18Z

/test all

hexfusion · 2024-01-05T21:10:55Z

/test all

hexfusion · 2024-01-10T23:17:30Z

/test all

cdoern

this looks pretty clean, do you think the controller will need any new RBAC?

pkg/controller/pinned-image-set/pinned_image_set_controller.go

cdoern

looks good! Left a few comments about API calls. Will give this another pass soon.

pkg/controller/pinned-image-set/pinned_image_set_controller.go

cdoern · 2024-01-31T13:31:42Z

pkg/controller/pinned-image-set/pinned_image_set_controller.go

+			if isNotFound {
+				_, err = ctrl.mcfgClient.MachineconfigurationV1().MachineConfigs().Create(context.TODO(), mc, metav1.CreateOptions{})
+			} else {
+				_, err = ctrl.mcfgClient.MachineconfigurationV1().MachineConfigs().Update(context.TODO(), mc, metav1.UpdateOptions{})


the preferred mechanism is patch I believe. You can look around at how to make a sonmergepatch.CreateThreeWayJSONMergePatch(curJSON, modJSON, curJSON) and then pass this output to .Patch rather than .Update.

There are some scenarios where you want to use update but I am forgetting if this falls into those.

Sounds good I will dig into it to ensure correctness this is copy pasta from an existing controller in this repo.

pkg/controller/pinned-image-set/pinned_image_set_controller.go

cdoern · 2024-01-31T13:42:05Z

pkg/controller/pinned-image-set/pinned_image_set_controller.go

+			return nil
+		}
+
+		_, err = ctrl.mcfgClient.MachineconfigurationV1().PinnedImageSets().UpdateStatus(context.TODO(), newImageSet, metav1.UpdateOptions{})


UpdateStatus is right here as opposed to patch. Though you might need the rbac for pinnedimagesets/status specifically? I have run into this before where it does not allow me to update status unless I have this role.

ok mcc rbac I believe is inclusive but will doublecheck.

- apiGroups: ["machineconfiguration.openshift.io"] resources: ["*"] verbs: ["*"]

pkg/controller/pinned-image-set/pinned_image_set_controller.go

pkg/daemon/daemon.go

pkg/daemon/update.go

pkg/controller/common/helpers.go

hexfusion · 2024-02-06T16:14:01Z

pkg/daemon/daemon.go

+
+	// minFreeStorageAfterPrefetch is the minimum amount of storage in bytes available on the root filesystem
+	// after prefetching images.
+	minFreeStorageAfterPrefetch int64 = 32 * 1024 * 1024 * 1024 // 32GB


What would use cases for this feature look like environment-wise? Is the expectation that if they are resource-limited, they really shouldn't be pre-pulling images?

It's a good to have a safeguard I think but maybe 32 is a bit high?

Not sure if we already collect some metrics around available free spaces for cluster for which it is targeted. If it exist, that will help to guess this value better. Free space is relative based on what kind of application is running on a cluster. A storage hungry application can run into no space sooner.

hexfusion · 2024-02-06T20:22:18Z

/assign

yuqi-zhang

the overall controller/daemon logic seems sound so far, some initial questions inline

haven't dug into details onto how the prefetch manager actually works, but I assume it's relatively disruption-proof?

since it somewhat runs independently, I'm just curious what happens if e.g. a machineconfig update comes in mid-way of image pulls and stops the daemon/reboots the node. I assume the aborted pull will just try from the start?

yuqi-zhang · 2024-02-06T23:58:56Z

pkg/daemon/daemon.go

+
+	// minFreeStorageAfterPrefetch is the minimum amount of storage in bytes available on the root filesystem
+	// after prefetching images.
+	minFreeStorageAfterPrefetch int64 = 32 * 1024 * 1024 * 1024 // 32GB


What would use cases for this feature look like environment-wise? Is the expectation that if they are resource-limited, they really shouldn't be pre-pulling images?

It's a good to have a safeguard I think but maybe 32 is a bit high?

yuqi-zhang · 2024-02-07T00:00:03Z

pkg/daemon/prefetch_manager.go

+}
+
+func (p *PrefetchManager) sync(key string) error {
+	klog.Infof("Syncing PinnedImageSet %q", key)


reminder to remove before merge or change verbosity

yuqi-zhang · 2024-02-07T00:03:26Z

pkg/controller/pinned-image-set/pinned_image_set_controller.go

+
+	for _, node := range nodes {
+		if !ctrl.isPrefetchCompleteForNode(node, imageSet) {
+			// If prefetch is not complete fail fast and requeue the PinnedImageSet


Could you help me understand this a bit. In the controller logic, you first ensurePinnedImageSet which deploys the machineconfig, then immediately after that sync this.

I assume expectation is that the image pulls should be taking awhile to complete, so is the expectation that the controller will be in an error state until the daemons are done?

Good point, the expectation is that it should be in a Progressing state. As the error is expected. So we should adjust that.

+1. We may want to show InProgress state to show that ImagePrefetch is progressing instead of an error.

yuqi-zhang · 2024-02-07T00:08:30Z

pkg/daemon/prefetch_manager.go

+	p.taskManager.add(imageSet, cancel)
+	defer p.taskManager.cancel(imageSet)
+
+	err = p.startWorkerPool(ctx, prefetchImages)


So, if I understand this correctly, the daemons are reacting to the pinnedimageset objects directly, and self-determining whether they should be pulling an image. Thus there are two processes happening in parallel:

the MCC rendering the new machineconfig and the MCD main node sync reacting to that

the MCD reading newly added pinnedimagesets and pulling images

Is there a strict dependency on that ordering? Is there any existing guard (sorry if I missed it) for that? And is there any additional pinnedimageset correctness needed to be processed by the controller before the daemon starts?

I guess the thought experiment is a large cluster with hundreds of nodes. While you only reload the crio daemon, each node is still sequentially processing the update with some built it delay of the machineconfig to enable pinnedimagesets (the crio toml file). This can take hours on large enough clusters, but I assume each daemon process running this would start the image pull already and potentially finish by the time the crio config updates.

yuqi-zhang · 2024-02-07T00:10:33Z

pkg/controller/pinned-image-set/pinned_image_set_controller.go

+}
+
+// getMachineConfigKey returns the managed key for the machine config
+func getMachineConfigKey(pool *mcfgv1.MachineConfigPool, client mcfgclientset.Interface, imageSetOrig *mcfgv1.PinnedImageSet) (string, error) {


I think this will work like the crio/kubelet configuration rendering, meaning that custom pool config > worker pool config now (but if you don't define a pinned image set for, say, your infra node, it will inherit worker configs and still try to pull as if it was a worker).

That's probably the expected behaviour but wanted to check explicitly

Right since today. configs are deployed on a pool level I don't feel it makes sense for this controller to act in a different way. My understanding is that you can create a custom pool for dedicated to a certain purpose "infra"? In that case the config could be deployed to only those nodes which are pool members.

go.mod

pkg/daemon/update.go

sinnykumari

Overall this looks great. Few overall question as I may have missed while briefly skimming through code:

What happens to prefecthed pinnedImages which are no longer reference in the when user removes few from PinnedImageSet CRO?
Do we want to add some sort of validation check to ensure that all images referenced in the PinnedImageSet are by hash and not tag?
3.This can happen in a separate PR, how about adding an e2e test for this feature? It can go in existing e2e-gcp-op. If it adds considerable amount of time for the test, we can do it a separate e2e test.

hexfusion · 2024-02-26T17:23:34Z

1.) What happens to prefecthed pinnedImages which are no longer reference in the when user removes few from PinnedImageSet CRO?

Unpinned images are subject to future pruning/wipe. The scope of this feature does not include a pruning mechanism.

2.) Do we want to add some sort of validation check to ensure that all images referenced in the PinnedImageSet are by hash and not tag?

This is built into the API-level validation pattern.

// +kubebuilder:validation:Pattern:=`@sha256:[a-fA-F0-9]{64}$`
type PinnedImageRef string

3.This can happen in a separate PR, how about adding an e2e test for this feature? It can go in existing e2e-gcp-op. If it adds considerable amount of time for the test, we can do it a separate e2e test.

Sounds good

- add gRPC-go - add k8s.io/cri-api Signed-off-by: Sam Batschelet <[email protected]>

openshift-ci · 2024-02-28T13:09:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hexfusion
Once this PR has been reviewed and has the lgtm label, please assign sinnykumari for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-02-28T13:31:43Z

@hexfusion: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/unit	`e177445`	link	true	`/test unit`
ci/prow/verify	`e177445`	link	true	`/test verify`
ci/prow/okd-scos-images	`e177445`	link	true	`/test okd-scos-images`
ci/prow/e2e-aws-ovn-upgrade	`e177445`	link	true	`/test e2e-aws-ovn-upgrade`
ci/prow/e2e-aws-ovn	`e177445`	link	true	`/test e2e-aws-ovn`
ci/prow/okd-images	`e177445`	link	false	`/test okd-images`
ci/prow/e2e-hypershift	`e177445`	link	true	`/test e2e-hypershift`
ci/prow/images	`e177445`	link	true	`/test images`
ci/prow/e2e-gcp-op-single-node	`e177445`	link	true	`/test e2e-gcp-op-single-node`
ci/prow/e2e-gcp-op	`e177445`	link	true	`/test e2e-gcp-op`
ci/prow/e2e-azure-ovn-upgrade-out-of-change	`e177445`	link	false	`/test e2e-azure-ovn-upgrade-out-of-change`
ci/prow/okd-scos-e2e-aws-ovn	`e177445`	link	false	`/test okd-scos-e2e-aws-ovn`
ci/prow/e2e-aws-ovn-upgrade-out-of-change	`e177445`	link	false	`/test e2e-aws-ovn-upgrade-out-of-change`
ci/prow/e2e-gcp-op-techpreview	`e177445`	link	false	`/test e2e-gcp-op-techpreview`
ci/prow/bootstrap-unit	`e177445`	link	false	`/test bootstrap-unit`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

hexfusion · 2024-02-28T13:33:43Z

updating api deps

rioliu-rh · 2024-03-12T07:02:42Z

FYI, when the code is ready for testing, let's us know @sergiordlr @rioliu-rh @ptalgulk01 and hold this PR, THX

openshift-merge-robot · 2024-03-12T07:02:50Z

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hexfusion · 2024-03-13T17:00:37Z

This PR was a WIP test. A new PR with updated apis and intent will follow shortly

/close

openshift-ci · 2024-03-13T17:01:58Z

@hexfusion: Closed this PR.

In response to this:

This PR was a WIP test. A new PR with updated apis and intent will follow shortly

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2024-04-15T17:30:02Z

@hexfusion: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

In response to this:

This PR implements openshift/enhancements#1481

This PR adds.

PinnedImageSet Controller

PinnedImageSet CRD

Prefetch Manager

The PinnedImageSetController reconciles against two desired states.

Defining the CRI-O pinned_images configuration via MachineConfig. This is populated via the PinnedImageSet CRD[2].

prefetching images: A secondary controller located in MCD is tasked with ensuring the images defined by PinnedInameSet are pulled. Currently the results of this operation are reported via node annotation. In the future, this will probably be reported via MachineConfigNode status. Once the nodes in the pool defined by the CR have completed the Status is updated to reflect.

The Prefetch Manager worker pool ensures that.

Adequate storage is available for the images before they are pulled.

Ensures that images are not available locally before it requests they are pulled.

One single worker is deployed on control-plane nodes to reduce I/O disruptions.

Pull failures are retried max 5

Image Pull requests are done via CRI gRPC client using the same method and the Kubelet.

Authentication is provided where appropriate for images.

Additional Logic.

postAction result of the configuration being written is a CRI-O reload.

Considerations

Because we are pulling possibly a large number of images there is a concern about how that could affect the control-plane. For this reason only a single worker is deployed on a master node. Each image is pulled serial with a 1s cool down period. But this still results in noticeable I/O. This is a basic idle AWS cluster. While this latency on its own is not an issue under load it should be a consideration. Current proposed mitigations include exposing knobs around concurrency and the throttle duration.

example CR
apiVersion: machineconfiguration.openshift.io/v1
kind: PinnedImageSet
metadata:
name: worker-test
spec:
machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: ""
pinnedImages:
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7aa95f32af51fc7892546a1e028808ec1bab1e507cf671b88d8280d2521e61d6
  - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d98ddbe73bda2ffed4d1aeb52be0500b8f8fe870cb465a8bb0cb113f7ed5ade3
ref.
[1] MCO-838 https://issues.redhat.com//browse/MCO-838
[2] openshift/api#1713

Blocked by
https://issues.redhat.com/browse/OCPNODE-1986

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 3, 2024

hexfusion force-pushed the hack/pinned-set branch from 3dbd001 to 1741a6e Compare January 10, 2024 22:59

hexfusion changed the base branch from master to release-4.16 January 10, 2024 23:27

hexfusion changed the base branch from release-4.16 to master January 10, 2024 23:28

hexfusion force-pushed the hack/pinned-set branch from ddcc3e0 to 4667239 Compare January 17, 2024 19:35

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 17, 2024

cdoern reviewed Jan 23, 2024

View reviewed changes

pkg/controller/pinned-image-set/pinned_image_set_controller.go Outdated Show resolved Hide resolved

hexfusion force-pushed the hack/pinned-set branch from 4667239 to 2881ef8 Compare January 24, 2024 04:40

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 24, 2024

hexfusion force-pushed the hack/pinned-set branch 5 times, most recently from 9964213 to cd48caa Compare January 30, 2024 05:55

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2024

hexfusion force-pushed the hack/pinned-set branch from cd48caa to 7ff9e78 Compare January 30, 2024 06:05

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2024

hexfusion marked this pull request as ready for review January 30, 2024 06:05

openshift-ci bot requested review from cgwalters and cheesesashimi January 30, 2024 06:06

hexfusion changed the title ~~[wip]: PinnedImageSet~~ [MCO-838] Add PinnedImageSet crd, controller and prefetch manager Jan 31, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 31, 2024

hexfusion force-pushed the hack/pinned-set branch from 7ff9e78 to 15f703d Compare January 31, 2024 04:24

cdoern reviewed Jan 31, 2024

View reviewed changes

sinnykumari reviewed Jan 31, 2024

View reviewed changes

pkg/daemon/daemon.go Outdated Show resolved Hide resolved

sinnykumari reviewed Jan 31, 2024

View reviewed changes

pkg/daemon/update.go Outdated Show resolved Hide resolved

hexfusion force-pushed the hack/pinned-set branch from 65cf948 to e235223 Compare February 2, 2024 17:41

hexfusion commented Feb 6, 2024

View reviewed changes

pkg/controller/common/helpers.go Outdated Show resolved Hide resolved

hexfusion commented Feb 6, 2024

View reviewed changes

openshift-ci bot assigned hexfusion Feb 6, 2024

yuqi-zhang reviewed Feb 7, 2024

View reviewed changes

sinnykumari reviewed Feb 7, 2024

View reviewed changes

go.mod Show resolved Hide resolved

sinnykumari reviewed Feb 7, 2024

View reviewed changes

pkg/daemon/update.go Outdated Show resolved Hide resolved

sinnykumari reviewed Feb 7, 2024

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2024

hexfusion added 4 commits February 28, 2024 08:09

make go-deps

90b5b2c

- add gRPC-go - add k8s.io/cri-api Signed-off-by: Sam Batschelet <[email protected]>

mcc: add PinnedImageSet controller and crd

54800f0

daemon: add prefetch manager

d8002ff

daemon: add CRI gRPC client

c8355a6

hexfusion force-pushed the hack/pinned-set branch from b1cd17c to c8355a6 Compare February 28, 2024 13:09

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 28, 2024

vendor: tidy

e177445

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 12, 2024

openshift-ci bot closed this Mar 13, 2024

hexfusion changed the title ~~MCO-1017: MCO-1018 MCO-1019: MCO-1020: MCO-1021 Add PinnedImageSet crd, controller and prefetch manager~~ Add PinnedImageSet crd, controller and prefetch manager Apr 15, 2024

openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 15, 2024

Add PinnedImageSet crd, controller and prefetch manager #4094

Add PinnedImageSet crd, controller and prefetch manager #4094

Conversation

hexfusion commented Jan 3, 2024 • edited Loading

This PR implements openshift/enhancements#1481

Considerations

example CR

openshift-ci bot commented Jan 3, 2024

hexfusion commented Jan 3, 2024

hexfusion commented Jan 5, 2024

hexfusion commented Jan 10, 2024

cdoern left a comment

Choose a reason for hiding this comment

cdoern left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hexfusion Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hexfusion commented Feb 6, 2024

yuqi-zhang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sinnykumari left a comment

Choose a reason for hiding this comment

hexfusion commented Feb 26, 2024 • edited Loading

openshift-ci bot commented Feb 28, 2024

openshift-ci bot commented Feb 28, 2024

hexfusion commented Feb 28, 2024

rioliu-rh commented Mar 12, 2024 • edited Loading

openshift-merge-robot commented Mar 12, 2024

hexfusion commented Mar 13, 2024

openshift-ci bot commented Mar 13, 2024

openshift-ci-robot commented Apr 15, 2024

This PR implements openshift/enhancements#1481

Considerations

example CR

hexfusion commented Jan 3, 2024 •

edited

Loading

hexfusion Jan 31, 2024 •

edited

Loading

hexfusion commented Feb 26, 2024 •

edited

Loading

rioliu-rh commented Mar 12, 2024 •

edited

Loading