Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teach installer to specify OS version through oscontainer image URLs #281

Closed
jlebon opened this issue Sep 19, 2018 · 31 comments
Closed

Teach installer to specify OS version through oscontainer image URLs #281

jlebon opened this issue Sep 19, 2018 · 31 comments

Comments

@jlebon
Copy link
Member

jlebon commented Sep 19, 2018

As mentioned in #267, the latest public RHCOS AMI will likely not be the latest RHCOS we want. Right now, MachineConfigs hardcode ://dummy as the OSImageURL: https://github.com/openshift/machine-config-operator/blob/a91ab5279755f87f0953f294e9add7584761a489/pkg/controller/template/render.go#L208. This should instead be the URL to the oscontainer image the installer would like to have installed. In more concrete terms: we could e.g. hardcode it for now as is currently done in ami.go for AMIs until we're ready to just always pick the latest, but also have something analogous to ec2AMIOverride, e.g. osImageURLOverride?

The MCD will then immediately update the node to the desired oscontainer upon startup.

@wking
Copy link
Member

wking commented Sep 19, 2018

As mentioned in #267, the latest public RHCOS AMI will likely not be the latest RHCOS we want

Is this AMI-specific? Why would the AMI lag, but the libvirt image be current?

This should instead be the URL to the oscontainer image the installer would like to have installed.

This would effectively give users two knobs for the same thing:

  • The AMI (or libvirt image, etc.?)
  • A URL for the target OSTree (assuming an OSTree-capabale AMI)

Do we need both of those knobs? If AMI lag isn't too great, it seems like we could stick with just the AMI as the knob, and let post-install tooling like the machine API handle subsequent modifications. If the AMI is just an OSTree-capable stub (i.e. without some packages needed to run the installer/cluster), then can we just remove the AMI/libvirt-image knob completely and automatically pull an OSTree-capable stub without giving the user a choice (they could still choose their target OSTree)?

How does this all fit in with release tracks (tested, etc.) and automatic upgrades? Can the installer continue to ignore those (and leave them to other operators to handle or not)?

@jlebon
Copy link
Member Author

jlebon commented Sep 19, 2018

Is this AMI-specific? Why would the AMI lag, but the libvirt image be current?

I think the long term plan is that only some (regularly updated, but "outdated") snapshots of RHCOS AMIs will be made public. So the installer needs to be able to work from a public RHCOS AMI and bring it up to date. Libvirt images are available internally just for development. (And I don't doubt we'll keep pushing out up to date private RHCOS AMIs as well).

Do we need both of those knobs? If AMI lag isn't too great, it seems like we could stick with just the AMI as the knob, and let post-install tooling like the machine API handle subsequent modifications.

Yeah, the AMI should be fully capable of joining the cluster seeing as it's just an older version of RHCOS. I guess the way I was thinking about it is that we should care less about the AMI used to bootstrap the cluster and more about the oscontainer we want to install since that's actually what will run the cluster. It seems like that setting should belong in examples/aws.yaml, right?

/cc @ashcrow to keep me honest

@jlebon
Copy link
Member Author

jlebon commented Sep 19, 2018

Related issue on the OS side: openshift/os#307

@cgwalters
Copy link
Member

I think big picture we want to always pivot early on in the installation.

I am working on redoing the os build process to use pivot.

@wking
Copy link
Member

wking commented Sep 20, 2018

I think big picture we want to always pivot early on in the installation.

Ok. So are folks comfortable removing the AMI knob (and the libvirt image knob?) and replacing it (them) with oscontainer image URLs? The installer would just pick something sane for the initial image, and callers wanting a specific eventual OS would need to use override the default oscontainer image URL.

I guess you could point that at a local (caching) registry if you wanted to efficiently launch multiple nodes. And once the cluster comes up, maybe the machine-config operator could point new nodes at image streams in the cluster's own registry.

@cgwalters
Copy link
Member

Ok. So are folks comfortable removing the AMI knob (and the libvirt image knob?)

Yes, though we will have changes that require updating those (in fact the recent partitioning switch is one) but those should become infrequent - let's say we limit ourselves to only respinning the AMI once every 6 months or so.

@ashcrow
Copy link
Member

ashcrow commented Sep 20, 2018

/cc @mrguitar

@jlebon
Copy link
Member Author

jlebon commented Sep 20, 2018

So are folks comfortable removing the AMI knob (and the libvirt image knob?) and replacing it (them) with oscontainer image URLs?

Yeah, that sounds great!

So just to be explicit here, do we agree that the installer should take care of the initial pivot? It seems like leaving it up to the MCO would be more disruptive to the cluster than doing it upfront when not much is installed yet?

@ashcrow
Copy link
Member

ashcrow commented Sep 20, 2018

I think doing it upfront would be ideal. I wouldn't make it a hard requirement though.

@wking
Copy link
Member

wking commented Sep 20, 2018

So just to be explicit here, do we agree that the installer should take care of the initial pivot?

Yes, although since #119 we are no longer handling workers in the installer, so this would just be for masters (and bootstrap?).

It seems like leaving it up to the MCO would be more disruptive to the cluster than doing it upfront when not much is installed yet?

I'm also concerned about agility without an AMI knob. A really old AMI may not be capable of running a modern master. If the installer pivots masters early, we don't have to worry about that.

@jlebon
Copy link
Member Author

jlebon commented Sep 20, 2018

Yes, although since #119 we are no longer handling workers in the installer, so this would just be for masters (and bootstrap?).

Hmm, I'm not very familiar with the k8s cluster API, though are there hooks somewhere where we could pivot the workers before going any further?

@dustymabe
Copy link
Member

my personal take on how we should implement "pivoting" a slightly out of date starting image is by doing the pivot in ignition in the initramfs before the system boots completely

That being said I started a recent mail list thread about the priority of doing such work. I don't think it is something we have prioritized for our first deliverable of this new platform. Please respond to the mail so we can discuss further.

@cgwalters
Copy link
Member

my personal take on how we should implement "pivoting" a slightly out of date starting image is by doing the pivot in ignition in the initramfs before the system boots completely

Hmm. I understand that we want to enable e.g. Ignition disks so that people can customize the FS layout etc, and that will require a lot more of libostree in the initramfs. But today there's only a tiny ostree-prepare-root binary in the initramfs that actually doesn't link to libostree even.

Pulling in the full pivot code would require the whole container runtime plus rpm-ostree which....is a lot more.

@dustymabe
Copy link
Member

Agree it would add more complexity to the initramfs but would solve some problems (i.e. ignition run twice). I still think I'd rather see it there since if it's in the base it will definitely benefit anyone else wanting similar functionality.

Of course this is a future goal, so subject to change based on what needs pop up.

@ashcrow
Copy link
Member

ashcrow commented Oct 30, 2018

I think doing it upfront would be ideal. I wouldn't make it a hard requirement though.

I said the above. Though at this point I don't think we should focus on an early pivot as the cluster can provide the needed information for updating already. With that said, this should stay on the radar as it is a good feature to look at adding in the future.

@cgwalters
Copy link
Member

Since we last discussed this, pipeline v2 landed and the installer uses that, and installer releases are likely to pin to a RHCOS version.

(Will the installer also pin to release payloads?)

This ://dummy thing confuses me for some reason; having the magic value be the empty string would help me, not sure I can explain why.

@jlebon
Copy link
Member Author

jlebon commented Nov 20, 2018

(Will the installer also pin to release payloads?)

That was my understanding. Then the container image URL from the release payload would be used as the osImageURL of the base MC.

@wking
Copy link
Member

wking commented Nov 29, 2018

We have #732 in the merge queue to pin release images now. What do we need to do to get that wired up to the pivoting code? If we're not handilng pivots in Ignition, what should the installer be doing to get these pivots to happen? Note that we no longer touch worker nodes at all, we just push in machine-set configs like this and the cluster-API providers (like this) create the workers for us. We just use a pointer Ignition there, the real ignition is served by the machine-config operator. It looks like the daemon has a dummy:// workaround at the moment, but openshift/machine-config-operator#186 is in flight to remove that. Is there anything else we need to do, @cgwalters? Are we supposed to be pushing a config that sets OSImageURL? Is the cluster-version operator supposed to push that... to each node? I'm missing something...

@dustymabe
Copy link
Member

I think you're looking for @cgwalters

@cgwalters
Copy link
Member

BTW this issue duplicates openshift/machine-config-operator#183

Let's try to standardize on some terminology to reduce confusion - the "image" term is ambiguous between VM and container images. "release images" correspondingly could refer to the CVO release payload or to the RHCOS releases.

My proposal is:

  • bootimage (RHCOS build)
  • release payload (CVO)

Then an important bridge between the two is the "oscontainer". A RHCOS build also contains an oscontainer with the same content.

I think the release payload should define the oscontainer.

@wking
Copy link
Member

wking commented Nov 29, 2018

I think the release payload should define the oscontainer.

Does it not do this now? Maybe the oscontainer repo needs an images CI job? @smarterclayton?

@wking
Copy link
Member

wking commented Nov 29, 2018

And once we get it, how does that info get from the release payload into the machine-control configs (or wherever it needs to go to make pivots happen)?

@cgwalters
Copy link
Member

cgwalters commented Nov 29, 2018

Does it not do this now? Maybe the oscontainer repo needs an images CI job? @smarterclayton?

It doesn't for a few reasons - for RHCOS today we have a "unified" build process that does all of the bootimages (openstack/libvirt/AMI) and generates the oscontainer. The way the oscontainer is constructed today needs privileges - and generating the bootimages also hard requires KVM. To simplify the model we now support only needing /dev/kvm.

My thought is that we have a push to the release payload from the RHCOS pipeline.

@cgwalters
Copy link
Member

And once we get it, how does that info get from the release payload into the machine-control configs (or wherever it needs to go to make pivots happen)?

The strawman in this issue was that it's a ConfigMap that gets merged into the configs.

@jlebon
Copy link
Member Author

jlebon commented Nov 29, 2018

Just to elaborate, the current plan is:

  • add CI to the RHCOS pipeline so that we test new builds against the latest release.
  • on pass, we update the release payload by updating a ConfigMap
  • this ConfigMap is read by the MCC and subsequently the osImageURL makes it to the MCD which reacts as needed.

BTW this issue duplicates openshift/machine-config-operator#183

Yeah, might be better at this point to close this issue and keep discussions there. Eating my words from this morning, I'm not sure if the installer actually needs to do anything to get this wired up since it's going through the release payload -> CVO -> MCC. (Well, there might be changes needed to the installer once we add some RHCOS build tagging so it only selects the latest passed builds, but that's not what this ticket was originally opened for).

@wking
Copy link
Member

wking commented Nov 29, 2018

My thought is that we have a push to the release payload from the RHCOS pipeline.

Or a payload CI just fetches the latest oscontainer digest from a well-known location (like the installer currently does for AMIs) and bakes it into the release payload when the tests pass?

@cgwalters
Copy link
Member

Or a payload CI just fetches the latest oscontainer digest from a well-known location (like the installer currently does for AMIs) and bakes it into the release payload when the tests pass?

That's a better phrasing yes - when I said "push" I was more thinking "submit a PR" which gets e2e tested like any other PR.

@wking
Copy link
Member

wking commented Nov 29, 2018

So should the installer be doing this? Or is there an oscontainer repo hooked up to Prow that can do it? If you want the installer to do it, are you pushing these anywhere besides registry.svc.ci.openshift.org, pr do you want us pinning to that (I don't think it's retention period is very long)? @smarterclayton, can we hook the installer up to the operator image ranslation somehow? Maybe this should go in machine-config if there's no Prow-wired oscontainer repo?

@wking
Copy link
Member

wking commented Nov 29, 2018

/close

In favor of openshift/machine-config-operator#183

@openshift-ci-robot
Copy link
Contributor

@wking: Closing this issue.

In response to this:

/close

In favor of openshift/machine-config-operator#183

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 30, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants