Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add --coreos-url to oc adm release new #21998

Closed
wants to merge 1 commit into from

Conversation

cgwalters
Copy link
Member

For RHCOS we have two things:

  • The "bootimage" (AMI, qcow2, PXE env)
  • The "oscontainer", now represented as machine-os-content in the payload

For initial OpenShift releases (e.g. of the installer) ideally
these are the same (i.e. we don't upgrade OS on boot).

This PR aims to support injecting both data into the release payload.

More information on the "bootimage" and its consumption by the
installer as well as the Machine API Operator:
openshift/installer#987

More information on machine-os-content:
openshift/machine-config-operator#183

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 8, 2019
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cgwalters
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: enj

If they are not already assigned, you can assign the PR to them by writing /assign @enj in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@smarterclayton
Copy link
Contributor

As an alternative point - can we include the AMI info as content in the machine-os-content container when we build it? If we put that data in /manifests and that's a build side thing, we don't need to change it here.

I.e. have machine-os-content build lay down:

/manifests/0000_30_machine-os-content_configmap.yaml
kind: ConfigMap
...
data:
  metadata.json: |
      contents_of

? That's slightly increase in complexity in build time of os container. Or alternatively, embedded as label metadata.

I really don't like the external calls here and in the installer in general because it means we have two types of content, not one.

As a clarifying question, when would we change metadata.json but not the content we are shipping?

@cgwalters
Copy link
Member Author

can we include the AMI info as content in the machine-os-content container when we build it?

The current RHCOS build system has some really funky dependencies between building the oscontainer and the bootimage. Because: the bootimage needs to know the oscontainer it came from - if you boot and run e.g.:

[root@osiris-master-0 ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/rhcos/maipo@sha256:2b7ff5707a67234e6d3187a6b30f5382aa74cd98327228ec6a463dcfe8e02908
              CustomOrigin: Provisioned from oscontainer
                   Version: 47.307 (2019-02-07T16:07:32Z)

That's how the MCD knows whether or not it needs to update.

Another further wrinkle here is that we want to actually test the OS before we upload an oscontainer - which requires a bootimage (VM image).

So the build system today goes:

  • RPMs ➡️ ostree
  • ostree ➡️ bootimage
  • Test bootimage (kola sanity checks; no failing systemd units etc.)
  • ostree ➡️ oscontainer
  • push oscontainer to api.ci and get final digest (because docker images, compression...)
  • "scripts/prepivot" which injects the pull spec into the bootimage
  • Upload to AWS
  • Upload final build to S3

Now...I am not sure offhand how we could easily inject the AMIs into our oscontainer.

It might be easiest to have this be a separate container?

@cgwalters
Copy link
Member Author

I really don't like the external calls here and in the installer in general because it means we have two types of content, not one.

Yep agreed - really the goal of this PR is to enable the magic to move from the installer to the release payload. If this lands and the installer implemented openshift/installer#987 then it could drop the code talking to the API - the source of truth for "what the installer will install" becomes just the release payload.

But bigger picture I don't think we're going to get away from the two fundamental types of content (bootimages/VM images, containers). A lot of work was put into the RHCOS build system to have a unified build scheme that encapsulates both - which is required if we want to have the "don't pivot on initial install" model.

In the end certainly from the user point of view we're streamlining this - they shouldn't have to care about how it's implemented. The coreos-assembler build scheme/metadata is fairly hidden here and we could definitely change how it works later.

Well...and we almost certainly will change it. It does seem likely at this point that we'll end up encapsulating a bootimage inside a container too. That isn't designed today though.

As a clarifying question, when would we change metadata.json but not the content we are shipping?

I assume here metadata.json == RHCOS build? I can't think of a case where we would change only that. But the goal here is to get the data in the release payload.

@jlebon
Copy link
Member

jlebon commented Feb 8, 2019

"scripts/prepivot" which injects the pull spec into the bootimage

It's not ideal, but if breaking the oscontainer <-> bootimage coupling makes things much easier, we could pretty easily teach pivot/MCD to also consider the OSTree checksum rather than just the pullspec. Then we can add whatever metadata we want to the oscontainer about the built images.

@cgwalters
Copy link
Member Author

cgwalters commented Feb 8, 2019

pretty easily teach pivot/MCD to also consider the OSTree checksum rather than just the pullspec.

Yeah...or another idea here is to derive a layer from the oscontainer with the build metadata. Both would end up in the release payload but the "buildmeta" layer would be tiny.

EDIT: But the thing is, doing that will require nontrivial changes in the installer and I don't want to block on that. At least, not unless the installer folks think it will be easy to switch to extracting the AMI data from the release payload.

@cgwalters
Copy link
Member Author

cgwalters commented Feb 8, 2019

As a clarifying question, when would we change metadata.json but not the content we are shipping?

Actually, a possible scenario for this is "add EC2 region to existing release". But...we can probably ignore that for now.

Anyways we have a few choices. We could skip trying to get the bootimage data into the release payload (i.e. installer code stays unchanged), and simply add a cron job that pulls the latest rhcos oscontainer and pushes it into the machine-os-content IS which would unblock me for landing work to do OS updates in openshift/machine-config-operator#363

When the installer folks go to do a release they'd need to manually backreference from the machine-os-content to get the bootimage data (which BTW we should hardcode in the installer rather than being an API fetch). We could also in parallel try to implement machine-os-bootimage-meta or so which would just have metadata for now (all we need for AWS).

However, this would mean that in CI both would continue to float independently - leading to a situation where sometimes in CI one would boot a slightly older release and we'd upgrade during a run. The reason I was going for this model is it makes crystal clear that the source of truth is the release payload (which just happens to gather bootimage data using an external API at the moment at build time).

@cgwalters
Copy link
Member Author

OK fixed various bugs; main thing is we now piggy back on o.Mappings which avoids touching a lot of code. This works for me:

$ ./_output/local/bin/linux/amd64/oc adm release new --coreos-url https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo --from-image-stream=origin-v4.0 -n openshift --to-image quay.io/cgwalters/oste
st:release
info: Found 111 images in image stream                                                                                                                                                                            
Using CoreOS build 47.310   
...
$ podman pull quay.io/cgwalters/ostest:release   # need a way to build to local image store
$ podman inspect quay.io/cgwalters/ostest:release | grep coreos | head -1
                "io.openshift.release.coreos-boot-image": "{\"amis\":[{\"hvm\":\"ami-02d655625b6c691e9\",\"name\":\"ap-northeast-1\"},{\"hvm\":\"ami-08f5ccc1a8b0f1fd4\",\"name\":\"ap-northeast-2\"},{\"hvm\":\"am
i-0147d72611333957b\",\"name\":\"ap-south-1\"},{\"hvm\":\"ami-089509f5194db0e3f\",\"name\":\"ap-southeast-1\"},{\"hvm\":\"ami-0737f6cd734a7d222\",\"name\":\"ap-southeast-2\"},{\"hvm\":\"ami-0b719049951b1fd21\",\
"name\":\"ca-central-1\"},{\"hvm\":\"ami-076f809a745f83d98\",\"name\":\"eu-central-1\"},{\"hvm\":\"ami-08d87f63698b5fd3d\",\"name\":\"eu-west-1\"},{\"hvm\":\"ami-0698a3aa0e4b52c8f\",\"name\":\"eu-west-2\"},{\"hv
m\":\"ami-04b893b85e5defb6d\",\"name\":\"eu-west-3\"},{\"hvm\":\"ami-021c3815a9866feb2\",\"name\":\"sa-east-1\"},{\"hvm\":\"ami-0fe81b1f471177095\",\"name\":\"us-east-1\"},{\"hvm\":\"ami-0013fbee46f77ca37\",\"na
me\":\"us-east-2\"},{\"hvm\":\"ami-06681b6266a5ded7c\",\"name\":\"us-west-1\"},{\"hvm\":\"ami-0e7126056bc16b40f\",\"name\":\"us-west-2\"}],\"buildid\":\"47.310\",\"images\":{\"qemu\":{\"path\":\"redhat-coreos-ma
ipo-47.310-qemu.qcow2.gz\",\"sha256\":\"0a6e060043713bd23617db3422e0863744f4163627eb8ff1237fc143d3eca6a5\"}},\"ostree-version\":\"47.310\",\"oscontainer\":{\"digest\":\"sha256:def5c8ef775f634d99e3f603da0db7591a0
113ad8307bc8d8545a170787a2825\",\"image\":\"registry.svc.ci.openshift.org/rhcos/maipo\"}}"           
$ podman run --rm -ti quay.io/cgwalters/ostest:release image machine-os-content
registry.svc.ci.openshift.org/rhcos/maipo@sha256:def5c8ef775f634d99e3f603da0db7591a0113ad8307bc8d8545a170787a2825

And notably machine-os-content is the oscontainer corresponding to 47.310.

For RHCOS we have two things:

 - The "bootimage" (AMI, qcow2, PXE env)
 - The "oscontainer", now represented as `machine-os-content` in the payload

For initial OpenShift releases (e.g. of the installer) ideally
these are the same (i.e. we don't upgrade OS on boot).

This PR aims to support injecting both data into the release payload.

More information on the "bootimage" and its consumption by the
installer as well as the Machine API Operator:
openshift/installer#987

More information on `machine-os-content`:
openshift/machine-config-operator#183
@cgwalters
Copy link
Member Author

Let me summarize. This PR is attempting to move the "RHCOS API consumption" from the installer to the release payload. It also helps ensure that the bootimage is the same as the oscontainer in the release payload which is something we want to support long term. I am not aware of a different approach to solve that.

If we want to wait on this we could just do manual updates of machine-os-content for now.

return builds.Builds[0], nil
}

// GetLatest returns the CoreOS build with target version. If version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCoreOSBuild

Name string `json:"name"`
} `json:"amis"`
BuildID string `json:"buildid"`
Images struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just drop this and include only the keys we need for now. Not that we're planning to break the schema, though there's no point in deserializing it if we're not using it.

Speaking of, I think we discussed adding schema versioning to the JSON output. Probably would be good to do this before teaching the origin about it?

fmt.Fprintf(o.Out, "Using CoreOS build %s\n", coreosBuild.BuildID)
digestedImage := fmt.Sprintf("%s@%s", coreosBuild.OSContainer.Image, coreosBuild.OSContainer.Digest)
if digestedImage == "@" {
return fmt.Errorf("No oscontainer in CoreOS build")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do an explicit e.g. if coreosBuild.OSContainer.Image == "" || coreosBuild.OSContainer.Digest == "" { ... }? (This also catches the case where just one of the two is somehow missing.)

return err
}
// This is written as a label in the final image
is.Annotations[coreOSBootImageLabel] = string(serializedBuild)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh OK, I see now why you wanted to also deserialize the qcow2s. Though... isn't there a way to have this data be a proper configmap instead of JSON-in-a-label? We were discussing this in openshift/machine-config-operator#183 right? (And including the pkglist, etc...).

Anyway, totally fine doing it this way for now too!

@cgwalters
Copy link
Member Author

Clayton and I had a quick call on this. We agreed to for now just push a :maipo tag and get wire that up in CI to get into the payload as machine-os-content.

@ashcrow
Copy link
Member

ashcrow commented Feb 12, 2019

/retest

@ashcrow
Copy link
Member

ashcrow commented Feb 12, 2019

Looks like the new switches must be added to completions for ci/prow/verify

@cgwalters
Copy link
Member Author

/hold

We are deferring this for now.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 12, 2019
@openshift-ci-robot
Copy link

@cgwalters: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/verify 38fc2fe link /test verify
ci/prow/integration 38fc2fe link /test integration
ci/prow/e2e-aws 38fc2fe link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2019
@abhinavdahiya
Copy link
Contributor

can we include the AMI info as content in the machine-os-content container when we build it?

The current RHCOS build system has some really funky dependencies between building the oscontainer and the bootimage. Because: the bootimage needs to know the oscontainer it came from - if you boot and run e.g.:

[root@osiris-master-0 ~]# rpm-ostree status
State: idle
AutomaticUpdates: disabled
Deployments:
● pivot://registry.svc.ci.openshift.org/rhcos/maipo@sha256:2b7ff5707a67234e6d3187a6b30f5382aa74cd98327228ec6a463dcfe8e02908
              CustomOrigin: Provisioned from oscontainer
                   Version: 47.307 (2019-02-07T16:07:32Z)

That's how the MCD knows whether or not it needs to update.

Another further wrinkle here is that we want to actually test the OS before we upload an oscontainer - which requires a bootimage (VM image).

So the build system today goes:

* RPMs arrow_right ostree

* ostree arrow_right bootimage

* Test bootimage (kola sanity checks; no failing systemd units etc.)

* ostree arrow_right oscontainer

* _push_ oscontainer to api.ci and get final digest (because docker images, compression...)

* "scripts/prepivot" which injects the pull spec into the bootimage

* Upload to AWS

* Upload final build to S3

Now...I am not sure offhand how we could easily inject the AMIs into our oscontainer.

It might be easiest to have this be a separate container?

what if we use machine-os-content as easy way for MC* to deal with ostree image.
and machine-os-config to store the ConfigMap that @smarterclayton suggested.

so the RHCOS build pipeline pushed 2 images and the release image creation process remains fairly same.

keeping the metadata.json in a configmap gives operators like machine-api-operator easy way to sync metadata.json contents as a consumer.

@cgwalters
Copy link
Member Author

and machine-os-config to store the ConfigMap that @smarterclayton suggested. So the RHCOS build pipeline pushed 2 images and the release image creation process remains fairly same.

Yep, I think that should be fairly straightforward. An interesting wrinkle here though is that I think while we want to support drift, we'd also like the ability to "atomically" tag both of them together into the release payload. But eh, we can just deal with that in the release controller?

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 7, 2019
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 7, 2019
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters
Copy link
Member Author

cgwalters commented Aug 29, 2019

Been thinking about this again since bootimage updates would greatly help MCO kernel args.

We really have total flexibility in how we implement this. So here's a totally different proposal:
Create a new Prow job in api.ci which polls the RHCOS build endpoint for updated bootimages (one job per branch/stream), and generates a container which has a manifest that installs a configmap that has the meta.json. This would be machine-bootimage-metadata or something.

(And from there, either the MCO could update the machinesets, or the cluster API could read it directly, whichever)

Opinions?

@cgwalters
Copy link
Member Author

Or perhaps to start, we don't poll the RHCOS build endpoint, we just inject the installer-pinned bootimage. Yeah...that way we avoid having to think about two separate bootimage versions per version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants