Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhcos: implement image discovery for new pipeline #554

Merged
merged 1 commit into from
Oct 31, 2018
Merged

Conversation

crawford
Copy link
Contributor

RHCOS has moved over to the 2.0 pipeline which has a nicer discovery
mechanism for images. This uses that new pipeline to fetch the latest
AMIs and QCOW images for the latest build. Unfortunately, there is no
"latest" alias anymore, so the installer has to fetch resources from the
Internet before it can prompt the user. This will go away once the
installer starts pinning to a specific RHCOS release.

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 26, 2018
@crawford
Copy link
Contributor Author

2018/10/26 23:43:40 error: unable to signal to artifacts container to terminate in pod release-latest, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2018/10/26 23:43:40 error: unable to retrieve artifacts from pod release-latest: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
2018/10/26 23:43:46 Ran for 1m48s
error: could not run steps: failed to wait for release pod to complete: could not wait for pod completion: pod release-latest was already deleted

I'm not sure what happened there...

/retest

@crawford
Copy link
Contributor Author

/cc @cgwalters

@@ -288,13 +289,24 @@ func (a *platform) libvirtPlatform() (*types.LibvirtPlatform, error) {
return nil, err
}

// TODO: Ideally, this would live inside of a closure which is passed to
// asset.GenerateUserProvidedAsset and only called if the environment
// variable isn't present. As this exists, it ruins the abstraction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of problem goes away with #556, where all dependencies and defaults are only rendered as needed 🎉

pkg/rhcos/ami.go Outdated
// AMI calculates a Red Hat CoreOS AMI.
// TODO: Use the new RHCOS build pipeline to discover AMIs. The pipeline isn't
// currently exposed publicly and CI isn't on the VPN, so this cannot be
// updated without breaking the tests.
Copy link
Member

@wking wking Oct 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only workable if we have the same internal and external release cadence. See discussion in #281 and openshift/os#353.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will look at making more of the pipeline public. There are some prerequisites for this.

How hard would it be to put some of this data in the pull secret though or so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to put some of this data in the pull secret though or so?

What data? A URI with an initial entry-point for this discovery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What data? A URI with an initial entry-point for this discovery?

I'd say all the release metadata - including the meta.json fetched by this PR, which includes AMI ids, along with link to qcow2 etc.

Hmm. Or maybe the pull secret grants access to a server running on api.ci or wherever that hands out the metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, one thing this circles back to is...we could stick all of the relevant metadata in a container, which the pull secret already grants access to of course.

Would having this be in a container be ergonomic for the installer? It'd depend on oc adm release extract?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all sounds far too complex for what we are after. A few notes on the discussion:

  • We shouldn't depend on the URL remaining unknown in order to protect the RHCOS images. Hiding this in a container behind a pull-secret doesn't buy much.
  • The RHCOS image reference should come from the CVO manifest payload (just like all of the other resources). We need to wait until MAO can create masters before we can make this jump since I don't want the installer parsing through the CVO manifest payload.

A lot of this is temporary scaffolding, so I'd rather we focus the discussion on the longer-term vision rather than this code which we know we are going to throw away in a few weeks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RHCOS image reference should come from the CVO manifest payload (just like all of the other resources). We need to wait until MAO can create masters before we can make this jump since I don't want the installer parsing through the CVO manifest payload.

What approach would you use for the always-pre-MAO bootstrap (which needs an AMI too ;)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't depend on the URL remaining unknown in order to protect the RHCOS images. Hiding this in a container behind a pull-secret doesn't buy much.

True long term.

The RHCOS image reference should come from the CVO manifest payload

Agree. What's fuzzy for me is the dependencies required for that. It also forces some deep issues like...when we branch OKD, are we going to try to maintain a separate OS stream for that? Or do we just keep the same OS and ship updated kubelets too? Or a mix where we ship the same "base OS", and just pull different kubelets? We've talked about this in the past, and what hurts my head is that the complexity jumps up when one notes we have two major version streams for RHCOS today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you guys confirm, CVO manifest payload = origin-v4.0:release image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed.

@wking
Copy link
Member

wking commented Oct 27, 2018

I think the new pipeline also lets us replace this hack with Content-Encoding (discussed in 7abe94d, #280).

pkg/rhcos/qemu.go Outdated Show resolved Hide resolved
pkg/rhcos/builds.go Outdated Show resolved Hide resolved
@ashcrow
Copy link
Member

ashcrow commented Oct 30, 2018

Needs a rebase

RHCOS has moved over to the 2.0 pipeline which has a nicer discovery
mechanism for images. This uses that new pipeline to fetch the latest
QCOW images for the latest build. Unfortunately, there is no "latest"
alias anymore, so the installer has to fetch resources from the Internet
before it can prompt the user. This will go away once the installer
starts pinning to a specific RHCOS release.
if nextImage.ImageId == nil || nextImage.CreationDate == nil {
continue
for _, ami := range meta.AMIs {
if ami.Name == region {
Copy link
Member

@wking wking Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to land this before the AMIs are being pushed to other regions?

$ BUILD=$(curl -sL https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/builds.json | jq -r '.builds[0]')
$ echo "${BUILD}"
47.38
$ curl -sL "https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/${BUILD}/meta.json" | jq .amis
[
  {
    "name": "us-east-1",
    "hvm": "ami-03afbcc8b72560ac8"
  }
]

I'd rather stick to scraping AWS until we get that, because scraping AWS has a chance at working in other regions if folks can copy the AMI over to the target region. That would allow us to use the other regions we're trying to use for openshift-dev.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an environment variable for the AMI. We already have the override for the QEMU image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add an environment variable for the AMI. We already have the override for the QEMU image.

I'm ok with that, although see this comment about not having AMI/image knobs and assuming that there's a default AMI/image that is (1) available and (2) healthy enough to pivot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're close to landing the multi-region again although this is going to slow our pipeline notably. There are two things we need to do; first on the RHCOS side, implement tags, and only upload to all the regions for tags (e.g. once a day at most?).

Second though we need to do openshift/os#289 or possibly teach the installer to take a set of override RPMs and apply them via Ignition? Then I think we can slow the cadence of "promoted" builds, and anyone who wants to test bleeding edge OS work can use libvirt.

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit b4fe4d2 into openshift:master Oct 31, 2018
@crawford crawford deleted the os branch October 31, 2018 21:49
@wking
Copy link
Member

wking commented Oct 31, 2018

Do we want to land an environment variable override? Or are we ok re-breaking @ironcladlou and others who work in other regions until we get the multi-region pipeline going?

@cgwalters
Copy link
Member

An env override makes sense to me, but it'd be even more powerful to have an override for the URL; then one could use it to override every format. The only slight downside to this would be that one would have to hand-craft a meta.json but that's not hard.

wking added a commit to wking/openshift-installer that referenced this pull request Nov 7, 2018
Keep the environment variable (with a warning about using it), but
drop the interactive prompt.  The default is solid, and users
manipulating it are more likely to break something (e.g. by continuing
to use the old v1 pipeline), while the installer can update its
default to track the RHCOS folks (e.g. like 1117821, rhcos: implement
image discovery for new pipeline, 2018-10-26, openshift#554).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants