-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement proposal for Equinix Metal IPI #436
Conversation
/assign @mfojtik |
/cc @crawford |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for writing up an enhancement!
I've left a number of comments. Feel free to take some of them and reflect them in an "open questions" section. We don't need to block merging the provisional enhancement on answering every question.
enhancements/installer/packet-ipi.md
Outdated
- <https://www.packet.com/developers/changelog/custom-image-handling/> | ||
- Packet does not offer a managed DNS solution. Route53, CloudFlare, or RFC 2136 | ||
(DNS Update), and NS record based zone forwarding are considerations. Existing | ||
IPI drivers like BareMetal have similar limitations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For baremetal
IPI, we do some automation of required DNS records for use within the cluster itself. I think that should be reusable here if multicast is allowed on packet's network. Has this been considered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multicast is not fully supported, it may work enough for our needs. Are you suggesting mDNS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Take a look at the "internal DNS" section of https://github.com/openshift/enhancements/blob/master/enhancements/network/baremetal-networking.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mdns is used internally for IPI. Both for nodes resolution and for pointing to the api-int entry.
It has nothing to do with external consumption.
In assisted-installer (where we also rely on BM IPI ignitions) we do provide optional route53 based dns to support external api + *.apps resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I've been looking for something like https://github.com/openshift/mdns-publisher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the relevant repo that provides networking related manifests as part of IPI BareMetal platfrom - you'll find there mdns, keepalived and coredns.
IPI drivers like BareMetal have similar limitations. | ||
- LoadBalancing can be achieved with the Packet CCM, which creates and maintains | ||
a MetalLB deployment within the cluster. | ||
- Both Public and Private IPv4 and IPv6 addresses made available for each node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss how network configuration is to be handled. IIRC, the IPv4 address is handed out via DHCP, so that should work fine without any further changes. There is no DHCPv6 for IPv6, so OpenShift configured for IPv6 will not work without some additional changes. I believe we need the hosts to pull their networking config from a packet API so they can self-configure the IPv6 address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packet devices do not use DHCP. Device addresses are configured on the filesystem early in the boot phase.
We can look at how this is implemented to some degree with the opensource Tinkerbell project:
- https://github.com/tinkerbell/osie/blob/master/ci/network-test-files/want/centos-x1.small.x86-supermicro/etc/sysconfig/network-scripts/ifcfg-bond0https://github.com/tinkerbell/osie/blob/96d01cf72d64dc0b0d32e74631a467af64d4c4ad/docker/scripts/osie.sh#L551
- https://github.com/packethost/packet-networking/tree/master/packetnetworking/distros/redhat
- https://github.com/packethost/packet-images
(The images are based on and distributed as Docker images. A different approach is taken for images that require raw partitiion/filesystem layouts, as RHCOS does. This will require some RedHat <-> Packet coordination.)
All of this is to say, that the Packet approach to providing images and preparing those images on boot-up could be explored.
believe we need the hosts to pull their networking config from a packet API so they can self-configure the IPv6 address?
There is also a metadata service available. https://www.packet.com/developers/docs/servers/key-features/metadata/
OSIE configures Cloudinit to use EC2 Data sources with the metadata.packet.net/metadata URL. Would it benefit this effort to seek Packet addition to https://github.com/canonical/cloud-init/tree/master/cloudinit/sources, or is there a different project where this would need to be added (Ignition)?
It may also be possible to boot a raw unmodified RHCOS image, but I'm not sure about the process. If this direction were pursued, I assume we would need a Packet presence in the version of CloudInit/Ignition shipped by RHCOS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarifications!
cloud-init is not used here. Ignition is used, instead.
I do think we should aim for booting an unmodified RHCOS image and allowing it to get initialized at boot time. By default we'll be assuming DHCP is available. If not, it does sound like we'll need to provide some additional ignition configuration to configure networking on first boot.
You may find this enhancement interesting, which describes the levers we have available for configuring networking on the host: #394
At some point I'd like to see a section in this document that describes how we'll handle host network configuration for this platform. While the enhancement is provisional
, it can be captured as an open question while we work through the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, how does host network configuration work for the UPI installation method already implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bastion node is created to serve the RHCOS image and act as a DHCP server for other nodes that will be added.
The control plane nodes added after the bastion node get DHCP/ipxe instructions from the bastion node. Their DHCP boot info includes ipxe instructions to boot from the bastion node's hosted copy of the RHCOS image.
That RHCOS image is pulled at provision time from a static address:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certain kargs are automatically whitelisted. console
isn't one of them, so you'd need to use --append-karg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgilbert is append-karg
available in the kernel command line?
In this openshift-installer I think it is fair to have a very opinionated (generated) Ignition file.
For my local testing, and in the general use-cases of launching RHCOS (and related OSes, say terraform or packet-cli), it would be very helpful (a requirement?) to not need to merge values into the data stashed at the Ignition URL. I suspect the format of this content could be unpredictable (json vs yaml, other?) (I'm also thinking of cloud-init, where the value could be a shell script or a cloud-config, a gzipped multi-part mime document, or other).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is append-karg available in the kernel command line?
Nope.
On 4.6, with Ignition spec 3, you might take a look at FCCT, particularly storage.trees
. You can create an FCC that points to a local directory tree, then run FCCT against it to produce an Ignition config which embeds 1) the contents of that tree, and 2) the Ignition config generated by openshift-installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgilbert is FCCT the preferred (or only) method to append kargs? MachineConfigs looked like a possible alternative, but I wasn't certain if the kargs whitelist applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liveaverage FCCT can't append kargs right now; that paragraph was replying to the last paragraph of #436 (comment). The mechanism you linked is separate from the coreos-installer one, and the whitelist doesn't apply there. AIUI the MachineConfig karg mechanism doesn't apply to the very first boot.
enhancements/installer/packet-ipi.md
Outdated
|
||
### Risks and Mitigations | ||
|
||
The official Packet ClusterAPI, CCM, and CSI drivers are all relatively new and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to each of these components would be helpful.
It would also be nice to expand on the integration plans for each of these items.
ClusterAPI -- This is alluded to in the task list earlier in the doc, but I expect we'd have to fork the CAPI repo into OpenShift and then adapt it to the OpenShift machine API. Changes will be needed to the openshift/machine-api-operator to understand how to run this new machine controller.
CCM -- This is going to hit a limitation we have in OpenShift. We don't yet have integration for running the external cloud controller managers. This enhancement is related: #423 That enhancement discusses using the out-of-tree OpenStack CCM. As a prerequisite, we need a new operator to manage it. That operator can be re-used for other out-of-tree CCMs, including this one.
CSI -- I don't know how CSI drivers are integrated. Has this been investigated yet? Are there notes we can include on the integration process and any expected challenges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to each of these components would be helpful.
- https://github.com/kubernetes-sigs/cluster-api-provider-packet
- https://github.com/packethost/packet-ccm
- https://github.com/packethost/csi-packet
It would also be nice to expand on the integration plans for each of these items.
Will do. I'm not sure how far out we'll need to plan for these integrations. I suspect, based on your earlier comments, that I may need another enhancement request for each of the components unless we can take advantage of existing features (especially for the CSI and CCM).
I expect we'd have to fork the CAPI repo into OpenShift and then adapt it to the OpenShift machine API.
That sounds reasonable. We could try to host it at the existing repository, but a v1beta1
tag will conflict with an eventual kubernetes-sigs spec and the likely capi-packet release tag.
This is the v1alpha1 based MachineProvider spec:
https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/v0.1.0/pkg/apis/packetprovider/v1alpha1/packetmachineproviderspec_types.go
I imagine the OpenShift v1beta1 capi-packet will build on this, the existing v1alpha3 enhancements, and changes necessary to adopt O/S capi v1beta1.
Changes will be needed to the openshift/machine-api-operator to understand how to run this new machine controller.
I started a branch for this, https://github.com/openshift/machine-api-operator/compare/master...displague:packet?expand=1. We'll need some coordination to get those quay image tags into existence.
We don't yet have integration for running the external cloud controller managers.
The value of the Packet CCM is in tracking the Packet device ids so the state of k8s nodes can reflect the state of Packet devices (deleted device is a removed node, for example). The other value is in configuring the MetalLB based loadbalancer. If there an alternative (in-tree) baremetal CCM available that could provide LoadBalancer service with the limitations that come from not being Packet API aware?
I don't know how CSI drivers are integrated. Has this been investigated yet?
Is that to say only flex volumes are supported?
What Openshift Baremetal storage solutions are available that don't rely on cloud managed storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no other bare-metal CCM.
I'm OK if this enhancement doesn't answer all of the integration questions around these components, but I'd like them to be captured as open questions either to be addressed by follow-up PRs to this doc, or future separate enhancements if the topic is deep enough to deserve its own doc.
IPI drivers like BareMetal have similar limitations. | ||
- LoadBalancing can be achieved with the Packet CCM, which creates and maintains | ||
a MetalLB deployment within the cluster. | ||
- Both Public and Private IPv4 and IPv6 addresses made available for each node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the old CoreOS Container Linux images for Packet, Afterburn (then called coreos-metadata) ran in the initramfs, queried the Packet metadata service for IPv6 and private IPv4 addresses and NIC bonding configuration, and wrote out configuration files for networkd. That code still exists in RHCOS but would need to be updated for NetworkManager: coreos/fedora-coreos-tracker#111
enhancements/installer/packet-ipi.md
Outdated
### Implementation Details/Notes/Constraints | ||
|
||
- RHCOS must be made available and maintained. | ||
Images available for provisioning may be custom or official. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packet's official image and custom image mechanisms are both based on filesystem tarballs rather than disk images, and RHCOS is delivered as a disk image.
If we want to get an official RHCOS image into Packet, we should work with the Packet folks to set up a custom deployment flow using coreos-installer. CoreOS Container Linux used a similar arrangement.
Otherwise, we can use Packet's custom iPXE support to implement the same thing ourselves. We'd just boot the RHCOS live PXE image and perform an ordinary bare-metal install, telling the installer to override the platform ID to packet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've captured most of this in the latest push. Thanks for the input and context, @bgilbert!
Custom Images can be hosted on git, as described here: | ||
- <https://www.packet.com/developers/docs/servers/operating-systems/custom-images/> | ||
- <https://www.packet.com/developers/changelog/custom-image-handling/> | ||
- It may be possible to boot using the [Packet Custom iPXE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be possible
kola, the CoreOS integration testing tool, already has working code to iPXE boot the live image on Packet, perform an install to disk, and boot into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/coreos/mantle#kola - very interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the old Container Linux one. https://github.com/coreos/coreos-assembler/tree/master/mantle#kola now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems ore
also has Packet awareness. I don't know if that is at all relevant.
https://github.com/coreos/coreos-assembler/tree/master/mantle#ore
([mirror](https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.5/?extIdCarryOver=true&sc_cid=701f2000001OH7iAAG)). | ||
|
||
A generic RHCOS Ignition URL could instruct nodes booted on Packet to | ||
configure their network using the EC2-like [metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignition wouldn't be involved in that process. Afterburn would automatically run in the initramfs (ConditionKernelCommandLine=ignition.platform.id=packet
) and configure the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liveaverage points out that I should include a note about the size constraints of userdata over Packet metadata. I'll dig up the limit.
Can Ignition work with base64 encoded, gzipped, multipart mime (like cloud-init can)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the userdata has to be an Ignition config without any special encoding. However, Ignition configs have a mechanism (ignition.config.merge
) for merging an external Ignition config into the one fetched from metadata. If necessary, the one in userdata can bootstrap the real config.
enhancements/installer/packet-ipi.md
Outdated
- Can the sig-cluster-lifecycle ClusterAPI and OpenShift MachineAPI converge to | ||
prevent maintaining two similar but different ClusterAPI implementations? | ||
- How will the RHCOS image be served? | ||
- How will Ignition be started and configure nodes appropriately? (restating - DHCP is not available but a metadata service is) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no bootstrapping problem here. The system comes up, DHCPs for its public IPv4 address, Afterburn configures other addresses by querying the metadata service, then Ignition runs.
- Device plan (required, defaulting to minimum required) | ||
- Facility (required) | ||
- Virtual Network (optional, defaulted to a new virtual network) | ||
- Control plane variables rely on boostrap variables and add: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "rely on bootstrap variables" mean that it will inherit the same settings, enforcing common settings for the whole cluster? Or is this a shorthand way of saying "all of that stuff plus some more" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking inheritance but I left my wording open in case that is not an option or not the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for working on this documentation. I left a few more comments. Once we feel comfortable that we have at least documented all of our known open questions, I am OK merging this and iterating on it with follow-up PRs.
enhancements/installer/packet-ipi.md
Outdated
- Virtual Network (optional, defaulted to a new virtual network) | ||
- Control plane variables rely on boostrap variables and add: | ||
- Device plan (defaulting to provide a good experience) | ||
- Additional clusters created through the ClusterAPI will require fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenShift does not support creating clusters through ClusterAPI. It only uses a subset (Machine, MachineSet) to abstract machine provisioning within a cluster, typically to scale out and add worker nodes.
enhancements/installer/packet-ipi.md
Outdated
|
||
- Is the Packet CCM necessary? | ||
- If so, who is responsible for supporting the MetalLB installation the Packet | ||
CCM provides and requires for LoadBalancer definitions? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fair to say that the use of MetalLB with OpenShift on Packet would follow whatever more general MetalLB deployment management we come up with. See #356 or any follow-ups to that same enhancement.
What may still make sense is Packet specific code to automatically configure it with the BGP settings that are appropriate for Packet. Automating that would be a cool feature in Packet IPI support, but is blocked on having MetalLB supported in the first place.
At this point I'm OK merging this and just continuing to follow up with additional PRs focused on different open questions. I'll hold off since @JoelSpeed just commented to let those questions get resolved first |
Exploring initial aspirations and decision points for a Packet (Equinix) bare metal IPI provider. Signed-off-by: Marques Johansson <[email protected]>
Happy to merge now if you are @russellb |
/approve Let's please follow up on this with additional PRs as appropriate. Thanks again for starting this document! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: displague, russellb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This initial enhancement proposal recommends the creation of a Packet provider in the IPI installer.
As with any IPI installer, this requires several pull requests throughout the OpenShift ecosystem, documentation, testing, QA, and other resources.
Unique to the Packet IPI, this enhancement points out some of the features and limitations that will need to be expressed in the solutions presented in subsequent openshift/enhancements and openshift/installer project pull requests.
Work has begun for this IPI installer in openshift/installer#3914 (very much a work in progress).