-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/asset: Add asset for Worker machinesets #468
pkg/asset: Add asset for Worker machinesets #468
Conversation
This allows combining values for machine configuration from various sources like, 1. Defaults decided by installer 2. Platform level defaults in InstallConfig 3. Per pool options.
The workers did not come up because: Seeing this error when creating machines on aws
The machine object:
It looks like this code https://github.com/openshift/cluster-api-provider-aws/blob/master/cloud/aws/actuators/machine/actuator.go#L337-L345 is not de-deduplicating the tags. /cc @bison |
pkg/asset/machines/worker.go
Outdated
|
||
tags := map[string]string{ | ||
"tectonicClusterID": ic.ClusterID, | ||
fmt.Sprintf("kubernetes.io/cluster/%s", ic.ObjectMeta.Name): "owned", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is more correct here, the actuator would need to ensure the tag is there, you shouldn't need to set this here, and it would be ideal if the actuator caught the duplicate. I'd just drop here and file a request for cloud team to check for dupes.
c6ca08a
to
4c0a98f
Compare
opened openshift/cluster-api-provider-aws#73 to track #468 (comment) |
4c0a98f
to
16e86c9
Compare
apiVersion: cluster.k8s.io/v1alpha1 | ||
kind: MachineSet | ||
metadata: | ||
name: {{.ClusterName}}-worker-0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up the -0
suffix on these? This will be copied into generateName
by the controller, so the nodes will end up with names like openshift-worker-0-abdef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up the -0 suffix on these
We there are going to have multiple machinesets (in aws per az).
so the nodes will end up with names like openshift-worker-0-abdef
I think the node names is not dictated by the machine object name. it will be whatever kubelet decides. on aws it is usually internal name, on libvirt host name.
From the ci:
NAME STATUS ROLES AGE VERSION
ip-10-0-13-26.ec2.internal Ready master 10m v1.11.0+d4cacc0
ip-10-0-14-47.ec2.internal Ready bootstrap 10m v1.11.0+d4cacc0
ip-10-0-151-203.ec2.internal Ready worker 5m v1.11.0+d4cacc0
ip-10-0-152-41.ec2.internal Ready worker 5m v1.11.0+d4cacc0
ip-10-0-155-96.ec2.internal Ready worker 4m v1.11.0+d4cacc0
ip-10-0-16-202.ec2.internal Ready master 10m v1.11.0+d4cacc0
ip-10-0-39-160.ec2.internal Ready master 10m v1.11.0+d4cacc0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, forgot about the multiple AZ thing.
So, yeah, this is kind of confusing. The node object's name is determined by what the kubelet registers. The machine object's name is this string copied into generateName
, so basically this with some extra characters.
I think @trawler is working on the duplicate tags issue. |
From 16e86c941:
I still have a pretty fuzzy grasp of how all these operators fit together :p. We could push most of these without waiting on the cluster-API server, right? We just can't push the Also, do we want to address "worker" vs. "compute" for new-to-this-PR names? |
The api for all these objects is provided by the aggregated apiserver installed by MAO.
We still call them workers. |
pkg/asset/machines/aws/aws.go
Outdated
filters: | ||
- name: "name" | ||
values: | ||
- rhcos* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want all of this filtering, do we? I think we want to calculate the AMI ID in Go and inject it directly, like we're currently doing over here. That will also make it easier to allow callers to override the AMI ID if/when we restore support for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstallConfig
does not have AMI override. So we don't support that for workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstallConfig
does not have AMI override. So we don't support that for workers.
But we do support AMI lookup in Go. And we use that lookup to set the master/bootstrap AMI ID here. I think we want to use the same AMI for workers, which means we should pull that lookup out to a higher level so we don't have to perform it twice. InstallConfig
seems like a reasonable place to put it, but I'm ok if you want to stick it somewhere else for the time being. I don't think we need to supply a user prompt for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to use the same AMI for workers.
That's not true. New workers created by scaling the machineset should not be stuck with old amis.
Bootstrap and master is separate because we create them.
But in future we want masters to be adopted by the something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to use the same AMI for workers.
That's not true. New workers created by scaling the machineset should not be stuck with old amis.
Bootstrap and master is separate because we create them.
And with this PR, we're effectively creating the workers as well.
But in future we want masters to be adopted by the something similar
Bootstrap doesn't matter; since it's a throw-away node. I expect we'll want a machine-set for masters too (or is there a different approach to scaling masters?). And however we handle rolling updates for workers, I expect we'll want the same handling for masters. Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI? For now, I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI?
it doesn't control machine characteristics. So no.
I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.
I would still want installer to not choose the ami for worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what the machine-config operator is going to handle? Can't it update the machine-sets when it decides to bump to a new AMI?
it doesn't control machine characteristics. So no.
Who is (or will be?) in charge of rolling AMI updates out across masters/workers? Or maybe with #281 in the pipe, nobody will care about AMIs at all?
I thought we'd want to punt on all of this and just pin to the creation-time AMI for both masters and workers.
I would still want installer to not choose the ami for worker.
But you're fine having it chose the AMI for the masters? They feel like the same thing to me. Would you rather we drop the AMI lookup from Go and just perform the lookup via tags? See also openshift/os#353 about long-term issues with the tag-based approach and release channels. It's not clear to me how the approach you have here will transition to the eventual non-tag AMI lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is (or will be?) in charge of rolling AMI updates out across masters/workers?
That is TBD.
But you're fine having it chose the AMI for the masters?
I don't want to do that for masters too. but we cannot do that now as we cannot run the cluster-api stack during bootstrap, because of various reasons, to give us masters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to specific AMI to keep consistency with current MAO implementation.
pods: | ||
cidrBlocks: | ||
- {{.PodCIDR}} | ||
serviceDomain: unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had this since openshift/machine-api-operator@04253cb9, but I'm not clear on why we aren't using our base domain or some such. @enxebre, do you remember why you chose this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the cluster object fields itself is used. However it's currently coupled in the actuator interface upstream hence we need it to exist for now https://github.com/kubernetes-sigs/cluster-api/blob/master/pkg/controller/machine/actuator.go#L25
kind: LibvirtMachineProviderConfig | ||
domainMemory: 2048 | ||
domainVcpu: 2 | ||
ignKey: /var/lib/libvirt/images/worker.ign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this just worker.ign
? More on why in openshift/machine-api-operator#70, in case you want any fodder for the explanatory commit message (although we probably don't have time to explain everything in these big "drop in a bunch of stuff which was worked up somewhere else" commits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with changing this to worker.ign
.
Long term i want consistency between libvirt and aws, as AWS uses a secret to specify the useradata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking this doesn't work
The actuator doesn't support this:
locally
Coud not create libvirt machine: error creating domain: error creating libvirt domain: virError(Code=1, Domain=10, Message='internal error: process exited while connecting to monitor: 2018-10-16T17:59:15.773168Z qemu-system-x86_64: -fw_cfg name=opt/com.coreos/config,file=worker.ign: can't load worker.ign')
a4e70fb
to
b9ef603
Compare
Add assets that can create the machinesets for workers. This depends on the cluster-apiserver to be deployed by the machine-api-operator, so the tectonic.sh will block on creating these machinesets objects until then. Using specific AMI for worker machinesets to keep consistency with current MAO implementation.
b9ef603
to
e2dc955
Compare
|
||
ctx, cancel := context.WithTimeout(context.TODO(), 60*time.Second) | ||
defer cancel() | ||
ami, err := rhcos.AMI(ctx, rhcos.DefaultChannel, config.Region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the later lookup for master/bootstrap AMIs. Are we ok with the redundant request (and possible race coming up with two different AMIs), or do we want to pull this lookup back to a separate location where it can be shared by both consumers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not different than what we have today, in MAO here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when we move masters to machinesets, it will end up in same location.
I'll follow up on the /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Adds ClusterK8sIO from e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468) and Master from 586ad45 (pkg/asset: Add asset for Master machines, 2018-10-18, openshift#491). Removes KubeCoreOperator from c9b0e2f (manifests: stop using kube core operator, 2018-10-08, openshift#420). Generated with: $ openshift-install graph | dot -Tsvg >docs/design/resource_dep.svg using: $ dot -V dot - graphviz version 2.30.1 (20170916.1124)
We'd added this in 4d636d3 (asset/manifests: bootstrap manifest generation, 2018-08-31, openshift#286) to support the machine-API operator which had been generating worker machine-sets. But since e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468), we've been creating those machine-sets ourselves. And the machine-API operator dropped their consuming code in openshift/machine-api-operator@c59151f6 (delete machine/cluster object loops, 2018-10-22, openshift/machine-api-operator#286). Dropping this dependency fixes bootstrap Ignition config edits during a multi-step deploy. For example, with: $ openshift-install --dir=wking create ignition-configs $ tree wking wking ├── bootstrap.ign ├── master-0.ign └── worker.ign before this commit, any edits to bootstrap.ign were clobbered in a subsequent 'create cluster' call, because: 1. The bootstrap Ignition config depends on the manifests. 2. The manifests depended on the worker Ignition config. 3. The worker Ignition config is on disk, so it gets marked dirty.
Workers have not had public IPs since (at least) we moved them to cluster-API creation in e2dc955 (pkg/asset: add ClusterK8sIO, machines.Worker assets, 2018-10-15, openshift#468). But it turns out a number of e2e tests assume SSH access to workers (e.g. [1]), and we don't have time to fix those tests now. We'll remove this once the tests have been fixed. [1]: https://github.com/kubernetes/kubernetes/blob/v1.13.1/test/e2e/node/ssh.go#L43
/cc @wking