Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Create ControlPlaneMachineSet CRDs #6172

Merged

Conversation

rna-afk
Copy link
Contributor

@rna-afk rna-afk commented Jul 26, 2022

Creating an extra CRD asset for the ControlPlaneMachineSet which
is required for the machine api operator for more control over
the control-plane nodes that come up on AWS.

@openshift-ci openshift-ci bot requested review from jstuever and r4f4 July 26, 2022 19:19
@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from 1763221 to f458ad9 Compare July 27, 2022 12:35
@patrickdillon
Copy link
Contributor

/cc @JoelSpeed

@openshift-ci openshift-ci bot requested a review from JoelSpeed July 27, 2022 23:32
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ControlPlaneMachineSet is different to the MachineSet. There should be a single ControlPlaneMachineSet, which has to be called cluster in the openshift-machine-api namespace.

Then the FailureDomains should contain information for all of the zones, not just a single zone. It is up to the ControlPlaneMachineSet operator to handle spreading the master machines across the failure domains

pkg/asset/machines/aws/machines.go Outdated Show resolved Hide resolved
pkg/asset/machines/aws/machines.go Outdated Show resolved Hide resolved
pkg/asset/machines/aws/machines.go Outdated Show resolved Hide resolved
pkg/asset/machines/aws/machines.go Outdated Show resolved Hide resolved
pkg/asset/machines/aws/machines.go Outdated Show resolved Hide resolved
pkg/asset/machines/master.go Outdated Show resolved Hide resolved
@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch 2 times, most recently from b627699 to 052f591 Compare July 28, 2022 14:03
@rna-afk rna-afk requested a review from JoelSpeed July 28, 2022 17:06
@JoelSpeed
Copy link
Contributor

Looks good, can you please attach as a code block an example of a generated ControlPlaneMachineSet based on this PR? Then we can run that through our validation webhooks and double check that it is valid, looks like it should be good though

@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from 052f591 to bc33dc4 Compare August 1, 2022 13:36
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 1, 2022

Fixed some issues and generated the CRD.

apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: null
  labels:
    machine.openshift.io/cluster-api-cluster: anarayan-h5gwf
  name: cluster
  namespace: openshift-machine-api
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: anarayan-h5gwf
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  strategy: {}
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        aws:
        - placement:
            availabilityZone: us-east-1a
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1a
            id: ""
            type: ID
        - placement:
            availabilityZone: us-east-1b
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1b
            id: ""
            type: ID
        - placement:
            availabilityZone: us-east-1c
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1c
            id: ""
            type: ID
        - placement:
            availabilityZone: us-east-1d
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1d
            id: ""
            type: ID
        - placement:
            availabilityZone: us-east-1f
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1f
            id: ""
            type: ID
        platform: AWS
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: anarayan-h5gwf
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            ami:
              id: ami-0b9c602d07818c309
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:
                encrypted: true
                iops: 0
                kmsKey:
                  arn: ""
                volumeSize: 120
                volumeType: gp3
            credentialsSecret:
              name: aws-cloud-credentials
            deviceIndex: 0
            iamInstanceProfile:
              id: anarayan-h5gwf-master-profile
            instanceType: m6i.xlarge
            kind: AWSMachineProviderConfig
            loadBalancers:
            - name: anarayan-h5gwf-int
              type: network
            - name: anarayan-h5gwf-ext
              type: network
            metadata:
              creationTimestamp: null
            metadataServiceOptions: {}
            placement:
              region: us-east-1
            securityGroups:
            - filters:
              - name: tag:Name
                values:
                - anarayan-h5gwf-master-sg
            subnet: {}
            tags:
            - name: kubernetes.io/cluster/anarayan-h5gwf
              value: owned
            userDataSecret:
              name: master-user-data
status: {}

@JoelSpeed
Copy link
Contributor

- placement:
            availabilityZone: us-east-1a
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-h5gwf-private-us-east-1a
            id: ""
            type: ID

This is invalid, you can't have filters as a non nil entry when type is ID, you need to override the value to Filters for type if you are using the type, you must also nil the id field

@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from bc33dc4 to 812a25b Compare August 1, 2022 13:47
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 1, 2022

Fixed it. PTAL new CRD.

apiVersion: machine.openshift.io/v1
kind: ControlPlaneMachineSet
metadata:
  creationTimestamp: null
  labels:
    machine.openshift.io/cluster-api-cluster: anarayan-fkc85
  name: cluster
  namespace: openshift-machine-api
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: anarayan-fkc85
      machine.openshift.io/cluster-api-machine-role: master
      machine.openshift.io/cluster-api-machine-type: master
  strategy: {}
  template:
    machineType: machines_v1beta1_machine_openshift_io
    machines_v1beta1_machine_openshift_io:
      failureDomains:
        aws:
        - placement:
            availabilityZone: us-east-1a
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-fkc85-private-us-east-1a
            type: Filters
        - placement:
            availabilityZone: us-east-1b
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-fkc85-private-us-east-1b
            type: Filters
        - placement:
            availabilityZone: us-east-1c
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-fkc85-private-us-east-1c
            type: Filters
        - placement:
            availabilityZone: us-east-1d
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-fkc85-private-us-east-1d
            type: Filters
        - placement:
            availabilityZone: us-east-1f
          subnet:
            filters:
            - name: tag:Name
              values:
              - anarayan-fkc85-private-us-east-1f
            type: Filters
        platform: AWS
      metadata:
        labels:
          machine.openshift.io/cluster-api-cluster: anarayan-fkc85
          machine.openshift.io/cluster-api-machine-role: master
          machine.openshift.io/cluster-api-machine-type: master
      spec:
        lifecycleHooks: {}
        metadata: {}
        providerSpec:
          value:
            ami:
              id: ami-0b9c602d07818c309
            apiVersion: machine.openshift.io/v1beta1
            blockDevices:
            - ebs:
                encrypted: true
                iops: 0
                kmsKey:
                  arn: ""
                volumeSize: 120
                volumeType: gp3
            credentialsSecret:
              name: aws-cloud-credentials
            deviceIndex: 0
            iamInstanceProfile:
              id: anarayan-fkc85-master-profile
            instanceType: m6i.xlarge
            kind: AWSMachineProviderConfig
            loadBalancers:
            - name: anarayan-fkc85-int
              type: network
            - name: anarayan-fkc85-ext
              type: network
            metadata:
              creationTimestamp: null
            metadataServiceOptions: {}
            placement:
              region: us-east-1
            securityGroups:
            - filters:
              - name: tag:Name
                values:
                - anarayan-fkc85-master-sg
            subnet: {}
            tags:
            - name: kubernetes.io/cluster/anarayan-fkc85
              value: owned
            userDataSecret:
              name: master-user-data
status: {}

@JoelSpeed
Copy link
Contributor

I think this looks ok but I'd like to quickly run this through our webhook validations to double check it passes, I'll have to create a cluster to do that so it may take some time

@jstuever
Copy link
Contributor

jstuever commented Aug 1, 2022

/retest

@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from 812a25b to d4bd08b Compare August 2, 2022 13:05
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 2, 2022

/retest

@JoelSpeed
Copy link
Contributor

So I gave this a test and hit highlighted an issue with one of the assumptions we made while developing the CPMS. This will work for any region with 3 failure domains but if there are more this won't work currently. I've created openshift/cluster-control-plane-machine-set-operator#73 to fix this, so we can merge this once this new PR merges

@rna-afk rna-afk requested a review from JoelSpeed August 8, 2022 17:37
@JoelSpeed
Copy link
Contributor

I'd like to hold off merging this until openshift/cluster-control-plane-machine-set-operator#80 merges, as we will then have the operator deployed in CI and be able to verify that this change doesn't break installations.

If we merge the other way round we may need to patch things up here later

@JoelSpeed
Copy link
Contributor

/test e2e-aws
/test e2e-aws-imdsv2
/test e2e-aws-proxy
/test e2e-aws-single-node

I think everything I wanted to see merged before we merge this is done now so let's try a selection of AWS presubmits and see how badly they break 😉

@JoelSpeed
Copy link
Contributor

@rna-afk I don't really understand how but all the AWS installs seem to be broken with some terraform issue right now, is that a result of this PR do you think?

@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from d4bd08b to 4282e74 Compare August 10, 2022 14:28
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 10, 2022

/retest

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 10, 2022

/test e2e-aws
/test e2e-aws-imdsv2
/test e2e-aws-proxy
/test e2e-aws-single-node

@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from 4282e74 to de96ff4 Compare August 11, 2022 16:23
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 11, 2022

Should work now. Was just a rogue pointer reference that I made.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2022
@JoelSpeed
Copy link
Contributor

/test e2e-aws
/test e2e-aws-proxy
/test e2e-aws-imdsv2

@JoelSpeed
Copy link
Contributor

Something went wrong with the CI system there
/test e2e-aws
/test e2e-aws-proxy
/test e2e-aws-imdsv2

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 24, 2022

/test yaml-lint

@patrickdillon
Copy link
Contributor

/approve
/skip

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2022
Creating an extra CRD asset for the ControlPlaneMachineSet which
is required for the machine api operator for more control over
the control-plane nodes that come up on AWS.
@rna-afk rna-afk force-pushed the aws_control_plane_nodes branch from 14de63d to 779e0ed Compare August 25, 2022 12:51
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2022
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 25, 2022
@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 25, 2022

/test gofmt
/test images
/test shellcheck
/test verify-vendor
/test tf-fmt
/test tf-lint
/test yaml-lint

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 25, 2022

/test shellcheck
/test tf-fmt
/test yaml-lint

Copy link
Contributor

@r4f4 r4f4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2022
@r4f4 r4f4 removed their assignment Aug 25, 2022
@r4f4
Copy link
Contributor

r4f4 commented Aug 25, 2022

/test okd-images

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 26, 2022

/test aro-unit
/test e2e-azure
/test e2e-gcp

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 26, 2022

/skip

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 26, 2022

/test e2e-azure
/test e2e-gcp

@rna-afk
Copy link
Contributor Author

rna-afk commented Aug 29, 2022

/test okd-unit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 29, 2022

@rna-afk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azurestack 14de63d815f2b9fb0d1dbf94f0e262f810daaeab link false /test e2e-azurestack
ci/prow/okd-e2e-aws-ovn 779e0ed link false /test okd-e2e-aws-ovn
ci/prow/e2e-metal-ipi 779e0ed link false /test e2e-metal-ipi

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 2 against base HEAD 0792ab3 and 8 for PR HEAD 779e0ed in total

@JoelSpeed
Copy link
Contributor

Everything is in payload now so now reason from my side to hold this up anymore, let's try get this merged asap

@rna-afk
Copy link
Contributor Author

rna-afk commented Sep 1, 2022

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants