Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster controller and types for v1alpha2 #1177

Merged
merged 2 commits into from
Jul 25, 2019

Conversation

vincepri
Copy link
Member

@vincepri vincepri commented Jul 19, 2019

Signed-off-by: Vince Prignano [email protected]

What this PR does / why we need it:
This PR is an implementation following the proposal outlined in #1137

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #833

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 19, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 19, 2019
@k8s-ci-robot k8s-ci-robot requested review from justinsb and ncdc July 19, 2019 17:00
@vincepri vincepri force-pushed the cluster-v1a2 branch 2 times, most recently from b7cbb5d to 10ed36f Compare July 19, 2019 17:25
// or if any resources linked to this cluster aren't yet deleted.
func (r *ReconcileCluster) isDeleteReady(ctx context.Context, cluster *v1alpha2.Cluster) error {
// TODO(vincepri): List and delete MachineDeployments, MachineSets, Machines, and
// block deletion until they're all deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably do not need to actively delete (just yet) and can likely just start with blocking deletion while any Machine* objects linked to this Cluster exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a user need to delete these before deleting the Cluster?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but technically that is no different than the state of the world today 😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@detiber we'll need to forward-port #1180

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 19, 2019
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 19, 2019
@vincepri vincepri changed the title WIP: Cluster controller and types for v1alpha2 Cluster controller and types for v1alpha2 Jul 19, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 19, 2019
pkg/apis/cluster/v1alpha2/cluster_types.go Outdated Show resolved Hide resolved
// field, akin to component config.
// +optional
Value *runtime.RawExtension `json:"value,omitempty"`
InfrastructureRef *corev1.ObjectReference `json:"infrastructureRef,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this field is finally considered optional and not mandatory and set at creation time as it was discussed in the data model change proposal. I personally like this more.

The collaboration model between cluster controller and infrastructure controller was simplified and adapted to this assumption.

The initial proposal, however, considered it optional and possibly set after the cluster object was created.

For the sake of having a documentation for the infrastructure provider implementers, should we amend the proposal to reflect we are considering this field optional and how clusterphase Pending consider this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cluster is going to be in Pending phase if the infrastructure reference doesn't exist. I think that's fine for providers/users that don't want to use the Cluster for Infrastructure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the reasons I was thinking we didn't strictly need a cluster-wide phase at this time, given that we only have 1 boolean (infra ready, or not).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way, if we want to remove it we can do it later on or I can make the changes in this PR if needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to keep it for consistency and it gives end users a better experience.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in the data model proposal, I don't have any strong opinion about either option, even when I see value in having a consistent approach across controllers (machine, cluster). If we are keeping it, I would like to update the data model accordingly.

pkg/apis/cluster/v1alpha2/cluster_types.go Outdated Show resolved Hide resolved
pkg/controller/cluster/cluster_controller.go Outdated Show resolved Hide resolved
pkg/controller/cluster/cluster_controller_phases.go Outdated Show resolved Hide resolved
pkg/controller/cluster/cluster_controller_phases.go Outdated Show resolved Hide resolved
@pablochacin
Copy link
Contributor

pablochacin commented Jul 20, 2019

@vincepri as we are removing the actuator interface, we should also remove the test actuator and move this logic to create a test infrastructure controller which sets the infrastructure status. Test must be changed accordingly (I think we don't need events anymore) otherwise integration tests won't pass.

Edit: I volunteer for helping with tests, if needed

@pablochacin
Copy link
Contributor

@vincepri
Copy link
Member Author

vincepri commented Jul 22, 2019

/test pull-cluster-api-integration

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 22, 2019
@vincepri
Copy link
Member Author

This should be ready for an in-depth review and serve as a starting point @detiber @pablochacin

@vincepri
Copy link
Member Author

/assign @ncdc

@vincepri vincepri force-pushed the cluster-v1a2 branch 2 times, most recently from bdde5f6 to 7180218 Compare July 23, 2019 16:39
pkg/apis/cluster/common/pointers.go Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha2/cluster_types.go Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha2/cluster_types.go Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha2/cluster_types.go Show resolved Hide resolved
pkg/controller/cluster/cluster_controller.go Outdated Show resolved Hide resolved
return err
}

if cluster.Status.InfrastructureReady || !infraConfig.GetDeletionTimestamp().IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we split this and move the infra ready check up above the call to r.reconcileExternal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, if we return early above then we don't reconcile the errors. I think the Machine reconcileBootstrap method needs to be fixed as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What flow are you thinking about? Cluster infraReady=true, but something possibly has changed to cause reconcileExternal to fail?

pkg/controller/cluster/cluster_controller_phases.go Outdated Show resolved Hide resolved
pkg/apis/cluster/v1alpha2/cluster_phase_types.go Outdated Show resolved Hide resolved

if cluster.Spec.InfrastructureRef != nil {
_, err := external.Get(r.Client, cluster.Spec.InfrastructureRef, cluster.Namespace)
if err != nil && !apierrors.IsNotFound(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted, this might look a little cleaner:

switch {
case apierrors.IsNotFound(err):
  // This is what we want - no need to requeue
case err != nil:
  return errors.Wrapf(...)
default: // or case err == nil
  return requeue
}

gvk := cluster.GroupVersionKind()
if err := r.Client.Patch(ctx, cluster, patchCluster); err != nil {
klog.Errorf("Error Patching Cluster %q in namespace %q: %v", cluster.Name, cluster.Namespace, err)
if reterr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think reterr can ever be non-nil here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is flipped, should have been == nil
proceeds to drink more coffee

pkg/controller/cluster/cluster_controller.go Show resolved Hide resolved
cluster.SetGroupVersionKind(gvk)
if err := r.Client.Status().Patch(ctx, cluster, patchCluster); err != nil {
klog.Errorf("Error Patching Cluster status %q in namespace %q: %v", cluster.Name, cluster.Namespace, err)
if reterr != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return early if the main patch failed, then if we made it this far, reterr should still be nil, right?

return err
}

if cluster.Status.InfrastructureReady || !infraConfig.GetDeletionTimestamp().IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What flow are you thinking about? Cluster infraReady=true, but something possibly has changed to cause reconcileExternal to fail?

@@ -106,29 +90,52 @@ type NetworkRanges struct {
type ClusterStatus struct {
// APIEndpoint represents the endpoint to communicate with the IP.
// +optional
APIEndpoints []APIEndpoint `json:"apiEndpoints,omitempty"`
APIEndpoint APIEndpoint `json:"apiEndpoint,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move from multiple to single endpoints?

For providers that don't have native load-balancers this would be the ideal place to retrieve a list of endpoints to load balance across (via client-side load balancing for example).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been discussed in #1137 and the ultimate goal was to reduce friction.

/cc @detiber

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a change in the data model was in scope for #1137 (outside of externalizing it), nor was it highlighted in the PR summary or proposal body - This should have been a separate PR and elicited more discussion and/or highlighted as a change in the proposal body.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open an issue to follow-up? Or PR to the current proposal? The change is small enough to be a follow-up to this PR

@vincepri vincepri force-pushed the cluster-v1a2 branch 5 times, most recently from e8948d5 to 4af32da Compare July 24, 2019 18:52
@k8s-ci-robot k8s-ci-robot requested a review from detiber July 24, 2019 18:56
@vincepri
Copy link
Member Author

@ncdc Good to merge?

@ncdc
Copy link
Contributor

ncdc commented Jul 24, 2019

@vincepri I won't likely have any more review time today. Feel free to ask others or I can come back to it tomorrow.

@ncdc
Copy link
Contributor

ncdc commented Jul 25, 2019

/lgtm

Can do additional fixes in follow-ups as needed.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 25, 2019
@vincepri
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 25, 2019
@k8s-ci-robot k8s-ci-robot merged commit 835ee87 into kubernetes-sigs:master Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Externalize provider specific specs and status in separated CRDs
6 participants