Skip to content
This repository has been archived by the owner on Oct 22, 2024. It is now read-only.

operator API v1alpha2: revise Status #611

Closed
pohly opened this issue Apr 22, 2020 · 5 comments · Fixed by #749
Closed

operator API v1alpha2: revise Status #611

pohly opened this issue Apr 22, 2020 · 5 comments · Fixed by #749
Assignees
Labels

Comments

@pohly
Copy link
Contributor

pohly commented Apr 22, 2020

We should align with kubernetes/enhancements#1624.

@pohly pohly added the future needs to be fixed in some future release label May 18, 2020
@pohly pohly added 0.8 needs to be fixed in 0.8.x and removed future needs to be fixed in some future release labels Jun 18, 2020
@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2020

@avalluri please describe what you intend to do regarding the status field in the CRD.

@avalluri
Copy link
Contributor

@pohly, here is the draft of the new status type roughly look:

// DeploymentConditionType type for representing a deployment status condition
type DeploymentConditionType string

const (
	// CertsVerified means the provided deployment secrets are verified and valid for usage
	CertsVerified DeploymentConditionType = "CertsVerified"
	// CertsReady means secrests/certificates required for running the PMEM-CSI driver
	// are ready and the deployment could progress further
	CertsReady DeploymentConditionType = "CertsReady"
	// DriverDeployed means that the node driver deamonset got created
	DriverDeployed DeploymentConditionType = "DriverDeployed"
)

// +k8s:deepcopy-gen=true
type DeploymentCondition struct {
	// Type of condition.
	Type DeploymentConditionType `json:"type"`
	// Status of the condition, one of True, False, Unknown.
	Status corev1.ConditionStatus `json:"status"`
	// Message human readable text that explain why this condition is in this state
	// +optional
	Message string `json:"reason,omitempty"`
	// Last time the condition was probed.
	// +optional
	LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"`
}

type DriverType string

const (
	ControllerDriver DriverType = "ControllerDriver"
	NodeDriver       DriverType = "NodeDriver"
)

// +k8s:deepcopy-gen=true
type DriverStatus struct {
	// Type represents type of the driver: controller or node
	Type DriverType `json:"type"`
	// Status represents the driver status : Ready, NotReady
	Status string `json:"status"`
	// Reason represents the human readable text that explains why the
	// driver is in this state.
	Reason string `json:"message"`
}

// +k8s:deepcopy-gen=true
// DeploymentStatus defines the observed state of Deployment
type DeploymentStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make operator-generate-k8s" to regenerate code after modifying this file

	// Phase indicates the state of the deployment
	Phase DeploymentPhase `json:"phase,omitempty"`
	// Conditions
	Conditions []DeploymentCondition `json:"conditions,omitempty"`
	Components []DriverStatus        `json:"components,omitempty"`
	// LastUpdated time of the deployment status
	LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}

I initially thout of including all sub-resources(roles, rolbebidnings, services, etc.) created for the deployment, but I am kind of decided to limit that to just keep node & controller driver status.

@pohly
Copy link
Contributor Author

pohly commented Sep 21, 2020

// Type represents type of the driver: controller or node
Type DriverType json:"type"

Is the plan to have more than one entry with type "node"?

If yes, how does the user know which node the status belongs to? If no, does it make sense to have Components []DriverStatus (an array) when there are always just two entries?

How does the operator determine whether the "controller" or "node" is "ready"? You need to explain the semantic in your API, not just list the values.

@avalluri
Copy link
Contributor

avalluri commented Sep 21, 2020

// Type represents type of the driver: controller or node
Type DriverType json:"type"

Is the plan to have more than one entry with type "node"?

No, It is the combined state of all the node(s)/daemonset. The 'Reason' field supposed to hold the details of how many nodes are in running state out of total nodes.

If no, does it make sense to have Components []DriverStatus (an array) when there are always just two entries?

The array was the result fo the initial design to hold all sub-resources state. Left like that, assuming that could be extendable to the original idea if needed. Otherwise, we can split into two entries - ControllerDriver and NodeDriver.

How does the operator determine whether the "controller" or "node" is "ready"?

'Ready' here meaning that the Pods underneath the node DaemonSet, and StatefulSet are in 'Running' state.

avalluri added a commit to avalluri/pmem-CSI that referenced this issue Sep 27, 2020
Revised the Deployment.Status to accommodate the deployment state
conditions and driver state. Currently, Deployment has 3 conditions named
CertsVerified, CertsReady, and DriverDeployed. It also records the summary
of controller and node driver state, .i.e, no. of nodes the driver is
running.

In order to record real time status of the driver current had to rewrite
the current reconcile loop. The existing reconcile loop was keen on the
deployment CR changes and redeploy *only* the sub-objects that requires
to redeploy. Instead the new reconcile logic *refresh* all the objects
and CR status to keep the state consistent. The refresh chooses to
merge patching the objects to avoid all unnecessary updates.

There are two reconcile entry points:
- CR reconcile loop: refreshes all the sub-objects and CR status
- sub-object vent handler: redeploy only the deleted/changed resource
and updates CR status if required.

This also includes other code cleanups that come across.

TODOs:
 - E2E tests for validating if the operator restores the state of a
 broken deployment - i.e, recovering deleted/modified sub-objects.

FIXES: intel#611
@pohly pohly mentioned this issue Sep 30, 2020
5 tasks
@pohly pohly added 0.9 and removed 0.8 needs to be fixed in 0.8.x labels Oct 2, 2020
@pohly
Copy link
Contributor Author

pohly commented Oct 2, 2020

Didn't make it into 0.8. Let's move it to 0.9.

avalluri added a commit to avalluri/pmem-CSI that referenced this issue Oct 6, 2020
Revised the Deployment.Status to accommodate the deployment state
conditions and driver state. Currently, Deployment has 3 conditions named
CertsVerified, CertsReady, and DriverDeployed. It also records the summary
of controller and node driver state, .i.e, no. of nodes the driver is
running.

In order to record real time status of the driver current had to rewrite
the current reconcile loop. The existing reconcile loop was keen on the
deployment CR changes and redeploy *only* the sub-objects that requires
to redeploy. Instead the new reconcile logic *refresh* all the objects
and CR status to keep the state consistent. The refresh chooses to
merge patching the objects to avoid all unnecessary updates.

There are two reconcile entry points:
- CR reconcile loop: refreshes all the sub-objects and CR status
- sub-object vent handler: redeploy only the deleted/changed resource
and updates CR status if required.

This also includes other code cleanups that come across.

TODOs:
 - E2E tests for validating if the operator restores the state of a
 broken deployment - i.e, recovering deleted/modified sub-objects.

FIXES: intel#611
avalluri added a commit to avalluri/pmem-CSI that referenced this issue Oct 8, 2020
Revised the Deployment.Status to accommodate the deployment state
conditions and driver state. Currently, Deployment has 3 conditions named
CertsVerified, CertsReady, and DriverDeployed. It also records the summary
of controller and node driver state, .i.e, no. of nodes the driver is
running.

In order to record real time status of the driver current had to rewrite
the current reconcile loop. The existing reconcile loop was keen on the
deployment CR changes and redeploy *only* the sub-objects that requires
to redeploy. Instead the new reconcile logic *refresh* all the objects
and CR status to keep the state consistent. The refresh chooses to
merge patching the objects to avoid all unnecessary updates.

There are two reconcile entry points:
- CR reconcile loop: refreshes all the sub-objects and CR status
- sub-object vent handler: redeploy only the deleted/changed resource
and updates CR status if required.

This also includes other code cleanups that come across.

FIXES: intel#611
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants