operator API v1alpha2: revise Status #611

pohly · 2020-04-22T09:03:26Z

We should align with kubernetes/enhancements#1624.

pohly · 2020-09-21T10:05:41Z

@avalluri please describe what you intend to do regarding the status field in the CRD.

avalluri · 2020-09-21T10:46:34Z

@pohly, here is the draft of the new status type roughly look:

// DeploymentConditionType type for representing a deployment status condition
type DeploymentConditionType string

const (
	// CertsVerified means the provided deployment secrets are verified and valid for usage
	CertsVerified DeploymentConditionType = "CertsVerified"
	// CertsReady means secrests/certificates required for running the PMEM-CSI driver
	// are ready and the deployment could progress further
	CertsReady DeploymentConditionType = "CertsReady"
	// DriverDeployed means that the node driver deamonset got created
	DriverDeployed DeploymentConditionType = "DriverDeployed"
)

// +k8s:deepcopy-gen=true
type DeploymentCondition struct {
	// Type of condition.
	Type DeploymentConditionType `json:"type"`
	// Status of the condition, one of True, False, Unknown.
	Status corev1.ConditionStatus `json:"status"`
	// Message human readable text that explain why this condition is in this state
	// +optional
	Message string `json:"reason,omitempty"`
	// Last time the condition was probed.
	// +optional
	LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"`
}

type DriverType string

const (
	ControllerDriver DriverType = "ControllerDriver"
	NodeDriver       DriverType = "NodeDriver"
)

// +k8s:deepcopy-gen=true
type DriverStatus struct {
	// Type represents type of the driver: controller or node
	Type DriverType `json:"type"`
	// Status represents the driver status : Ready, NotReady
	Status string `json:"status"`
	// Reason represents the human readable text that explains why the
	// driver is in this state.
	Reason string `json:"message"`
}

// +k8s:deepcopy-gen=true
// DeploymentStatus defines the observed state of Deployment
type DeploymentStatus struct {
	// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
	// Important: Run "make operator-generate-k8s" to regenerate code after modifying this file

	// Phase indicates the state of the deployment
	Phase DeploymentPhase `json:"phase,omitempty"`
	// Conditions
	Conditions []DeploymentCondition `json:"conditions,omitempty"`
	Components []DriverStatus        `json:"components,omitempty"`
	// LastUpdated time of the deployment status
	LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}

I initially thout of including all sub-resources(roles, rolbebidnings, services, etc.) created for the deployment, but I am kind of decided to limit that to just keep node & controller driver status.

pohly · 2020-09-21T14:32:29Z

// Type represents type of the driver: controller or node
Type DriverType json:"type"

Is the plan to have more than one entry with type "node"?

If yes, how does the user know which node the status belongs to? If no, does it make sense to have Components []DriverStatus (an array) when there are always just two entries?

How does the operator determine whether the "controller" or "node" is "ready"? You need to explain the semantic in your API, not just list the values.

avalluri · 2020-09-21T15:59:35Z

// Type represents type of the driver: controller or node
Type DriverType json:"type"

Is the plan to have more than one entry with type "node"?

No, It is the combined state of all the node(s)/daemonset. The 'Reason' field supposed to hold the details of how many nodes are in running state out of total nodes.

If no, does it make sense to have Components []DriverStatus (an array) when there are always just two entries?

The array was the result fo the initial design to hold all sub-resources state. Left like that, assuming that could be extendable to the original idea if needed. Otherwise, we can split into two entries - ControllerDriver and NodeDriver.

How does the operator determine whether the "controller" or "node" is "ready"?

'Ready' here meaning that the Pods underneath the node DaemonSet, and StatefulSet are in 'Running' state.

Revised the Deployment.Status to accommodate the deployment state conditions and driver state. Currently, Deployment has 3 conditions named CertsVerified, CertsReady, and DriverDeployed. It also records the summary of controller and node driver state, .i.e, no. of nodes the driver is running. In order to record real time status of the driver current had to rewrite the current reconcile loop. The existing reconcile loop was keen on the deployment CR changes and redeploy *only* the sub-objects that requires to redeploy. Instead the new reconcile logic *refresh* all the objects and CR status to keep the state consistent. The refresh chooses to merge patching the objects to avoid all unnecessary updates. There are two reconcile entry points: - CR reconcile loop: refreshes all the sub-objects and CR status - sub-object vent handler: redeploy only the deleted/changed resource and updates CR status if required. This also includes other code cleanups that come across. TODOs: - E2E tests for validating if the operator restores the state of a broken deployment - i.e, recovering deleted/modified sub-objects. FIXES: intel#611

pohly · 2020-10-02T09:53:39Z

Didn't make it into 0.8. Let's move it to 0.9.

Revised the Deployment.Status to accommodate the deployment state conditions and driver state. Currently, Deployment has 3 conditions named CertsVerified, CertsReady, and DriverDeployed. It also records the summary of controller and node driver state, .i.e, no. of nodes the driver is running. In order to record real time status of the driver current had to rewrite the current reconcile loop. The existing reconcile loop was keen on the deployment CR changes and redeploy *only* the sub-objects that requires to redeploy. Instead the new reconcile logic *refresh* all the objects and CR status to keep the state consistent. The refresh chooses to merge patching the objects to avoid all unnecessary updates. There are two reconcile entry points: - CR reconcile loop: refreshes all the sub-objects and CR status - sub-object vent handler: redeploy only the deleted/changed resource and updates CR status if required. This also includes other code cleanups that come across. TODOs: - E2E tests for validating if the operator restores the state of a broken deployment - i.e, recovering deleted/modified sub-objects. FIXES: intel#611

Revised the Deployment.Status to accommodate the deployment state conditions and driver state. Currently, Deployment has 3 conditions named CertsVerified, CertsReady, and DriverDeployed. It also records the summary of controller and node driver state, .i.e, no. of nodes the driver is running. In order to record real time status of the driver current had to rewrite the current reconcile loop. The existing reconcile loop was keen on the deployment CR changes and redeploy *only* the sub-objects that requires to redeploy. Instead the new reconcile logic *refresh* all the objects and CR status to keep the state consistent. The refresh chooses to merge patching the objects to avoid all unnecessary updates. There are two reconcile entry points: - CR reconcile loop: refreshes all the sub-objects and CR status - sub-object vent handler: redeploy only the deleted/changed resource and updates CR status if required. This also includes other code cleanups that come across. FIXES: intel#611

pohly added the future needs to be fixed in some future release label May 18, 2020

pohly added 0.8 needs to be fixed in 0.8.x and removed future needs to be fixed in some future release labels Jun 18, 2020

pohly assigned avalluri Jun 18, 2020

avalluri mentioned this issue Sep 27, 2020

operator: revise deployment status API and operator reconcile loop #749

Merged

3 tasks

pohly mentioned this issue Sep 30, 2020

stable operator API #758

Closed

5 tasks

pohly added 0.9 and removed 0.8 needs to be fixed in 0.8.x labels Oct 2, 2020

pohly closed this as completed in #749 Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operator API v1alpha2: revise Status #611

operator API v1alpha2: revise Status #611

pohly commented Apr 22, 2020

pohly commented Sep 21, 2020

avalluri commented Sep 21, 2020

pohly commented Sep 21, 2020

avalluri commented Sep 21, 2020 •

edited

Loading

pohly commented Oct 2, 2020

operator API v1alpha2: revise Status #611

operator API v1alpha2: revise Status #611

Comments

pohly commented Apr 22, 2020

pohly commented Sep 21, 2020

avalluri commented Sep 21, 2020

pohly commented Sep 21, 2020

avalluri commented Sep 21, 2020 • edited Loading

pohly commented Oct 2, 2020

avalluri commented Sep 21, 2020 •

edited

Loading