Skip to content

Commit

Permalink
operator: revise deployment status API and operator reconcile loop
Browse files Browse the repository at this point in the history
Revised the Deployment.Status to accommodate the deployment state
conditions and driver state. Currently, Deployment has 3 conditions named
CertsVerified, CertsReady, and DriverDeployed. It also records the summary
of controller and node driver state, .i.e, no. of nodes the driver is
running.

In order to record real time status of the driver current had to rewrite
the current reconcile loop. The existing reconcile loop was keen on the
deployment CR changes and redeploy *only* the sub-objects that requires
to redeploy. Instead the new reconcile logic *refresh* all the objects
and CR status to keep the state consistent. The refresh chooses to
merge patching the objects to avoid all unnecessary updates.

There are two reconcile entry points:
- CR reconcile loop: refreshes all the sub-objects and CR status
- sub-object vent handler: redeploy only the deleted/changed resource
and updates CR status if required.

This also includes other code cleanups that come across.

FIXES: intel#611
  • Loading branch information
avalluri committed Oct 8, 2020
1 parent 3d2367e commit 3245084
Show file tree
Hide file tree
Showing 6 changed files with 1,553 additions and 690 deletions.
100 changes: 74 additions & 26 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,36 +270,73 @@ pmem-csi.intel.com 50s

$ kubectl describe deployment.pmem-csi.intel.com/pmem-csi.intel.com
Name: pmem-csi.intel.com
Namespace: default
Namespace:
Labels: <none>
Annotations: <none>
API Version: pmem-csi.intel.com/v1alpha1
Kind: Deployment
Metadata:
Creation Timestamp: 2020-01-23T13:40:32Z
Creation Timestamp: 2020-10-07T07:31:58Z
Generation: 1
Resource Version: 3596387
Self Link: /apis/pmem-csi.intel.com/v1alpha1/deployments/pmem-csi.intel.com
UID: 454b5961-5aa2-41c3-b774-29fe932ae236
Managed Fields:
API Version: pmem-csi.intel.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:spec:
.:
f:deviceMode:
f:nodeSelector:
.:
f:storage:
Manager: kubectl-create
Operation: Update
Time: 2020-10-07T07:31:58Z
API Version: pmem-csi.intel.com/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:status:
.:
f:conditions:
f:driverComponents:
f:lastUpdated:
f:phase:
Manager: pmem-csi-operator
Operation: Update
Time: 2020-10-07T07:32:22Z
Resource Version: 1235740
Self Link: /apis/pmem-csi.intel.com/v1alpha1/deployments/pmem-csi.intel.com
UID: d8635490-53fa-4eec-970d-cd4c76f53b23
Spec:
Controller Resources:
Requests:
Cpu: 200m
Memory: 100Mi
Device Mode: lvm
Image: localhost/pmem-csi-driver:canary
Node Resources:
Requests:
Cpu: 200m
Memory: 100Mi
Node Selector:
Storage: pmem
Status:
Phase: Running
Conditions:
Last Update Time: 2020-10-07T07:32:00Z
Reason: Driver certificates are available.
Status: True
Type: CertsReady
Last Update Time: 2020-10-07T07:32:02Z
Reason: Driver deployed successfully.
Status: True
Type: DriverDeployed
Driver Components:
Component: Controller
Last Updated: 2020-10-08T07:45:13Z
Reason: 1 instance(s) of controller driver is running successfully
Status: Ready
Component: Node
Last Updated: 2020-10-08T07:45:11Z
Reason: All 3 node driver pod(s) running successfully
Status: Ready
Last Updated: 2020-10-07T07:32:21Z
Phase: Running
Reason: All driver components are deployed successfully
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NewDeployment 34s pmem-csi-operator Processing new driver deployment
Normal Running 2s (x10 over 26s) pmem-csi-operator Driver deployment successful

Type Reason Age From Message
---- ------ ---- ---- -------
Normal NewDeployment 58s pmem-csi-operator Processing new driver deployment
Normal Running 39s pmem-csi-operator Driver deployment successful

$ kubectl get po
NAME READY STATUS RESTARTS AGE
Expand Down Expand Up @@ -1176,21 +1213,32 @@ active volumes.

#### DeploymentStatus

A PMEM-CSI Deployment's `status` field is a `DeploymentStatus` object, which has
a `phase` field. The phase of a Deployment is high-level summary of where the
Deployment is in it's lifecycle.
A PMEM-CSI Deployment's `status` field is a `DeploymentStatus` object, which
carries the detailed state of the driver deployment. It comprises of deployment
conditions, driver component status, and a `phase` field. The phase of a
Deployment is a high-level summary of where the Deployment is in its lifecycle.

The possible `phase` values and their meaning are as below:

| Value | Meaning |
|---|---|
| empty string | A new deployment. |
| Initializing | All the direct sub-resources of the `Deployment` are created, but some indirect ones (like pods controlled by a daemon set) may still be missing. |
| Running | The operator has determined that the driver is usable<sup>1</sup>. |
| Failed | For some reason the state of the `Deployment` failed and cannot be progressed<sup>2</sup>. |
| Failed | For some reason the state of the `Deployment` failed and cannot be progressed. |

<sup>1</sup> This check has not been implemented yet. Instead, the deployment goes straight to `Running` after creating sub-resources.
<sup>2</sup> Failure reason is supposed to be carried by one of additional `DeploymentStatus` field, but not implemented yet.

#### Deployment Conditions

PMEM-CSI `DeploymentStatus` has an array of `conditions` through witch the
PMEM-CSI Deployment has or has not passed. Below are the possible condition
types and their meanings:

| Condition type | Meaning |
|---|---|
| CertsReady | Driver certificates/secrets are available. |
| CertsVerified | Verified that the provided certificates are valid. |
| DriverDeployed | All the componentes required for the PMEM-CSI deployment has been deployed. |

#### Deployment Events

Expand Down
194 changes: 191 additions & 3 deletions pkg/apis/pmemcsi/v1alpha1/deployment_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ const (
// Related issue : https://github.com/kubernetes-sigs/controller-tools/issues/478
// Fails setting min/max for integers: https://github.com/helm/helm/issues/5806

// +k8s:deepcopy-gen=true
// DeploymentSpec defines the desired state of Deployment
type DeploymentSpec struct {
// Important: Run "make operator-generate-k8s" to regenerate code after modifying this file
Expand Down Expand Up @@ -109,13 +110,77 @@ type DeploymentSpec struct {
KubeletDir string `json:"kubeletDir,omitempty"`
}

// DeploymentConditionType type for representing a deployment status condition
type DeploymentConditionType string

const (
// CertsVerified means the provided deployment secrets are verified and valid for usage
CertsVerified DeploymentConditionType = "CertsVerified"
// CertsReady means secrests/certificates required for running the PMEM-CSI driver
// are ready and the deployment could progress further
CertsReady DeploymentConditionType = "CertsReady"
// DriverDeployed means that the all the sub-resources required for the deployment CR
// got created
DriverDeployed DeploymentConditionType = "DriverDeployed"
)

// +k8s:deepcopy-gen=true
type DeploymentCondition struct {
// Type of condition.
Type DeploymentConditionType `json:"type"`
// Status of the condition, one of True, False, Unknown.
Status corev1.ConditionStatus `json:"status"`
// Message human readable text that explain why this condition is in this state
// +optional
Reason string `json:"reason,omitempty"`
// Last time the condition was probed.
// +optional
LastUpdateTime metav1.Time `json:"lastUpdateTime,omitempty"`
}

type DriverType int

const (
ControllerDriver DriverType = iota
NodeDriver
)

func (t DriverType) String() string {
switch t {
case ControllerDriver:
return "Controller"
case NodeDriver:
return "Node"
}
return ""
}

// +k8s:deepcopy-gen=true
type DriverStatus struct {
// Type represents type of the driver: controller or node
DriverComponent string `json:"component"`
// Status represents the driver status : Ready, NotReady
Status string `json:"status"`
// Reason represents the human readable text that explains why the
// driver is in this state.
Reason string `json:"reason"`
// LastUpdated time of the driver status
LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}

// +k8s:deepcopy-gen=true

// DeploymentStatus defines the observed state of Deployment
type DeploymentStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make operator-generate-k8s" to regenerate code after modifying this file

// Phase indicates the state of the deployment
Phase DeploymentPhase `json:"phase,omitempty"`
Phase DeploymentPhase `json:"phase,omitempty"`
Reason string `json:"reason,omitempty"`
// Conditions
Conditions []DeploymentCondition `json:"conditions,omitempty"`
Components []DriverStatus `json:"driverComponents,omitempty"`
// LastUpdated time of the deployment status
LastUpdated metav1.Time `json:"lastUpdated,omitempty"`
}
Expand Down Expand Up @@ -206,8 +271,6 @@ type DeploymentPhase string
const (
// DeploymentPhaseNew indicates a new deployment
DeploymentPhaseNew DeploymentPhase = ""
// DeploymentPhaseInitializing indicates deployment initialization is in progress
DeploymentPhaseInitializing DeploymentPhase = "Initializing"
// DeploymentPhaseRunning indicates that the deployment was successful
DeploymentPhaseRunning DeploymentPhase = "Running"
// DeploymentPhaseFailed indicates that the deployment was failed
Expand Down Expand Up @@ -259,6 +322,35 @@ func (c DeploymentChange) String() string {
}[c]
}

func (d *Deployment) SetCondition(t DeploymentConditionType, state corev1.ConditionStatus, reason string) {
for _, c := range d.Status.Conditions {
if c.Type == t {
c.Status = state
c.Reason = reason
c.LastUpdateTime = metav1.Now()
return
}
}
d.Status.Conditions = append(d.Status.Conditions, DeploymentCondition{
Type: t,
Status: state,
Reason: reason,
LastUpdateTime: metav1.Now(),
})
}

func (d *Deployment) SetDriverStatus(t DriverType, status, reason string) {
if d.Status.Components == nil {
d.Status.Components = make([]DriverStatus, 2)
}
d.Status.Components[t] = DriverStatus{
DriverComponent: t.String(),
Status: status,
Reason: reason,
LastUpdated: metav1.Now(),
}
}

// EnsureDefaults make sure that the deployment object has all defaults set properly
func (d *Deployment) EnsureDefaults(operatorImage string) error {
if d.Spec.Image == "" {
Expand Down Expand Up @@ -408,6 +500,78 @@ func (d *Deployment) GetHyphenedName() string {
return strings.ReplaceAll(d.GetName(), ".", "-")
}

// RegistrySecretName returns the name of the registry
// Secret object used by the deployment
func (d *Deployment) RegistrySecretName() string {
return d.GetHyphenedName() + "-registry-secrets"
}

// NodeSecretName returns the name of the node-controller
// Secret object used by the deployment
func (d *Deployment) NodeSecretName() string {
return d.GetHyphenedName() + "-node-secrets"
}

// CSIDriverName returns the name of the CSIDriver
// object name for the deployment
func (d *Deployment) CSIDriverName() string {
return d.GetName()
}

// ControllerServiceName returns the name of the controller
// Service object used by the deployment
func (d *Deployment) ControllerServiceName() string {
return d.GetHyphenedName() + "-controller"
}

// MetricsServiceName returns the name of the controller metrics
// Service object used by the deployment
func (d *Deployment) MetricsServiceName() string {
return d.GetHyphenedName() + "-metrics"
}

// ServiceAccountName returns the name of the ServiceAccount
// object used by the deployment
func (d *Deployment) ServiceAccountName() string {
return d.GetHyphenedName() + "-controller"
}

// ProvisionerRoleName returns the name of the provisioner's
// RBAC Role object name used by the deployment
func (d *Deployment) ProvisionerRoleName() string {
return d.GetHyphenedName() + "-external-provisioner-cfg"
}

// ProvisionerRoleBindingName returns the name of the provisioner's
// RoleBinding object name used by the deployment
func (d *Deployment) ProvisionerRoleBindingName() string {
return d.GetHyphenedName() + "-csi-provisioner-role-cfg"
}

// ProvisionerClusterRoleName returns the name of the
// provisioner's ClusterRole object name used by the deployment
func (d *Deployment) ProvisionerClusterRoleName() string {
return d.GetHyphenedName() + "-external-provisioner-runner"
}

// ProvisionerClusterRoleBindingName returns the name of the
// provisioner ClusterRoleBinding object name used by the deployment
func (d *Deployment) ProvisionerClusterRoleBindingName() string {
return d.GetHyphenedName() + "-csi-provisioner-role"
}

// NodeDriverName returns the name of the driver
// DaemonSet object name used by the deployment
func (d *Deployment) NodeDriverName() string {
return d.GetHyphenedName() + "-node"
}

// ControllerDriverName returns the name of the controller
// StatefulSet object name used by the deployment
func (d *Deployment) ControllerDriverName() string {
return d.GetHyphenedName() + "-controller"
}

// GetOwnerReference returns self owner reference could be used by other object
// to add this deployment to it's owner reference list.
func (d *Deployment) GetOwnerReference() metav1.OwnerReference {
Expand All @@ -423,6 +587,30 @@ func (d *Deployment) GetOwnerReference() metav1.OwnerReference {
}
}

// HaveCertificatesConfigured checks if the configured deployment
// certificate fields are valid. Returns true if valid else appropriate
// error.
func (d *Deployment) HaveCertificatesConfigured() (bool, error) {
// Encoded private keys and certificates
caCert := d.Spec.CACert
registryPrKey := d.Spec.RegistryPrivateKey
ncPrKey := d.Spec.NodeControllerPrivateKey
registryCert := d.Spec.RegistryCert
ncCert := d.Spec.NodeControllerCert

// sanity check
if caCert == nil {
if registryCert != nil || ncCert != nil {
return false, fmt.Errorf("incomplete deployment configuration: missing root CA certificate by which the provided certificates are signed")
}
return false, nil
} else if registryCert == nil || registryPrKey == nil || ncCert == nil || ncPrKey == nil {
return false, fmt.Errorf("incomplete deployment configuration: certificates and corresponding private keys must be provided")
}

return true, nil
}

func GetDeploymentCRDSchema() *apiextensions.JSONSchemaProps {
One := float64(1)
Hundred := float64(100)
Expand Down
Loading

0 comments on commit 3245084

Please sign in to comment.