Skip to content

Commit

Permalink
CSI: cephfs and rbd daemonset upgrade strategy
Browse files Browse the repository at this point in the history
CSI nodeplugins, specifically when using cephfs FUSE or
rbd-nbd as the mounters, when upgraded, will
cause existing mounts to become stale/not-rechable
(usually connection timeout errors).

This is due to losing the mount processes running within the
CSI nodeplugin pods.

This PR add updated the Daemonset update strategy
based on the ENV variable to take care of above issue
with some manual steps

Moreinfo: ceph/ceph-csi#703

Resolves: rook#4248

Signed-off-by: Madhu Rajanna <[email protected]>
  • Loading branch information
Madhu-1 authored and binoue committed Apr 10, 2020
1 parent 90b8801 commit f546994
Show file tree
Hide file tree
Showing 11 changed files with 142 additions and 63 deletions.
5 changes: 5 additions & 0 deletions Documentation/ceph-block.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ Before Rook can provision storage, a [`StorageClass`](https://kubernetes.io/docs
Each OSD must be located on a different node, because the [`failureDomain`](ceph-pool-crd.md#spec) is set to `host` and the `replicated.size` is set to `3`.

> **IMPORTANT**: If you are using rbd-nbd as a mounter in storageclass. During upgrade you will be hitting a ceph-csi
[bug](https://github.com/ceph/ceph-csi/issues/703) you need to follow
> [upgrade steps](ceph-upgrade.md#1.-Update-the-Rook-Operator) which requires
> node draining.
> **NOTE**: This example uses the CSI driver, which is the preferred driver going forward for K8s 1.13 and newer. Examples are found in the [CSI RBD](https://github.com/rook/rook/tree/{{ branchName }}/cluster/examples/kubernetes/ceph/csi/rbd) directory. For an example of a storage class using the flex driver (required for K8s 1.12 or earlier), see the [Flex Driver](#flex-driver) section below, which has examples in the [flex](https://github.com/rook/rook/tree/{{ branchName }}/cluster/examples/kubernetes/ceph/flex) directory.
Save this `StorageClass` definition as `storageclass.yaml`:
Expand Down
6 changes: 6 additions & 0 deletions Documentation/ceph-filesystem.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ $ ceph status
Before Rook can start provisioning storage, a StorageClass needs to be created based on the filesystem. This is needed for Kubernetes to interoperate
with the CSI driver to create persistent volumes.

> **IMPORTANT**: Do not use CephFS CSI driver, if the kernel is not
supporting ceph quotas (kernel version <4.17)
ceph-fuse client will be used as the default mounter. During upgrade you will be hitting a ceph-csi
[bug](https://github.com/ceph/ceph-csi/issues/703). you need to follow
[upgrade steps](ceph-upgrade.md#1.-Update-the-Rook-Operator) which requires node draining.

> **NOTE**: This example uses the CSI driver, which is the preferred driver going forward for K8s 1.13 and newer. Examples are found in the [CSI CephFS](https://github.com/rook/rook/tree/{{ branchName }}/cluster/examples/kubernetes/ceph/csi/cephfs) directory. For an example of a volume using the flex driver (required for K8s 1.12 and earlier), see the [Flex Driver](#flex-driver) section below.
Save this storage class definition as `storageclass.yaml`:
Expand Down
14 changes: 14 additions & 0 deletions Documentation/ceph-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,20 @@ kubectl apply -f upgrade-from-v1.1-apply.yaml
The largest portion of the upgrade is triggered when the operator's image is updated to `v1.2.x`.
When the operator is updated, it will proceed to update all of the Ceph daemons.

If you are using ceph-fuse or nbd-rbd mounter. when upgraded, will cause
existing mounts to become stale/not-rechable. please add below to the operator
`env` variables by editing operator deployment.

```yaml
env:
# Add CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY env if you are using ceph-fuse mounter in storageclass or kernel is not supporting ceph quota(<4.17)
- name: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY
value: "OnDelete"
# Add CSI_RBD_PLUGIN_UPDATE_STRATEGY env if you are using rbd-nbd mounter
- name: CSI_RBD_PLUGIN_UPDATE_STRATEGY
value: "OnDelete"
```
```sh
kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.2.0
```
Expand Down
100 changes: 51 additions & 49 deletions Documentation/helm-operator.md

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions cluster/charts/rook-ceph/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,14 @@ spec:
value: {{ .Values.csi.enableCephfsDriver | quote }}
- name: CSI_ENABLE_SNAPSHOTTER
value: {{ .Values.csi.enableSnapshotter | quote }}
{{- if .Values.csi.cephFSPluginUpdateStrategy }}
- name: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY
value: {{ .Values.csi.cephFSPluginUpdateStrategy | quote }}
{{- end }}
{{- if .Values.csi.rbdPluginUpdateStrategy }}
- name: CSI_RBD_PLUGIN_UPDATE_STRATEGY
value: {{ .Values.csi.rbdPluginUpdateStrategy | quote }}
{{- end }}
{{- if .Values.csi.kubeletDirPath }}
- name: ROOK_CSI_KUBELET_DIR_PATH
value: {{ .Values.csi.kubeletDirPath | quote }}
Expand Down
7 changes: 7 additions & 0 deletions cluster/charts/rook-ceph/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@ csi:
enableCephfsDriver: true
enableGrpcMetrics: true
enableSnapshotter: true
# CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#rbdPluginUpdateStrategy: OnDelete
# CSI Rbd plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#cephFSPluginUpdateStrategy: OnDelete

# Set provisonerTolerations and provisionerNodeAffinity for provisioner pod.
# The CSI provisioner would be best to start on the same nodes as other ceph daemons.
# provisionerTolerations:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ spec:
selector:
matchLabels:
app: csi-cephfsplugin
updateStrategy:
type: {{ .CephFSPluginUpdateStrategy }}
template:
metadata:
labels:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ spec:
selector:
matchLabels:
app: csi-rbdplugin
updateStrategy:
type: {{ .RBDPluginUpdateStrategy }}
template:
metadata:
labels:
Expand Down
8 changes: 8 additions & 0 deletions cluster/examples/kubernetes/ceph/operator-openshift.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,14 @@ spec:
# See the upgrade guide: https://rook.io/docs/rook/v1.2/ceph-upgrade.html
- name: CSI_FORCE_CEPHFS_KERNEL_CLIENT
value: "true"
# CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#- name: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY
# value: "OnDelete"
# CSI Rbd plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#- name: CSI_RBD_PLUGIN_UPDATE_STRATEGY
# value: "OnDelete"
# kubelet directory path, if kubelet configured to use other than /var/lib/kubelet path.
#- name: ROOK_CSI_KUBELET_DIR_PATH
# value: "/var/lib/kubelet"
Expand Down
8 changes: 8 additions & 0 deletions cluster/examples/kubernetes/ceph/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,14 @@ spec:
# See the upgrade guide: https://rook.io/docs/rook/v1.2/ceph-upgrade.html
- name: CSI_FORCE_CEPHFS_KERNEL_CLIENT
value: "true"
# CSI CephFS plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#- name: CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY
# value: "OnDelete"
# CSI Rbd plugin daemonset update strategy, supported values are OnDelete and RollingUpdate.
# Default value is RollingUpdate.
#- name: CSI_RBD_PLUGIN_UPDATE_STRATEGY
# value: "OnDelete"
# The default version of CSI supported by Rook will be started. To change the version
# of the CSI driver to something other than what is officially supported, change
# these images to the desired release of the CSI driver.
Expand Down
45 changes: 31 additions & 14 deletions pkg/operator/ceph/csi/spec.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,22 @@ import (
)

type Param struct {
CSIPluginImage string
RegistrarImage string
ProvisionerImage string
AttacherImage string
SnapshotterImage string
DriverNamePrefix string
EnableSnapshotter string
EnableCSIGRPCMetrics string
KubeletDirPath string
ForceCephFSKernelClient string
CephFSGRPCMetricsPort uint16
CephFSLivenessMetricsPort uint16
RBDGRPCMetricsPort uint16
RBDLivenessMetricsPort uint16
CSIPluginImage string
RegistrarImage string
ProvisionerImage string
AttacherImage string
SnapshotterImage string
DriverNamePrefix string
EnableSnapshotter string
EnableCSIGRPCMetrics string
KubeletDirPath string
ForceCephFSKernelClient string
CephFSPluginUpdateStrategy string
RBDPluginUpdateStrategy string
CephFSGRPCMetricsPort uint16
CephFSLivenessMetricsPort uint16
RBDGRPCMetricsPort uint16
RBDLivenessMetricsPort uint16
}

type templateParam struct {
Expand Down Expand Up @@ -222,6 +224,21 @@ func StartCSIDrivers(namespace string, clientset kubernetes.Interface, ver *vers
if !strings.EqualFold(enableSnap, "false") {
tp.EnableSnapshotter = "true"
}

updateStrategy := os.Getenv("CSI_CEPHFS_PLUGIN_UPDATE_STRATEGY")
if strings.EqualFold(updateStrategy, "ondelete") {
tp.CephFSPluginUpdateStrategy = "OnDelete"
} else {
tp.CephFSPluginUpdateStrategy = "RollingUpdate"
}

updateStrategy = os.Getenv("CSI_RBD_PLUGIN_UPDATE_STRATEGY")
if strings.EqualFold(updateStrategy, "ondelete") {
tp.RBDPluginUpdateStrategy = "OnDelete"
} else {
tp.RBDPluginUpdateStrategy = "RollingUpdate"
}

if ver.Major > KubeMinMajor || (ver.Major == KubeMinMajor && ver.Minor < provDeploymentSuppVersion) {
deployProvSTS = true
}
Expand Down

0 comments on commit f546994

Please sign in to comment.