CORTX-30684: change retain policy to Delete for local-path storage #260

keithpine · 2022-05-23T19:04:33Z

Description

Change the reclaimPolicy of the local path StorageClass to Delete, from Retain. Note that Delete is the default policy for dynamic provisioner storage class that doesn't specify any policy, and also the default for the Local Path Provisioner.

Using a Retain policy means the cluster administrator needs to manually clean-up the backing data for k8s volumes, after a user deletes their PVCs and PVs. In development usage the local node partitions can easily run out of space without active management, causing deployments to fail. In comparison, if the policy is Delete, the provisioner will automatically remove the backing data on the nodes when the PVCs are deleted (and PVs are unbound). This is the behavior that makes sense as a default for users of the deployment scripts.

It is no longer necessary to manually remove PVs since the provisioner does this itself, in fact doing this at the same time causes the provisioner to be unable to removing the node data.

Breaking change

It's useful for users to be aware of this change to the default behavior. Running the destroy script will now result in all data being removed from the cluster nodes (although, that is really the intent).

The destroy-cortx-cloud.sh script no longer manually removes 3rd party PVs. This means running the script against a cluster where the PV reclaim policies are Retain, the PVs will not be removed. Users should manually delete the PVs and data on the cluster nodes.

Type of change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds new functionality)
Breaking change (bug fix or new feature that breaks existing functionality)
Third-party dependency update
Documentation additions or improvements
Code quality improvements to existing code or test additions/updates

Applicable issues

This change address an issue: CORTX-30684
This change is related to an issue: CORTX-31536

In debugging 31356 I wanted to ensure that no left over Pod data was influencing repeat runs. Switching to the Delete reclaim policy makes sure the left-over data is automatically cleaned up.

How was this tested?

Deploy and destroy. I see all node data removed after the destroy.

Saved files into S3 bucket, confirmed after stopping and starting the cluster (stop/start scripts) the data is still accessible.

Additional information

If users prefer to keep the Retain policy, PVs can be modified after they are created. https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/

Note that future plans include allowing the use of any StorageClass, not just our specific local-path provisioner. In that case, users are free to define the StorageClass any way they prefer, and clean-up PVs and data as required.

Checklist

The change is tested and works locally.
New or changed settings in the solution YAML are documented clearly in the README.md file.
All commits are signed off and are in agreement with the CORTX Community DCO and CLA policy.

If this change addresses a CORTX Jira issue:

The title of the PR starts with the issue ID (e.g. CORTX-XXXXX:)

Signed-off-by: Keith Pine <[email protected]>

k8_cortx_cloud/destroy-cortx-cloud.sh

keithpine · 2022-05-23T21:47:45Z

I did a loop of deploy/destroy. I've noticed that some of the data directories are still left over. In the local-path-provisioner log, I see some messages like:

E0523 20:42:58.647690       1 controller.go:991] volume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" in work queue no longer exists

pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a was the PV which corresponds to a left-over data directory.

In the destroy script, we manually delete the PVs:

Removing pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a
persistentvolume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" deleted

My guess is that deleting the volume manually is confusing the provisioner, and it doesn't handle a volume being removed while it's in its work queue.

osowski

based on everything else we've seen and have in plan, this lgtm.

Signed-off-by: Keith Pine <[email protected]>

k8_cortx_cloud/destroy-cortx-cloud.sh

Needs re-review

osowski · 2022-05-24T13:53:39Z

I did a loop of deploy/destroy. I've noticed that some of the data directories are still left over. In the local-path-provisioner log, I see some messages like:
E0523 20:42:58.647690       1 controller.go:991] volume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" in work queue no longer exists
pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a was the PV which corresponds to a left-over data directory.

In the destroy script, we manually delete the PVs:
Removing pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a
persistentvolume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" deleted
My guess is that deleting the volume manually is confusing the provisioner, and it doesn't handle a volume being removed while it's in its work queue.

As we're moving towards handling as much k8s functionality as possible, this makes sense to me to remove the manual removal of PVs when we have the underlying StorageClass and dynamic provisioner taking care of some of that. It will help with separation of roles, as long as we're able to still run atomic deploy/destroy loops.

…tro v7.10 and ElasticSearch host to incorporate the multi-node cluster (Seagate#260) * EOS-18850:ElasticSearch host to incorporate the multi-node cluster * EOS-18850:Update py-utils code wrt Elasticsearch and Kibana OpenDistro v7.10 and ElasticSearch host to incorporate the multi-node cluster * Update elasticsearch and elasticsearch-dsl version Signed-off-by: Pranali04796 <[email protected]> * Update py-utils: ElasticSearch host to incorporate the multi-node cluster Signed-off-by: Pranali04796 <[email protected]> * codacy changes Signed-off-by: Pranali04796 <[email protected]> * update database.py Signed-off-by: Pranali04796 <[email protected]> * codacy changes Signed-off-by: Pranali04796 <[email protected]> Co-authored-by: Udayan Yaragattikar <[email protected]> Co-authored-by: Sachin Punadikar <[email protected]>

) Signed-off-by: Keith Pine <[email protected]>

Change retain policy to Delete for local-path storage

f12e67c

Signed-off-by: Keith Pine <[email protected]>

cla-bot bot added the cla-signed label May 23, 2022

keithpine commented May 23, 2022

View reviewed changes

k8_cortx_cloud/destroy-cortx-cloud.sh Outdated Show resolved Hide resolved

keithpine marked this pull request as ready for review May 23, 2022 21:22

keithpine requested a review from a team as a code owner May 23, 2022 21:22

osowski previously approved these changes May 23, 2022

View reviewed changes

walterlopatka previously approved these changes May 23, 2022

View reviewed changes

keithpine marked this pull request as draft May 23, 2022 22:45

Let provisioner delete PVs

7d26833

Signed-off-by: Keith Pine <[email protected]>

keithpine requested review from osowski and walterlopatka May 23, 2022 23:47

keithpine commented May 23, 2022

View reviewed changes

k8_cortx_cloud/destroy-cortx-cloud.sh Show resolved Hide resolved

keithpine marked this pull request as ready for review May 23, 2022 23:54

keithpine changed the title ~~Change retain policy to Delete for local-path storage~~ CORTX-30684: change retain policy to Delete for local-path storage May 24, 2022

walterlopatka approved these changes May 24, 2022

View reviewed changes

osowski approved these changes May 24, 2022

View reviewed changes

keithpine merged commit f649707 into Seagate:integration May 24, 2022

keithpine deleted the rancher-reclaim-delete branch May 24, 2022 17:18

walterlopatka pushed a commit that referenced this pull request Jun 3, 2022

CORTX-30684: change retain policy to Delete for local-path storage (#260

0918dd6

) Signed-off-by: Keith Pine <[email protected]>

keithpine added this to the v0.7.0 milestone Jun 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORTX-30684: change retain policy to Delete for local-path storage #260

CORTX-30684: change retain policy to Delete for local-path storage #260

keithpine commented May 23, 2022 •

edited

Loading

keithpine commented May 23, 2022

osowski left a comment

osowski commented May 24, 2022

CORTX-30684: change retain policy to Delete for local-path storage #260

CORTX-30684: change retain policy to Delete for local-path storage #260

Conversation

keithpine commented May 23, 2022 • edited Loading

Description

Breaking change

Type of change

Applicable issues

How was this tested?

Additional information

Checklist

keithpine commented May 23, 2022

osowski left a comment

Choose a reason for hiding this comment

osowski commented May 24, 2022

keithpine commented May 23, 2022 •

edited

Loading