Skip to content
This repository has been archived by the owner on Feb 8, 2024. It is now read-only.

CORTX-30684: change retain policy to Delete for local-path storage #260

Merged
merged 2 commits into from
May 24, 2022
Merged

CORTX-30684: change retain policy to Delete for local-path storage #260

merged 2 commits into from
May 24, 2022

Conversation

keithpine
Copy link
Contributor

@keithpine keithpine commented May 23, 2022

Description

Change the reclaimPolicy of the local path StorageClass to Delete, from Retain. Note that Delete is the default policy for dynamic provisioner storage class that doesn't specify any policy, and also the default for the Local Path Provisioner.

Using a Retain policy means the cluster administrator needs to manually clean-up the backing data for k8s volumes, after a user deletes their PVCs and PVs. In development usage the local node partitions can easily run out of space without active management, causing deployments to fail. In comparison, if the policy is Delete, the provisioner will automatically remove the backing data on the nodes when the PVCs are deleted (and PVs are unbound). This is the behavior that makes sense as a default for users of the deployment scripts.

It is no longer necessary to manually remove PVs since the provisioner does this itself, in fact doing this at the same time causes the provisioner to be unable to removing the node data.

Breaking change

It's useful for users to be aware of this change to the default behavior. Running the destroy script will now result in all data being removed from the cluster nodes (although, that is really the intent).

The destroy-cortx-cloud.sh script no longer manually removes 3rd party PVs. This means running the script against a cluster where the PV reclaim policies are Retain, the PVs will not be removed. Users should manually delete the PVs and data on the cluster nodes.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds new functionality)
  • Breaking change (bug fix or new feature that breaks existing functionality)
  • Third-party dependency update
  • Documentation additions or improvements
  • Code quality improvements to existing code or test additions/updates

Applicable issues

  • This change address an issue: CORTX-30684
  • This change is related to an issue: CORTX-31536

In debugging 31356 I wanted to ensure that no left over Pod data was influencing repeat runs. Switching to the Delete reclaim policy makes sure the left-over data is automatically cleaned up.

How was this tested?

Deploy and destroy. I see all node data removed after the destroy.

Saved files into S3 bucket, confirmed after stopping and starting the cluster (stop/start scripts) the data is still accessible.

Additional information

If users prefer to keep the Retain policy, PVs can be modified after they are created. https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/

Note that future plans include allowing the use of any StorageClass, not just our specific local-path provisioner. In that case, users are free to define the StorageClass any way they prefer, and clean-up PVs and data as required.

Checklist

  • The change is tested and works locally.
  • New or changed settings in the solution YAML are documented clearly in the README.md file.
  • All commits are signed off and are in agreement with the CORTX Community DCO and CLA policy.

If this change addresses a CORTX Jira issue:

  • The title of the PR starts with the issue ID (e.g. CORTX-XXXXX:)

@cla-bot cla-bot bot added the cla-signed label May 23, 2022
@keithpine keithpine marked this pull request as ready for review May 23, 2022 21:22
@keithpine keithpine requested a review from a team as a code owner May 23, 2022 21:22
@keithpine
Copy link
Contributor Author

I did a loop of deploy/destroy. I've noticed that some of the data directories are still left over. In the local-path-provisioner log, I see some messages like:

E0523 20:42:58.647690       1 controller.go:991] volume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" in work queue no longer exists

pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a was the PV which corresponds to a left-over data directory.

In the destroy script, we manually delete the PVs:

Removing pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a
persistentvolume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" deleted

My guess is that deleting the volume manually is confusing the provisioner, and it doesn't handle a volume being removed while it's in its work queue.

osowski
osowski previously approved these changes May 23, 2022
Copy link
Contributor

@osowski osowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on everything else we've seen and have in plan, this lgtm.

walterlopatka
walterlopatka previously approved these changes May 23, 2022
@keithpine keithpine marked this pull request as draft May 23, 2022 22:45
Signed-off-by: Keith Pine <[email protected]>
@keithpine keithpine marked this pull request as ready for review May 23, 2022 23:54
@keithpine keithpine dismissed stale reviews from walterlopatka and osowski May 23, 2022 23:55

Needs re-review

@osowski
Copy link
Contributor

osowski commented May 24, 2022

I did a loop of deploy/destroy. I've noticed that some of the data directories are still left over. In the local-path-provisioner log, I see some messages like:

E0523 20:42:58.647690       1 controller.go:991] volume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" in work queue no longer exists

pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a was the PV which corresponds to a left-over data directory.

In the destroy script, we manually delete the PVs:

Removing pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a
persistentvolume "pvc-94f74a9f-f308-473a-bd88-705ca7f64e5a" deleted

My guess is that deleting the volume manually is confusing the provisioner, and it doesn't handle a volume being removed while it's in its work queue.

As we're moving towards handling as much k8s functionality as possible, this makes sense to me to remove the manual removal of PVs when we have the underlying StorageClass and dynamic provisioner taking care of some of that. It will help with separation of roles, as long as we're able to still run atomic deploy/destroy loops.

@keithpine keithpine changed the title Change retain policy to Delete for local-path storage CORTX-30684: change retain policy to Delete for local-path storage May 24, 2022
@keithpine keithpine merged commit f649707 into Seagate:integration May 24, 2022
@keithpine keithpine deleted the rancher-reclaim-delete branch May 24, 2022 17:18
rohit-k-dwivedi pushed a commit to rohit-k-dwivedi/cortx-k8s that referenced this pull request May 24, 2022
…tro v7.10 and ElasticSearch host to incorporate the multi-node cluster (Seagate#260)

* EOS-18850:ElasticSearch host to incorporate the multi-node cluster

* EOS-18850:Update py-utils code wrt Elasticsearch and Kibana OpenDistro v7.10 and ElasticSearch host to incorporate the multi-node cluster

* Update elasticsearch and elasticsearch-dsl version

Signed-off-by: Pranali04796 <[email protected]>

* Update py-utils: ElasticSearch host to incorporate the multi-node cluster

Signed-off-by: Pranali04796 <[email protected]>

* codacy changes

Signed-off-by: Pranali04796 <[email protected]>

* update database.py

Signed-off-by: Pranali04796 <[email protected]>

* codacy changes

Signed-off-by: Pranali04796 <[email protected]>

Co-authored-by: Udayan Yaragattikar <[email protected]>
Co-authored-by: Sachin Punadikar <[email protected]>
walterlopatka pushed a commit that referenced this pull request Jun 3, 2022
@keithpine keithpine added this to the v0.7.0 milestone Jun 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants