Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Simulation of automatically provisioned ReadWriteMany PVs #1487

Open
joshatcaper opened this issue Apr 17, 2020 · 33 comments
Open

Enable Simulation of automatically provisioned ReadWriteMany PVs #1487

joshatcaper opened this issue Apr 17, 2020 · 33 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@joshatcaper
Copy link

What would you like to be added: A method to provide automatically provisioned ReadeWriteMany PVs that are available on all workers.

Currently the storage provisioner that is being used can only provision

Why is this needed: The current volume provisioner that is being only supports creating ReadWriteOnce volumes. This is because kind is using the rancher local-path-provisioner and they hard code their provisioner to disallow any PVCs with an access mode other than ReadWriteOnce. Many managed kubernetes providers supply some type of distributed file system. I'm currently using Azure Storage File (which is SMB/cifs under the hood) for this use case in production. Google's Kubernetes Engine offers ReadOnlyMany out of the box.

Possible solutions: Could we have the control plane node start up an NFS container backed by a ReadWriteOnce?

Thanks for your time!

@joshatcaper joshatcaper added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 17, 2020
@BenTheElder
Copy link
Member

NFS from an overlayfs requires a 4.15+ kernel IIRC.
Currently kind imposes no additional requirements on kernel version beyond what kubernetes does upstream.

I don't think we want to start imposing any kernel requirement yet, or the overhead of running & managing NFS by default.

kind of course supports installing additional drivers, preferably with CSI.

IMHO it makes more sense to run this as an addon. cc @msau42 @pohly.

We can discuss other Read* modes upstream in the rancher project.

@BenTheElder
Copy link
Member

BenTheElder commented Apr 17, 2020

Ah, I hadn't had a need for RWM. https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

even ReadOnlyMany is going to require some kind of network attached storage or something, since the "many" is nodes not pods (my mistake)

I don't think rancher / local storage is going to do read across nodes 😅

probably the best solution here is to document some yaml to apply for getting an NFS provisioner installed on top of a standard kind cluster.

@joshatcaper
Copy link
Author

@BenTheElder ah, didn't know NFS required a newer kernel in this instance. Would it be possible to do something similar with docker volumes instead? The following docker-compose example should back the containers with a shared volume that is consistent-ish:

version: "2.3"
services:
  control-plane0:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc
  worker0:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc
  worker1:
    image: k8s.gcr.io/pause
    volumes:
      - rwmpvc:/rwmpvc

volumes:
  rwmpvc:

The ouput of docker-compose up && docker container inspect <container> will show:

        ...
        "Mounts": [
            {
                "Type": "volume",
                "Name": "test_rwmpvc",
                "Source": "/var/lib/docker/volumes/test_rwmpvc/_data",
                "Destination": "/rwmpvc",
                "Driver": "local",
                "Mode": "rw",
                "RW": true,
                "Propagation": ""
            }
        ],
        ...

Using an approach like this would not require any NFS server to be run internally in the containers. The PV provisioner just needs to consistently derive the host path in a similar way like /rwmpvc/<uuid> on each host.

@BenTheElder
Copy link
Member

This will work in backends where the nodes are all on a single machine (which we may not guarantee in the future) IF we write a custom provisioner.

IMHO it's better to just provide an opt-in NFS solution you can deploy and document it.

It should just be a kubectl apply away from installing an NFS provisioner as long as you have an updated kernel.

@msau42
Copy link

msau42 commented Apr 21, 2020

Agree, I think a opt-in NFS tutorial would be the best option here for users that need it.

We don't have any great options from sig-storage perspective, most solutions already assume you have an nfs server setup somewhere.

  • nfs external provisioner: this repo is deprecated and in the process of being migrated to its own repo. This uses ganesha to provision nfs servers, but still requires some sort of stable disk to back it.
  • nfs-client external provisioner: this repo is deprecated and in the process of being migrated to its own repo. This takes an existing nfs share and carves out subdirectories from it as PVs.
  • nfs csi driver: currently does not support dynamic provisioning, but there are plans to add in an nfs-client-like provisioner in the near future. can potentially add snapshots support in the future too.

@joshatcaper
Copy link
Author

I don't know if this is possible but would there be some way to abstract this from the end user using some method of packaging and enabling "addons" similar to minikube? I don't know about the long term goals of kind but from an outsiders perspective it seems like a wonderful way to deploy an ephemeral copy of software in a CI stage. I was investigating it as a method to run some end-to-end integration testing on my company's software. I'd really like it if the configurations I end up applying to the created cluster very closely match what I'd push to a real cluster otherwise I'd be worried about running into the same issues you hit when you build a "dev" and "production" version of a binary and only test against your "dev" builds, never your production build.

I don't know if addons are a clean way of accomplishing this goal but I think the utility of kind for the in-CI-deployment workflow would greatly be helped by something that completely hides that this isn't a real managed kube cluster from the end user. Obviously, though, having some way to do this is better than having no way of doing this.

Interested in your thoughts.

@BenTheElder
Copy link
Member

I don't know if this is possible but would there be some way to abstract this from the end user using some method of packaging and enabling "addons" similar to minikube?

Hi, regarding addons: we're not bundling addons at this time.

That approach tends to be problematic for users as it couples the lifecycle of the addons to the version of the cluster tool.

SIG Cluster Lifecycle seems to agree and the future of addon work there seems to be the cluster addons project, which involves a generic system on top of any cluster. We're tracking that work and happy to integrate when it's ready #253

In the meantime addons tend to not be any different from any other cluster workload, they can be managed with kubectl, helm, kustomize, kpt, etc.

For an example of a more involved "addon" that isn't actually bundled with kind config dependencies see https://kind.sigs.k8s.io/docs/user/ingress/

I don't know about the long term goals of kind but from an outsiders perspective it seems like a wonderful way to deploy an ephemeral copy of software in a CI stage.

This gives a rough idea where our priorities are at, which do include supporting this more or less
https://kind.sigs.k8s.io/docs/contributing/project-scope/

I was investigating it as a method to run some end-to-end integration testing on my company's software. I'd really like it if the configurations I end up applying to the created cluster very closely match what I'd push to a real cluster otherwise I'd be worried about running into the same issues you hit when you build a "dev" and "production" version of a binary and only test against your "dev" builds, never your production build.

We have a KubeCon talk about this: https://kind.sigs.k8s.io/docs/user/resources/#testing-your-k8s-apps-with-kind--benjamin-elder--james-munnelly

I don't know if addons are a clean way of accomplishing this goal but I think the utility of kind for the in-CI-deployment workflow would greatly be helped by something that completely hides that this isn't a real managed kube cluster from the end user. Obviously, though, having some way to do this is better than having no way of doing this.

Clusters have a standard API in KUBECONFIG and the API endpoint.
Unfortunately for portability reasons we can't quite hide that this isn't the same as your real cluster, a lot of extension points break down here including but not limited to:

  • ingress
  • loadbalancer
  • storage classes (nonstandard ones in your prod environment, k8s really only has default as something of a standard)

For these you'll want to provide your own wrapper of some sort to ensure that the kind cluster matches your prod more closely (e.g. mimicking the custom storage classes from your prod cluster, trying to run a similar or the same ingress..)

@BenTheElder BenTheElder added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Apr 25, 2020
@BenTheElder
Copy link
Member

nfs-common will be installed on the nodes going forward which should enable NFS volumes. you still need to run an NFS server somehow.

@BenTheElder
Copy link
Member

(also confirmed that it works, the kubernetes NFS e2e tests pass)

@BenTheElder
Copy link
Member

@danquah
Copy link

danquah commented Jun 16, 2020

Just did a verification of this feature.

I first made sure kubernetes was cloned to ${GOPATH}/src/k8s.io/kubernetes as described in https://kind.sigs.k8s.io/docs/user/working-offline/#prepare-kubernetes-source-code

I then built my own node-image using the latest base-image with nfs-common via the following (takes a while!)

kind build node-image --image kindest/node:master --base-image kindest/base:v20200610-99eb0617 --kube-root "${GOPATH}/src/k8s.io/kubernetes"

Next i created a cluster using the new node-image via

kind create cluster --config kind-config.yaml

Using the following kind-config.yaml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:master

I then pulled and loaded the nfs-provisioner image to prepare for installation

docker pull quay.io/kubernetes_incubator/nfs-provisioner
kind  load docker-image quay.io/kubernetes_incubator/nfs-provisioner

The provisioner could then be installed via Helm (Helm was installed separately).

helm repo add stable https://kubernetes-charts.storage.googleapis.com/
helm install nfs-provisioner stable/nfs-server-provisioner 

And I was then finally able to to provision a NFS volume via the following PVC

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-dynamic-volume-claim
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

Everything worked like a charm - looking forward to the next Kind release :)

@ayoubfaouzi
Copy link

Nice ! I am currently looking for this. When this will be released?

@BenTheElder
Copy link
Member

BenTheElder commented Jul 25, 2020 via email

@danquah
Copy link

danquah commented Aug 5, 2020

@BenTheElder any updates on the new target date? Trying to determine whether to base some internal setup on our own build of kind or whether there will be a release in the near future we can use instead.

@BenTheElder
Copy link
Member

Sorry I missed this comment (sweeping issues now), v0.9.0 was re-scheduled to match k8s v1.19 but some last minute fixes are still pending so we didn't cut the release today (k8s did). I expect to have those merged by tomorrow.

@koxu1996
Copy link

This is side note, but might be useful for someone. When I updated node image from 18.8 to 19.1 then NFS helm chart does not work properly: memory is filled up in few seconds. I investigated the problem and it seems rpc.statd from nfs-utils package is outdated and it is leaking memory.

@BenTheElder
Copy link
Member

that's unfortunate. we're shipping the latest available in the distro at the moment (ubuntu 20.10), if it's fixed in ubuntu we'll pick it up in a future kind image.

@koxu1996
Copy link

koxu1996 commented Sep 18, 2020

@BenTheElder Now I think it might be something different. That's how I reproduce issue:

$ kind create cluster --image [NODE_IMAGE]
$ helm install stable/nfs-server-provisioner --generate-name
# wait 30s until 100% memory is filled up

Issue is present when I use most recent node images:

  • kindest/node:v1.19.0v1.19.1 (98cf52888646)
  • kindest/node:v1.18.8 (f4bcc97a0ad6)

List of node images that works without problem:

  • kindest/node:v1.18.8 (I don't know digest, but it was version older than 4 days)
  • kindest/node:v1.18.6

Note: I tried building latest node image from kind:v0.9.0 sources and it works fine 😕

@BenTheElder
Copy link
Member

1.19.0 isn't a latest image (please see the kind release notes as usual) and all of the images that are current were built with the same version, there were no changed to the base image or node image build process between those builds and tagging the release.

@koxu1996
Copy link

@BenTheElder Sorry, I pasted corrected digest 98cf52888646, but lower version - it should be latest v1.19.1:

$ docker pull kindest/node:v1.19.1
v1.19.1: Pulling from kindest/node
Digest: sha256:98cf5288864662e37115e362b23e4369c8c4a408f99cbc06e58ac30ddc721600
Status: Image is up to date for kindest/node:v1.19.1
docker.io/kindest/node:v1.19.1

So issue is present for latest node image. I am trying to track down what was changed during latest node images update.

@aojea
Copy link
Contributor

aojea commented Sep 19, 2020

I' m almost sure is because of this
#1799

but I keep thinking that is an nfs bug 😄
#760 (comment)

@koxu1996 you should limit the filedescriptor at the OS level

@koxu1996
Copy link

koxu1996 commented Sep 19, 2020

@aojea Indeed, I bisected KinD commits and this is the culprit: 2f17d25.

I use Arch BTW 😆 and kernel-limit of file descriptors is really high:

$ sudo sysctl -a | grep "fs.nr_open"
fs.nr_open = 1073841816

To workaround the NFS issue you can change kernel-level limits, eg.

sudo sysctl -w fs.nr_open=1048576

or you could use custom node image.

Edit:

I asked nfs-utils maintainer about this bug and got following reply:

This was fixed by the following libtirpc commit:

commit e7c34df8f57331063b9d795812c62cec3ddfbc17 (tag: libtirpc-1-2-7-rc3)
Author: Jaime Caamano Ruiz [email protected]
Date: Tue Jun 16 13:00:52 2020 -0400

libtirpc: replace array with list for per-fd locks

Which is in the latest RC release libtirpc-1-2-7-rc4

@BenTheElder
Copy link
Member

looks like libtirpc is not packaged yet. I'm not sure how we want to proceed here

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2020
@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2021
@BenTheElder BenTheElder removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 6, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Feb 6, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Feb 6, 2021
@BenTheElder
Copy link
Member

I think we should try to make sure libtirpc is updated and document how to setup an NFS provisioner, I'm not sure if this is in scope to have in the default setup, but it's certainly in scope to put a guide on the site.

@BenTheElder BenTheElder added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 6, 2021
@aojea aojea added kind/documentation Categorizes issue or PR as related to documentation. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Feb 6, 2021
@backtrackshubham
Copy link

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-dynamic-volume-claim
spec:
  storageClassName: "nfs"
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi

This fails with an error saying,

Finished building Kubernetes
Building node image ...
Building in kind-build-1623416466-881282865
Image build Failed! Failed to pull Images: command "docker exec --privileged kind-build-1623416466-881282865 cat /etc/containerd/config.toml" failed with error: exit status 1
ERROR: error building node image: command "docker exec --privileged kind-build-1623416466-881282865 cat /etc/containerd/config.toml" failed with error: exit status 1
Command Output: cat: /etc/containerd/config.toml: No such file or directory

@BenTheElder
Copy link
Member

well first of all you should not need to build new node imaiges, we've had multiple releases since #1487 (comment), they already contain all of the changes.

... and the reason that's failing is the base image specified in the command in that comment is very outdated versus current kind. you can skip all the image building steps, NFS should just work now, we run NFS tests in CI. There's no changes to kind needed, just the cluster objects installed at runtime for your NFS service / PVs.

@backtrackshubham
Copy link

Hey @BenTheElder thanks for the comment, but when I tried using the storage class nfs it went into the pending state describeing the pvc showed that it doesn't have the storage class "nfs", I understand that you have suggested to run a nfs server some where, but my question is in current version of kind can we do (after having my nfs server) pvc's with access mode ReadWriteMany, I went through the issues inorder to find something on this but was not able to find, any help or suggestions would be wonderful

@BenTheElder
Copy link
Member

but when I tried using the storage class nfs it went into the pending state describeing the pvc showed that it doesn't have the storage class "nfs"

yes, we don't have the storage class because that has to refer to a specific NFS setup, and that's something you can choose and install at runtime

I understand that you have suggested to run a nfs server some where,

yes, #1487 (comment) starting from "I then pulled and loaded the nfs-provisioner image to prepare for installation" is still relevant as one approach. The part before that with the custom image is not.

but my question is in current version of kind can we do (after having my nfs server) pvc's with access mode ReadWriteMany,

Yes, in any version nfs has readwritemany, it's just that NFS could not work in a nested container environment when the project was started (issues in the linux kernel actually, not in kind itself). It can now. (see also: #1806)

I don't specifically work with this, but NFS in kind is not special (versus another cluster tool) anymore.

We just need someone to document doing this.

@backtrackshubham
Copy link

Thanks I am still in a phase of understanding and learning about K8, and many thanks to the devs and contributors of kind , I will see how to do it thanks, 😃

@backtrackshubham
Copy link

Thanks I am still in a phase of understanding and learning about K8, and many thanks to the devs and contributors of kind , I will see how to do it thanks, 😃

Hi @BenTheElder so thanks for all the guidance and ideas I was successfully able to deploy a NFS server with mode RWM using the steps that you and other devs indicated on a Linux system, but now when I am trying to move the same setup on a Mac ( Docker desktop) , I could see that the pos for nfs provisioner is failling with (upon describing)

Warning  FailedScheduling  33m (x2 over 33m)   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.

And then it eventually gets into crash loop, I found this answer suggesting some change but I would like to understand what exactly have changed between the two systems, could it be because of the resources, as on Linux system the the KInd cluster was flying with 24G ram but here on Mac its 6 CPUs, 4 GB mem 2GB Swap and 200GB HHD,

Thanks

@BenTheElder
Copy link
Member

BenTheElder commented Jul 7, 2021

You should also consider running less nodes, kind tries to be as light as possible but kubeadm recommends something like 2gb for node for a more typical cluster IIRC 😅

Kubernetes does not yet use swap effectively, and actually officially requires it to be off, though we set an option to make it run anyhow.

node.kubernetes.io/not-ready is not a taint you should have to remove and kind in general should not require you to manually remove taints ever, this means the nodes are not healthy (which is a very general symptom)

EDIT: If you need more help with that please file a different issue for your case since it's not related to RWM PVs, so folks monitoring this can avoid being notified spuriously, and so we can track your issue directly. We can cross link them for reference. The new issue template also requests useful information for debugging.

@meln5674
Copy link

On the off chance anyone is still watching this, local-path-provisioner has supported RWX volumes for a few releases now, and with v0.0.27 now supports multiple storage classes with a single deployment.

Unless I've overlooked something, I think it should be reasonable to automatically create a RWX storage class for single-node clusters. To support multi-node clusters, that could be accomplished by mounting the same host volume to the same location in each node container, and that could be provided by a new field in the configuration. This would even support future multi-host setups if the user is made responsible for mounting network storage at that location on each host out-of-band.

I would be happy to start work on a PR for this if the idea isn't rejected out of hand.

@mosesdd
Copy link

mosesdd commented Jul 5, 2024

@meln5674 I created a workaround in my environment for this:

kubectl -n local-path-storage patch configmap local-path-config -p '{"data": {"config.json": "{\n\"sharedFileSystemPath\": \"/var/local-path-provisioner\"\n}"}}'

As long as you use only one node configuration this works totally fine.

See https://github.com/rancher/local-path-provisioner?tab=readme-ov-file#definition for details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests