-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodeserver shutdown makes mount unavailable until restart business pod #14917
Comments
@LuQQiu can you take a look? |
@Binyang2014 @maobaolong any idea for this issue? how hard/how long time it may take to support that feature? |
Hi @Haoning-Sun I have some questions for your case. Is your business pod in the same node as fuse service? |
@Binyang2014 The fuse service is created in the nodeserver by triggering the mount when the business pod starts, and does not require me to manually deploy. |
I'm confusing about this. I believe fuse daemon and business pod are located in the same node. Then if the node down, fuse daemon and your business pod will stopped. Why the fuse daemon down but the your business pod is still alive? |
I mean csi-nodeserver, not the node for k8s. |
The csi-nodeserver is a long-running service and should be stable. Do you known the reason why csi-nodeserver is down? You can fire a bug if the csi-nodeserver down unexpectedly. If you don't want to restart the business pod, you need to add some logic by yourself to handle storage failure. From the system view, we don't know if this error is critical or not, some app can not tolerate such error, but some app can recover after remount the volume. |
We are just preparing to use CSI. We found this problem when we took the initiative to shut down nodeserver during the test. |
In current alluxio csi design, the issue is not avoidable, because the alluxio fuse process is running inside daemonset nodeserver POD and has the same lifecycle with the daemonset POD. This will not only introduce the problem that business POD hang if nodeserver POD is restarted, but also makes daemonset upgrade difficulty. There is an alternative approach in k8s community for csi storage, when nodeserver starts the alluxio fuse daemon process, instead of starting a process in nodeserver's POD, it starts a new container in the same POD and running the alluxio fuse daemon process inside the new container. This will decouple nodeserver and fuse daemon process lifecycle management. Of course it will introduce complexity managing those alluxio fuse containers, and their lifecycles. |
We are now using k8s sidecar mode. As you said, it starts a new container in the same pod and running the alluxio fuse daemon process inside the new container. |
It is good enough except that your sidecar container must be a privileged container who can directly access /dev/fuse device which is usually a security concern. |
@kevincai I'm interested with the approach you mentioned:
You mean the sidecar mode or we can assign a container to job POD dynamically? |
@Binyang2014 It is not a sidecar mode. In alluxio csi nodeserver, instead of starting a subprocess to start the alluxio fuse process, it calls k8s api to start a new container, the container is responsible to run the alluxio fuse process and mount to local filesystem via fuse. @Haoning-Sun is using the sidecar mode, which is out of k8s CSI scope. |
@kevincai Is there any open source project using this approach? Seems we can refer this way then figure out what is the correct way to integrate alluxio-fuse into Kubernetes |
@Binyang2014 As far as I know, juicedata project has already implemented this approach, their major work is done in this PR: juicedata/juicefs-csi-driver#100 |
@madanadit @jiacheliu3 FYI. |
Reference from k8s community kubernetes/kubernetes#70013 |
Add seems there is an example fix https://github.com/kubernetes-sigs/blob-csi-driver/pull/117/files. This requires k8s version 1.19+. After adopting this fix, restart the csi-nodserver still broken all fuse mount point. |
### What changes are proposed in this pull request? Unmount corrupted folder is case csi-nodeserver restarted Related to issue #14917 This change just mitigate the issue. If node-server restated, it will try to remount the folder. But user job still can not access data during restarting period. Just notice these is another solution for this issue #15165. Believe this solution is more robustness and better for critical job. But this solution need more efforts to be mutual for production workload. Maybe at this stage, we can keep the both. **Notice: this change requires k8s version above 1.18** This change not tested yet, we need to kill the nodeserver. After it restarted, check if the mount path is remounted pr-link: #15191 change-id: cid-fa4311a52288a0349084c08f6d202cf3e4069da4
### What changes are proposed in this pull request? Making CSI launch a separate pod running AlluxioFuse process, instead of launcing AlluxioFuse process in the CSI nodeserver container ### Why are the changes needed? If nodeserver container or node-plugin pod for any reason is down, we lose Alluxio Fuse process and it's very cumbersome to bring it back. With a separate Fuse pod, CSI pod won't affect Fuse process. Solves #14917 ### Does this PR introduce any user facing changes? 1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio properties in helm chart should be organized in one place, not in `properties` and in `csi`. 2. Add property `mountInPod` in csi section. If set to `true`, Fuse process is launched in the separate pod. pr-link: #15221 change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e
…e pod Making CSI launch a separate pod running AlluxioFuse process, instead of launcing AlluxioFuse process in the CSI nodeserver container If nodeserver container or node-plugin pod for any reason is down, we lose Alluxio Fuse process and it's very cumbersome to bring it back. With a separate Fuse pod, CSI pod won't affect Fuse process. Solves Alluxio#14917 1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio properties in helm chart should be organized in one place, not in `properties` and in `csi`. 2. Add property `mountInPod` in csi section. If set to `true`, Fuse process is launched in the separate pod. pr-link: Alluxio#15221 change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e
Is your feature request related to a problem? Please describe.
When I use Alluxio-CSI, I found that If the nodeserver shutdown, the fuse service inside shutdown with it, and
nodeserver automatic restart will not automatically mount. So I need to restart the business pod to remount.
Describe the solution you'd like
After the nodeserver is shut down, the fuse service is transferred to another nodeserver or the nodeserver can be automatically mounted after restarting, so there is no need to restart the business pod.
Describe alternatives you've considered
Can each fuse process be put into a mount pod separately?
Urgency
Business pods may be stateful and restarting pods may be costly.
Additional context
The text was updated successfully, but these errors were encountered: