Nodeserver shutdown makes mount unavailable until restart business pod #14917

Haoning-Sun · 2022-01-26T09:59:41Z

Is your feature request related to a problem? Please describe.
When I use Alluxio-CSI, I found that If the nodeserver shutdown, the fuse service inside shutdown with it, and
nodeserver automatic restart will not automatically mount. So I need to restart the business pod to remount.

Describe the solution you'd like
After the nodeserver is shut down, the fuse service is transferred to another nodeserver or the nodeserver can be automatically mounted after restarting, so there is no need to restart the business pod.

Describe alternatives you've considered
Can each fuse process be put into a mount pod separately？

Urgency
Business pods may be stateful and restarting pods may be costly.

Additional context

HelloHorizon · 2022-01-27T20:15:10Z

@LuQQiu can you take a look?

LuQQiu · 2022-01-27T21:44:50Z

@Binyang2014 @maobaolong any idea for this issue? how hard/how long time it may take to support that feature?

Binyang2014 · 2022-01-28T00:51:42Z

Hi @Haoning-Sun I have some questions for your case. Is your business pod in the same node as fuse service?
If so, when node shutdown, your service will shutdown either. And when node restart, your business pod will in pending status util fuse is mounted.
If not, how do you deploy the fuse service and business pod?

Haoning-Sun · 2022-01-28T07:49:01Z

@Binyang2014 The fuse service is created in the nodeserver by triggering the mount when the business pod starts, and does not require me to manually deploy.

Binyang2014 · 2022-01-28T08:52:33Z

I'm confusing about this. I believe fuse daemon and business pod are located in the same node. Then if the node down, fuse daemon and your business pod will stopped. Why the fuse daemon down but the your business pod is still alive?

Haoning-Sun · 2022-01-29T01:30:26Z

I mean csi-nodeserver, not the node for k8s.

Binyang2014 · 2022-02-07T02:42:09Z

The csi-nodeserver is a long-running service and should be stable. Do you known the reason why csi-nodeserver is down? You can fire a bug if the csi-nodeserver down unexpectedly.

If you don't want to restart the business pod, you need to add some logic by yourself to handle storage failure. From the system view, we don't know if this error is critical or not, some app can not tolerate such error, but some app can recover after remount the volume.
So app developer should handle such error or let the app crash for simply.

Haoning-Sun · 2022-02-07T06:10:51Z

We are just preparing to use CSI. We found this problem when we took the initiative to shut down nodeserver during the test.

kevincai · 2022-02-08T06:19:53Z

In current alluxio csi design, the issue is not avoidable, because the alluxio fuse process is running inside daemonset nodeserver POD and has the same lifecycle with the daemonset POD.

This will not only introduce the problem that business POD hang if nodeserver POD is restarted, but also makes daemonset upgrade difficulty.

There is an alternative approach in k8s community for csi storage, when nodeserver starts the alluxio fuse daemon process, instead of starting a process in nodeserver's POD, it starts a new container in the same POD and running the alluxio fuse daemon process inside the new container. This will decouple nodeserver and fuse daemon process lifecycle management. Of course it will introduce complexity managing those alluxio fuse containers, and their lifecycles.

Haoning-Sun · 2022-02-08T06:57:17Z

We are now using k8s sidecar mode. As you said, it starts a new container in the same pod and running the alluxio fuse daemon process inside the new container.

kevincai · 2022-02-08T07:22:23Z

It is good enough except that your sidecar container must be a privileged container who can directly access /dev/fuse device which is usually a security concern.

Binyang2014 · 2022-02-08T08:13:47Z

@kevincai I'm interested with the approach you mentioned:

when nodeserver starts the alluxio fuse daemon process, instead of starting a process in nodeserver's POD, it starts a new container in the same POD and running the alluxio fuse daemon process inside the new container.

You mean the sidecar mode or we can assign a container to job POD dynamically?
For sidecar mode, we need to claim the sidecar container when creating the POD, it can not leverage CSI interface. I'm not sure if we have a method which can combine the CSI interface and the sidecar mode.

kevincai · 2022-02-09T06:01:28Z

@Binyang2014 It is not a sidecar mode. In alluxio csi nodeserver, instead of starting a subprocess to start the alluxio fuse process, it calls k8s api to start a new container, the container is responsible to run the alluxio fuse process and mount to local filesystem via fuse.

@Haoning-Sun is using the sidecar mode, which is out of k8s CSI scope.

Binyang2014 · 2022-02-09T06:33:46Z

@kevincai Is there any open source project using this approach? Seems we can refer this way then figure out what is the correct way to integrate alluxio-fuse into Kubernetes

kevincai · 2022-02-09T06:37:48Z

@Binyang2014 As far as I know, juicedata project has already implemented this approach, their major work is done in this PR: juicedata/juicefs-csi-driver#100

ssz1997 · 2022-02-10T03:33:19Z

@madanadit @jiacheliu3 FYI.
Alluxio CSI is using a nodeserver to launch the fuse process. Haoning-Sun is worrying about if the nodeserver is down, fuse process is then down, and the application pods can't access Alluxio data anymore and then crash.
One solution is what kevincai was describing, that instead of launching Fuse process inside nodeserver container, nodeserver creates another container in the same pod that launchces Fuse process, through kubernetes api. Therefore, if the nodeserver crashes, it won't affect the fuse process which is in another container. This way is being used by juiceFS.

Binyang2014 · 2022-02-23T03:16:59Z

Reference from k8s community kubernetes/kubernetes#70013

Binyang2014 · 2022-02-23T07:30:58Z

Add seems there is an example fix https://github.com/kubernetes-sigs/blob-csi-driver/pull/117/files. This requires k8s version 1.19+. After adopting this fix, restart the csi-nodserver still broken all fuse mount point.
But after csi-nodeserver restarted, it will remount the broken mount point automatically, which can partially mitigate this issue.
@LuQQiu @ssz1997 PTAL for this approach

### What changes are proposed in this pull request? Unmount corrupted folder is case csi-nodeserver restarted Related to issue #14917 This change just mitigate the issue. If node-server restated, it will try to remount the folder. But user job still can not access data during restarting period. Just notice these is another solution for this issue #15165. Believe this solution is more robustness and better for critical job. But this solution need more efforts to be mutual for production workload. Maybe at this stage, we can keep the both. **Notice: this change requires k8s version above 1.18** This change not tested yet, we need to kill the nodeserver. After it restarted, check if the mount path is remounted pr-link: #15191 change-id: cid-fa4311a52288a0349084c08f6d202cf3e4069da4

### What changes are proposed in this pull request? Making CSI launch a separate pod running AlluxioFuse process, instead of launcing AlluxioFuse process in the CSI nodeserver container ### Why are the changes needed? If nodeserver container or node-plugin pod for any reason is down, we lose Alluxio Fuse process and it's very cumbersome to bring it back. With a separate Fuse pod, CSI pod won't affect Fuse process. Solves #14917 ### Does this PR introduce any user facing changes? 1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio properties in helm chart should be organized in one place, not in `properties` and in `csi`. 2. Add property `mountInPod` in csi section. If set to `true`, Fuse process is launched in the separate pod. pr-link: #15221 change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e

…e pod Making CSI launch a separate pod running AlluxioFuse process, instead of launcing AlluxioFuse process in the CSI nodeserver container If nodeserver container or node-plugin pod for any reason is down, we lose Alluxio Fuse process and it's very cumbersome to bring it back. With a separate Fuse pod, CSI pod won't affect Fuse process. Solves Alluxio#14917 1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio properties in helm chart should be organized in one place, not in `properties` and in `csi`. 2. Add property `mountInPod` in csi section. If set to `true`, Fuse process is launched in the separate pod. pr-link: Alluxio#15221 change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e

Haoning-Sun added the type-feature This issue is a feature request label Jan 26, 2022

HelloHorizon added area-fuse Alluxio fuse integration area-k8s Alluxio Kubernetes Integration priority-medium labels Jan 31, 2022

HelloHorizon added priority-high and removed priority-medium labels Mar 3, 2022

ssz1997 mentioned this issue Mar 17, 2022

[WIP] Make CSI launch Fuse pod #15165

Closed

Binyang2014 mentioned this issue Mar 23, 2022

Unmount corrupted folder is case nodeserver restarted #15191

Merged

ssz1997 mentioned this issue Mar 29, 2022

Support CSI start AlluxioFuse process in separate pod #15221

Merged

HelloHorizon closed this as completed May 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nodeserver shutdown makes mount unavailable until restart business pod #14917

Nodeserver shutdown makes mount unavailable until restart business pod #14917

Haoning-Sun commented Jan 26, 2022

HelloHorizon commented Jan 27, 2022

LuQQiu commented Jan 27, 2022

Binyang2014 commented Jan 28, 2022

Haoning-Sun commented Jan 28, 2022

Binyang2014 commented Jan 28, 2022

Haoning-Sun commented Jan 29, 2022

Binyang2014 commented Feb 7, 2022

Haoning-Sun commented Feb 7, 2022

kevincai commented Feb 8, 2022 •

edited

Loading

Haoning-Sun commented Feb 8, 2022

kevincai commented Feb 8, 2022

Binyang2014 commented Feb 8, 2022 •

edited

Loading

kevincai commented Feb 9, 2022

Binyang2014 commented Feb 9, 2022

kevincai commented Feb 9, 2022 •

edited

Loading

ssz1997 commented Feb 10, 2022 •

edited

Loading

Binyang2014 commented Feb 23, 2022

Binyang2014 commented Feb 23, 2022

Nodeserver shutdown makes mount unavailable until restart business pod #14917

Nodeserver shutdown makes mount unavailable until restart business pod #14917

Comments

Haoning-Sun commented Jan 26, 2022

HelloHorizon commented Jan 27, 2022

LuQQiu commented Jan 27, 2022

Binyang2014 commented Jan 28, 2022

Haoning-Sun commented Jan 28, 2022

Binyang2014 commented Jan 28, 2022

Haoning-Sun commented Jan 29, 2022

Binyang2014 commented Feb 7, 2022

Haoning-Sun commented Feb 7, 2022

kevincai commented Feb 8, 2022 • edited Loading

Haoning-Sun commented Feb 8, 2022

kevincai commented Feb 8, 2022

Binyang2014 commented Feb 8, 2022 • edited Loading

kevincai commented Feb 9, 2022

Binyang2014 commented Feb 9, 2022

kevincai commented Feb 9, 2022 • edited Loading

ssz1997 commented Feb 10, 2022 • edited Loading

Binyang2014 commented Feb 23, 2022

Binyang2014 commented Feb 23, 2022

kevincai commented Feb 8, 2022 •

edited

Loading

Binyang2014 commented Feb 8, 2022 •

edited

Loading

kevincai commented Feb 9, 2022 •

edited

Loading

ssz1997 commented Feb 10, 2022 •

edited

Loading