Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodeserver shutdown makes mount unavailable until restart business pod #14917

Closed
Haoning-Sun opened this issue Jan 26, 2022 · 18 comments
Closed
Labels
area-fuse Alluxio fuse integration area-k8s Alluxio Kubernetes Integration priority-high type-feature This issue is a feature request

Comments

@Haoning-Sun
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When I use Alluxio-CSI, I found that If the nodeserver shutdown, the fuse service inside shutdown with it, and
nodeserver automatic restart will not automatically mount. So I need to restart the business pod to remount.

Describe the solution you'd like
After the nodeserver is shut down, the fuse service is transferred to another nodeserver or the nodeserver can be automatically mounted after restarting, so there is no need to restart the business pod.

Describe alternatives you've considered
Can each fuse process be put into a mount pod separately?

Urgency
Business pods may be stateful and restarting pods may be costly.

Additional context
image

@Haoning-Sun Haoning-Sun added the type-feature This issue is a feature request label Jan 26, 2022
@HelloHorizon
Copy link
Contributor

@LuQQiu can you take a look?

@LuQQiu
Copy link
Contributor

LuQQiu commented Jan 27, 2022

@Binyang2014 @maobaolong any idea for this issue? how hard/how long time it may take to support that feature?

@Binyang2014
Copy link
Contributor

Hi @Haoning-Sun I have some questions for your case. Is your business pod in the same node as fuse service?
If so, when node shutdown, your service will shutdown either. And when node restart, your business pod will in pending status util fuse is mounted.
If not, how do you deploy the fuse service and business pod?

@Haoning-Sun
Copy link
Contributor Author

@Binyang2014 The fuse service is created in the nodeserver by triggering the mount when the business pod starts, and does not require me to manually deploy.

@Binyang2014
Copy link
Contributor

I'm confusing about this. I believe fuse daemon and business pod are located in the same node. Then if the node down, fuse daemon and your business pod will stopped. Why the fuse daemon down but the your business pod is still alive?

@Haoning-Sun
Copy link
Contributor Author

I mean csi-nodeserver, not the node for k8s.

@HelloHorizon HelloHorizon added area-fuse Alluxio fuse integration area-k8s Alluxio Kubernetes Integration priority-medium labels Jan 31, 2022
@Binyang2014
Copy link
Contributor

The csi-nodeserver is a long-running service and should be stable. Do you known the reason why csi-nodeserver is down? You can fire a bug if the csi-nodeserver down unexpectedly.

If you don't want to restart the business pod, you need to add some logic by yourself to handle storage failure. From the system view, we don't know if this error is critical or not, some app can not tolerate such error, but some app can recover after remount the volume.
So app developer should handle such error or let the app crash for simply.

@Haoning-Sun
Copy link
Contributor Author

We are just preparing to use CSI. We found this problem when we took the initiative to shut down nodeserver during the test.

@kevincai
Copy link
Contributor

kevincai commented Feb 8, 2022

In current alluxio csi design, the issue is not avoidable, because the alluxio fuse process is running inside daemonset nodeserver POD and has the same lifecycle with the daemonset POD.

This will not only introduce the problem that business POD hang if nodeserver POD is restarted, but also makes daemonset upgrade difficulty.

There is an alternative approach in k8s community for csi storage, when nodeserver starts the alluxio fuse daemon process, instead of starting a process in nodeserver's POD, it starts a new container in the same POD and running the alluxio fuse daemon process inside the new container. This will decouple nodeserver and fuse daemon process lifecycle management. Of course it will introduce complexity managing those alluxio fuse containers, and their lifecycles.

@Haoning-Sun
Copy link
Contributor Author

We are now using k8s sidecar mode. As you said, it starts a new container in the same pod and running the alluxio fuse daemon process inside the new container.

@kevincai
Copy link
Contributor

kevincai commented Feb 8, 2022

It is good enough except that your sidecar container must be a privileged container who can directly access /dev/fuse device which is usually a security concern.

@Binyang2014
Copy link
Contributor

Binyang2014 commented Feb 8, 2022

@kevincai I'm interested with the approach you mentioned:

when nodeserver starts the alluxio fuse daemon process, instead of starting a process in nodeserver's POD, it starts a new container in the same POD and running the alluxio fuse daemon process inside the new container.

You mean the sidecar mode or we can assign a container to job POD dynamically?
For sidecar mode, we need to claim the sidecar container when creating the POD, it can not leverage CSI interface. I'm not sure if we have a method which can combine the CSI interface and the sidecar mode.

@kevincai
Copy link
Contributor

kevincai commented Feb 9, 2022

@Binyang2014 It is not a sidecar mode. In alluxio csi nodeserver, instead of starting a subprocess to start the alluxio fuse process, it calls k8s api to start a new container, the container is responsible to run the alluxio fuse process and mount to local filesystem via fuse.

@Haoning-Sun is using the sidecar mode, which is out of k8s CSI scope.

@Binyang2014
Copy link
Contributor

@kevincai Is there any open source project using this approach? Seems we can refer this way then figure out what is the correct way to integrate alluxio-fuse into Kubernetes

@kevincai
Copy link
Contributor

kevincai commented Feb 9, 2022

@Binyang2014 As far as I know, juicedata project has already implemented this approach, their major work is done in this PR: juicedata/juicefs-csi-driver#100

@ssz1997
Copy link
Contributor

ssz1997 commented Feb 10, 2022

@madanadit @jiacheliu3 FYI.
Alluxio CSI is using a nodeserver to launch the fuse process. Haoning-Sun is worrying about if the nodeserver is down, fuse process is then down, and the application pods can't access Alluxio data anymore and then crash.
One solution is what kevincai was describing, that instead of launching Fuse process inside nodeserver container, nodeserver creates another container in the same pod that launchces Fuse process, through kubernetes api. Therefore, if the nodeserver crashes, it won't affect the fuse process which is in another container. This way is being used by juiceFS.

@Binyang2014
Copy link
Contributor

Reference from k8s community kubernetes/kubernetes#70013

@Binyang2014
Copy link
Contributor

Add seems there is an example fix https://github.com/kubernetes-sigs/blob-csi-driver/pull/117/files. This requires k8s version 1.19+. After adopting this fix, restart the csi-nodserver still broken all fuse mount point.
But after csi-nodeserver restarted, it will remount the broken mount point automatically, which can partially mitigate this issue.
@LuQQiu @ssz1997 PTAL for this approach

alluxio-bot pushed a commit that referenced this issue Mar 29, 2022
### What changes are proposed in this pull request?
Unmount corrupted folder is case csi-nodeserver restarted

Related to issue #14917

This change just mitigate the issue. If node-server restated, it will
try to remount the folder. But user job still can not access data during
restarting period.

Just notice these is another solution for this issue
#15165. Believe this solution is
more robustness and better for critical job. But this solution need more
efforts to be mutual for production workload. Maybe at this stage, we
can keep the both.

**Notice: this change requires k8s version above 1.18**
This change not tested yet, we need to kill the nodeserver. After it
restarted, check if the mount path is remounted

pr-link: #15191
change-id: cid-fa4311a52288a0349084c08f6d202cf3e4069da4
alluxio-bot pushed a commit that referenced this issue Apr 12, 2022
### What changes are proposed in this pull request?
Making CSI launch a separate pod running AlluxioFuse process, instead of
launcing AlluxioFuse process in the CSI nodeserver container

### Why are the changes needed?
If nodeserver container or node-plugin pod for any reason is down, we
lose Alluxio Fuse process and it's very cumbersome to bring it back.
With a separate Fuse pod, CSI pod won't affect Fuse process.

Solves #14917

### Does this PR introduce any user facing changes?
1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio
properties in helm chart should be organized in one place, not in
`properties` and in `csi`.
2. Add property `mountInPod` in csi section. If set to `true`, Fuse
process is launched in the separate pod.

pr-link: #15221
change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e
flaming-archer pushed a commit to flaming-archer/alluxio that referenced this issue Sep 1, 2022
…e pod

Making CSI launch a separate pod running AlluxioFuse process, instead of
launcing AlluxioFuse process in the CSI nodeserver container

If nodeserver container or node-plugin pod for any reason is down, we
lose Alluxio Fuse process and it's very cumbersome to bring it back.
With a separate Fuse pod, CSI pod won't affect Fuse process.

Solves Alluxio#14917

1. Removed `javaOptions` from csi section in `values.yaml`. Alluxio
properties in helm chart should be organized in one place, not in
`properties` and in `csi`.
2. Add property `mountInPod` in csi section. If set to `true`, Fuse
process is launched in the separate pod.

pr-link: Alluxio#15221
change-id: cid-b6897172e11f80618decbfdc0758423e71aa387e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-fuse Alluxio fuse integration area-k8s Alluxio Kubernetes Integration priority-high type-feature This issue is a feature request
Projects
None yet
Development

No branches or pull requests

6 participants