-
Notifications
You must be signed in to change notification settings - Fork 20
packet: Mounts inside /mnt/ can't be used by pods #73
Comments
The rkt pod is exposed with /mnt/ volume[1] and in this PR: https://github.com/kinvolk/lokomotive-kubernetes/pull/48/files#diff-b6c7caf796cd86bdcdd936319b1793a1R152 the location where the RAID 0 array is mounted was changed from `/mnt` to `/mnt/<some-dir>`. This breaks using local volumes in pods, as it seems the mount is not visible for the rkt container or the kubelet running in rkt. The investigation of the root cause of this issue, to understand why mounts inside mnt (like in `/mnt/node-local-storage`) can't be used by pods currently and what needs to be changed for that, is left as a for future patches and issue #73 was created to investigate this. This patch just makes the mount location back to `/mnt` for all the cases (`setup_raid`, `setup_raid_*`) so it works on all cases. However, this imposes one limitation: `setup_raid_hdd` and `setup_raid_ssd` are mutually exclusive now. This limitation is not limiting something that was working in master, in fact `setup_raid_hdd` and `setup_raid_ssd` (when `setup_raid_ssd_fs` is set) were completely broken since they were merged and pods never wrote to those volumes, for the very same reason issue #73 states: mounts insde `/mnt` are not visible to pods. Therefore, this patches fixes node local storage while making those options exclusive (only one can be set), and this is not a big problem as those options never really worked. Some alternative fixes were considered, like changing the path that is exposed to rkt container to be /mnt/node-local-storage or /mnt/node-hdd-local-storage, according to what option was used (setup_raid, setup_raid_hdd, and exposing both if both are set), but that was messy without any good reason (it is better to tackle #73 before doing something that ofuscated). So, we decided for this, more simpler, approach. This patch is just a minimal fix, "a revert" to mount in `/mnt/` again, to make this work again on master. As a side effect of this PR, too, another issue to reconsider if we need so many `setup_raid_*` flags was created (#74) and to even consider a totally different approach than the current bash script before it gets out of control: #75 [1]: https://github.com/kinvolk/lokomotive-kubernetes/blob/d59d071a451f45ac61c2524b94a146a6cde60401/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl#L65-L66
The second approach, i.e. the one that if possible I'd like to avoid, that exposes and mounts all the paths to the rkt container is implemented here: https://github.com/kinvolk/lokomotive-kubernetes/tree/rata/node-local-storage-inside-mnt. In particular, here: ba80265 |
The rkt pod is exposed with /mnt/ volume[1] and in this PR: https://github.com/kinvolk/lokomotive-kubernetes/pull/48/files#diff-b6c7caf796cd86bdcdd936319b1793a1R152 the location where the RAID 0 array is mounted was changed from `/mnt` to `/mnt/<some-dir>`. This breaks using local volumes in pods, as it seems the mount is not visible for the rkt container or the kubelet running in rkt. The investigation of the root cause of this issue, to understand why mounts inside mnt (like in `/mnt/node-local-storage`) can't be used by pods currently and what needs to be changed for that, is left as a for future patches and issue #73 was created to investigate this. This patch just makes the mount location back to `/mnt` for all the cases (`setup_raid`, `setup_raid_*`) so it works on all cases. However, this imposes one limitation: `setup_raid_hdd` and `setup_raid_ssd` are mutually exclusive now. This limitation is not limiting something that was working in master, in fact `setup_raid_hdd` and `setup_raid_ssd` (when `setup_raid_ssd_fs` is set) were completely broken since they were merged and pods never wrote to those volumes, for the very same reason issue #73 states: mounts insde `/mnt` are not visible to pods. Therefore, this patches fixes node local storage while making those options exclusive (only one can be set), and this is not a big problem as those options never really worked. Some alternative fixes were considered, like changing the path that is exposed to rkt container to be /mnt/node-local-storage or /mnt/node-hdd-local-storage, according to what option was used (setup_raid, setup_raid_hdd, and exposing both if both are set), but that was messy without any good reason (it is better to tackle #73 before doing something that ofuscated). So, we decided for this, more simpler, approach. This patch is just a minimal fix, "a revert" to mount in `/mnt/` again, to make this work again on master. As a side effect of this PR, too, another issue to reconsider if we need so many `setup_raid_*` flags was created (#74) and to even consider a totally different approach than the current bash script before it gets out of control: #75 [1]: https://github.com/kinvolk/lokomotive-kubernetes/blob/d59d071a451f45ac61c2524b94a146a6cde60401/packet/flatcar-linux/kubernetes/workers/cl/worker.yaml.tmpl#L65-L66
From https://coreos.com/rkt/docs/latest/running-fly-stage1.html
Therefore, the behaviour is indeed deliberate and we should double check docs or consider the patch I posted in the previous comment (#73 (comment)) |
Node local storage was working when it was mounted to
/mnt
. However, we later changed that to support mounting hdd and ssd disks on/mnt/node-local-sdd-storage
and/mnt/node-local-hdd-storage
.However, when doing so (#72) the mounts were not exposed to the pod and the pod, therefore, was just using the installation OS. This PR that fixed the issue has more details: #72
The quick fix was to mount on
/mnt
again, but we should investigate to make sure we understand why mounts under that (eg. in/mnt/node-local-storage
) are not exposed to the pods and can't use them.The goal is to either understand why mounts don't propagate and is not possible to change, or do the needed changes for the mounts to be propagated.
However, if we find out mount's can't be propagated, we can consider a potentially very nasty approach (only if we really have to. We should see #74 before doing this too) that is the following:
We can consider expose in the rkt container, like now is being down with
/mnt/
,/mnt/node-local-storage
,/mnt/node-ssd-local-storage/
,/mnt/node-hdd-local-storage
, etc.However, we can only use as volumes for rkt directories that do exist[1], so we should either select in the template based on the different
setup_*
flags (this is very messy), or we should always create all directories and rkt just exposes all (this is more elegant, as that can all be part of the systemd unit as with the rest of theExecStartPre=/bin/mkdir ..
we are already doing).[1]: I tried replacing /mnt by /mnt/node-local-storage in the kubelet systemd unit, while that directory didn't exist, and the systemd unit failed to start with:
Sep 27 11:03:25 rata-test-backend-worker-0 kubelet-wrapper[5459]: run: stat of host path /mnt/node-local-storage: stat /mnt/node-local-storage: no such file or directory
The text was updated successfully, but these errors were encountered: