Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zdtm/static/mnt_root_ext FAIL at CRIU restore #1941

Closed
avagin opened this issue Jul 23, 2022 · 7 comments · Fixed by #1945
Closed

zdtm/static/mnt_root_ext FAIL at CRIU restore #1941

avagin opened this issue Jul 23, 2022 · 7 comments · Fixed by #1945
Assignees

Comments

@avagin
Copy link
Member

avagin commented Jul 23, 2022

2022-07-21T15:16:35.5432953Z ===================== Run zdtm/static/mnt_root_ext in uns ======================
2022-07-21T15:16:35.5433190Z Start test
2022-07-21T15:16:35.5433364Z Test is SUID
2022-07-21T15:16:35.5433746Z ./mnt_root_ext --pidfile=mnt_root_ext.pid --outfile=mnt_root_ext.out --dirname=mnt_root_ext.test
2022-07-21T15:16:35.5434145Z Running zdtm/static/mnt_root_ext.hook(--post-start)
2022-07-21T15:16:35.5434489Z Running zdtm/static/mnt_root_ext.hook(--pre-dump)
2022-07-21T15:16:35.5434719Z Run criu dump
2022-07-21T15:16:35.5435013Z Running zdtm/static/mnt_root_ext.hook(--pre-restore)
2022-07-21T15:16:35.5435243Z Run criu restore
2022-07-21T15:16:35.5435486Z =[log]=> dump/zdtm/static/mnt_root_ext/154/1/restore.log
2022-07-21T15:16:35.5435848Z ------------------------ grep Error ------------------------
2022-07-21T15:16:35.5436320Z b'(00.051031)      1: mnt-v2: Create plain mountpoint /tmp/.criu.mntns.HONjGc/mnt-0000000994 for 994'
2022-07-21T15:16:35.5436744Z b'(00.051039)      1: mnt-v2: \tMounting unsupported @994 (0)'
2022-07-21T15:16:35.5437076Z b'(00.051046)      1: uns: calling mount_root (8, 0)'
2022-07-21T15:16:35.5437412Z b'(00.051080) uns: daemon calls 0x5616bb179060 (180, 10, 0)'
2022-07-21T15:16:35.5437813Z b'(00.051148) Error (criu/mount.c:2241): mnt: Unable to mount /proc/self/fd/10: Invalid argument'
2022-07-21T15:16:35.5438311Z b'(00.051176)      1: Error (criu/mount-v2.c:457): mnt-v2: Unable to mount /tmp/.criu.mntns.HONjGc/mnt-0000000994'
2022-07-21T15:16:35.5438694Z b'(00.051697) uns: calling exit_usernsd (-1, 1)'
2022-07-21T15:16:35.5439033Z b'(00.051730) uns: daemon calls 0x5616bb182a40 (178, -1, 1)'
2022-07-21T15:16:35.5439338Z b'(00.051738) uns: `- daemon exits w/ 0'
2022-07-21T15:16:35.5439606Z b'(00.052284) uns: daemon stopped'
2022-07-21T15:16:35.5439937Z b'(00.052292) Error (criu/cr-restore.c:2545): Restoring FAILED.'
2022-07-21T15:16:35.5440344Z ------------------------ ERROR OVER ------------------------
2022-07-21T15:16:35.5440658Z ############## Test zdtm/static/mnt_root_ext FAIL at CRIU restore ##############

https://github.com/checkpoint-restore/criu/runs/7452327660?check_suite_focus=true

@avagin
Copy link
Member Author

avagin commented Jul 23, 2022

2022-07-21T15:14:33.8797192Z Linux 363034b0c4e9 5.15.0-1014-azure #17~20.04.1-Ubuntu SMP Thu Jun 23 20:01:51 UTC 2022 x86_64 GNU/Linux

@Snorch
Copy link
Member

Snorch commented Jul 25, 2022

Error is from do_mount_root_v2, where we do mount+MS_BIND. One idea I have is that as we don't have (intentionally as we don't want unexpected mounts to appear inside container) MS_REC, bindmount may fail due to children mounts over container root source on host.

I will try to reproduce it to check if it's the case.

@minhbq-99
Copy link
Member

I can reproduce it on my machine using alpine docker on Ubuntu 20.04. Your idea seems correct, the error is because we don't have MS_REC and there is locked children mount
https://github.com/torvalds/linux/blob/master/fs/namespace.c#L2402

When I only run uns test, the error disappears. I don't have much understanding about mounting, I think maybe we use MNT_DETACH to lazy umount (in test/umount2.c) and when bind mount happens, the mountpoint is not lazily umounted yet.

@Snorch
Copy link
Member

Snorch commented Jul 26, 2022

Strange, but the test mnt_root_ext passes for me on fresh Ubuntu 20.04 install (vm) running like make -C scripts/ci alpine GCC=1.

Snorch added a commit to Snorch/criu that referenced this issue Jul 26, 2022
If root mount in criu mntns is slave, it would be slave of host mount
where criu is stored, so if someone mounts something in subdir of
{criu-dir}/test/ on host while tests are running this mount can
influence the test as it appears on top of root mount in criu mntns.

1) With mount-compat this mount can get into restored test mntns, which
means wrong restore, as this mount was not there on dump.
2) With mount-v2 this mount would just fail container restore.

Fixes: checkpoint-restore#1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
Snorch added a commit to Snorch/criu that referenced this issue Jul 26, 2022
If root mount in criu mntns is slave, it would be slave of host mount
where criu is stored, so if someone mounts something in subdir of
{criu-dir}/test/ on host while tests are running this mount can
influence the test as it appears on top of root mount in criu mntns.

1) With mount-compat this mount can get into restored test mntns, which
means wrong restore, as this mount was not there on dump.
2) With mount-v2 this mount would just fail container restore, as root
container mount is mounted non-recursively to protect from unexpected
mounts appear after restore.

Fixes: checkpoint-restore#1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
@Snorch
Copy link
Member

Snorch commented Jul 26, 2022

That's how I manage to "reproduce" it on Fedora:

# term 1
./test/zdtm.py run -t zdtm/static/mnt_root_ext -f uns --ignore-taint --sbs
# term 2
mount --bind /home/snorch/devel/ms/criu/test/debug_child_mount /home/snorch/devel/ms/criu/test/debug_child_mount
# term 1
"Press Enter to continue"
...
b'(00.070940) Error (criu/mount.c:2241): mnt: Unable to mount /proc/self/fd/10: Invalid argument'
b'(00.071044)      1: Error (criu/mount-v2.c:457): mnt-v2: Unable to mount /tmp/.criu.mntns.27ZfYn/mnt-0000001810'

It requires someone to create "self" bind-mount in subdir of /home/snorch/devel/ms/criu/test/ at the time of test is running.

I'd prefer to fix it by just making /tmp/criu-root-*/root in zdtm_ct to be private #1945.

Not sure if it is the original problem, if it is - not sure why someone creates mounts in our directories (other test?)...

@minhbq-99
Copy link
Member

I mean when I run without specific flavor, zdtm run ns test first, uns later, I get the error. It's not 100% but the chance is high. When I run only uns test, I can't reproduce. So I guess lazy umount may be the cause.

Snorch added a commit to Snorch/criu that referenced this issue Jul 27, 2022
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.

But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.

To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.

Fixes: checkpoint-restore#1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
@Snorch
Copy link
Member

Snorch commented Jul 27, 2022

@minhbq-99 I managed to reproduce, exactly what you were talking about: manual alpine-criu docker container on ubuntu-22.04, thanks!

And graph-trace of the reproduce is:

__x64_sys_mount() {
...
  path_mount() {
...
    __do_loopback() {
      is_subdir() {
        rcu_read_unlock_strict();
      } /* is_subdir */
    } /* __do_loopback */
    unlock_mount() {
    }
...
  } /* path_mount */
      /* path_mount__return: (__x64_sys_mount+0x108/0x140 <- path_mount) arg1=0xffffffea */
...
} /* __x64_sys_mount */

Meaning that we fail here, you were right:

__do_loopback()
{
        has_locked_children()
        {
                struct mount *child;

                list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
                        if (!is_subdir(child->mnt_mountpoint, dentry))
                                continue;

                        if (child->mnt.mnt_flags & MNT_LOCKED)
        >                        return true;
                }
                return false;
        }
}

Let's see on mountinfo at the time of failure:

root@ptikh-ubuntu-22:~# cat /proc/1116735/mountinfo | grep "/tmp/criu-root"
3227 3212 253:2 /rootfs/criu/test /tmp/criu-root-yd72becd/root rw,relatime master:787 - ext4 /dev/mapper/docker-8:3-2372371-ad7eee0466d6b141cf50764a051045827e17e073735898724dbda108ad23f05b rw,stripe=16
3228 3227 253:2 /rootfs/.zdtm_root_ext.tmp /tmp/criu-root-yd71becd/root/mnt_root_ext.test rw,relatime master:787 - ext4 /dev/mapper/docker-8:3-2372371-ad7eee0466d6b141cf50764a051045827e17e073735898724dbda108ad23f05b rw,stripe=16

There is actually a child mount, which we know becomes locked on restore from graph-trace... The problem is in mnt_root_ext test as this test creates this mnt_root_ext.test mount via propagation by accident. I've updated PR #1945 with a new fix (old fix fixes other possible problem but I believe we still need it).

avagin pushed a commit that referenced this issue Aug 4, 2022
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.

But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.

To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.

Fixes: #1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
avagin pushed a commit to avagin/criu that referenced this issue Mar 13, 2023
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.

But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.

To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.

Fixes: checkpoint-restore#1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
avagin pushed a commit that referenced this issue Apr 16, 2023
This test specifically wants to create external bind-mount of "/" from
criu mntns to test mntns, and it wants "/" in criu mntns to be a shared
mount so that "external" mount in the test mntns is it's slave. This is
to triger specific dirname() resolution which happens only when sharing
restore is involved for external mounts, and only if rootfs is involved.

But initially I missed that when we create external mount in test's
temporary mntns it creates a propagation in criu mntns on top of root
mount. This mount may influence other tests restore as child mount in
root mount converts to locked child mount in criu service mntns (for uns
flavour) and when criu would restore root container mount it would fail
with EINVAL on non recursive bind with locked children.

To fix this mess we just need to prohibit propagating from tests
temporary mntns to criu mntns by making mounts slave.

Fixes: #1941

Signed-off-by: Pavel Tikhomirov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants