Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glibc-2.34-38.fc35 breaks criu: SEGV in restore #1935

Closed
edsantiago opened this issue Jul 7, 2022 · 29 comments
Closed

glibc-2.34-38.fc35 breaks criu: SEGV in restore #1935

edsantiago opened this issue Jul 7, 2022 · 29 comments
Assignees

Comments

@edsantiago
Copy link

edsantiago commented Jul 7, 2022

Latest Fedora 35:

# bin/podman run -d quay.io/libpod/testimage:20220615 top
685009ff5e33a06ffa1209ece8f5c229ce10537f2cdfc6b48ce522e4dc209904
# bin/podman container checkpoint 685
685009ff5e33a06ffa1209ece8f5c229ce10537f2cdfc6b48ce522e4dc209904
# bin/podman container restore 685
Error: OCI runtime error: crun: CRIU restoring failed -52.  Please check CRIU logfile /var/lib/containers/storage/overlay-containers/685009ff5e33a06ffa1209ece8f5c229ce10537f2cdfc6b48ce522e4dc209904/userdata/restore.log

criu-restore.log. Last few lines:

(00.030324)      1: Thread    0 stack  0x1b080 rt_sigframe  0x23080
(00.030354)      1: Going to chroot into /proc/self/fd/8
(00.030360)      1: Restoring umask to 22
(00.030379)      1: task_args: 0x25000
task_args->pid: 1
task_args->nr_threads: 1
task_args->clone_restore_fn: 0x11e70
task_args->thread_args: 0x25580(00.030385) pie: 1: Switched to the restorer 1
(00.030595) pie: 1: Mapping native vDSO at 0x2d000
(00.030622) pie: 1: vdso: Using gettimeofday() on vdso at 0x31790
(00.030634) pie: 1:     mmap(0x55bf5b61f000 -> 0x55bf5b62b000, 0x3 0x12 3)
(00.030641) pie: 1:     mmap(0x55bf5b62b000 -> 0x55bf5b6c6000, 0x7 0x12 3)
(00.030645) pie: 1:     mmap(0x55bf5b6c6000 -> 0x55bf5b6e8000, 0x3 0x12 3)
(00.030648) pie: 1:     mmap(0x55bf5b6e8000 -> 0x55bf5b6ec000, 0x3 0x12 3)
(00.030652) pie: 1:     mmap(0x55bf5b6ec000 -> 0x55bf5b6ed000, 0x3 0x12 3)
(00.030688) Error (criu/cr-restore.c:1510): 33055 stopped by signal 11: Segmentation fault
(00.030823) mnt: Switching to new ns to clean ghosts
(00.030958) Error (criu/cr-restore.c:2536): Restoring FAILED.

criu-3.17.1-1.fc35.x86_64, podman v4.0.0-rc2-1599-g700f1faf6 (built from source, because podman 4.0 isn't packaged for f35), kernel 5.18.5-100.fc35. [UPDATE: crun-1.4.5-1.fc35.x86_64]

Ubuntu is showing similar crashes in our (podman) CI, but I have no actual access to Ubuntu systems.

@edsantiago
Copy link
Author

dnf downgraded to criu-3.16-2.fc35 and crun-1.1-1.fc35; same symptom.

@edsantiago
Copy link
Author

Same with 5.18.9-100.fc35.x86_64

@adrianreber
Copy link
Member

@mihalicyn can this be rseq() related? If I remember it correctly Fedora 35 glibc doesn't turn on rseq().

@edsantiago
Copy link
Author

edsantiago commented Jul 7, 2022

Oooh, good call! The culprit is glibc-2.34-38.fc35.x86_64. With 2.34-7.fc35, checkpoint/restore work fine.

[UPDATE: podman uses f35 as part of its regular CI, including with checkpoint tests. Checkpointing on f35 has been working fine until yesterday, and it looks like the glibc update is the cause.]

[UPDATE 2: I dnf-upgraded criu and crun back to 3.17.1-1 and 1.4.5-1 respectively, just to be sure, and yes, they work fine with the old glibc.]

@adrianreber
Copy link
Member

@fweimer-rh any ideas what might have happened with the glibc update that it breaks criu?

@edsantiago edsantiago changed the title 3.17.1-1 broken in fedora 35: SEGV in restore glibc-2.34-38.fc35 breaks criu: SEGV in restore Jul 7, 2022
@mihalicyn
Copy link
Member

mihalicyn commented Jul 7, 2022

I believe that problem comes from this change:

2022-06-08 - Florian Weimer <[email protected]> - 2.34-37
- Enable rseq by default and add GLIBC_2.35 rseq symbols (#2085529)

right now I'm trying to find the git for fedora glibc and understand what's happening.
I believe that issue connected with the rseq un-registration.

I'm sure that there is some issue with RSEQ_SIG compile-time constant. It's possibly absent on Fedora 35.
It's really important because we define prep_libc_rseq_info depending on the presence of this constant.

https://src.fedoraproject.org/rpms/glibc/blob/f35/f/glibc-rh2085529-1.patch#_153

@mihalicyn
Copy link
Member

@edsantiago Ed, couldn't you show me the contents of /usr/include/bits/rseq.h file from your system?

@edsantiago
Copy link
Author

rseq.h.f35.txt

/usr/include/linux, not /bits, but there it is

@mihalicyn
Copy link
Member

mihalicyn commented Jul 7, 2022

Yep, that's the reason why it fails. Normally, system should have several rseq headers:

$ find /usr/include -type f -name 'rseq.h'
/usr/include/sys/rseq.h
/usr/include/linux/rseq.h
/usr/include/bits/rseq.h
$ rpm -qf /usr/include/bits/rseq.h
glibc-headers-x86-2.35-5.fc36.noarch

It's the results from my Fedora 36.

@edsantiago
Copy link
Author

Followup: There is no /usr/include/bits/rseq.h in glibc-headers-x86-2.34-7.fc35.noarch

@mihalicyn mihalicyn self-assigned this Jul 7, 2022
@edsantiago
Copy link
Author

...and, upgrading back to glibc-2.34-38.fc35, there is a bits/rseq.h with:

#  grep SIG /usr/include/bits/rseq.h
/* RSEQ_SIG is a signature required before each abort handler code.
   RSEQ_SIG is used with the following reserved undefined instructions, which
#define RSEQ_SIG        0x53053053

HTH

@mihalicyn
Copy link
Member

mihalicyn commented Jul 7, 2022

And another strange thing. Let's take a look on this patch:
https://src.fedoraproject.org/rpms/glibc/c/13adf97e7bf6e7796f584a52c9b6449efcc12697?branch=f35

* Fri Jan 14 2022 Florian Weimer <[email protected]> - 2.34-19
- Optionally accelerate sched_getcpu using rseq (#2024347)

it brings /bits/rseq.h files as it required.

This patch is present in fc35 branch:
https://src.fedoraproject.org/rpms/glibc/blob/f35/f/glibc-rh2024347-7.patch

Upd: not actual. So, okay, with the new glibc you have desired header file.

@mihalicyn
Copy link
Member

@edsantiago just to be sure, have you tried to recompile the CRIU itself or you just using binary packages of CRIU?

@mihalicyn
Copy link
Member

mihalicyn commented Jul 7, 2022

Hehe, I've got it.

Last criu build for Fedora 35 is criu-3.17.1 logs:
https://kojipkgs.fedoraproject.org//packages/criu/3.17.1/1.fc35/data/logs/x86_64/root.log

DEBUG util.py:445:   glibc-devel                x86_64    2.34-35.fc35               build     38 k
DEBUG util.py:445:   glibc-headers-x86          noarch    2.34-35.fc35               build    435 k

but the change with rseq comes with:

2022-06-08 - Florian Weimer <[email protected]> - 2.34-37
- Enable rseq by default and add GLIBC_2.35 rseq symbols (#2085529)

It means that we just need to rebuild the CRIU binary package for Fedora 35 against fresh glibc (at least 2.34-37). I believe that Radostin (@rst0git) help us with that.

@edsantiago
Copy link
Author

I have not recompiled criu. All I've been doing is using dnf to downgrade then upgrade back.

  • with glibc -38, restore fails
  • with glibc -7, restore works

Just for grins I saved bits/rseq.h, downgraded to glibc -7 (which works), put that saved bits/rseq.h into /usr/include, and restore still works. (Which, to me, implies that the problem is in glibc itself, not in the presence/absence of bits/rseq.h)

@mihalicyn
Copy link
Member

I have not recompiled criu. All I've been doing is using dnf to downgrade then upgrade back.

* with glibc `-38`, restore fails

* with glibc `-7`, restore works

Just for grins I saved bits/rseq.h, downgraded to glibc -7 (which works), put that saved bits/rseq.h into /usr/include, and restore still works. (Which, to me, implies that the problem is in glibc itself, not in the presence/absence of bits/rseq.h)

presence/absence of this header affects only on the CRIU build. If you working with the CRIU binary it doesn't matter which headers you have in your system. I'm sorry that I've confused you. I was sure (by default) that you build CRIU from the source for some reason :)

@fweimer-rh
Copy link
Contributor

We enabled rseq by default in Fedora 35. Sorry, I assumed CRIU had already been fixed there, and that criu-3.17.1-1.fc35 includes rseq support. Based on the upstream version, it should, and I think the upstream approach does not actually depend on glibc rseq headers to enable rseq support.

Is this issue similar to the previous rseq-related issues?

@adrianreber
Copy link
Member

We enabled rseq by default in Fedora 35. Sorry, I assumed CRIU had already been fixed there, and that criu-3.17.1-1.fc35 includes rseq support. Based on the upstream version, it should, and I think the upstream approach does not actually depend on glibc rseq headers to enable rseq support.

Is this issue similar to the previous rseq-related issues?

Could it be that this fails because CRIU was compiled against a different version of glibc and that the rseq definition changed? I think we saw similar errors in F36 and a recompilation of CRIU was enough to fix it.

@mihalicyn
Copy link
Member

Hello, Florian!

Based on the upstream version, it should, and I think the upstream approach does not actually depend on glibc rseq headers to enable rseq support.

rseq support is not seriously depend on the glibc headers, but we have special handling for the case when CRIU itself runs with the fresh glibc:
f70ddab

What we trying to do there, is to unregister the restartable sequence provided by Glibc during clone/inherited from the CRIU root task.

Most important place here is:

#if defined(__GLIBC__) && defined(RSEQ_SIG)
static void prep_libc_rseq_info(struct rst_rseq_param *rseq)
{
	if (!kdat.has_rseq) {
		rseq->rseq_abi_pointer = 0;
		return;
	}

So, we are trying to determine if Glibc supports rseq by checking RSEQ_SIG presence which is provided by bits/rseq.h, which is provided by sys/rseq.h.

As far as I can see from Fedora glibc sources is that starting from the version 2.34-19 we have rseq in Glibc, but sys/rseq.h header is not public here. So, during the CRIU compilation against such glibc-headers we got defined(RSEQ_SIG) as false.
Starting from the version 2.34-37 we have sys/rseq.h as a public header and all should work perfectly if we recompile CRIU.

@mihalicyn
Copy link
Member

mihalicyn commented Jul 8, 2022

Hi, Adrian!

Could it be that this fails because CRIU was compiled against a different version of glibc and that the rseq definition changed?

previous issue which I can remember was connected with breaking ABI change in Glibc when __rseq_offset symbol size was changed.

Ref:
https://sourceware.org/pipermail/libc-alpha/2022-February/136024.html

@fweimer-rh
Copy link
Contributor

@mihalicyn Yeah, that's it. I recall discussions about getting this data using ptrace instead of (g)libc interfaces, but that's still a TODO:

        /*
         * TODO: handle built-in rseq on other libc'ies like musl
         * We can do that using get_rseq_conf kernel feature.
         *
         * For now we just assume that other libc libraries are
         * not registering rseq by default.
         */

The way we made the change in Fedora 35, a simple rebuild should be enough to activate rseq restore support in CRIU. The issue is slightly different from the late ABI change, we narrowly avoided that in the downstream backport.

@adrianreber I think we also need a rebuild downstream.

@adrianreber
Copy link
Member

I started a rebuild of CRIU on all Fedora branches.

@mihalicyn
Copy link
Member

@fweimer-rh yep, I'll implement this. But anyway, calling ptrace() for each process on restore is not an optimal thing. It's a way of "last hope" :)

@fweimer-rh
Copy link
Contributor

@fweimer-rh yep, I'll implement this.

Thanks (it wasn't meant as a criticism).

But anyway, calling ptrace() for each process on restore is not an optimal thing. It's a way of "last hope" :)

If there is no execve between the call prep_libc_rseq_info and the actual use of that data, it cannot possibly change. So you could fork once, peek at that subprocess using ptrace, and use that data for all subprocesses. (If there's an execve between, would you even be guaranteed to end up with the same libc?)

@mihalicyn
Copy link
Member

mihalicyn commented Jul 8, 2022

If there is no execve between the call prep_libc_rseq_info and the actual use of that data, it cannot possibly change. So you could fork once, peek at that subprocess using ptrace, and use that data for all subprocesses. (If there's an execve between, would you even be guaranteed to end up with the same libc?)

awesome idea! You mean that if we have a particular fixed Glibc version then __rseq_offset and thread_pointer() will be constant.

Speaking about exec* syscalls. We are not using this syscalls during the users processes restoration, because we use a restorer blob execution and it calls sys_prctl(PR_SET_MM, PR_SET_MM_MAP, ...) to restore /proc/<pid>/exe link and fill process struct mm_struct with the corresponding data:

prctl_map = (struct prctl_mm_map){

@fweimer-rh
Copy link
Contributor

If there is no execve between the call prep_libc_rseq_info and the actual use of that data, it cannot possibly change. So you could fork once, peek at that subprocess using ptrace, and use that data for all subprocesses. (If there's an execve between, would you even be guaranteed to end up with the same libc?)

awesome idea! You mean that if we have a particular fixed Glibc version then __rseq_offset and thread_pointer() will be constant.

More precisely, the value will not change during the life-time of the the process (for __rseq_offset) or thread (for thread_pointer()), and fork does not change them, either. In the future, it is possible that the value of __rseq_offset will depend on glibc and kernel version, but the value will still remain unchanged for the entire life-time of the process.

@mihalicyn
Copy link
Member

If there is no execve between the call prep_libc_rseq_info and the actual use of that data, it cannot possibly change. So you could fork once, peek at that subprocess using ptrace, and use that data for all subprocesses. (If there's an execve between, would you even be guaranteed to end up with the same libc?)

awesome idea! You mean that if we have a particular fixed Glibc version then __rseq_offset and thread_pointer() will be constant.

More precisely, the value will not change during the life-time of the the process (for __rseq_offset) or thread (for thread_pointer()), and fork does not change them, either. In the future, it is possible that the value of __rseq_offset will depend on glibc and kernel version, but the value will still remain unchanged for the entire life-time of the process.

Thanks a lot, Florian. I'll pick this in near future and implement universal approach to this problem basing on your idea. ;)

@edsantiago
Copy link
Author

The rebuilt criu is now in f35 stable. Thank you everyone for your diagnosis and fix.

mihalicyn added a commit to mihalicyn/criu that referenced this issue Jul 20, 2022
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like checkpoint-restore#1935 it's not always true.

The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.

This will prevent issues like this:
checkpoint-restore#1935

Suggested-by: Florian Weimer <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
@mihalicyn
Copy link
Member

I've prepared PR with fixed which should prevent problems like this in the future.
#1937

avagin pushed a commit that referenced this issue Aug 8, 2022
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like #1935 it's not always true.

The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.

This will prevent issues like this:
#1935

Suggested-by: Florian Weimer <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
avagin pushed a commit to avagin/criu that referenced this issue Mar 13, 2023
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like checkpoint-restore#1935 it's not always true.

The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.

This will prevent issues like this:
checkpoint-restore#1935

Suggested-by: Florian Weimer <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
avagin pushed a commit that referenced this issue Apr 16, 2023
Before this patch we assumed that CRIU is compiled against
the same GLibc as it runs with. But as we see from real
world examples like #1935 it's not always true.

The idea of this patch is to detect rseq configuration
for the main CRIU process and use it to unregister
rseq for all further child processes. It's correct,
because we restore pstree using clone*() syscalls,
don't use exec*() (!) syscalls, so rseq gets inherited
in the kernel and rseq configuration remains the same
for all children processes.

This will prevent issues like this:
#1935

Suggested-by: Florian Weimer <[email protected]>
Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants