-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glibc-2.34-38.fc35 breaks criu: SEGV in restore #1935
Comments
|
Same with 5.18.9-100.fc35.x86_64 |
@mihalicyn can this be |
Oooh, good call! The culprit is glibc-2.34-38.fc35.x86_64. With 2.34-7.fc35, checkpoint/restore work fine. [UPDATE: podman uses f35 as part of its regular CI, including with checkpoint tests. Checkpointing on f35 has been working fine until yesterday, and it looks like the glibc update is the cause.] [UPDATE 2: I dnf-upgraded criu and crun back to 3.17.1-1 and 1.4.5-1 respectively, just to be sure, and yes, they work fine with the old glibc.] |
@fweimer-rh any ideas what might have happened with the glibc update that it breaks criu? |
I believe that problem comes from this change:
right now I'm trying to find the git for fedora glibc and understand what's happening. I'm sure that there is some issue with https://src.fedoraproject.org/rpms/glibc/blob/f35/f/glibc-rh2085529-1.patch#_153 |
@edsantiago Ed, couldn't you show me the contents of |
/usr/include/linux, not /bits, but there it is |
Yep, that's the reason why it fails. Normally, system should have several rseq headers:
It's the results from my Fedora 36. |
Followup: There is no |
...and, upgrading back to # grep SIG /usr/include/bits/rseq.h
/* RSEQ_SIG is a signature required before each abort handler code.
RSEQ_SIG is used with the following reserved undefined instructions, which
#define RSEQ_SIG 0x53053053 HTH |
And another strange thing. Let's take a look on this patch:
it brings /bits/rseq.h files as it required. This patch is present in fc35 branch: Upd: not actual. So, okay, with the new glibc you have desired header file. |
@edsantiago just to be sure, have you tried to recompile the CRIU itself or you just using binary packages of CRIU? |
Hehe, I've got it. Last criu build for Fedora 35 is
but the change with rseq comes with:
It means that we just need to rebuild the CRIU binary package for Fedora 35 against fresh glibc (at least 2.34-37). I believe that Radostin (@rst0git) help us with that. |
I have not recompiled criu. All I've been doing is using
Just for grins I saved |
presence/absence of this header affects only on the CRIU build. If you working with the CRIU binary it doesn't matter which headers you have in your system. I'm sorry that I've confused you. I was sure (by default) that you build CRIU from the source for some reason :) |
We enabled rseq by default in Fedora 35. Sorry, I assumed CRIU had already been fixed there, and that Is this issue similar to the previous rseq-related issues? |
Could it be that this fails because CRIU was compiled against a different version of glibc and that the rseq definition changed? I think we saw similar errors in F36 and a recompilation of CRIU was enough to fix it. |
Hello, Florian!
rseq support is not seriously depend on the glibc headers, but we have special handling for the case when CRIU itself runs with the fresh glibc: What we trying to do there, is to unregister the restartable sequence provided by Glibc during clone/inherited from the CRIU root task. Most important place here is:
So, we are trying to determine if Glibc supports rseq by checking As far as I can see from Fedora glibc sources is that starting from the version |
Hi, Adrian!
previous issue which I can remember was connected with breaking ABI change in Glibc when Ref: |
@mihalicyn Yeah, that's it. I recall discussions about getting this data using
The way we made the change in Fedora 35, a simple rebuild should be enough to activate rseq restore support in CRIU. The issue is slightly different from the late ABI change, we narrowly avoided that in the downstream backport. @adrianreber I think we also need a rebuild downstream. |
I started a rebuild of CRIU on all Fedora branches. |
@fweimer-rh yep, I'll implement this. But anyway, calling |
Thanks (it wasn't meant as a criticism).
If there is no |
awesome idea! You mean that if we have a particular fixed Glibc version then Speaking about Line 1688 in f70ddab
|
More precisely, the value will not change during the life-time of the the process (for |
Thanks a lot, Florian. I'll pick this in near future and implement universal approach to this problem basing on your idea. ;) |
The rebuilt criu is now in f35 stable. Thank you everyone for your diagnosis and fix. |
Before this patch we assumed that CRIU is compiled against the same GLibc as it runs with. But as we see from real world examples like checkpoint-restore#1935 it's not always true. The idea of this patch is to detect rseq configuration for the main CRIU process and use it to unregister rseq for all further child processes. It's correct, because we restore pstree using clone*() syscalls, don't use exec*() (!) syscalls, so rseq gets inherited in the kernel and rseq configuration remains the same for all children processes. This will prevent issues like this: checkpoint-restore#1935 Suggested-by: Florian Weimer <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
I've prepared PR with fixed which should prevent problems like this in the future. |
Before this patch we assumed that CRIU is compiled against the same GLibc as it runs with. But as we see from real world examples like #1935 it's not always true. The idea of this patch is to detect rseq configuration for the main CRIU process and use it to unregister rseq for all further child processes. It's correct, because we restore pstree using clone*() syscalls, don't use exec*() (!) syscalls, so rseq gets inherited in the kernel and rseq configuration remains the same for all children processes. This will prevent issues like this: #1935 Suggested-by: Florian Weimer <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Before this patch we assumed that CRIU is compiled against the same GLibc as it runs with. But as we see from real world examples like checkpoint-restore#1935 it's not always true. The idea of this patch is to detect rseq configuration for the main CRIU process and use it to unregister rseq for all further child processes. It's correct, because we restore pstree using clone*() syscalls, don't use exec*() (!) syscalls, so rseq gets inherited in the kernel and rseq configuration remains the same for all children processes. This will prevent issues like this: checkpoint-restore#1935 Suggested-by: Florian Weimer <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Before this patch we assumed that CRIU is compiled against the same GLibc as it runs with. But as we see from real world examples like #1935 it's not always true. The idea of this patch is to detect rseq configuration for the main CRIU process and use it to unregister rseq for all further child processes. It's correct, because we restore pstree using clone*() syscalls, don't use exec*() (!) syscalls, so rseq gets inherited in the kernel and rseq configuration remains the same for all children processes. This will prevent issues like this: #1935 Suggested-by: Florian Weimer <[email protected]> Signed-off-by: Alexander Mikhalitsyn <[email protected]>
Latest Fedora 35:
criu-restore.log. Last few lines:
criu-3.17.1-1.fc35.x86_64, podman v4.0.0-rc2-1599-g700f1faf6 (built from source, because podman 4.0 isn't packaged for f35), kernel 5.18.5-100.fc35. [UPDATE: crun-1.4.5-1.fc35.x86_64]
Ubuntu is showing similar crashes in our (podman) CI, but I have no actual access to Ubuntu systems.
The text was updated successfully, but these errors were encountered: