-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CRIU to be used as non-root with CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN #1155
Conversation
test/zdtm.py
Outdated
@@ -2604,6 +2620,9 @@ def clean_stuff(opts): | |||
help="Use splice or read mode of pre-dumping", | |||
choices=['splice', 'read'], | |||
default='splice') | |||
rp.add_argument("--non-root", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have the --user option. How is this one different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all. I was not aware of --user
. I have updated this PR to use --user
instead of --non-root
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename --user to --non-root? user could mean anything. (criU, user namespaces). Not at all obvious of what it would do. --unprivileged is pretty excplicit too. I'd argue that Adrian might have seen the option if the naming of the option was unambiguous. Our users might miss it too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the history behind --user
? I see now that is has been used in one of the jenkins jobs which I am not familiar with.
Is --user
already testing exactly the same I wanted to test with this PR? Running CRIU with CAP_CHECKPOINT_RESTORE/CAP_SYS_ADMIN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the history behind --user?
It was Pavel's attempt to implement criu dump that can't be executed by non-root users to dump their processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I suggest --rootless
? It is an established term and the meaning is very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first patch series was using --non-root
, now I am currently using --user
. --rootless
is indeed a good idea, but I do not really care which name it is.
Should we re-use --user
or give it a new name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the --user option is to allow non-root users dumping processes even when criu doesn't have any capabilities. We want to save this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed my changes to --rootless
.
val = getenv("ZDTM_GROUPS"); | ||
if (val) { | ||
char *tok = NULL; | ||
unsigned int size = 0, groups[NGROUPS_MAX]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NGROUPS_MAX
is 65536
Isn't it will be better to allocate memory from heap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just me moving code behind an if
if running as non-root. I have not changed the actual code. If this should be changed it should happen in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just me moving code behind an
if
if running as non-root. I have not changed the actual code. If this should be changed it should happen in another PR.
no problem. It's just comment.
…cm/linux/kernel/git/brauner/linux Pull checkpoint-restore updates from Christian Brauner: "This enables unprivileged checkpoint/restore of processes. Given that this work has been going on for quite some time the first sentence in this summary is hopefully more exciting than the actual final code changes required. Unprivileged checkpoint/restore has seen a frequent increase in interest over the last two years and has thus been one of the main topics for the combined containers & checkpoint/restore microconference since at least 2018 (cf. [1]). Here are just the three most frequent use-cases that were brought forward: - The JVM developers are integrating checkpoint/restore into a Java VM to significantly decrease the startup time. - In high-performance computing environment a resource manager will typically be distributing jobs where users are always running as non-root. Long-running and "large" processes with significant startup times are supposed to be checkpointed and restored with CRIU. - Container migration as a non-root user. In all of these scenarios it is either desirable or required to run without CAP_SYS_ADMIN. The userspace implementation of checkpoint/restore CRIU already has the pull request for supporting unprivileged checkpoint/restore up (cf. [2]). To enable unprivileged checkpoint/restore a new dedicated capability CAP_CHECKPOINT_RESTORE is introduced. This solution has last been discussed in 2019 in a talk by Google at Linux Plumbers (cf. [1] "Update on Task Migration at Google Using CRIU") with Adrian and Nicolas providing the implementation now over the last months. In essence, this allows the CRIU binary to be installed with the CAP_CHECKPOINT_RESTORE vfs capability set thereby enabling unprivileged users to restore processes. To make this possible the following permissions are altered: - Selecting a specific PID via clone3() set_tid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Selecting a specific PID via /proc/sys/kernel/ns_last_pid relaxed from userns CAP_SYS_ADMIN to CAP_CHECKPOINT_RESTORE. - Accessing /proc/pid/map_files relaxed from init userns CAP_SYS_ADMIN to init userns CAP_CHECKPOINT_RESTORE. - Changing /proc/self/exe from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. Of these four changes the /proc/self/exe change deserves a few words because the reasoning behind even restricting /proc/self/exe changes in the first place is just full of historical quirks and tracking this down was a questionable version of fun that I'd like to spare others. In short, it is trivial to change /proc/self/exe as an unprivileged user, i.e. without userns CAP_SYS_ADMIN right now. Either via ptrace() or by simply intercepting the elf loader in userspace during exec. Nicolas was nice enough to even provide a POC for the latter (cf. [3]) to illustrate this fact. The original patchset which introduced PR_SET_MM_MAP had no permissions around changing the exe link. They too argued that it is trivial to spoof the exe link already which is true. The argument brought up against this was that the Tomoyo LSM uses the exe link in tomoyo_manager() to detect whether the calling process is a policy manager. This caused changing the exe links to be guarded by userns CAP_SYS_ADMIN. All in all this rather seems like a "better guard it with something rather than nothing" argument which imho doesn't qualify as a great security policy. Again, because spoofing the exe link is possible for the calling process so even if this were security relevant it was broken back then and would be broken today. So technically, dropping all permissions around changing the exe link would probably be possible and would send a clearer message to any userspace that relies on /proc/self/exe for security reasons that they should stop doing this but for now we're only relaxing the exe link permissions from userns CAP_SYS_ADMIN to userns CAP_CHECKPOINT_RESTORE. There's a final uapi change in here. Changing the exe link used to accidently return EINVAL when the caller lacked the necessary permissions instead of the more correct EPERM. This pr contains a commit fixing this. I assume that userspace won't notice or care and if they do I will revert this commit. But since we are changing the permissions anyway it seems like a good opportunity to try this fix. With these changes merged unprivileged checkpoint/restore will be possible and has already been tested by various users" [1] LPC 2018 1. "Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=12095 2. "Securely Migrating Untrusted Workloads with CRIU" https://www.youtube.com/watch?v=yI_1cuhoDgA&t=14400 LPC 2019 1. "CRIU and the PID dance" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=2m48s 2. "Update on Task Migration at Google Using CRIU" https://www.youtube.com/watch?v=LN2CUgp8deo&list=PLVsQ_xZBEyN30ZA3Pc9MZMFzdjwyz26dO&index=9&t=1h2m8s [2] checkpoint-restore/criu#1155 [3] https://github.com/nviennot/run_as_exe * tag 'cap-checkpoint-restore-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests: add clone3() CAP_CHECKPOINT_RESTORE test prctl: exe link permission error changed from -EINVAL to -EPERM prctl: Allow local CAP_CHECKPOINT_RESTORE to change /proc/self/exe proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid pid: use checkpoint_restore_ns_capable() for set_tid capabilities: Introduce CAP_CHECKPOINT_RESTORE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
We need solid testing without CAP_SYS_ADMIN, where we run with just with CAP_CHECKPOINT_RESTORE and CAP_PTRACE, as advertised. Otherwise, we can't develop properly. This may require us to use a custom kernel (could be compiled and put as a binary blob in some repo we own) that we run with qemu.
-
CLI options: should we require users to manually disable features (like network), or automatically disable features when we see we don't have the right capability. Another way to look at this is to be tolerant as much as we can to EPERM errors.
-
CRIU should not look at uid. Rather, it should look at its effective capabilities (and perhaps whether it is running in a user namespace, as it makes all the
capable(CAP_*)
tests fail in the kernel, which matters for things likeSO_SNDBUFFORCE
). -
At some point, we should motivate the docker/kubernetes people to add CAP_CHECKPOINT_RESTORE in the set of bounded capabilities (so users don't have to add it manually).
I can help
criu/cr-check.c
Outdated
* CAP_CHECKPOINT_RESTORE and CAP_PTRACE. | ||
*/ | ||
|
||
fd = open(criu, O_RDONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to check for the caps of the CRIU binary. Even if the binary has some setcap magic bits on it, we might not have the effective capabilities we want, because the bounding capabilities may be restricted. (For example, in kubernetes, one must specify the capabilities that can be activated within the container in the yaml file, regardless of the setcap attrs of the binaries in the image)
We should look at /proc/self/status
and see what effective capabilities we have. Using the capget()
syscall is another option. Depending if we have CAP_SYS_ADMIN
, CAP_CHECKPOINT_RESTORE
, or CAP_NET_ADMIN
, we may want to do a variety of things.
Parsing /proc/self/status should be fairly easy, as there is already one parse_pid_status(), which parses the capabilities.
Replacing opts.uid
with opts.caps
would be helpful: instead of doing tests like if (opts.uid)
, we could do tests like if (opts.caps & CAP_NET_ADMIN)
XXX We might want to test if we are in a user namespace, and not in the root namespace. If that's the case, then a bunch of tests won't work though. If we are in a user namespace, all the kernel checks that are if (capable(CAP_*))
will return false. I'm not sure how to test if we are in a user_ns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
criu already should have a helper to test whether or not it is running a user namespace. If not, what should work is sm like:
inline static bool in_userns(void)
{
__do_fclose FILE *f = NULL;
uid_t user, host, count;
int ret;
/* Now: are we in a user namespace? Because then we're also
* unprivileged.
*/
f = fopen("/proc/self/uid_map", "re");
if (!f)
return false;
ret = fscanf(f, "%u %u %u", &user, &host, &count);
if (ret != 3)
return false;
return user != 0 || host != 0 || count != UINT32_MAX;
}
criu/cr-check.c
Outdated
out_close: | ||
close(fd); | ||
if (exit_code) { | ||
pr_msg(" CRIU needs to be run as root or needs to have the CAP_SYS_ADMIN or\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One call instead of three, and pr_msg -> pr_error. Also I wouldn't go with the two space of indent.
so:
pr_error(
"CRIU...\n"
"2nd line...\n"
"3rd line...\n");
* capabilities available. Now we have to make the process dumpable | ||
* so that /proc/self is not owned by root. | ||
*/ | ||
if (pr_set_dumpable(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With capabilities checks, we could either do set_dumpable() in all cases. If we wanted to put it behind an if, we could check if we have CAP_DAC_OVERRIDE
criu/fdstore.c
Outdated
@@ -46,8 +49,16 @@ int fdstore_init(void) | |||
return -1; | |||
} | |||
|
|||
if (setsockopt(sk, SOL_SOCKET, SO_SNDBUFFORCE, &buf[0], sizeof(buf[0])) < 0 || | |||
setsockopt(sk, SOL_SOCKET, SO_RCVBUFFORCE, &buf[1], sizeof(buf[1])) < 0) { | |||
if (opts.uid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check CAP_NET_ADMIN
for this one but I'm not sure how this work in practice. The kernel checks if we have CAP_NET_ADMIN
in the root namespace. This test can also succeed if we are not in a user namespace. I'm not sure how to test that we are in a user name space
Maybe it's better to try to use SO_SNDBUFFORCE
, and if we can't due to -EPERM
, we fallback to SO_SNDBUF
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to check CAP_NET_ADMIN here. uid can be 0 but setsockopt(SO_RCVBUFFORCE) will fail without CAP_NET_ADMIN.
criu/files.c
Outdated
@@ -1323,6 +1323,8 @@ int prepare_fds(struct pstree_item *me) | |||
|
|||
static int fchroot(int fd) | |||
{ | |||
if (opts.uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test CAP_SYS_CHROOT
(here the kernel tests CAP_SYS_CHROOT
in the current user namespace).
@@ -187,7 +187,7 @@ static int lsm_set_label(char *label, char *type, int procfd) | |||
} | |||
|
|||
static int restore_creds(struct thread_creds_args *args, int procfd, | |||
int lsm_type) | |||
int lsm_type, uid_t uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of things going in there that can fail if we have limited creds: . sys_setresuid() and friends, sys_prctl(PR_CAPBSET_DROP, etc.) / sys_capset(). SELinux things.
I think we need testing without CAP_SYS_ADMIN, and just CAP_CHECKPOINT_RESTORE. Can we boot a kernel that has the new cap on travis ci via qemu? Otherwise, I'm afraid we are not going to be effective in writing this non-root functionality.
Thanks to @brauner the CAP_CHECKPOINT_RESTORE changes have been merged and should be available with RC1 (torvalds/linux@74858ab). As we are already using Vagrant to test the latest Fedora release we can easily add another Vagrant based test and install the latest RC1 kernel via https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories which provides all RC releases: https://repos.fedorapeople.org/repos/thl/kernel-vanilla-mainline/fedora-32/x86_64/
Any reason to not automatically disable features if we already detect that it will not work?
Sounds good. Like you mentioned above looking at
Thanks. In your review you mentioned a lot of different capabilities we should test and this sounds like a good approach. I would start with reading |
Sometimes, failing can be preferable:
So I'd argue that we want a deterministic behavior given a set of CLI options. I see two approaches: b) is a superset of a) and would be easy to use for users. Most importantly, the behavior of CRIU is deterministic given a list of CLI options. |
criu/cr-check.c
Outdated
* CAP_CHECKPOINT_RESTORE and CAP_PTRACE. | ||
*/ | ||
|
||
fd = open(criu, O_RDONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
criu already should have a helper to test whether or not it is running a user namespace. If not, what should work is sm like:
inline static bool in_userns(void)
{
__do_fclose FILE *f = NULL;
uid_t user, host, count;
int ret;
/* Now: are we in a user namespace? Because then we're also
* unprivileged.
*/
f = fopen("/proc/self/uid_map", "re");
if (!f)
return false;
ret = fscanf(f, "%u %u %u", &user, &host, &count);
if (ret != 3)
return false;
return user != 0 || host != 0 || count != UINT32_MAX;
}
I created a vagrant based test using a 5.8.0 kernel for now. Once the Fedora vanilla kernel mainline repository picks up 5.9.0-rc1 the test should work. Right now I added the test more as a proof of concept. To see how it could look like. https://travis-ci.org/github/checkpoint-restore/criu/jobs/715943265 |
criu/kerndat.c
Outdated
@@ -829,6 +829,9 @@ static int kerndat_try_load_cache(void) | |||
{ | |||
int fd, ret; | |||
|
|||
if (opts.uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a comment why we can't load cache in this case.
a5eda57
to
4cd15dd
Compare
Test is now running on Fedora with 5.9.0-rc0 kernel and CAP_CHECKPOINT_RESTORE |
4cd15dd
to
6a6f93e
Compare
I rebased this and pushed a new version in which I tried to switch to using /proc/pid/status. |
6a6f93e
to
e791f7f
Compare
Pushed again to fix a CI error with usernsd. |
@nviennot Any comments from your side to the current patches? |
@@ -1334,6 +1334,9 @@ void rlimit_unlimit_nofile(void) | |||
{ | |||
struct rlimit new; | |||
|
|||
if (opts.uid) | |||
return; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will cause pre-dump
to fail for non-root users with the following error for processes with larger (e.g., 1GB) memory state.
Error (criu/page-pipe.c:120): page-pipe: Can't make pipe for page-pipe: Too many open files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianreber you marked it as resolved, but for me it is unclear how it was resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about pre-dump which is out of scope of this PR.
@adrianreber running
|
@rst0git What does happen if you run pre-dump as non-root before my patches? I am aware that my patches are only touching a small part of what needs to be done to have the complete code base ready for CAP_CHECKPOINT_RESTORE. But it at least provides a working checkpoint and restore implementation as non-root. I would prefer an incremental approach on the |
@adrianreber I can confirm that the same error occurs before applying the patches in this PR and I agree with you that incremental approach on the |
So, here's the enhanced version of the first try. Changes are: 1. The wrapper name is criu-ns instead of crns.py 2. The CLI is absolutely the same as for criu, since the script re-execl-s criu binary. E.g. scripts/criu-ns dump -t 1234 ... just works 3. Caller doesn't need to care about substituting CLI options, instead, the scripts analyzes the command line and a) replaces -t|--tree argument with virtual pid __if__ the target task lives in another pidns b) keeps the current cwd (and root) __if__ switches to another mntns. A limitation applies here -- cwd path should be the same in target ns, no "smart path mapping" is performed. So this script is for now only useful for mntns clones (which is our main goal at the moment). Signed-off-by: Pavel Emelyanov <[email protected]> Looks-good-to: Andrey Vagin <[email protected]>
This adds the function check_caps() which checks if CRIU is running with at least CAP_CHECKPOINT_RESTORE. That is the minimum capability CRIU needs to do a minimal checkpoint and restore from it. In addition helper functions are added to easily query for other capability for enhanced checkpoint/restore support. Signed-off-by: Adrian Reber <[email protected]>
This commit enables checkpointing and restoring of applications as non-root. First goal was to enable checkpoint and restore of the env00 and pthread00 test case. This uses the information from opts.uid and opts.cap_eff to skip certain code paths which do not work as non-root. The kerndat file is saved in $HOME/.criu.kdat. As $HOME is usually not on a tmpfs there is now a check for nodename and release to see if the system has been rebooted since last writing the kerndat cache file and mark it as stale. Signed-off-by: Adrian Reber <[email protected]>
This adds the non-root section and information about the parameter --unprivileged to the man page. Signed-off-by: Adrian Reber <[email protected]>
This are the minimal changes to make zdtm.py successfully run the env00 and pthread test case as non-root using the '--rootless' zdtm option. Signed-off-by: Adrian Reber <[email protected]>
Run env00 and pthread00 test as non-root as initial proof of concept. Signed-off-by: Adrian Reber <[email protected]>
Rebased to solve a merge conflict. So what is still missing to merge this? |
@@ -732,6 +732,9 @@ int dump_task_cgroup(struct pstree_item *item, u32 *cg_id, struct parasite_dump_ | |||
unsigned int n_ctls = 0; | |||
struct cg_set *cs; | |||
|
|||
if (opts.uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be "if (opts.unprivileged)" ?
@@ -989,6 +992,9 @@ int dump_cgroups(void) | |||
CgroupEntry cg = CGROUP_ENTRY__INIT; | |||
int ret = -1; | |||
|
|||
if (opts.uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts.unprivileged and we need to print a warning that cgroups are not dumped.
@@ -211,7 +211,8 @@ int prepare_inventory(InventoryEntry *he) | |||
if (get_task_ids(&crt.i)) | |||
return -1; | |||
|
|||
he->has_root_cg_set = true; | |||
if (!opts.uid) | |||
he->has_root_cg_set = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very very suspicious. We set has_root_cg_set only if uid is zero, but we always call dump_task_cgroup. If here is any reasons for this, we need to write a comment here.
When I suggested to add the --unprivileged options, I mean that we will check opts.unprivileged instead of opts.uid.
* allow to write to KDAT_RUNDIR which usually is only writable by root. | ||
* Let's write .criu.kdat file to the home directory for non-root cases. | ||
*/ | ||
if (opts.uid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opts.unprivileged. opts.uid doesn't guarantee that criu has been executed with a full set of capabilities.
@@ -847,24 +880,48 @@ static int kerndat_try_load_cache(void) | |||
|
|||
close(fd); | |||
|
|||
/* | |||
* For the non-root use case we use uname(2) to decide | |||
* if a reboot happened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can uname help up to detect a reboot? Do you mean a kernel update?
Hello @avagin, I think it would make more sense if you take over this PR. You have a much clearer understanding and vision about what needs to be done than myself. Just take what already exist and make the necessary changes. You can close this PR or modify it. Whatever works best for you. |
A friendly reminder that this PR had no activity for 30 days. |
The changes in this PR have been used to enable rootless checkpoint/restore support in OpenLiberty/open-liberty#20683 |
With the almost finished discussion about CAP_CHECKPOINT_RESTORE I reworked my test code to be ready for review.
This contains the changes to run CRIU as non-root with either CAP_SYS_ADMIN or CAP_CHECKPOINT_RESTORE. As Travis does not yet support CAP_CHECKPOINT_RESTORE I am using CAP_SYS_ADMIN during Travis testing, but locally it works just the same with CAP_CHECKPOINT_RESTORE.
This PR checks if the euid is non-zero and if it is non-zero it checks if CRIU has either CAP_SYS_ADMIN or CAP_CHECKPOINT_RESTORE. If that is true CRIU remembers the euid and decides at different places in the code to not do certain things which do not work as non-root.
This is a minimal start of non-root CRIU and it can be used with zdtm to test with env00. As seen in the patch.
This is based on @nviennot and my test branches for the lkml CAP_CHECKPOINT_RESTORE discussions.