-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow CRIU to support restoring into an existing PID namespace #1056
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Sorry about late-reviewing, sitting under lockdown a bit affects my productivity :-/
syntax = "proto2"; | ||
|
||
message pidns_entry { | ||
optional string ext_key = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be required
?
Empty file without entries looks a bit strange, but I don't insist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about this. Not sure. I was looking at the external network namespace support implementation and that uses optional. With PID namespace support it is however the only entry in the protobuf definition. Having the only field as optional is kind of strange.
So, yes, required
would be probably a good idea right now as the file will only exist if ext_key
is set. If we ever get another entry here then it would be better to be optional
.
Interestingly I just read that the proto3
format has dropped required
and optional
completely.
I will change it to required
if we think that this is better. I am unsure myself what do to with it.
Thanks for the reviews everyone. I will update this PR. @avagin Can you create a new magic for me? |
I was able to address most things which came up during the review:
If it should be called I am also unsure if the protobuf field should be I also need someone to create a magic value which fits the existing scheme. |
003e532
to
82b2687
Compare
Tests are happy right now, but sometimes I see errors like this (centos):
or on alpine
Not sure why this happens. Is this somehow racy? But it is during |
Maybe the difference is in how you run your root process, setsid wrapper might help. |
Could you give some more details what you mean. I am already using setsid and I also tried it with a setsid before the |
It means that your root task with pid [1216, 7] (1216 in criu pidns and 7 in it's own pidns) has sid [smth, 0], which means that session leader is from lower pid namespace, likely from criu one or even lower. Your task [1216, 7] should do setsid before exec and have sid [1216, 7], so that where will be no problem anymore. You are obviousely using setsid somehow wrong as your root task is not a session leader. |
TODO: create correct magic Signed-off-by: Adrian Reber <[email protected]>
This loads and stores the key for an external PID namespace if specified by the user using: --external pid[<inode>]:<label> Preparation for restoring into existing PID namespaces. Signed-off-by: Adrian Reber <[email protected]>
This allows CRIU to restore a process into an existing PID namespace. During checkpointing the PID namespace can be marked as external using: --external pid[<inode>]:<label> This can be the host PID namespace or any other PID namespace. During restore a process can be restored into an existing PID namespace using: --inherit-fd fd[<FD>]:<label> The <label> has to be the same for checkpoint and restore. CRIU uses the <label> to know which resource the user means. A process can start in the host PID namespace and can be moved to another namespace or the other way around. Any PID namespace can be used. This is necessary to checkpoint containers in a POD which share certain namespaces and this code to support external PID namespaces is the first step towards checkpointing and restoring containers which belong to a POD. This is not using the --join-ns functionality as it is not possible to move existing process to a PID namespace using setns(). Only child process will be created in the PID namespace and that is why setns() needs to be called before clone(). Signed-off-by: Adrian Reber <[email protected]>
Signed-off-by: Adrian Reber <[email protected]>
Adapt netns_ext tests to also work with pid namespaces and move it from test/others/netns_ext/ to test/others/ns_ext/. Also enable ns_ext tests in Travis runs. Signed-off-by: Adrian Reber <[email protected]>
Are there any further comments regarding |
Another ping. The only open point so far is |
ino2=$(ls -iL $MNT2 | awk '{ print $1 }') | ||
exec 33< $MNT1 | ||
exec 34< $MNT2 | ||
$CRIU dump -v4 -t $pid -o dump.log -D images --external $NS[$ino]:test_ns --external $NS[$ino2]:test_ns2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case of pidns, ino and ino2 will be equal to each other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think you don't dump pid2, do you? It would be nice to have some comments in this script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you are aware that you wrote that script initially before I extended it to also test restoring into an existing PID namespace 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very often, I write fewer comments than I have to;).
elif [[ "$NS" == "pid" ]]; then | ||
RND1=$RANDOM | ||
RND2=$RANDOM | ||
setsid unshare -p -f setsid bash -c "setsid sh _run.sh pidfile2 $RND2 & . _run.sh pidfile $RND1" < /dev/zero &> output & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a comment which explains these three setsid-s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Only with asetsid
at all three places it succeeds with Ubuntu, CentOS and Alpine. I do not understand this part. It seems, especially the alpine busybox behaves differently.
I pushed the wrong branch to the pidns PR checkpoint-restore#1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR checkpoint-restore#1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR checkpoint-restore#1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR #1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR checkpoint-restore#1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR checkpoint-restore#1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
I pushed the wrong branch to the pidns PR #1056 which resulted in the wrong patches getting merged. This is the actual result from the review. Signed-off-by: Adrian Reber <[email protected]>
Trying to checkpoint one container of a pod in CRI-O fails with
because, as far as I understand it, pods can share namespaces. It seems the PID namespace if controlled by the initial container in that pod which runs a
pause
process.To enable checkpointing containers out of a pod this PR enables CRIU to restore out of PID namespace and into another namespace. I can restore process from the host PID namespace in another PID namespace. Out of PID namespace into the host PID namespace. All directions are working.
Restoring into an existing PID namespace comes with possibility of PID collisions.
Using
clone3()
the whole thing is pretty simple:setns(); clone3();
Not using
clone3()
makes it much more complicated as it is necessary to write tons_last_pid
of the destination PID namespace and thus a helper is required:setns()
,fork()
,open('ns_last_pid')
,write()
,close()
,waitpid()
.This seems to work so far, all tests are happy. After implementing it I am not sure it makes sense to checkpoint single containers out of a pod or if it would make more sense to checkpoint the complete pod. Not sure.