Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] shows connected TCP socket after adding tcp established in conf #589

Closed
harishanand95 opened this issue Jan 1, 2019 · 21 comments
Closed

Comments

@harishanand95
Copy link

harishanand95 commented Jan 1, 2019

I am getting a docker checkpoint error as

(00.685912) Error (criu/sk-inet.c:188): inet: Connected TCP socket, consider using --tcp-established option.
(00.685936) Error (criu/cr-dump.c:1347): Dump files (pid: 11246) failed with -1
(00.698151) Error (criu/cr-dump.c:1732): Dumping FAILED.

Full log:
https://termbin.com/dx4c

The criu (version 3.11 used) config file has the following parameters,
cat /etc/criu/default.conf
tcp-established

sudo lsof -p 11246

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
rosmaster 11246 root cwd DIR 0,180 4096 2316176 /root/.ros
rosmaster 11246 root rtd DIR 0,180 4096 1039113 /
rosmaster 11246 root txt REG 0,180 3492656 2052389 /usr/bin/python2.7
rosmaster 11246 root mem REG 8,1 2052389 /usr/bin/python2.7 (path inode=58649)
rosmaster 11246 root mem REG 8,1 1796641 /lib/x86_64-linux-gnu/libnss_files-2.23.so (path inode=2016)
rosmaster 11246 root mem REG 8,1 2053990 /usr/lib/python2.7/lib-dynload/_multiprocessing.x86_64-linux-gnu.so (path inode=58864)
rosmaster 11246 root mem REG 8,1 2056623 /usr/lib/x86_64-linux-gnu/libyaml-0.so.2.0.4 (path inode=25996)
rosmaster 11246 root mem REG 8,1 2824414 /usr/lib/python2.7/dist-packages/_yaml.so (stat: No such file or directory)
rosmaster 11246 root mem REG 8,1 2053984 /usr/lib/python2.7/lib-dynload/_elementtree.x86_64-linux-gnu.so (path inode=58870)
rosmaster 11246 root mem REG 8,1 2052260 /lib/x86_64-linux-gnu/libexpat.so.1.6.0 (path inode=1995)
rosmaster 11246 root mem REG 8,1 2054005 /usr/lib/python2.7/lib-dynload/pyexpat.x86_64-linux-gnu.so (path inode=58875)
rosmaster 11246 root mem REG 8,1 2053985 /usr/lib/python2.7/lib-dynload/_hashlib.x86_64-linux-gnu.so (path inode=58863)
rosmaster 11246 root mem REG 8,1 2308289 /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (path inode=84513)
rosmaster 11246 root mem REG 8,1 2308292 /lib/x86_64-linux-gnu/libssl.so.1.0.0 (path inode=84385)
rosmaster 11246 root mem REG 8,1 2053992 /usr/lib/python2.7/lib-dynload/_ssl.x86_64-linux-gnu.so (path inode=58883)
rosmaster 11246 root mem REG 8,1 1796624 /lib/x86_64-linux-gnu/libm-2.23.so (path inode=1999)
rosmaster 11246 root mem REG 8,1 1796691 /lib/x86_64-linux-gnu/libz.so.1.2.8 (path inode=2139)
rosmaster 11246 root mem REG 8,1 1796686 /lib/x86_64-linux-gnu/libutil-2.23.so (path inode=2010)
rosmaster 11246 root mem REG 8,1 1796605 /lib/x86_64-linux-gnu/libdl-2.23.so (path inode=2005)
rosmaster 11246 root mem REG 8,1 1796592 /lib/x86_64-linux-gnu/libc-2.23.so (path inode=2003)
rosmaster 11246 root mem REG 8,1 1796660 /lib/x86_64-linux-gnu/libpthread-2.23.so (path inode=2002)
rosmaster 11246 root mem REG 8,1 1796572 /lib/x86_64-linux-gnu/ld-2.23.so (path inode=2001)
rosmaster 11246 root 0r CHR 1,3 0t0 6 /dev/null
rosmaster 11246 root 1w CHR 1,3 0t0 6 /dev/null
rosmaster 11246 root 2w CHR 1,3 0t0 6 /dev/null
rosmaster 11246 root 3w REG 0,180 116751 1039253 /root/.ros/log/fcfe0390-0d7d-11e9-9a33-0242ac1c0005/master.log
rosmaster 11246 root 4u sock 0,9 0t0 1616381 protocol: TCP
rosmaster 11246 root 5u sock 0,9 0t0 1639058 protocol: TCP
rosmaster 11246 root 6u sock 0,9 0t0 1646342 protocol: TCP
rosmaster 11246 root 7u sock 0,9 0t0 1636693 protocol: TCP
rosmaster 11246 root 8u sock 0,9 0t0 1641593 protocol: TCP
rosmaster 11246 root 15r CHR 1,9 0t0 11 /dev/urandom

Where should I add tcp established in docker?

From what I read, I think I should the pass inode of tcp sockets to CRIU, how do I pass them in docker/criu?

@rst0git
Copy link
Member

rst0git commented Jan 1, 2019

Adrian's PR (opencontainers/runc#1933) for CRIU's configuration file support has not been merged yet into runc. Could try to install runc from his fork?

go get github.com/adrianreber/runc/
cd $GOPATH/src/github.com/adrianreber/runc
make
sudo make install PREFIX=/usr

@harishanand95
Copy link
Author

Ok, thanks

@rst0git
Copy link
Member

rst0git commented Jan 1, 2019

Then you should be able to add the tcp-established option with:

# echo tcp-established > /etc/criu/runc.conf

Note that options added in /etc/criu/default.conf will be overwritten by runc and the options in /etc/criu/runc.conf will overwrite those set by runc (e.g. tcp-established is set to false by default).

@harishanand95
Copy link
Author

harishanand95 commented Jan 1, 2019

@rst0git Sorry to bother you, I am getting failures when I attempt to get adrians runc repo


hanand4@instance-1:~/go$ go get github.com/adrianreber/runc/
# github.com/adrianreber/runc
src/github.com/adrianreber/runc/signals.go:136:28: cannot use ws (type "github.com/adrianreber/runc/vendor/golang.org/x/sys/unix".WaitStatus) as type "github.com/opencontainers/runc/vendor/golang.org/x/sys/unix".WaitStatus in argument to utils.ExitStatus
src/github.com/adrianreber/runc/utils_linux.go:238:7: cannot use spec (type *"github.com/adrianreber/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec) as type *"github.com/opencontainers/runc/vendor/github.com/opencontainers/runtime-spec/specs-go".Spec in field value

hanand4@instance-1:~/go$ go version
go version go1.10.4 linux/amd64

hanand4@instance-1:~/go$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/hanand4/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/hanand4/go"
GORACE=""
GOROOT="/usr/lib/go-1.10"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.10/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build409874469=/tmp/go-build -gno-record-gcc-switches"

Also, go get github.com/opencontainers/runc worked fine

@harishanand95
Copy link
Author

@rst0git Got it working, I cloned adrian repo.

@harishanand95
Copy link
Author

harishanand95 commented Jan 1, 2019

@rst0git great. Both the checkpoint and restore worked fine. I'm now facing the following error,

OCI runtime exec failed: exec failed: container_linux.go:344: starting container process caused "apparmor: config provided but apparmor not supported": unknown

whenever I run docker exec -it 78e19c8b939d bash

Googling the error pointed to this page, docker/for-linux#503
Not sure what to do here, any help is much appreciated.

@rst0git
Copy link
Member

rst0git commented Jan 1, 2019

@harishanand95 I think that you need to build runc with Apparmor support:

make BUILDTAGS='seccomp apparmor'
sudo make install PREFIX=/usr

@harishanand95
Copy link
Author

ah! makes sense. Let me try that

@harishanand95
Copy link
Author

Great it worked. Thanks @rst0git

@harishanand95
Copy link
Author

I was able to checkpoint correctly in one machine, but I have the following error on another machine.

sudo cat /home/hanand/checkpoints/c54ab0a3b78e779fc347514addeecf93574bd7780f9c49d5aee06561fb52bd83/checkpoints/chk3/criu.work/dump.log | grep Err
(00.838667) Error (criu/files-reg.c:814): Can't dump ghost file /tmp/gz_model-acef-b238-85f6-c9d5.tar of 1413120 size, increase limit
(00.838714) Error (criu/cr-dump.c:1347): Dump files (pid: 9529) failed with -1
(00.850147) Error (criu/cr-dump.c:1732): Dumping FAILED.

runc criu conf

~$ cat /etc/criu/runc.conf 
ghost-limit 1000000000000
tcp-established

@rst0git I tried giving ghost limit a very high value and I was still getting the following error
http://paste.ubuntu.com/p/B3c4YzW448/

@harishanand95 harishanand95 reopened this Jan 3, 2019
@avagin
Copy link
Member

avagin commented Jan 3, 2019

Cc: @adrianreber

@adrianreber
Copy link
Member

@harishanand95 Are you also using my runc version on the other machine? Can you also share the dump.log from the system where it works?

@rst0git
Copy link
Member

rst0git commented Jan 3, 2019

@harishanand95 Can you try with ghost-limit 100M?

@harishanand95
Copy link
Author

I tried adding ghost-limit to /etc/criu/runc.conf, it still showed same error. But checkpoint worked when I added ghost-limit to /etc/criu/default.conf.

@adrianreber
Copy link
Member

/etc/criu/runc.conf only works if you take runc from opencontainers/runc#1933. Upstream runc does not yet support /etc/criu/runc.conf.

@harishanand95
Copy link
Author


runc -v
runc version 1.0.0-rc6+dev
commit: 403986c5dd078a9d528794aee0b38dc742ee072b
spec: 1.0.1-dev

@harishanand95
Copy link
Author

I'm confused. Is it the upstream or the version from your repo?

@avagin
Copy link
Member

avagin commented Jan 3, 2019

it is the version from @adrianreber 's pr:
opencontainers/runc@403986c

@harishanand95
Copy link
Author

harishanand95 commented Jan 4, 2019

@avagin @rst0git I am facing another issue. bash and few other process (gazebo which i'm working on) stores the hostname in memory.

Let create a test container

$ docker create --name=test centos:latest /bin/sh -c "while true; do echo hello world; sleep 1; done"
5deec115a79c485c65d186cb9a92f327cbadb80864645034e551221233eeae34

$ docker start test
test

$ docker ps
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                   NAMES
5deec115a79c        centos:latest         "/bin/sh -c 'while..."   39 seconds ago      Up 4 seconds                                                test

$ docker exec -it test bash
[root@5deec115a79c /]# exit

Now doing a checkpoint and restore on a new container test1

$ docker checkpoint create --checkpoint-dir=/home/hanand/checkpoints test chk2
chk2

$ docker commit test new_test
sha256:0d7fa6fa1bb3dbd4d6b3781bd4922b5aa62a5ac514b75d39d67f1ccf02d7ea6c

$ docker create --name=test1 new_test /bin/sh -c "while true; do echo hello world; sleep 1; done"
7a97b905b117b6674c33efbc734b9fc39299483d2266035bad8b859efc4cbd05

$ docker start --checkpoint-dir /home/hanand/checkpoints/5deec115a79c485c65d186cb9a92f327cbadb80864645034e551221233eeae34/checkpoints/ --checkpoint=chk2 test1

$ docker exec -it test1 bash
[root@5deec115a79c /]# cat /etc/hostname
7a97b905b117
[root@5deec115a79c /]# cat /etc/hosts
127.0.0.1	localhost
::1	localhost ip6-localhost ip6-loopback
fe00::0	ip6-localnet
ff00::0	ip6-mcastprefix
ff02::1	ip6-allnodes
ff02::2	ip6-allrouters
172.17.0.2	7a97b905b117

Bash seems to store the hostname 5deec115a79c in new container clones as well. Any ideas on how to fix this?

@adrianreber
Copy link
Member

Processes storing the hostname is one of the not really solvable problems with process migration. CRIU does not know where the process stores what information (which includes the hostname) and cannot change it. So the process you migrate has to be smart enough to re-read the hostname once it has been migrated. If you have control over the process you can tell the process to re-read the hostname after it receives a certain signal... Or something like that.

@harishanand95
Copy link
Author

Yeah, that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants