Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to unlock multiple NVENC video encoding sessions in nvidia-docker? #131

Closed
vBLFTePebWNi6c opened this issue Jul 10, 2019 · 13 comments
Assignees
Labels

Comments

@vBLFTePebWNi6c
Copy link
Contributor

Hello! Thanks for the patch!

I'm trying to use my program that runs multiple ffmpeg encoding processes in parallel inside nvidia-docker, but when I run more than 2 processes if crashes with following errors:

[h264_nvenc @ 0x42f2780] OpenEncodeSessionEx failed: out of memory (10)
[h264_nvenc @ 0x462f780] No NVENC capable devices found 

Host machine was patched and it can run more than 2 sessions at once. Inside docker image i can run 2 and less sessions as well.

I'd be glad for any suggestions to move towards solving this problem.

@Snawoot Snawoot self-assigned this Jul 10, 2019
@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

Hello!

My pleasure! This patch modifies userland libraries, so I guess environment inside docker also has to be patched.

Easiest and most reproducible way to achieve this is to make your own Dockerfile which overlays patch on top of some nvidia-docker image like nvidia/cuda:9.0-base.

I don't actually use nvidia-docker, but I think Dockerfile should look like this:

FROM nvidia/cuda:9.0-base

RUN curl -s -o /var/tmp/patch.sh 'https://github.com/keylase/nvidia-patch/blob/master/patch.sh' && \
    chmod +x /var/tmp/patch.sh && \
    /var/tmp/patch.sh && \
    rm -f /var/tmp/patch.sh

Build this image with following command:

docker build -t nvidia-patched .

And use nvidia-patched tag everywhere instead of original container image (nvidia/cuda:9.0-base in this example).

Also you may find some useful info on nvidia-patch in Docker in this issue: #43

@vBLFTePebWNi6c
Copy link
Contributor Author

vBLFTePebWNi6c commented Jul 10, 2019

Thanks for your response!

Yes, I've already tried to run patch.sh during build or inside running docker instance, but got the same error as mentioned here.

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

@vBLFTePebWNi6c

Users in that thread reported patch of a host system helped them, but now I see it's not the case.

It seems nvidia-docker runs container with a special runtime which mounts actual utilities and libraries into running container as a read-only volume. And probably this set of binaries is not even with host machine driver files. It might be worth to peek into docker/runc arguments or pinpoint location of another set of libraries in your host system.

Could you please show output of following command from host system:

find / -type f -name libnvcuvid.so.\*

@vBLFTePebWNi6c
Copy link
Contributor Author

Search from root takes much time on my machine. I ran find for /usr and /opt and got this:

/usr/lib32/nvidia-418/libnvcuvid.so.418.67
/usr/lib/nvidia-418/libnvcuvid.so.418.67
/opt/nvidia/libnvidia-encode-backup/libnvcuvid.so.418.67

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

What is sha1 sum of each of these files?

sha1sum /usr/lib/nvidia-418/libnvcuvid.so.418.67 /opt/nvidia/libnvidia-encode-backup/libnvcuvid.so.418.67

Also output of command mount from container will be useful.

@vBLFTePebWNi6c
Copy link
Contributor Author

sha1 sum:

9a0f86ff52826cf345fa7d9ab09e1f8f6eab26c8  /usr/lib/nvidia-418/libnvcuvid.so.418.67
292b0a094c6442c9cf08c75b46ddc4841768b31b  /opt/nvidia/libnvidia-encode-backup/libnvcuvid.so.418.67

mount output:

overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/I4ECS4AEAYX6VEABCLSWIDEWVX:/var/lib/docker/overlay2/l/AKEXLJBPFUCELAINYFZ2M3XGUF:/var/lib/docker/overlay2/l/LZI4FGP4DGQY2OLSEMTBUZL3QN:/var/lib/docker/overlay2/l/5J75ABPEMCPZUTOC7TNFWAELQT:/var/lib/docker/overlay2/l/FAYIFKNN5SNESUVM4XEZCFR6VA:/var/lib/docker/overlay2/l/HMK75QKMG4IQ5SPJ25W4IK3EIQ:/var/lib/docker/overlay2/l/XLJY4UEP45VZCW3Z53MGK4JTR6:/var/lib/docker/overlay2/l/ETCJTX566XHDN3XGLGJXYBQRSN:/var/lib/docker/overlay2/l/VDMEY4M2QZPTRJO2AKD74OQ7CF:/var/lib/docker/overlay2/l/YAGJ2Q4LL7AMJPUTP2UZL4KGCK:/var/lib/docker/overlay2/l/A4DXRODCOUK6JWPQBK7I7KBPAO:/var/lib/docker/overlay2/l/TCDA7EX7JT6YKZH634WJZOHADG:/var/lib/docker/overlay2/l/7R4BRYKYHQ37YKWNYHISQCGVDF:/var/lib/docker/overlay2/l/NFID6UPGUJGJHD5ISSK2E34PU3:/var/lib/docker/overlay2/l/UHEL6C3XMLFRWCLECFALCGECUA:/var/lib/docker/overlay2/l/BCJ5C4CCWSIBY5MZIHFFWF5E32:/var/lib/docker/overlay2/l/TNPQQOTDL2ZJI2ZCKP5AYR4MUW:/var/lib/docker/overlay2/l/QK4GFO5HQEBFWYH4VJLVSIQ3Z3:/var/lib/docker/overlay2/l/DKVMOP5SJWUZ7WWRWXZ5RSIGBW:/var/lib/docker/overlay2/l/2MHEH7AVKZLV2TG6PXBLOOXH2E:/var/lib/docker/overlay2/l/3SNBHXSO6TBJ2EYTVAJ66M4F2H:/var/lib/docker/overlay2/l/FYH7VZC7DGGRMLOKOCMI42RGK4:/var/lib/docker/overlay2/l/E4ZFT54ORW3S3M7LKEZVZVQ7HB:/var/lib/docker/overlay2/l/OES6ECA7Z3GRZLIS4SSJPPGHCC:/var/lib/docker/overlay2/l/RJHHEI2VEZPSAECSVKKKR4N6Y4:/var/lib/docker/overlay2/l/GBCF7OOJKZMQBSS7QY2V3F6GZ5:/var/lib/docker/overlay2/l/Y4G5PTFQW67X76ZGRJE5C7A3G4:/var/lib/docker/overlay2/l/QZ2CZOPFYUJ4NNOIJPVZICVMFZ:/var/lib/docker/overlay2/l/LT66UOCAOYBZLDZWUYELUKAYPH:/var/lib/docker/overlay2/l/EEKPJ67X2BPJAN3CAQ2SSHN5B4:/var/lib/docker/overlay2/l/7QGMFMRLXA4WYZR4434CTLBHKV:/var/lib/docker/overlay2/l/JDKXXM24VGID7HJ7IGUNDJUFLK:/var/lib/docker/overlay2/l/7SK3MHDSLQWOQX3YHKFZOHMTOZ,upperdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/diff,workdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/vg0-root on /out type ext4 (rw,relatime,errors=remount-ro,data=ordered)
udev on /out/dev type devtmpfs (rw,nosuid,relatime,size=65928580k,nr_inodes=16482145,mode=755)
devpts on /out/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /out/dev/shm type tmpfs (rw,nosuid,nodev)
hugetlbfs on /out/dev/hugepages type hugetlbfs (rw,relatime)
mqueue on /out/dev/mqueue type mqueue (rw,relatime)
tmpfs on /out/run type tmpfs (rw,nosuid,noexec,relatime,size=13190856k,mode=755)
tmpfs on /out/run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /out/run/user/1022 type tmpfs (rw,nosuid,nodev,relatime,size=13190856k,mode=700,uid=1022,gid=1008)
tmpfs on /out/run/user/1016 type tmpfs (rw,nosuid,nodev,relatime,size=13190856k,mode=700,uid=1016,gid=1016)
nsfs on /out/run/docker/netns/default type nsfs (rw)
sysfs on /out/sys type sysfs (rw,nosuid,nodev,noexec,relatime)
securityfs on /out/sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /out/sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /out/sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /out/sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /out/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /out/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /out/sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /out/sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /out/sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /out/sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /out/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /out/sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /out/sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
pstore on /out/sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
debugfs on /out/sys/kernel/debug type debugfs (rw,relatime)
tracefs on /out/sys/kernel/debug/tracing type tracefs (rw,relatime)
fusectl on /out/sys/fs/fuse/connections type fusectl (rw,relatime)
proc on /out/proc type proc (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /out/proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=24,pgrp=0,timeout=0,minproto=5,maxproto=5,direct)
/dev/mapper/vg0-tmp on /out/tmp type ext4 (rw,nosuid,nodev,relatime,data=ordered)
/dev/mapper/vg0-var on /out/var type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg0-var_log on /out/var/log type ext4 (rw,nosuid,nodev,relatime,data=ordered)
overlay on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/I4ECS4AEAYX6VEABCLSWIDEWVX:/var/lib/docker/overlay2/l/AKEXLJBPFUCELAINYFZ2M3XGUF:/var/lib/docker/overlay2/l/LZI4FGP4DGQY2OLSEMTBUZL3QN:/var/lib/docker/overlay2/l/5J75ABPEMCPZUTOC7TNFWAELQT:/var/lib/docker/overlay2/l/FAYIFKNN5SNESUVM4XEZCFR6VA:/var/lib/docker/overlay2/l/HMK75QKMG4IQ5SPJ25W4IK3EIQ:/var/lib/docker/overlay2/l/XLJY4UEP45VZCW3Z53MGK4JTR6:/var/lib/docker/overlay2/l/ETCJTX566XHDN3XGLGJXYBQRSN:/var/lib/docker/overlay2/l/VDMEY4M2QZPTRJO2AKD74OQ7CF:/var/lib/docker/overlay2/l/YAGJ2Q4LL7AMJPUTP2UZL4KGCK:/var/lib/docker/overlay2/l/A4DXRODCOUK6JWPQBK7I7KBPAO:/var/lib/docker/overlay2/l/TCDA7EX7JT6YKZH634WJZOHADG:/var/lib/docker/overlay2/l/7R4BRYKYHQ37YKWNYHISQCGVDF:/var/lib/docker/overlay2/l/NFID6UPGUJGJHD5ISSK2E34PU3:/var/lib/docker/overlay2/l/UHEL6C3XMLFRWCLECFALCGECUA:/var/lib/docker/overlay2/l/BCJ5C4CCWSIBY5MZIHFFWF5E32:/var/lib/docker/overlay2/l/TNPQQOTDL2ZJI2ZCKP5AYR4MUW:/var/lib/docker/overlay2/l/QK4GFO5HQEBFWYH4VJLVSIQ3Z3:/var/lib/docker/overlay2/l/DKVMOP5SJWUZ7WWRWXZ5RSIGBW:/var/lib/docker/overlay2/l/2MHEH7AVKZLV2TG6PXBLOOXH2E:/var/lib/docker/overlay2/l/3SNBHXSO6TBJ2EYTVAJ66M4F2H:/var/lib/docker/overlay2/l/FYH7VZC7DGGRMLOKOCMI42RGK4:/var/lib/docker/overlay2/l/E4ZFT54ORW3S3M7LKEZVZVQ7HB:/var/lib/docker/overlay2/l/OES6ECA7Z3GRZLIS4SSJPPGHCC:/var/lib/docker/overlay2/l/RJHHEI2VEZPSAECSVKKKR4N6Y4:/var/lib/docker/overlay2/l/GBCF7OOJKZMQBSS7QY2V3F6GZ5:/var/lib/docker/overlay2/l/Y4G5PTFQW67X76ZGRJE5C7A3G4:/var/lib/docker/overlay2/l/QZ2CZOPFYUJ4NNOIJPVZICVMFZ:/var/lib/docker/overlay2/l/LT66UOCAOYBZLDZWUYELUKAYPH:/var/lib/docker/overlay2/l/EEKPJ67X2BPJAN3CAQ2SSHN5B4:/var/lib/docker/overlay2/l/7QGMFMRLXA4WYZR4434CTLBHKV:/var/lib/docker/overlay2/l/JDKXXM24VGID7HJ7IGUNDJUFLK:/var/lib/docker/overlay2/l/7SK3MHDSLQWOQX3YHKFZOHMTOZ,upperdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/diff,workdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/work)
overlay on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/I4ECS4AEAYX6VEABCLSWIDEWVX:/var/lib/docker/overlay2/l/AKEXLJBPFUCELAINYFZ2M3XGUF:/var/lib/docker/overlay2/l/LZI4FGP4DGQY2OLSEMTBUZL3QN:/var/lib/docker/overlay2/l/5J75ABPEMCPZUTOC7TNFWAELQT:/var/lib/docker/overlay2/l/FAYIFKNN5SNESUVM4XEZCFR6VA:/var/lib/docker/overlay2/l/HMK75QKMG4IQ5SPJ25W4IK3EIQ:/var/lib/docker/overlay2/l/XLJY4UEP45VZCW3Z53MGK4JTR6:/var/lib/docker/overlay2/l/ETCJTX566XHDN3XGLGJXYBQRSN:/var/lib/docker/overlay2/l/VDMEY4M2QZPTRJO2AKD74OQ7CF:/var/lib/docker/overlay2/l/YAGJ2Q4LL7AMJPUTP2UZL4KGCK:/var/lib/docker/overlay2/l/A4DXRODCOUK6JWPQBK7I7KBPAO:/var/lib/docker/overlay2/l/TCDA7EX7JT6YKZH634WJZOHADG:/var/lib/docker/overlay2/l/7R4BRYKYHQ37YKWNYHISQCGVDF:/var/lib/docker/overlay2/l/NFID6UPGUJGJHD5ISSK2E34PU3:/var/lib/docker/overlay2/l/UHEL6C3XMLFRWCLECFALCGECUA:/var/lib/docker/overlay2/l/BCJ5C4CCWSIBY5MZIHFFWF5E32:/var/lib/docker/overlay2/l/TNPQQOTDL2ZJI2ZCKP5AYR4MUW:/var/lib/docker/overlay2/l/QK4GFO5HQEBFWYH4VJLVSIQ3Z3:/var/lib/docker/overlay2/l/DKVMOP5SJWUZ7WWRWXZ5RSIGBW:/var/lib/docker/overlay2/l/2MHEH7AVKZLV2TG6PXBLOOXH2E:/var/lib/docker/overlay2/l/3SNBHXSO6TBJ2EYTVAJ66M4F2H:/var/lib/docker/overlay2/l/FYH7VZC7DGGRMLOKOCMI42RGK4:/var/lib/docker/overlay2/l/E4ZFT54ORW3S3M7LKEZVZVQ7HB:/var/lib/docker/overlay2/l/OES6ECA7Z3GRZLIS4SSJPPGHCC:/var/lib/docker/overlay2/l/RJHHEI2VEZPSAECSVKKKR4N6Y4:/var/lib/docker/overlay2/l/GBCF7OOJKZMQBSS7QY2V3F6GZ5:/var/lib/docker/overlay2/l/Y4G5PTFQW67X76ZGRJE5C7A3G4:/var/lib/docker/overlay2/l/QZ2CZOPFYUJ4NNOIJPVZICVMFZ:/var/lib/docker/overlay2/l/LT66UOCAOYBZLDZWUYELUKAYPH:/var/lib/docker/overlay2/l/EEKPJ67X2BPJAN3CAQ2SSHN5B4:/var/lib/docker/overlay2/l/7QGMFMRLXA4WYZR4434CTLBHKV:/var/lib/docker/overlay2/l/JDKXXM24VGID7HJ7IGUNDJUFLK:/var/lib/docker/overlay2/l/7SK3MHDSLQWOQX3YHKFZOHMTOZ,upperdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/diff,workdir=/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/work)
proc on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
mqueue on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
sysfs on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys type sysfs (ro,nosuid,nodev,noexec,relatime)
tmpfs on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/systemd type cgroup (ro,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/cpu,cpuacct type cgroup (ro,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/cpuset type cgroup (ro,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/freezer type cgroup (ro,nosuid,nodev,noexec,relatime,freezer)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/hugetlb type cgroup (ro,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/memory type cgroup (ro,nosuid,nodev,noexec,relatime,memory)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/pids type cgroup (ro,nosuid,nodev,noexec,relatime,pids)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/blkio type cgroup (ro,nosuid,nodev,noexec,relatime,blkio)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/devices type cgroup (ro,nosuid,nodev,noexec,relatime,devices)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/net_cls,net_prio type cgroup (ro,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /out/var/lib/docker/overlay2/269a3035fb301706affc99918363cb6b2ee8a7fd41c79ee242e06aa5fbb81c52/merged/sys/fs/cgroup/perf_event type cgroup (ro,nosuid,nodev,noexec,relatime,perf_event)
shm on /out/var/lib/docker/containers/a652ddfcd744591c3d144be64b88b078a3b3fd980fb9b683a6687860fb0f9af1/mounts/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
/dev/md127 on /out/home type ext4 (rw,relatime,stripe=256,data=ordered)
/dev/mapper/vg0-var on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg0-var on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/mapper/vg0-var on /etc/hosts type ext4 (rw,relatime,data=ordered)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
tmpfs on /proc/driver/nvidia type tmpfs (rw,nosuid,nodev,noexec,relatime,mode=555)
/dev/mapper/vg0-root on /usr/bin/nvidia-smi type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/bin/nvidia-debugdump type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/bin/nvidia-persistenced type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/bin/nvidia-cuda-mps-control type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/bin/nvidia-cuda-mps-server type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libcuda.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libvdpau_nvidia.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
/dev/mapper/vg0-root on /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.67 type ext4 (ro,nosuid,nodev,relatime,errors=remount-ro,data=ordered)
tmpfs on /run/nvidia-persistenced/socket type tmpfs (rw,nosuid,nodev,noexec,relatime,size=13190856k,mode=755)
udev on /dev/nvidiactl type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
udev on /dev/nvidia-uvm type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
udev on /dev/nvidia-uvm-tools type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
udev on /dev/nvidia0 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:04:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia1 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:05:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia2 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:08:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia3 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:09:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia4 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:83:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia5 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:84:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia6 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:87:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
udev on /dev/nvidia7 type devtmpfs (ro,nosuid,noexec,relatime,size=65928580k,nr_inodes=16482145,mode=755)
proc on /proc/driver/nvidia/gpus/0000:88:00.0 type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/bus type proc (ro,relatime)
proc on /proc/fs type proc (ro,relatime)
proc on /proc/irq type proc (ro,relatime)
proc on /proc/sys type proc (ro,relatime)
proc on /proc/sysrq-trigger type proc (ro,relatime)
tmpfs on /proc/asound type tmpfs (ro,relatime)
tmpfs on /proc/acpi type tmpfs (ro,relatime)
tmpfs on /proc/kcore type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/keys type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/timer_list type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/timer_stats type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/sched_debug type tmpfs (rw,nosuid,size=65536k,mode=755)
tmpfs on /proc/scsi type tmpfs (ro,relatime)
tmpfs on /sys/firmware type tmpfs (ro,relatime)

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

Thanks. Now I see - nvidia files inside container definitely bind-mounted from host system as a read-only overlay. Can you please check sha1sum of nvcuvid.so file mounted inside container? It should be like:

docker run --rm nvidia/cuda:9.0-base sha1sum /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.67

@vBLFTePebWNi6c
Copy link
Contributor Author

Sure! Here's sha1sum:
292b0a094c6442c9cf08c75b46ddc4841768b31b /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.67

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

@vBLFTePebWNi6c SHA1 sum of file in container matches to sum of original unpatched file, which was moved out of the way. You still have original file somewhere in your system. Probably you should also check /var, /srv, /lib and /bin. If none of these locations contain libnvcuvid.so.* it may designate real file is somewhere in some kind of docker volume, inside nvidia runtime guts. If it's the case, then it is worth to look closely how runtime mounts these files. I don't have any suitable system at this moment.

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

On the other hand, you may build container which sorts this out automatically. All you have to do is to ensure your container entrypoint bind-mounts library to writable location and patches on start. Like this:

https://gist.github.com/Snawoot/59ddc6ae9d3654da50b3653d98428543

Here entrypoint does patching job and runs original command with all parameters. This way docker-image will be independent from host, whether it patched or not.

@Snawoot
Copy link
Collaborator

Snawoot commented Jul 10, 2019

Gist updated, I forgot to invoke patch itself. Please let me know if it works for you and I'll this workaround into main patch file.

@Snawoot Snawoot mentioned this issue Jul 11, 2019
@Snawoot
Copy link
Collaborator

Snawoot commented Jul 11, 2019

mount approach won't work, it'll require CAP_SYS_ADMIN privilige, so it makes containers almost pointless.

But I found another solution which utilizes dynamic linker for patching needs (via #132), so now Docker support with example Dockerfile is available on master branch. See instructions on this topic at main page.

@Snawoot Snawoot closed this as completed Jul 11, 2019
@vBLFTePebWNi6c
Copy link
Contributor Author

Tried your solution today and it worked like a charm! Thanks, man! Very appreciate your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants