Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse-overlayfs: different performance tweaks #88

Merged
merged 49 commits into from
Jul 23, 2019

Conversation

giuseppe
Copy link
Member

this PR has different performance improvements, most notably, there is some initial support for threading. Most of the code is still protected by a lock, so it runs on a single threaded, but operations like read/write/setattr/fsync can be dispatched on a different thread.

It is now possible to disable fsync that seems to slow apt quite significantly. In general it is probably safer to run containers without access to fsync.

The new options can be configured in Podman through ~/.config/containers/storage.conf. I've got the best results with this combination:

mountopt = "threaded=0,fsync=0"

@giuseppe
Copy link
Member Author

this needs a fix in SELinux:

type=AVC msg=audit(1563197807.799:377): avc:  denied  { setattr } for  pid=7950 comm="fuse-overlayfs" name="183" dev="proc" ino=160867 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=lnk_file permissive=0

@giuseppe
Copy link
Member Author

type=AVC msg=audit(1563197807.799:377): avc: denied { setattr } for pid=7950 comm="fuse-overlayfs" name="183" dev="proc" ino=160867 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=lnk_file permissive=0

@rhatdan is it possible to enable it only for /proc/self?

@rhatdan
Copy link
Member

rhatdan commented Jul 15, 2019

Was this actually blocking? Since this is an unconfined domain it should be allowed. But it looks like a bogus AVC?

A process is attempting to change the attributes of something in /proc/self? The link file is fuse-overlayfs? Is this more about the setattr should be on the thing the link points at?

@giuseppe
Copy link
Member Author

the process inside the rootless container was blocked. I'm using /proc/self/FD to change the ownership of a symlink. It seems like the only possible way when dealing with symlinks and fds.

@rhatdan
Copy link
Member

rhatdan commented Jul 15, 2019

If this is a container process trying to make this change then it is running with the wrong label. unconfined_t versus container_t.

@rhatdan
Copy link
Member

rhatdan commented Jul 15, 2019

type=AVC msg=audit(1563197807.799:377): avc: denied { setattr } for pid=7950 comm="fuse-overlayfs" name="183" dev="proc" ino=160867 scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=lnk_file permissive=0

IE is this supposed to be
container_t writing to unconfined_t lnk_file or
container_t writing to contianer_t lnk_file or
the process running podman writing to the lnk_file, in which case it would be unconfined_t writing to unconfined_t lnk_file.

@rhatdan
Copy link
Member

rhatdan commented Jul 15, 2019

No problem allowing
container_t setattr to contianer_t lnk_file or
unconfined_t setattr to unconfined_t lnk_file

Problem with
container_t setattr on unconfined_t lnk_file

@giuseppe
Copy link
Member Author

I see. fuse-overlayfs doesn't run in the container, only in its user+mount namespace created by podman

@giuseppe giuseppe force-pushed the perf-improvements branch 2 times, most recently from a90a5d7 to 7fbe3df Compare July 16, 2019 10:46
@giuseppe
Copy link
Member Author

@rhatdan could we get the unconfined_t setattr to unconfined_t lnk_file fix for selinux?

With this PR fuse-overlayfs performs significantly better.

I've added some new options, such as disabling xattrs. This helps with SELinux as we won't hit the FUSE file system with a getxattr for every operation. Also, disabling fsync seems a sane default for containers, at least for the rootfs where fuse-overlayfs is used. With this in place fuse-overlayfs performs better than native overlay in most cases.

Using mountopt = "noxattrs=1,fsync=0,threaded=0" with rootless podman I get:

$ /usr/bin/time -v podman  run --rm -ti ubuntu:18.04 sh -c 'apt update && apt install -y wine-development'
[.....]
	User time (seconds): 0.15
	System time (seconds): 0.31
	Percent of CPU this job got: 1%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.64
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 43416
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 4940
	Voluntary context switches: 4978
	Involuntary context switches: 84
	Swaps: 0
	File system inputs: 16
	File system outputs: 608
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

with root and native overlay:

# /usr/bin/time -v podman  run --rm -ti ubuntu:18.04 sh -c 'apt update && apt install -y wine-development'
[.....]
	User time (seconds): 0.22
	System time (seconds): 0.36
	Percent of CPU this job got: 1%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.17
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 43808
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 14456
	Voluntary context switches: 7312
	Involuntary context switches: 186
	Swaps: 0
	File system inputs: 264
	File system outputs: 624
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

@giuseppe giuseppe changed the title [WIP] fuse-overlayfs: different performance tweaks fuse-overlayfs: different performance tweaks Jul 16, 2019
@rhatdan
Copy link
Member

rhatdan commented Jul 16, 2019

@wrabcak Could you update selinux-policy for Fedora and RHEL7/RHEL8 with
allow unconfined_domain self:lnk_file setattr;

wrabcak added a commit to fedora-selinux/selinux-policy that referenced this pull request Jul 16, 2019
wrabcak added a commit to fedora-selinux/selinux-policy that referenced this pull request Jul 16, 2019
wrabcak added a commit to fedora-selinux/selinux-policy that referenced this pull request Jul 16, 2019
@wrabcak
Copy link
Member

wrabcak commented Jul 16, 2019

Added to selinux-policy for F29, F30 and Rawhide:
fedora-selinux/selinux-policy@444993d

main.c Outdated

if (asprintf (&whiteout_opq_path, "%s/" OPAQUE_WHITEOUT, path) < 0)
return -1;
sprintf (whiteout_opq_path, "%s/" OPAQUE_WHITEOUT, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should still use snprintf?

main.c Outdated
ret = asprintf (&path, "%s/%s", parent->path, name);
if (ret < 0)
return ret;
sprintf (path, "%s/%s", parent->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&whiteout_path, "%s/%s", parent->path, name);
if (ret < 0)
return ret;
sprintf (whiteout_path, "%s/%s", parent->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&whiteout_wh_path, "%s/.wh.%s", parent->path, name);
if (ret < 0)
return ret;
sprintf (whiteout_wh_path, "%s/.wh.%s", parent->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&whiteout_path, "%s/%s", parent->path, name);
if (ret < 0)
return ret;
sprintf (whiteout_path, "%s/%s", parent->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&whiteout_path, ".wh.%s", name);
if (ret < 0)
return ret;
sprintf (whiteout_path, ".wh.%s", name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&whiteout_path, "%s/.wh.%s", parent->path, name);
if (ret < 0)
return ret;
sprintf (whiteout_path, "%s/.wh.%s", parent->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And continue using it fall sprintf calls....

main.c Outdated
ret = asprintf (&whiteout_path, ".wh.%s", dent->d_name);
if (ret < 0)
return NULL;
sprintf (whiteout_path, ".wh.%s", dent->d_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

main.c Outdated
ret = asprintf (&path, "%s/%s", pnode->path, name);
if (ret < 0)
return NULL;
sprintf (path, "%s/%s", pnode->path, name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snprintf

@rhatdan
Copy link
Member

rhatdan commented Jul 22, 2019

Would have been a lot easier to review, if you did a lot of these as separate reviews and documented how they helped improve performance.

giuseppe added 5 commits July 22, 2019 13:07
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the perf-improvements branch from 56ad386 to 59cd4c0 Compare July 22, 2019 15:31
@mbargull
Copy link

Happy to see work fuse-overlayfs' performance.
I can't really comment on/review these changes, but was curious enough to compile and test this branch against a slow-running case I know of. Hopefully the following can be of some use.

This records the time taken to create a couple thousand folders (via seq $count | xargs mkdir) and remove them (via find . -delete) for the different storage drivers for podman:

/usr/bin/fuse-overlayfs --version
/tmp/fuse-overlayfs-gh-88 --version
for storage_option in \
    --storage-driver=vfs \
    --storage-driver=overlay \
    --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs \
    --storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88 \
    ; do
  printf \\n\\n
  for count in 1000 2000 20000 40000 80000 ; do
    podman --root="$(pwd)/containers" ${storage_option} run --rm -qit \
      --workdir=/test --env=count="${count}" --env=TIME="${storage_option} ${count} %S %U %e %C" \
      busybox sh -ec 'seq "${count}" | \time xargs mkdir && \time find . -delete ; printf \\n'
  done
  rm -rf ./containers
done | column -tLN storage,count,sys,user,wall,command -R storage,count,sys,user,wall
fuse-overlayfs: version 0.4.2
FUSE library version 3.6.1
using FUSE kernel interface version 7.29
fusermount3 version: 3.6.1
fuse-overlayfs: version 0.4.2
FUSE library version 3.6.2
using FUSE kernel interface version 7.31
fusermount3 version: 3.6.1
                                                      storage  count    sys  user    wall  command             
                                                                                                               
                                                                                                               
                                         --storage-driver=vfs   1000   0.00  0.00    0.00  xargs    mkdir
                                         --storage-driver=vfs   1000   0.00  0.00    0.00  find     .          -delete
                                                                                                               
                                         --storage-driver=vfs   2000   0.00  0.00    0.00  xargs    mkdir
                                         --storage-driver=vfs   2000   0.01  0.00    0.01  find     .          -delete
                                                                                                               
                                         --storage-driver=vfs  20000   0.03  0.00    0.04  xargs    mkdir
                                         --storage-driver=vfs  20000   0.11  0.02    0.14  find     .          -delete
                                                                                                               
                                         --storage-driver=vfs  40000   0.07  0.01    0.09  xargs    mkdir
                                         --storage-driver=vfs  40000   0.16  0.10    0.28  find     .          -delete
                                                                                                               
                                         --storage-driver=vfs  80000   0.17  0.02    0.19  xargs    mkdir
                                         --storage-driver=vfs  80000   0.36  0.19    0.58  find     .          -delete
                                                                                                               
                                                                                                               
                                                                                                               
                                     --storage-driver=overlay   1000   0.00  0.00    0.00  xargs    mkdir
                                     --storage-driver=overlay   1000   0.00  0.00    0.00  find     .          -delete
                                                                                                               
                                     --storage-driver=overlay   2000   0.00  0.00    0.01  xargs    mkdir
                                     --storage-driver=overlay   2000   0.00  0.00    0.02  find     .          -delete
                                                                                                               
                                     --storage-driver=overlay  20000   0.08  0.00    0.09  xargs    mkdir
                                     --storage-driver=overlay  20000   0.14  0.04    0.19  find     .          -delete
                                                                                                               
                                     --storage-driver=overlay  40000   0.17  0.01    0.19  xargs    mkdir
                                     --storage-driver=overlay  40000   0.26  0.11    0.40  find     .          -delete
                                                                                                               
                                     --storage-driver=overlay  80000   0.36  0.02    0.39  xargs    mkdir
                                     --storage-driver=overlay  80000   0.49  0.26    0.80  find     .          -delete
                                                                                                               
                                                                                                               
                                                                                                               
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs   1000   0.01  0.00    0.07  xargs    mkdir
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs   1000   0.01  0.00    0.12  find     .          -delete
                                                                                                               
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs   2000   0.03  0.00    0.22  xargs    mkdir
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs   2000   0.05  0.00    0.38  find     .          -delete
                                                                                                               
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  20000   1.77  0.17   15.23  xargs    mkdir
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  20000   3.05  0.62   31.32  find     .          -delete
                                                                                                               
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  40000   4.72  0.22   73.52  xargs    mkdir
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  40000   7.53  1.78  149.06  find     .          -delete
                                                                                                               
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  80000   9.03  0.61  342.64  xargs    mkdir
  --storage-opt=overlay.mount_program=/usr/bin/fuse-overlayfs  80000  13.89  3.20  673.05  find     .          -delete
                                                                                                               
                                                                                                               
                                                                                                               
  storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88   1000   0.01  0.00    0.05  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88   1000   0.01  0.00    0.09  find     .          -delete
                                                                                                               
  storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88   2000   0.02  0.00    0.15  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88   2000   0.03  0.01    0.24  find     .          -delete
                                                                                                               
  storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  20000   0.89  0.04    7.10  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  20000   1.47  0.42   13.53  find     .          -delete
                                                                                                               
  storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  40000   2.76  0.31   34.43  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  40000   5.36  1.07   67.84  find     .          -delete
                                                                                                               
  storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  80000   6.97  0.37  202.25  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88  80000  11.73  3.18  401.59  find     .          -delete
                                                                                                               

/tmp/fuseoverlayfs-gh-88 is created via the Dockerfile.static from this repository (changed to target this branch) with the state before the last push/GitHub outage.
Overall, speed seems to more or less double compared to the release version, which is nice. The superlinearily increasing time is still observable, though.

Regarding the new options: I have no idea (didn't think about/research it) whether setting fsync=0/fsync=1 should/would affect this test case, but it seems only threaded=0 vs threaded=1 shows a difference:

|> podman --root="$(pwd)/containers" \
|>   --storage-opt='"overlay.mountopt=threaded=0,fsync=0","overlay.mount_program=/tmp/fuse-overlayfs-gh-88"' \
|>   run --rm -qit --workdir=/test --env=TIME='%S %U %e %C' \
|>   busybox sh -ec 'seq 20000 | \time xargs mkdir && \time find . -delete'
0.90 0.07 7.13 xargs mkdir
1.66 0.25 13.43 find . -delete
|> podman --root="$(pwd)/containers" \
|>   --storage-opt='"overlay.mountopt=threaded=0,fsync=1","overlay.mount_program=/tmp/fuse-overlayfs-gh-88"' \
|>   run --rm -qit --workdir=/test --env=TIME='%S %U %e %C' \
|>   busybox sh -ec 'seq 20000 | \time xargs mkdir && \time find . -delete'
0.94 0.04 7.18 xargs mkdir
1.62 0.35 13.52 find . -delete
|> podman --root="$(pwd)/containers" \
|>   --storage-opt='"overlay.mountopt=threaded=1,fsync=0","overlay.mount_program=/tmp/fuse-overlayfs-gh-88"' \
|>   run --rm -qit --workdir=/test --env=TIME='%S %U %e %C' \
|>   busybox sh -ec 'seq 20000 | \time xargs mkdir && \time find . -delete'
1.19 0.03 10.17 xargs mkdir
1.61 0.55 17.83 find . -delete
|> podman --root="$(pwd)/containers" \
|>   --storage-opt='"overlay.mountopt=threaded=1,fsync=1","overlay.mount_program=/tmp/fuse-overlayfs-gh-88"' \
|>   run --rm -qit --workdir=/test --env=TIME='%S %U %e %C' \
|>   busybox sh -ec 'seq 20000 | \time xargs mkdir && \time find . -delete'
1.10 0.11 10.30 xargs mkdir
1.76 0.51 18.03 find . -delete

@mbargull
Copy link

Re-ran the above with the latest changes from 59cd4c0 :

--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0   1000  0.01  0.00  0.04  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0   1000  0.01  0.00  0.06  find     .          -delete
                                                                                                                    
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0   2000  0.02  0.00  0.09  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0   2000  0.04  0.00  0.13  find     .          -delete
                                                                                                                    
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  20000  0.20  0.00  0.93  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  20000  0.35  0.09  1.29  find     .          -delete
                                                                                                                    
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  40000  0.38  0.02  1.87  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  40000  0.71  0.19  2.59  find     .          -delete
                                                                                                                    
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  80000  0.78  0.05  3.76  xargs    mkdir
--storage-opt=overlay.mount_program=/tmp/fuse-overlayfs-gh-88-59cd4c0  80000  1.42  0.42  5.21  find     .          -delete

Some of those changes fixed the non-linear slowness -- massive improvement!

@giuseppe giuseppe force-pushed the perf-improvements branch from 59cd4c0 to 9e20d96 Compare July 22, 2019 21:17
@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2019

LGTM
We need to get this into Rawhide first and make sure it does not break anything... Also put it up for fedora updates but make sure everyone on the team installs it. Especially QE. Need to quickly get it into CI/CD tests for podman.

@giuseppe
Copy link
Member Author

@mbargull thanks a lot for your tests, I've opened a new PR that should improve the tests case you've reported above: #89

For your test case, the new options should not make any difference, in particular sync=0 is useful when there are too many fsync(2) happening on the file system, it seems to make a huge difference with apt and dpkg.

Could you re-run the test with #89? In general I am curious about the previous released version of fuse-overlayfs vs the version in #89.

@giuseppe
Copy link
Member Author

We need to get this into Rawhide first and make sure it does not break anything... Also put it up for fedora updates but make sure everyone on the team installs it. Especially QE. Need to quickly get it into CI/CD tests for podman.

I'll do that once https://bodhi.fedoraproject.org/updates/FEDORA-2019-b156bd756a is finally moved to stable and #89 is also merged

rhatdan added a commit that referenced this pull request Jul 24, 2019
fuse-overlayfs: different performance tweaks continuation of #88
@mbargull
Copy link

@mbargull thanks a lot for your tests, I've opened a new PR that should improve the tests case you've reported above: #89

With pleasure! Thanks for your work on this. I'll report in gh-91, comparing release 0.4.2, gh-88, gh-89, and gh-91.

For your test case, the new options should not make any difference, in particular sync=0 is useful when there are too many fsync(2) happening on the file system

Right, makes sense, the mkdir/rm test case wouldn't necessarily sync, thanks.

@giuseppe
Copy link
Member Author

Sorry I have just opened a new PR. Could you just use that? The intermediate ones don't make much sense to be tested separately

@mbargull
Copy link

Ah, didn't see comment. Will do, gimme a couple of minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants