-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharing memory between containers #996
Comments
I am trying to continue on my side on this topic. Now I am quite sure the limitation does not come from the Kernel and the backing file system. In fact, the reflink copy (provided by some FS like XFS) as no chance to work in regards to mmap. As a matter of fact, I turn my hopes to overlay FS which should be geared for this kind of optimization. So, I am digging into this by testing with podman et reading the code of container/storage. Back to my use case:
The FS diff containing the output.dat file, "sha256:62b02674a316aa00ca3e17fe18af907dffc03a4d082b749e15c159418be1ed8f" is shared between to layers. This is confirmed when looking in podman's graphroot.
The Let's confirm this:
We can see the inodes are corresponding and it is what I expected from overlay. But here we have another problem, the device id are not the same which is probably due to the different mount points for the two containers (tell me if I am wrong). Maybe we can think to a solution so that the layers share fs diff, by adding somehow an indirection at layer level. What's your views on this ? |
In order to progress, I was thinking to the following small change. We could add a Then, in each layer composing an image, the directory In the case of the writable top layer of a container, the With this change, all the shareable filesystem diffs (basically, all the read-only layers composing an image) could be reused by completely different images which could result in reducing memory footprint of containers running on the same host. Do you see some drawbacks to this kind of modification ? |
@giuseppe Thanks so much for your answer. Indeed, #995 looks promising. Locally in my podman graphroot directory, I turned one 'output.dat' into a hardlink to the 'output.dat' of the another layer and the memory is, as expected, shared between containers. I saw that your PR is from a branch named "zstd-chunked-hard-links-dedup" but is it related to the compression algorithm of the layers ? |
@giuseppe Actually, the reason I proposed to have a separated directory containing all the file system diffs, is that in the Docker documentation (https://docs.docker.com/storage/storagedriver/overlayfs-driver/#how-the-overlay-driver-works) we can read:
But to be honest, I don't really understand the "excessive use of inodes" when using hardlinks. What I proposed could, at least, only reduce the number of used inodes. |
that feature is related to the zstd:chunked feature I was working on. I don't think there is a "excessive use of inodes" problem when using hardlinks, there are other issues though so IMO using hardlinks must be a last resort. In what timezone are you located? Do you think it could be helpful to have a call to discuss the issue you are having? |
Thanks for your answer. |
@giuseppe as discussed this morning, I manually did what I described in my proposal directly in the podman graphroot.
I moved one diff in a directory
Now the two layers look like this:
I spawned two containers from the different images:
As we can see here, now the memory is shared:
So with no change in podman, the change in the storage layout fixes my case. |
sorry for the delay. In your example bde06f8a7e and de3ea9ea6 will both point to the same data. How do you pick what layers can be deduplicated with this mechanism? |
I think this problem is fixed upstream with the hard link deduplication feature we have with |
Hello,
I am trying to figure out whether it is possible to share the loaded shared library between containers. The rationale behind this is that if I build twice an image (without using the cache
--no-cache
) and provided I take care a creating reproducible layer FS diff, then the instantiated container will not use twice the amount of memory.To make it more explicit and without talking about shared libraries, let's have a look at the following image:
Running the container from the described image, I can see the
output.dat
file is mapped once in memory. The Proportial Set Size matches the size of the file mapped in memory (see Pss column).Now running the same image a second time:
The PSS is now divided by two meaning the file is mapped once and shared by the two containers. This is normal because I use the "overlay" storage driver. (it would not have worked with "vfs" backed by "ext4" for instance because of the lack of reflink support).
Now let's build another image without using the cache. The way
output.dat
is created will result in the same layer diff as we can see below:The last layer
sha256:62b02674a316aa00ca3e17fe18af907dffc03a4d082b749e15c159418be1ed8f
is the same.If I run a container for this latest image, I can see the PSS of
output.dat
, for the first container I ran, will not decrease because the file has not the same deviceid/inode.This makes sense to me when the backing filesystem is something that does not support reflink like ext4. So I tried this on XFS with reflink activated but I have the same result. I would have expected that this could be improved because the mmap'ed file is in fact the same if one is a reflink of the other.
So my questions:
System information:
The text was updated successfully, but these errors were encountered: