-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rsync hard-links to save space #102
Conversation
Sounds like a good idea to me, but I'm not familiar with this backend. |
Thanks for this @art-w ! With @talex5's help I wrote the rsync backend (mainly for it's convenience and also for the macOS port of obuilder).
Yep, that is a correct assumption (or at least should be!). Anything in
So, from a Linux perspective, I think this change is good. It does come at the added cost of doing a copy rather than a rename, but I think the potential disk space saving is worth it. The rsync backend isn't supposed to be fast. I'll follow up again soon after I rebase and try this with the experimental macOS port (see #87). |
Thanks, your explanations does match my intuition! Yes the original (Out of curiosity, I'm going to run some tests without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @art-w. I've tried this with quite a few example on linux and also on the macOS PR (see this branch which has that work rebased on this PR) and everything seems to be working just fine.
I did a small, not particularly scientific, test on the performance impact using this obuilder spec file:
((from ocaml/opam@sha256:dca2be63c27c3860560bd70001f94c39c32e8a22fdc270e5e77d297b665c871f)
(run (shell "echo 1"))
(run (shell "echo 2"))
(run (shell "echo 3")))
I ran the spec file once with only the from
section to get the image untarred into the build cache and then compared running the three echoes with the proposed changes and master
. On the machine I was using, your PR took 4m20.703
and master
took 1m36.558s
.
There's a tradeoff here (as per usual) between space and time and I was thinking maybe it would be better to let the user decide whether or not to use hardlinks? I think an rsync store that has sometimes used hardlinks and other times hasn't would still be a working store although I'm not sure (I'm thinking here if you forgot to add the --use-hardlinks
flag or whatever way we could expose that in the CLI). What do you think?
Thanks for testing it out on another platform! I like the idea of letting the user choose the tradeoff so I added the corresponding CLI flag: by default it keeps the original On your example, I get 7.9G in 3m25s for |
4a9edbe
to
b24bef1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @art-w, I'm happy with these changes and it's a nice addition I think :))
CHANGES: - Add --fuse-path to allow selection of the path redirected by FUSE (@mtelvers ocurrent/obuilder#128, reviewed by @MisterDA ) - Pre-requisites for Windows support using docker for Windows (@MisterDA ocurrent/obuilder#116, reviewed by @tmcgilchrist) - Additional tests and prerequistes for Windows support (@MisterDA ocurrent/obuilder#130, reviewed by @tmcgilchrist) - Add support for Docker/Windows spec (@MisterDA ocurrent/obuilder#117, reviewed by @tmcgilchrist) - Depend on Lwt.5.6.1 for bugfixes (@MisterDA ocurrent/obuilder#108, reviewed by @tmcgilchrist) - Add macOS support (@patricoferris ocurrent/obuilder#87, reviewed by @tmcgilchrist @talex5 @kit-ty-kate) - Enable macOS tests only on macOS (@MisterDA ocurrent/obuilder#126, reviewed by @tmcgilchrist) - Dune 3.0 generates empty intf for executables (@MisterDA ocurrent/obuilder#111, reviewed by @talex5) - Fix warnings and CI failure (@MisterDA ocurrent/obuilder#110, reviewed by @talex5) - Expose store root and cmdliner term with non-required store (@MisterDA ocurrent/obuilder#119, reviewed by @tmcgilchrist) - Expose Rsync_store module (@MisterDA ocurrent/obuilder#114, reviewed by @talex5) - Rsync hard-links to save space (@art-w ocurrent/obuilder#102, reviewed by @patricoferris)
Please note that I have NO idea what I'm doing: I'm working under the assumption that files are never modified in place in the store but always copied elsewhere first. If that ain't the case, well, please ignore and close this PR harshly!
I was hoping to save a bit of disk space in the obuilder store when using rsync:
Here
ce0813
was created from343861
by runningsudo ln -f /usr/bin/opam-2.0 /usr/bin/opam
. As a full build involves a dozen steps, the copy-everything is eating my disk alive... But by asking nicely, rsync could observe that files fromce0813
are identical to those in343861
and create hard links to the originals rather than a real copy. This is obviously wrong if either can be updated in place later!Regarding the rsync arguments:
--link-dest=
is the path in which the original files will be discovered (and hard-linked to). When this argument is a relative path, it is interpreted as relative to thedst
directory (which would be plain wrong here!)... Hopefully the paths are always absolutes, hence the cmdliner quick fix to enforce it. I tried relative paths for the rsync store to see if I was breaking existing functionality, but it was already crashingrunc
because it led to the wrong store path.=> I'm not sure if the
btrfs
backend has the same limitation? (the doc seems to imply so)--checksum
may not be 100% required, but otherwise rsync might hard-link files even though they could be different as it only checks the filename, file size, modifications dates, ... and not the actual content.Anyway, the result makes me sad. Hard-links are correctly created for files, but not directories: (because "stuff tends to break when your fs is not a tree")
"Everything is a file, but some files are more files than others."