Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS support #87

Closed
wants to merge 10 commits into from
Closed

Conversation

patricoferris
Copy link
Contributor

@patricoferris patricoferris commented Nov 9, 2021

This PR is a draft for initial macOS support using users as sandboxes. This PR contains:

  • A new macos_sandbox.ml implementation which generates a new user and executes commands as the user, note at this time obuilder only expects a single user to be run at a time.
  • A new rsync storage backend -- this is for easier testing (especially on macOS where ZFS is only available using kernel extensions)

This PR is a re-branding of this branch that was proving to difficult to rebase and also the commit history was a bit wonky. To get in this "working" state I had a lot of help from @kit-ty-kate and @talex5, thanks!

Questions

  1. The rsync backend is almost a complete copy of the btrfs backend which has a result and result-tmp directory and can quickly copy from one to the other with snapshotting. ZFS (due to limitations) builds directly in the result directory. I'm wondering if rsync should do the same which should make it significantly faster.
  2. We should probably sandbox-exec everything, not sure how nicely that plays when nested (when opam invokes it).
  3. Originally, we were going to choose the sandboxing environment at build time by choosing an implementation based on the OS but given the Docker backend could in theory run on Linux, Windows and macOS (with a bit of tweaking) I'm not sure if we still want to go with this approach ?
  4. The User_temp.ml could probably just be removed now, everything can use the "Docker fetcher" or would it be good to support a file-path based fetch (not limited to /Users) just some tar:/path/to/tar, it can be useful on macOS for testing but not strictly necessary ?

TODOs

  • Test this newer branch more thoroughly by submitting jobs to it.
    • Currently being tested as mac-cat-2 in opam-repo-ci and running a opam-health-check with the latest version of all packages too.
  • Work out the docker image story.
  • Add the todos from Add macOS support opam-repo-ci#116
    • static home directory
    • Check caching works properly
  • Investigate ZFS issues on Linux and macOS -- recently the Github actions seem a bit flaky (see Temporarily pin tar-unix #85). I noticed problems on macOS to and resulted to forcefully unmounting datasets in the original branch, but it would be good to actually fix the problem rather than do that. (Fix ZFS tests in CI #91)

@kit-ty-kate
Copy link
Contributor

* Investigate ZFS issues on Linux and macOS

Do we really need to investigate if the rsync backend works fine or is zfs still used even with rsync?

@patricoferris
Copy link
Contributor Author

patricoferris commented Nov 9, 2021

Do we really need to investigate if the rsync backend works fine or is zfs still used even with rsync?

I was going to try and show using the Github runner that it is quite a bit slower but looks like it ran out of memory or something and I can't cancel the job :( Anyway, that's the main problem, rsync is slow. I used WSL to check the linux version worked and the stress test there says:

Ran 100 jobs (max 10 at once). 0 failures. Took 227.5 s (0.4 jobs/s)

I can't imagine that's all WSL/virtualisation's fault :/

When I get a chance I'll see if I can port the stress test to macOS and then do a head to head with ZFS maybe.

EDIT: Building directly in the directory gives

Ran 100 jobs (max 10 at once). 0 failures. Took 143.8 s (0.7 jobs/s)

Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable. Might be worth separating the rsync stuff out into a separate PR to make reviewing easier, but since that and the macos support is experimental it's not a big problem.

lib/macos_sandbox.mli Outdated Show resolved Hide resolved
lib/os.ml Outdated Show resolved Hide resolved
lib/os.ml Outdated
let pp s ppf = Fmt.pf ppf "[ %s ]" s in
if String.length pid = 0 then (Log.warn (fun f -> f "Empty PID"); Lwt.return ())
else begin
let delete = ["kill"; "-9"; pid ] in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux, kill says:

Negative PID values may be used to choose whole process groups; see the PGID column in ps command output.

Is that useful on macos too?

There's also setsid(1). I don't know much about this stuff, but I think it's the old POSIX way of managing groups of processes from before Linux had containers.

lib/os.ml Outdated Show resolved Hide resolved
lib/rsync_store.ml Outdated Show resolved Hide resolved
lib/rsync_store.ml Outdated Show resolved Hide resolved
let gen = cache.gen in
Rsync.copy_children ~src:snapshot ~dst:tmp >>= fun () ->
let { Obuilder_spec.uid; gid } = user in
Os.sudo ["chown"; Printf.sprintf "%d:%d" uid gid; tmp] >>= fun () ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be faster to do rsync --chown above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change went into #88 but I've just realised that macOS ships with a much older version of rsync that doesn't support --chown. There are two options:

  • Go back to this implementation so Obuilder will work with the default rsync macOS comes with.
  • Force people to download a newer version -- note this is not necessarily super straightforward because brew install rsync will put it in /usr/local and won't be available when obuilderfs is active, so the user will probably have to build from source and put the binary and libs elsewhere.

Any thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess we can revert that then. I'm not used to things being older than debian/stable! (and, indeed, older than debian/oldstable too!!)

Might be a performance hit, though. I remember that running sudo chown in Dockerfiles is massively slower than setting the ownership in the COPY.

lib/rsync_store.ml Outdated Show resolved Hide resolved
@MisterDA
Copy link
Contributor

MisterDA commented Nov 10, 2021

Originally, we were going to choose the sandboxing environment at build time by choosing an implementation based on the OS but given the Docker backend could in theory run on Linux, Windows and macOS (with a bit of tweaking) I'm not sure if we still want to go with this approach ?

This I have "solved" by having a Builder module using the usual Store and Sandbox modules, and a custom Docker_builder module. Two mutually-exclusive flags, --store=STORE and --docker-backend=PATH select respectively the Builder module or the Docker_builder module.

@patricoferris
Copy link
Contributor Author

Thanks for the review @talex5 :)) I will investigate a cleaner kill solution based on your suggestions :)

@kit-ty-kate
Copy link
Contributor

We should probably sandbox-exec everything, not sure how nicely that plays when nested (when opam invokes it).

Nested sandbox-exec is not permitted. When tried it returns:

sandbox-exec: sandbox_apply: Operation not permitted

It could be a good idea to enable sandboxing by default and to have some kind of parameter to disable it, e.g.

(run (sandbox disabled) (shell "opam install pkg")) # disabled for opam (allows us to test the opam sandbox)
(run (shell "dune build -p pkg")) # enabled by default for all other programs

@patricoferris
Copy link
Contributor Author

ocurrent/ocluster#152 is now in opam-repo-ci as mac-cat-2 to start testing this PR more thoroughly (note we're still using the User_temp fetcher and not Docker for the moment).

@@ -125,7 +125,9 @@ let cache ~user t name =
(* Create writeable clone. *)
let gen = cache.gen in
let { Obuilder_spec.uid; gid } = user in
Rsync.copy_children ~chown:(Printf.sprintf "%d:%d" uid gid) ~src:snapshot ~dst:tmp () >>= fun () ->
(* rsync --chown not supported by the rsync that macOS ships with *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t rsync be using -a (which includes -o and -g, aka. preserve owner/group) instead of plain rsync + chown? It should be more efficient.

Or are there some issues with owner/group needing changing between 2 snapshots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default all rsync commands have -aHq

let rsync = [ "rsync"; "-aHq" ]

The extra chown iiuc is for differences between uids/gids when restoring caches. It is not used in macOS where we ignore the config.user because we're using users to build, unlike in runc where it could be anyone

let user =
let { Obuilder_spec.uid; gid } = user in
`Assoc [
"uid", `Int uid;
"gid", `Int gid;
]

@patricoferris
Copy link
Contributor Author

Just adding an update here that this PR now uses "a static home directory", this is needed to avoid relocation errors (the OCaml compiler is not relocatable, nor are quite a few packages). This comes at the expense of 2 additional copies from and to the static home directory. A typical build step is now:

  • Store clone: /<store>/result/<hash> -> /<store>/result-tmp/<new-hash>
  • Delete contents of /Users/<static-home-dir-unique-to-uid>
  • Rsync: /<store>/result-tmp/<new-hash>/ -> /Users/<static-home-dir-unique-to-uid>
  • Build using /Users/<static-home-dir-unique-to-uid>
  • Rsync: /Users/<static-home-dir-unique-to-uid>/ -> /<store>/result-tmp/<new-hash>
  • Store snapshot

The extra copies do slow down each build step. An alternative solution that could be explored would be to bindfs the static home directory to the store. Obuilderfs is largely based on bindfs, except it also has the added cost of checking the calling uid and redirecting based on that via the "scoreboard", which we don't need for this particular redirection. It would add yet another fuse filesystem though, so it's a tradeoff between the speed of copying vs. the speed of having all FS actions go through userspace.

@tmcgilchrist tmcgilchrist mentioned this pull request Oct 10, 2022
@patricoferris
Copy link
Contributor Author

Closing in favour of #122

tmcgilchrist added a commit to tmcgilchrist/opam-repository that referenced this pull request Nov 7, 2022
CHANGES:

- Add --fuse-path to allow selection of the path redirected by FUSE (@mtelvers ocurrent/obuilder#128, reviewed by @MisterDA )
- Pre-requisites for Windows support using docker for Windows (@MisterDA ocurrent/obuilder#116, reviewed by @tmcgilchrist)
- Additional tests and prerequistes for Windows support (@MisterDA ocurrent/obuilder#130, reviewed by @tmcgilchrist)
- Add support for Docker/Windows spec (@MisterDA ocurrent/obuilder#117, reviewed by @tmcgilchrist)
- Depend on Lwt.5.6.1 for bugfixes (@MisterDA ocurrent/obuilder#108, reviewed by @tmcgilchrist)

- Add macOS support (@patricoferris ocurrent/obuilder#87, reviewed by @tmcgilchrist @talex5 @kit-ty-kate)
- Enable macOS tests only on macOS (@MisterDA ocurrent/obuilder#126, reviewed by @tmcgilchrist)
- Dune 3.0 generates empty intf for executables (@MisterDA ocurrent/obuilder#111, reviewed by @talex5)
- Fix warnings and CI failure (@MisterDA ocurrent/obuilder#110, reviewed by @talex5)

- Expose store root and cmdliner term with non-required store (@MisterDA ocurrent/obuilder#119, reviewed by @tmcgilchrist)
- Expose Rsync_store module (@MisterDA ocurrent/obuilder#114, reviewed by @talex5)
- Rsync hard-links to save space (@art-w ocurrent/obuilder#102, reviewed by @patricoferris)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants