-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose File.stat and improve Path.load #339
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this does need improving.
I'm not sure size
is the right method, though. Perhaps we need a more general stat
operation?
The Linux statx
system call takes a mask saying what information you want. Maybe we could do something like that?
Let me know what you think, I've added two versions of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with using read_exact is that if the file shrinks mid-read we just throw an EOF
OK, that seems like a good reason to use the Buf_read
!
the second issue is that due to the integer overflow in luv, read_exact would end up always giving an EOF.
If the low-level read operation doesn't work correctly, then neither will Buf_read built on top of it. Eio_luv
should just limit the buffer size to something it can handle. Callers need to handle short reads anyway (and I doubt to OS is going to write more than 2GB in one go anyway).
I've fixed the Uring fstat to use the Large version as well.
Thanks!
If you don't want to expose size as you mentioned before, we have to decide what to put and how to export stat, since we would need to see what Luv offers in windows and so on.
Yes, it needs a bit of thinking about. But I'd prefer to avoid adding a new method that everyone has to implement, when we know we're going to get rid of it again soon.
e2f38d3
to
746db28
Compare
This is ready for review, let me know what you think. |
746db28
to
1b6754e
Compare
lib_eio/path.ml
Outdated
Buf_read.take_all buf | ||
let load ((t:#Fs.dir), path) = | ||
let st = t#stat path in | ||
let mymax = min Sys.max_string_length Sys.max_array_length in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why max_array_length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we have to store it in a Cstruct.t before the converting it to a string, and max_array_length < max_string_length, at least on amd64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, that doesn't apply to bigarrays like cstruct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it occurs to me that, since we're opening the file anyway, it would be simpler to use File.stat
here, not Path.stat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check and bigarray is constrained by the that limit, but I might have been mistaken I'll check again.
I moved it to File
, problem is I can't call it a unix domain socket, I've added the method to File.ro
, to have it in socket it would have to be deeper, I'm not sure what is preferred here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check and bigarray is constrained by the that limit, but I might have been mistaken I'll check again.
Could you post a link to the code. It seems very surprising to me that a "big array" is the same size as a normal one.
In particular, https://v2.ocaml.org/api/Bigarray.html says:
Bigarrays are not limited in size, unlike OCaml arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check and bigarray is constrained by the that limit, but I might have been mistaken I'll check again.
Could you post a link to the code. It seems very surprising to me that a "big array" is the same size as a normal one.
In particular, https://v2.ocaml.org/api/Bigarray.html says:
Bigarrays are not limited in size, unlike OCaml arrays.
You're absolutely correct, I think my confusion came from the fact that Bigarray was using Array underneath and I assumed the same, but it's not:
caml_make_vect
(Array.create) checks for Max_wosize
where Bigarray's caml_ba_create
does not. I've removed the test.
@@ -495,3 +499,25 @@ Unconfined: | |||
+rename <fs:../foo> to <fs:foo> -> ok | |||
- : unit = () | |||
``` | |||
|
|||
# Stat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With make test_luv
, I get:
+|Exception: Eio.Fs.Not_found ("stat_link", _)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oopsy, will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, File.realpath
resolves the symlink and returns ENOENT, fix is not trivial, but since we don't do it in u-ring and therefore behaves differently, this is a bug. We're not using realpath(3) in Uring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should just do the realpath
on the parent, then? It's fine to lstat a symlink even if we don't have access to its target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's smart, I hadn't considered it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't address the realpath
on the parent here, it's outside of the scope imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now Path.stat
is gone - we can worry about that later :-) PR title needs updating too...
This is a bit in WIP, I think I've addressed all points but I'm not too sure about File.stat |
(** Tranditional Unix permissions. *) | ||
module Unix_perm = struct | ||
type t = int | ||
(** This is the same as {!Unix.file_perm}, but avoids a dependency on [Unix]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not so keen on this in the portable eio
core, as we may just have very simple filesystem backends. It would be nice to not have to dummy out Stat.t
with fake values for uid/gid, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this was in Path
, Path
is already part of the portable core, or did I misunderstand something ?
I'm unsure where to put the Stat, I was thinking on adding a new effect on Eio_unix.Private
:
| Stat : <unix_fd> -> Stat.t Effect.t
Does that make sense, technically all we need is a fd, or should I add to Eio_unix.unix_fd
directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no problem with having the concept of "Unix permission" itself in the core - even non-Unix systems will often want to access Unix filesystems, tarballs, etc.
A more interesting question is whether we should add type parameters to dir
to cover different types of file-system. e.g. our dir
could become Unix_perm.t dir
. However, that could be annoying, and any FS that can be mounted on Linux already has some suitable mapping.
Linux's statx(2)
already allows you to do a stat that might or might not return some values (e.g. you can say you don't care about the UID and it is free to leave that field blank). So ideally we want a way to indicate which fields are provided even there.
57fc678
to
f487afd
Compare
Had to step back a bit on this diff, I've rebased and force-pushed, from the new commit message:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apologize for wasting everyone's time and radically changing it, but the whole luv bits were too insane for my taste, see details below.
Sounds good. Do we really need to add it to the socket types, though? It doesn't seem useful for anything.
Frankly, can't see much uses, and can only remember one time I relied on it, I've added it more for the sake of correctness/completeness, and to use it as an excuse to learn some ocaml OO. I can gut it if it's the case. |
Yes please. It will be a pain for e.g. Mirage having to support fstat. |
Ack, leme know if this is ok and then I'll squash and merge. |
lib_eio/path.ml
Outdated
let load ((t:#Fs.dir), path) = | ||
with_open_in (t, path) @@ fun flow -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let load ((t:#Fs.dir), path) = | |
with_open_in (t, path) @@ fun flow -> | |
let load t = | |
with_open_in t @@ fun flow -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't resolved. Keeping it as one will also fix the problem that the string in the error loses the dir tag, since you can then use Path.pp
.
(we might need a bit of variable renaming in this module; lots of functions here are using t
for the directory and path
for a full path, which is now a bit backwards)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused, we don't really include the dir in the error string for any of the Eio.Fs exceptions:
let open_in ~sw ((t:#Fs.dir), path) = t#open_in ~sw path
method open_in ~sw path =
let fd = File.open_ ~sw (self#resolve path) [`NOFOLLOW; `RDONLY] |> or_raise_path path in
(flow fd :> <Eio.File.ro; Eio.Flow.close>) ^^^^
utop # let _ = Eio_luv.run (fun env -> let cwd = Eio.Stdenv.cwd env in Eio.Switch.run @@ fun sw -> Eio.Path.load Eio.Path.(cwd / "not_there"));;
Exception: Eio__Fs.Not_found ("not_there", Eio_luv.Luv_error 877502405).
let _ = Eio_linux.run (fun env -> let cwd = Eio.Stdenv.cwd env in Eio.Switch.run @@ fun sw -> Eio.Path.load Eio.Path.(cwd / "not_there"));;
Exception:
Eio__Fs.Not_found ("not_there", Unix.Unix_error (Unix.ENOENT, "openat2", "")).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's sort that out later then! It just seemed a shame to be splitting the path up only for the purpose of producing a slightly less informative error message.
@@ -495,3 +499,25 @@ Unconfined: | |||
+rename <fs:../foo> to <fs:foo> -> ok | |||
- : unit = () | |||
``` | |||
|
|||
# Stat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should just do the realpath
on the parent, then? It's fine to lstat a symlink even if we don't have access to its target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good now! (Would be nice to get the dir tag in the error message though). Can be merged once rebased / squashed.
@@ -495,3 +499,25 @@ Unconfined: | |||
+rename <fs:../foo> to <fs:foo> -> ok | |||
- : unit = () | |||
``` | |||
|
|||
# Stat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, now Path.stat
is gone - we can worry about that later :-) PR title needs updating too...
Some notes: - Use Unix.Largestat instead of Unix.stat - Add a portable stat for the user - Moved Path.Unix_perm into File.Unix_perm to avoid cyclical ref A previous attempt of this diff tried to make proper use of Luv.File.Stat but there were multiple issues: `mode` was not properly exported so I had to add a discovery.ml; it only works for a subset of file types and we had to manually convert timestamps to float; all doable but an insuferable amount of code to end up with less functionality than just calling into Unix.Largestat.stat. There are still some kinks: - Luv will always follow a link, so we can't really stat on a symbolic link, this is because we rely on realpath(3) which resolves the link. - We can't open named pipes in with luv atm, so we also can't stat it (ocaml-multicore#357). - I did test all filemodes, but didn't add all to the test since they are impractical (opening /dev/tty for Character_special...) -- Path.load was doubling the buffer and trying to fit enough until it saw an EOF. This would cause many unecessary large allocations and my poor 16GB ram laptop couldn't even load a 5GB file without getting OOMed. This diff makes load create an initial buffer of the size of the file and load exactly that amount of bytes in it.
74280aa
to
b7f87f1
Compare
CHANGES: API changes: - Unify IO errors as `Eio.Io` (@talex5 ocaml-multicore/eio#378). This makes it easy to catch and log all IO errors if desired. The exception payload gives the type and can be used for matching specific errors. It also allows attaching extra information to exceptions, and various functions were updated to do this. - Add `Time.Mono` for monotonic clocks (@bikallem @talex5 ocaml-multicore/eio#338). Using the system clock for timeouts, etc can fail if the system time is changed during the wait. - Allow datagram sockets to be created without a source address (@bikallem @haesbaert ocaml-multicore/eio#360). The kernel will allocate an address in this case. You can also now control the `reuse_addr` and `reuse_port` options. - Add `File.stat` and improve `Path.load` (@haesbaert @talex5 ocaml-multicore/eio#339). `Path.load` now uses the file size as the initial buffer size. - Add `Eio_unix.pipe` (@patricoferris ocaml-multicore/eio#350). This replaces `Eio_linux.pipe`. - Avoid short reads from `getrandom(2)` (@haesbaert ocaml-multicore/eio#344). Guards against buggy user code that might not handle this correctly. - Rename `Flow.read` to `Flow.single_read` (@talex5 ocaml-multicore/eio#353). This is a low-level function and it is easy to use it incorrectly by ignoring the possibility of short reads. Bug fixes: - Eio_luv: Fix non-tail-recursive continue (@talex5 ocaml-multicore/eio#378). Affects the `Socket_of_fd` and `Socketpair` effects. - Eio_linux: UDP sockets were not created close-on-exec (@talex5 ocaml-multicore/eio#360). - Eio_linux: work around io_uring non-blocking bug (@haesbaert ocaml-multicore/eio#327 ocaml-multicore/eio#355). The proper fix should be in Linux 6.1. - `Eio_mock.Backend`: preserve backtraces from `main` (@talex5 ocaml-multicore/eio#349). - Don't lose backtrace in `Switch.run_internal` (@talex5 ocaml-multicore/eio#369). Documentation: - Use a proper HTTP response in the README example (@talex5 ocaml-multicore/eio#377). - Document that read_dir excludes "." and ".." (@talex5 ocaml-multicore/eio#379). - Warn about both operations succeeding in `Fiber.first` (@talex5 ocaml-multicore/eio#358, reported by @iitalics). - Update README for OCaml 5.0.0~beta2 (@talex5 ocaml-multicore/eio#375). Backend-specific changes: - Eio_luv: add low-level process support (@patricoferris ocaml-multicore/eio#359). A future release will add Eio_linux support and a cross-platform API for this. - Expose `Eio_luv.Low_level.Stream.write` (@patricoferris ocaml-multicore/eio#359). - Expose `Eio_luv.Low_level.get_loop` (@talex5 ocaml-multicore/eio#371). This is needed if you want to create resources directly and then use them with Eio_luv. - `Eio_linux.Low_level.openfile` is gone (@talex5 ocaml-multicore/eio#378). It was just left-over test code.
Path.load was doubling the buffer and trying to fit enough until it saw an EOF. This would cause many unecessary large allocations and my poor 16GB ram laptop couldn't even load a 5GB file without getting OOMed.
This diff makes load create an initial buffer of the size of the file and load exactly that amount of bytes in it.
Also, using take_all here isn't a good idea since it will resize even if initial_size (capacity) is adequate, that's because it does:
ensure t.len(0) (0+1), which ends up allocating another bigarray.