-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement FileDescriptor.Pipe()
#58
Conversation
@lorentey I haven't looked into pipes too deeply, but this seems like the right direction. There are some minor details I'm not sure about, such as an initializer backed by a syscall vs a static create method (just haven't thought it through), etc. WDYT? |
FileDescriptor.Pipe()
FileDescriptor.Pipe
FileDescriptor.Pipe
FileDescriptor.Pipe()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be very welcome addition! 👍
I think I'd prefer if we went with a simple pipe()
method, unless there is a good reason to have the Pipe
type that I'm not seeing.
I've addressed some of @lorentey's feedback. I still think a separate structure is warranted for the reasons listed here: #58 (comment), but let me know if you disagree. |
I think the "fileDescriptor" part of the name is superfluous. What if we went further with the pipe terminology and called them "inlet" and "outlet"? i.e. the descriptor used to push data in to the pipe and the descriptor where data emerges from? |
The more I think about this PR, the more I like having a nominal type (inside I really like @karwa 's suggestion ( |
I changed it to inlet/outlet, great suggestion @karwa! |
Naming them let (outlet, inlet) = try FileDescriptor.pipe()
defer {
try! outlet.close() // Closing a valid pipe never errors out.
try! inlet.close()
}
try launchChildProcess(input: .standardInput, output: inlet) // ???
try launchChildProcess(input: outlet, output: .standardOutput) // ??? Repeating #58 (comment): The terms let (input, output) = try FileDescriptor.pipe()
defer {
try! input.close() // Closing a valid pipe never errors out.
try! output.close()
}
try launchChildProcess(input: .standardInput, output: output)
try launchChildProcess(input: input, output: .standardOutput) |
It's not so bad if you use them from the pipe object/tuple: let pipe = try FileDescriptor.Pipe()
defer {
try! pipe.close() // Closing a valid pipe never errors out.
}
try launchChildProcess(input: .standardInput, output: pipe.inlet) // Oh, ok - outputting to the pipe
try launchChildProcess(input: pipe.outlet, output: .standardOutput) // Oh, ok - reading from the pipe But yeah, if you immediately destructure the result in to 2 file descriptors in the calling function, those names might not make sense and you can choose to bind them to other names in your destructuring. |
I don't know, Going from |
To further improve on @karwa's example, you can just pass the pipe:
And Even without that, |
What error? 😊 My entire point is that passing extension FileDescriptor {
public static func pipe() throws -> (input: Self, output: Self)
}
let pipe = try FileDescriptor.pipe()
defer {
try! pipe.input.close()
try! pipe.output.close()
}
// This looks obviously correct, and is actually correct. It's also eminently consistent.
try spawnChildProcess(input: .standardInput, output: pipe.output)
try spawnChildProcess(input: pipe.input, output: .standardOutput)
// This looks incorrect, but is actually correct. It's clever, but tiresome. I'll get it wrong every single time.
try spawnChildProcess(input: .standardInput, output: pipe.inlet)
try spawnChildProcess(input: pipe.outlet, output: .standardOutput) This also rhymes with how
Yep. However, does this trivial little syscall deserve an ABI stable named type plus a never-ending list of overloads? I am not convinced it does. Let's suppose we wanted to implement the super simple public func spawnChildProcess(
executable: FilePath,
arguments: [String],
input: FileDescriptor = .standardInput,
output: FileDescriptor = .standardOutput,
error: FileDescriptor = .standardError,
environment: ProcessEnvironment? = nil
) throws -> ProcessID (I don't think we'd want this as the primary frontend to process creation -- first and foremost, System ought to expose the full power of How exactly would we allow people to supply It seems we have two options:
These seem both terrible. My position is that We don't need to overthink this. |
I realize it gets a little worse, as we'd need Practically, I don't think let pipeline: Pipe = [
Command("cat", "swift-system/README.md")
Command("tr", "[:space:]", "\n")
Command("sort")
Command("uniq", "-c")
Command("sort", "-nr")
Command("head", "-10")
])
try await try pipeline.run()
// Prints to stdout:
// 96
// 9 the
// 7 for
// 6 to
// 6 and
// 6 a
// 5 is
// 5 System
// 5 =
// 5 ## I believe such a facility would be very desirable, but (in my view) it belongs in a package on top of System, rather than embedded in it. Whether On the other hand, having a named func startProcess() throws -> (pid: ProcessID, results: FileDescriptor) {
let file = try FileDescriptor.open("/tmp/foo.txt", .readOnly)
defer { try! file.close() }
let pipe = try FileDescriptor.pipe()
defer { try! pipe.output.close() } // Not both!
var actions: Spawn.Actions = []
actions.setStandardInput(to: file)
// Expands to:
actions.close(.standardInput)
actions.duplicate(file, as: .standardInput)
actions.close(file)
actions.setStandardOutput(to: pipe)
// Expands to:
actions.close(pipe.input)
actions.close(.standardOutput)
actions.duplicate(pipe.output, as: .standardOutput) // Note how the terminology lines up again!
actions.close(pipe.output)
do {
let pid = try ProcessID.spawn(
path: "/usr/bin/sort",
actions: actions,
arguments: ["/usr/bin/sort", "-r"])
return (pid, pipe.input)
} catch {
try! pipe.input.close()
throw error
}
} I don't think that System must provide such action shortcuts (they may be a little too magical) -- but if y'all think this is something worth spending a named type on, I will not continue arguing against it, either. At the end of the day, it is a relatively minor thing. 😅 The dup2 example action above underscores the importance of consistent naming though -- I think (On the other hand, I do realize I'm being a bit underhanded about naming the second component of |
I'm swayed by @lorentey's argument. I have replaced the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming is backwards from the pipe's perspective and is confusing. I get the stdin/stdout analogy, but I do not believe it applies in this case (or, rather, it can just as easily apply in reverse).
If a "thing" (i.e. an encapsulation) has a member named input or output, it's the thing's input/output. It's not the input/output of whatever is holding the thing. Input and output are terms relative to some frame of reference and the concepts flip when viewed from the inside vs the outside.
Stdin/stdout are the input and output of the currently executing process, that's why they're globals. That is, globals are "local" members inside of the current process. Inside the process, we read from the input and write to the output, hence the names. Outside that process, we communicate with a process by writing to its input and/or reading from its output.
Even if it's just a typealias, giving it a name like Pipe
is part of the source stability story. Giving it a name (even if via typealias) reifies it into a "thing" that has members.
If we're going with just a tuple, then the labels are less salient and can be ignored by someone familiar with the ordering. Otherwise they provide good documentation and I would prefer a clearer name. "Input/output" is relative to some frame of reference. man 2 pipe
terminology of "read end" and "write end" is absolute, as is plumbing terminology of "inlet" and "outlet". (readEnd: FD, writeEnd: FD)
, (outlet: FD, inlet: FD)
, etc., read better to me than (input: FD, output: FD)
while lacking a frame of reference.
Trouble with Tuples
As for tuples, I've usually regretted any time that I've settled on a typealias for a tuple rather than a real nominal type. It's unfortunate that real nominal types are such a pain to write in Swift.
An API pattern I've been playing around with elsewhere for such nominalized types is to have a var destructure: (fieldA: A, fieldB: B, ...)
, which gives you the tuple version if you really want/need it (e.g. switch expressions). This probably isn't the time to formalize this pattern in System.
I want to have a more comprehensive treatment of the file descriptor type hierarchy at some point and I don't mean to balloon this PR into that. The OS concept of file descriptor has been ad-hoc overloaded and extended over the decades (always easier to awkwardly extend an existing construct than invent a new one), but we want to provide stronger/better types when we can deliver extra value in doing so. I don't think it's too crazy to make a RawRepresentable
struct for pipes and a FileDescriptorProtocol
or some such family of related protocols. We currently have (albeit in sketches/draft PRs) stronger types such as SocketDescriptor
and KernelQueue
. These have different subsets of operations that make sense on them and each has additional operations available.
IIUC, pipe.write(...)
and pipe.read(...)
have an obvious behavior of forwarding to the appropriate end. Closing a pipe is weird though because (IIUC) you're likely to close the write end, but the read end is still open for reading anything buffered until it gets a zero (meaning EOF). A nominal type also lets us say FileDescriptor.Pipe.open()
instead of just FileDescriptor.pipe()
, which helps drive some of the intuition that the result should be closed (just like FileDescriptor.open()
).
For this PR
I'm inclined to drop the type alias for now. That at least saves the name FileDescriptor.Pipe
for future nominalization. Or, we could just nominalize it now...
Co-authored-by: Michael Ilseman <[email protected]>
@GeorgeLyon sorry for the long back-and-forth. It's frustrating that such a simple API sparks so much debate and deferred API design tradeoffs. I think @lorentey and I have settled on:
|
|
||
func testAdHocPipe() throws { | ||
// Ad-hoc test testing `Pipe` functionality. | ||
// We cannot test `Pipe` using `MockTestCase` because it calls `pipe` with a pointer to an array local to the `Pipe`, the address of which we do not know prior to invoking `Pipe`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably do (not gonna hold up this PR for it though) want to allow for mock testing of such stack local pointers. We'd probably have a variant that we'd pass in an array and it would compare the contents.
Co-authored-by: Michael Ilseman <[email protected]>
No worries! This matters so it is design, not bike-shedding and design is important :) I like the "readEnd", "writeEnd" nomenclature. For now I just blindly accepted the review suggestions, but I can do a second pass to make sure everything builds, etc. I'm not sure how you all handle formatting/CI etc so let me know if there is something else I can do (or if it is simpler feel free to just commandeer this PR). |
Sources/System/FileOperations.swift
Outdated
return valueOrErrno(retryOnInterrupt: false) { | ||
system_pipe(fds.baseAddress!) | ||
}.map { _ in (FileDescriptor(rawValue: fds[0]), FileDescriptor(rawValue: fds[1])) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to say var fds: Array<CInt> = [-1, -1]
and use the implicit array-to-pointer conversion by passing &fds
to the syscall (though I don't recall if it still works or has limitations). Otherwise, you can probably use withUnsafeMutablePointer
to avoid the extra bind-memory step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lorentey suggested I use the tuple form for a stronger guarantee that those values would end up on the stack. I haven't checked but I believe withUnsafeMutablePointer
will still require rebinding the type (I think this was a signedness issue), but would also require manually passing the count (1), whereas withUnsafeMutableBytes
takes this information from the buffer pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do make sure that this builds on windows - I'm almost certain that this will break the Windows builds. Also note that Windows should transact in HANDLE
s, aka void *
, so this is going to truncate at least. See CreatePipe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's fine to simply surround these additions with #if !os(Windows)
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pointer to a homogenous tuple is also bound to its element type, so you can use assumingMemoryBound
.
(I hope. I use this quite a lot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As written, this case indeed calls for assumingMemoryBound
.
The memory is already bound to Int32
, so bindMemory
doesn't add information.
This being said, we unfortunately don't have assumingMemoryBound
on the RawBufferPointer
types at this point. We also gain nothing from using a Buffer in this case: we must rely in the C call to be well behaved instead of having any sort of bounds checking. Given that, I suggest this for lines 389-393:
return withUnsafeMutablePointer(to: &fds) { pointer in
valueOrErrno(retryOnInterrupt: false) {
system_pipe(UnsafeMutableRawPointer(pointer).assumingMemoryBound(to: Int32.self))
}
}.map { _ in (FileDescriptor(rawValue: fds.0), FileDescriptor(rawValue: fds.1)) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a document detailing what "binding memory" actually does? I had thought this was just managing compile-time information so I'm not 100% clear on the difference between "assuming" memory bound and actually binding memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some scattered documentation, including the doc-comments, but the largest chunk is in the RawPointer
proposal: https://github.com/apple/swift-evolution/blob/main/proposals/0107-unsaferawpointer.md
Experience has shown that those sources are insufficient.
@NevinBR (I think) came up with a nice description of binding for humans a few weeks ago: https://forums.swift.org/t/pitch-implicit-pointer-conversion-for-c-interoperability/51129/36.
In general, if you're reminding the compiler of type information it should already have known, but had been obscured for some reason, then assumingMemoryBound
is the thing to reach for (as we did here). If you are telling the compiler new information, then reach for bindMemory
or withMemoryRebound
.
@swift-ci please test |
@swift-ci please test |
@swift-ci please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ready to merge as soon as the pointer binding thing is figured out.
Sources/System/FileOperations.swift
Outdated
return valueOrErrno(retryOnInterrupt: false) { | ||
system_pipe(fds.baseAddress!) | ||
}.map { _ in (FileDescriptor(rawValue: fds[0]), FileDescriptor(rawValue: fds[1])) } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Guillaume Lessard <[email protected]>
@swift-ci please test |
@swift-ci please test |
@GeorgeLyon do you have a Twitter handle I can use for a shoutout? |
I do not, but thank you for the thought :) Happy to have this landed! |
Late for the party, curious about is this optimised for Darwin's mach port implementation? |
I'm sorry, I don't understand this question. This PR adds a lightweight Swift wrapper around the classic UNIX The PR doesn't change how |
I believe that |
thanks for information |
See #57 for discussion.