-
-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Names for communication ABCs #1208
Comments
Strongly seconded. In an ideal world we would rework I'd go with |
Well, in an ideal world we would just have Trio to be the default framework of choice for async in all languages. The reality is different though: asyncio.Stream is designed the way it is designed to preserve backwards compatibility, and so unfortunately there's not a lot of room left to change it. Not including it in 3.8 wouldn't change any of that. That said, if there's something we can do to make it easier for people to write adapters from Trio streams to asyncio Streams (if that's even a thing) we'll gladly consider that! |
Can this be made part of the same abstract interface? What This would be more problematic for |
Agree with @1st1 |
Umm, yeah. My take would be to make the whole thing modular. Details: Add a Most of these filters would even be framework agnostic; they would work with trio, asyncio, or whateverio. (As long as nobody needs sub-tasks or timeouts, of course.) You don't even need to write the NB: IMHO: Whoever invented a |
Oh, hm, that was me. We're still debating about that though. One of the solutions is to add |
Yes, please. Or call it Arguments against this idea: (a) this prevents automated testing (with mypy et al.) whether somebody forgot an |
I actually like all of this. BUT |
How about Trio renames We can maybe work around the weird nature of And lastly we'll have a problem with |
Well, sometimes you really do need to kill off a connection Right Now, no matter what. I'd rename it to The nice, everybody-should-use-it async version might be named |
I've discussed this with @ambv and here's the plan I like for 3.8:
I'm fine with adding The key question I have now is this: If we do the above, would that be better for Trio in any way? |
Well, I'm not @njsmith but from my PoV, yes sure. Splitting up The other problem that we still have is the one this topic is supposed to be about, i.e. naming things. If |
Sure, we'd accept help. Splitting up sounds interesting, as I'm a big fan of composable component-oriented APIs such as Twitter's Finagle. cc @asvetlov |
I'm all for standardization across the board, especially if it ends up allowing me to reduce the code base of AnyIO. Because the semantics of network functionality of the various I/O libraries were so wildly different, I had to invent my own. I would jump at the opportunity to throw out that code and piggyback on the individual network layers of each framework, provided that the semantics were in harmony with each other. But, it seems like there will be a long way to go until that happens. I have no real preferences about the naming of the ABCs, so long as they're descriptive and everybody agrees on them. @1st1 Are you still planning to make that asyncio-next library? |
Hey man, I know this was humorous exaggeration, but let's criticize tech not people, and skip the violent imagery. I edited your post to remove that bit.
Trio currently calls the method The different names also have a potential advantage here: they mean the same object could potentially expose both a backwards-compatible set of operations for legacy asyncio code, and also implement this hypothetical ABC we're talking about with the improved semantics. (And the new methods could skip straight to the desired semantics while the old ones are working their way through their deprecation periods.) There is one big conflict though: iteration. Trio's byte stream interface allows async iteration and it returns arbitrary byte chunks. We don't have to sort out all the details right now, but maybe for 3.8 you might consider replacing
In Trio this is actually handled as part of the |
I had a similar reaction to @sorcio's about this bit:
It seems like there's a generalization of BytesTube = ChunkTube[Byte] or
|
Capturing some comments from chat:
My response: Surely they'd at least call it UppercasingBytesStream, emphasis on the "s" :-). Though it would probably make more sense to call it
My response: Huh, that's a great analogy. We should steal that for the docs if nothing else. Something like:
...well that leaves out the part where streams are bidirectional, not sure how to work that in. But this is obviously one of those things that's just intrinsically confusing the first time you encounter it, so we'll want to take the time and explain it 3 different ways and hope that at least one of them works. And this is a great addition to our quiver. |
A few of these are interesting.
Plus,
|
For what it's worth, I don't love the idea of using |
(In |
@glyph The thing we're talking about here is what class TransformedFlow(Flow[OuterT]):
def __init__(self, inner: Flow[InnerT]):
...
outer_flow = TransformedFlow(inner_flow) (There are unidirectional versions too of course, I'm just illustrating the general point.) It sort of reminds me of that famous line, "have you considered calling a function with arguments?". Async functions/methods enable a lot of simple patterns that didn't use to be possible. Help us out though :-). I'm know you'd love to see the Python ecosystem develop standard, composable abstractions for this stuff that become popular and ubiquitous, way or another. The goals here are identical to |
That's a bit too much work for us -- changing the semantics of .write()/.close() is already bad enough, as it requires users of asyncio subprocess API to update their code. Asking them to deal with yet another significant API change is not an option. :( I have a counter proposal though -- why doesn't Trio implement the default |
@1st1 I'm only talking about the new
Because we don't have line-splitting functionality at all in these types – adding it would be a major change because it requires adding a buffer and an end-of-line scanning algorithm, and both of those are quite tricky because the naive algorithm is O(N**2). And the whole idea here is to separate those kinds of algorithms off into a single robust implementation instead of duplicating them in every class.
That's technically feasible, and how it used to work in Trio, but it's unattractive because the # what people used to have to write every time they used a Trio stream
while True:
chunk = await stream.receive_some()
if not chunk:
# EOF
break
# ... handle chunk ...
# What they write now
async for chunk in stream:
# ... handle chunk ... It's not just the extra typing; it's that this particular boilerplate is especially error-prone and confusing to new users. It used to be that basically every time someone posted their first program in the Trio chat to ask for feedback, they had a bug in this loop. Also all our tutorial examples got substantially simpler. |
I guess I'm too old for this -- I don't understand why "stream of objects" would be more fundamental than "stream of bytes", and I thought that the original terms Stream and Channel[T] were exactly right. |
Hi Guido! When I say "more fundamental", I mean it's a more general concept – "stream of bytes" is a special case of "stream of things". In an idealized theoretical world, what trio currently calls This is a super difficult set of concepts to pin down, and there are at least 3 ways of splitting them up that are natural in some situations, and run into problems in others. (@glyph's But even if we kept |
In other words, possibly-different-enough to get a separate name, and mayyybe different accessors (
Seconded, for what that's worth, and with the above naming caveat.
Exactly. If asyncio keeps this |
I could be convinced to use a separate name. My big problem with |
The new That's why in the new stream design we carefully reproduce the existing old API methods. |
@njsmith Well, that's easy. A channel transports distinct sequential entities. Ships, in the real world. @asvetlov These properties could equally well be |
+1 on the naturalness of Stream and Channel. Naming ain’t easy, but I could
not imagining swapping these two. I don’t think that saying a Stream is a
Channel[byte] helps with understanding. Unless you are trying to prove
something purely mathematical and abstract. But that’s not common in
programming.
--
--Guido (mobile)
|
In any case, after taking a day off to hack on the code, I've got an initial split-up of it all up and running. Rough code, heaps of compatibility hooks to get the tests working, no docs yet, but tests pass – and Trio's SSL code works with next to no changes. Available at github.com/smurfix/cpython, "streams" branch.. Will try to work on the code some more later this week. |
There's definitely a strong tradition of using both "stream" and "channel" to refer to all kinds of incremental point-to-point communication. In fact I just checked the original CSP paper, and Hoare actually uses both "channel" and "stream" about equally often, and with no explanation for either – basically he uses "channel" when he's talking about a connection between two processes, and "stream" when he's talking about the data flowing over that connection. Here's some examples – in both these cases he says "stream", but it's taken for granted that the stream is represented as a CSP channel: In a system like traditional asyncio, a generic name like "stream" is a good choice! There's only one thing it could refer to, so there's no confusion. Ditto for using "channel" in CSP or Go. But, when we put two different kinds of point-to-point communication in the same system, and try to distinguish them using two words that both refer generically to point-to-point communication, then that's where my problem starts, because now we have to change the meaning: we're redefining "stream" to mean "NOT objects", and "channel" to mean "NOT bytes" [1]. As far as I can tell, Trio is the only system that's ever done this – it's novel. And it contradicts a lot of common usage. In Trio right now, Hoare's clear, ordinary-looking language is just... wrong. When we talk about SSH channels, we have to be careful to clarify that these channels aren't channels. And we can't say things like "WebSocket enables streams of messages on top of TCP", or "TCP is the protocol that guarantees we can have a reliable communication channel over an unreliable network" – Trio's use of stream/channel turns these ordinary sentences into something completely misleading, and this is already one of the places where new users get the most confused. [1] This is a classic thing in human language – word meaning is all about contrasts, and when you add new words, the existing words shift their meanings around to make room. A third way?That said, I can also see that no-one's convinced by my idea of distinguishing
That last option is seen in nodejs, and in @glyph's current The major challenge is that framed and unframed streams have really different behavior, so you need some way to keep track of that distinction, communicate it to users, make sure you don't accidentally pass an unframed @glyph's way of handling this is to define two different subtypes of bytes, to represent framed and unframed data – so in his notation, an unframed stream is a This doesn't seem super appealing to me for two reasons. First, you don't get any runtime checking. Beginners are the ones most likely to mix these up, and also the least likely to have fancy stuff like mypy set up. And second, even if you do have mypy running, I think to make it work you'd have to write explicit casts everywhere whenever you wrote any networking code, which seems annoying. Here's another idea, that's not fully baked but I think might be promising. What if we define a generic base class, and define two empty sub-interfaces to use as markers for the two confusable variants: class Stream(Generic[T]):
# ...
class FramedByteStream(Stream[bytes]):
pass
class UnframedByteStream(Stream[bytes]):
pass We can bikeshed the names of course. These ones are kind of wordy... but that seems OK, because these are abstract types that only show up in documentation, type signatures, and class LengthPrefixFramer(FramedByteStream):
"Converts an `UnframedByteStream` into a `FramedByteStream`." # <-- docs
def __init__(self, transport: UnframedByteStream): # <-- static type checks
if not isinstance(transport, UnframedByteStream): # <-- runtime type checks
raise TypeError
# ... The rest of the time you get to use both types with regular What do y'all think? Other stray thoughtsAn oddity is that according to the Liskov substitution principle, In this approach, I also thought about having |
Hmm, here's another limitation of the I guess the cause is that in this design, unframed streams are the odd-one-out – they don't quite behave like generic streams. One way to patch over this case would be to drop Another idea: we could push the framed/unframed distinction up into the generic layer, so instead of just What if we made it |
You could also have a [Unicode]CharacterStream … which is even wordier. Or an IntegerStream or a FloatStream, which means that we're right back where we started. Well … except for the fact that Also, Let's face it, there do not seem to be any choices that are (a) intuitively obvious without explanation (b) used the same way in some sort of majority of other languages (c) short enough to not be too verbose. Thus I move that we go with two simple words, i.e. we follow Guido's recommendation and use Thus my hypothetical audio processor would have a bunch of Frankly, I'd rather not discuss the names of these concepts for another week or two. (Which is not to say that the discussion so far has been unhelpful. Quite the opposite. But there's diminishing returns …) I'd rather start write a bunch of helper classes to translate between different types of streams and/or channels which happen to work with both Trio and asyncio. Before getting there, we also have another naming problem, which is |
… no takers? Everybody afraid to step on somebody's toes / strong opinions WRT these names? Alternately we could simply agree to disagree and use a translator. As an example, the code that adapts Trio's SSL stream to an asyncio Stream which I used as a proof-of-concept in my
… and while that's basically it – you now have an encrypted |
No, not really. Since we're reverting the new streams api from 3.8 we'll start a new discussion on what's the new api should look like for asyncio 3.9. Future compatibility with Trio is desired so I suggest not to settle on any design before we have a chance to discuss it with asyncio devs. I'll start a thread sometime next week, right now we're busy releasing RC1. Not saying this to discourage the discussion here, quite the contrary it should continue! Just a heads up explaining why Andrew and I are quiet for now. |
A bunch of discussion about asyncio.Stream happened over here: https://bugs.python.org/issue38242
That's the problem though. Everything you're saying seemed plausible a priori, but then we actually did the experiment, and it didn't become intuitive for me. I'm still willing to hear arguments for this approach if folks have them, but it's a very inconvenient fact, and I'd appreciate it if folks arguing for That said: Having slept on it for a few weeks, I'm still very happy with the So the advantages of
The one downside is that it's a little more verbose than other options, but given that these are abstract types that you only need to spell out in full in contexts where you want the extra precision, I think we can live with that. All the options we've come up with require some compromises, and I'm much more comfortable with that compromise than the alternatives. If folks are comfortable moving forward with that, then I think the next steps will be:
|
I really don't want to be the decider here, and I don't have any evidence beyond my own feelings. I also don't recall the "experiment" you have already done -- if it was so compelling, would you mind providing a reference? (I do recall reading about it before, but I don't recall where, and I haven't kept track of this discussion since the last time it flared up.) |
@gvanrossum No worries! We all value your thoughts, but there's definitely no obligation to make a pronouncement or anything. We'll figure it out :-). And I was speaking a bit loosely with the "experiment". I don't mean we herded a bunch of undergrads into cubicles and made them answer questionnaires, but rather that we've actually been shipping the |
So it's your word against mine, really. :-)
--
--Guido (mobile)
|
I don't think it's about anyone's word against anyone. I'm not saying you find those names difficult to work with :-). I'm just saying that I do. And then if folks claim that the names will quickly become intuitive for all the users we care about, AFAICT that implies that either I'm not a user we care about, or that I'm wrong about my own experience, and either way it's not a great feeling. That doesn't mean they're necessarily the wrong choice on net; I'm just hoping to see less of that particular argument. I do think the meaning of Linux
MSDN on named pipes:
So that seems to match my intuition – when it comes to IPC mechanisms, Linux and Windows don't agree on much, but apparently they do agree that the natural way to explain this distinction is to use the words "bytes" and "messages". I'm super curious if any has seen documentation that said something like "one of the main differences between TCP and UDP is that TCP is a stream and UDP is a channel". I don't think I've ever seen anything like that, but I've been wrong before... |
Nathaniel, I'm really struggling with how to respond to this. I respect you tremendously. All I meant to say was that my intuition and yours are opposite. I was not trying to deny your experience. However, I would like you to acknowledge mine too. I don't know what to conclude from this about how the different choices of terminology will appear to others. The rest of your message seems to be rationalizing your intuition. It doesn't convince me. I still feel you're making a mistake by choosing long composite terms for such fundamental concepts. I want no power here other than that of persuasion, and I acknowledge that I am failing at persuading you. So let's just agree to disagree. |
Thanks Guido, that helps a lot, and I really appreciate the clarification – I feel like this is one of those topics where the limited bandwidth of text gets especially tricky, between the subtle intuitions involved in naming, and trying to collaborate between two projects that haven't had the chance to build up much shared context. And my frustration there was definitely not at you in particular (or anyone else in particular, really). I also think we're probably closer than it sounds. Tentatively, I think there's good objective evidence for all of these (i.e. hopefully everyone can agree on these?):
The tricky subjective part is figuring out how to weight these things against each other. (Or finding some other solution entirely, but it feels like we've already wrung the English language pretty hard here and there's not much more to squeeze out.) It's not like one of these is an slam-dunk issue that everyone would agree takes precedence. I hear you saying that for you, looking at that list, the verbosity issue is the one that feels the most pressing. And, I also respect you tremendously, so I'm going to think hard about that! But I also hear you saying that you don't want to be in the hot seat on this one, and I am 100% sympathetic – I honestly don't know how you managed it so long. Your retirement is spectacularly well-earned :-). So like... this sounds weird, but... I hope it's a reassuring thing, that I also respect you too much to just rubber-stamp what you're saying? But I hear you, and will definitely think hard about it, and appreciate your efforts to think and communicate about the issue. I'm also thinking: maybe I should go spend some time working out what Trio API changes we'd want to do if we did go down the |
WRT talking to beginners: While I am not one, the distinction between framed and unframed protocols is often lost even if there are distinct names for them, no matter what they are – I remember teaching people about the need for explicit framing in quite a few programming languages. Thus if they're treated differently, to the tune that (a) type checkers complain and (b) even the methods are different so that you simply can't plug a ByteStream into something that wants a MessageStream[bytes] (otherwise it would work when testing …), I'd consider that a strong plus. So, another question for @njsmith's list is: Are Another question: given the fact that there'll also be a In other words, if you want to name the I do tend towards saying that they are fundamentally-enough-different and therefore should have different names. I do admit that my post-fact rationalization for the |
I'm still lagging behind on this one. Will try to find some time this week. A quick note: if you want asyncio/Python to adopt Trio's terminology you shouldn't finalize it before we have a chance to discuss. I for one like Stream/Channel way more than ByteStream/MessageStream (even though I follow Nathaniel's line of reasoning, I'm leaning towards simpler/shorter names). But I guess interface-level API compatibility is more important than using same names. |
I will try to explain my reasoning one more time. For me, the concepts I like to refer to as stream and channel are quite different. A stream buffers bytes or characters (across some connection) and the primary APIs are to append string of bytes/characters, and to receive a string of the same, with the explicit caveat that the API does not guarantee that write boundaries correspond to read boundaries. You seldom if ever read or write a single bytes/character, as that would be highly inefficient. There is typically a single reader and a single writer (IOW, a single producer and a single consumer), as the "meaning" of the bytes or characters in the stream is typically unrelated to the read/write boundaries. A reader will typically have some kind of parser that can be fed data in arbitrary chunks (but which is optimized for being fed sizeable chunks, which is the common case). OTOH, a channel to me looks more like a queue, whose primary API is to put in or get out a single object of arbitrary complexity. In some cases it's reasonable to have multiple producers and multiple consumers. I guess UDP isn't quite a channel since IIRC it doesn't guarantee ordering or delivery, while for a channel I would like both of those. Hence, channels should typically be implemented as a layer on top of a stream of bytes like TCP or pipes, not on a datagram protocol like UDP. So, while it's true that one of the distinctions is that streams use bytes and channels use messages, the more important difference to me is that the primary stream APIs send and receive string of bytes, and don't preserve boundaries, while it's the opposite for channel APIs, which send and receive objects or messages. If a queue-like API dealt in instances of a user-defined class I could still call it a channel, even though I might not think of those instances as messages. If the queue reaches across a network it could be built on top of a channel of messages (objects encoded as bytes -- or strings, as in JSON), which in turn might be built on top of a stream of characters/bytes, using some explicit framing mechanism to preserve messages/object boundaries. But despite the layering possible, I don't see streams and channels as similar, because I think of them as having quite different APIs. |
I will accept ByteStream and MessageStream, but I'd rather strongly prefer Stream and Channel. @njsmith has provided an ironclad existence proof that Stream and Channel do not intuitively map to "bytes" and "messages" for everyone who sees them. :-) ByteStream and MessageStream appear to me to point more obviously in about the right direction. But in exchange for greater intuitive appeal, ByteStream/MessageStream require us to pay the cost of more unwieldy compound names. My personal opinion is that this latter cost is greater in the long run. If you don't understand what the names mean, you look it up in the docs, maybe a few times until it sinks in. If you understand the names just fine but they're annoying to type, you keep getting annoyed whenever you use them. There's also a sense in which having names that are too "intuitive" can create confusion about the ways that the user's model doesn't line up with the reality. For example, if you've been working with MessageStreams for a while, and using multiple concurrent writer tasks with them, you might expect you could do the same with the similarly-named ByteStream. Nope. Or at least use similarly-named methods? Nope. Certainly, if we choose to emphasize the differences (Stream vs Channel), we lose the chance for our users to learn more quickly by making analogies about the similarities... but the flip side of that, I think, is that if we choose to emphasize the similarities, our users can get confused by the differences. It's not clear to me that one of these failure modes is better than the other. Possibly relevant to this discussion: https://malcolmocean.com/2016/02/sparkly-pink-purple-ball-thing/ |
Ahh, right, I see more where you're coming from now! This is actually where we started out too – I don't know if hearing how my thinking has developed will change your opinion or anything, but maybe you'll find it interesting. So: originally Trio had a Since the whole point of an async library is to make lots of tasks, the After a bunch of discussion, we hit on the idea of splitting the Then we realized a neat side-effect: splitting the Of course, these are all the same reasons that sockets are structured the way they are, with separate objects for the two endpoints, etc. The underlying insight is: queues and sockets actually have a lot in common! They send/receive different types of data, but aside from that they're both fundamentally ways to manage a communication channel, and that leads to a lot of common structure – basically everything except the actual type signature on the send/receive operations. We also had an unrelated todo item. A nice feature in Twisted is that if you want to build a new protocol parser, you can build on standard implementations of basic protocol framing like line-based protocols, length-prefixed framing, netstrings, etc. We knew we'd eventually want some similar building blocks. But wait, we realized, that's not an unrelated todo item at all; the changes to our "queue" API turned it into exactly the API you want to represent a byte stream + added framing. And in this context, the alignment between the stream+queue APIs is super handy. For example, consider a cut down version of a line framer: @dataclass
class LineSender(MessageStream[bytes]):
transport: ByteStream
async def send_message(self, message):
if b"\n" in message:
raise ValueError
await self.transport.send_bytes(message + b"\n")
async def aclose(self):
self.transport.aclose() There are actually a ton of subtleties you can't see here. Around this time we were also iterating on sans-io libraries for various protocols, and found ourselves converging on a general approach to designing these libraries as converters between a stream of bytes and a stream of low-level protocol-level events... i.e., they also act like typed "queue" on top of a byte stream. So this means that not only do byte streams and message streams have a lot of similarities in terms of lifecycle and close handling, they're also deeply intertwined in pretty much any non-trivial use case. For example, consider websocket-over-HTTP/2 (as supported by e.g. hypercorn). Your protocol stack looks like:
So these days I think of byte streams and message streams as deeply connected. I don't think they should be actually the same type, because sending/receiving an arbitrary chunk of bytes is pretty different from sending/receiving a single coherent message. But all the machinery and conventions around those core operations can be the same, and I think that there are a lot of benefits in making those connections clear to users. (Bibliography: #497, #586, #719, #620, #796, python-hyper/wsproto#91, probably others...)
Hmm, I dunno... there are lots of message streams that don't support multiple concurrent writer tasks (like |
OK, and now I think I see where you're coming from. You're looking at this from the perspective of what should happen for edge cases when one of the communicating parties closes or wants to close the comunication channel. As you describe there are lots of cases and they are mostly the same regardless of whether the endpoints deal in messages or bytes. Whereas I've been focusing on the APIs that are sensitive to framing. I also note that the TCP vs. UDP (or SOCK_STREAM vs. SOCK_DGRAM) distinction doesn't capture the difference adequately, because UDP may lose messages and even deliver out of order -- I assume we don't want that for our Channel/MessageStream abstraction, or else the Queue analogy fails badly. (Though what do you propose for UDP then? A MessageStream/Channel with an additional flag that says it may lose messages and may deliver them out of order? I suppose that could work. But I don't think we should force those semantics on everyone, so maybe it should be an UnreliableMessageStream/UnreliableChannel, which should be a superclass (!) of Channel/MessageStream.) I'm not sure where to take it from here. How much of the API that a typical producer or consumer uses has to do with the closing edge cases, and how much with the continued production or consumption of bytes or messages? |
Umm, well, they are, but only as far as all the interesting machinery of closing them / concurrent writers / etc. is concerned. So, yeah, there's a common base here. We even have an ABC for that: But that's as far as it goes. I tend to think that the conceptual differences are more important. Yes we got there by a very interesting route that showed us, yes you can send objects through a queue, and yes you can send random chunks of bytes through a queue, and so you can actually convert one to the other by a simple building block. That doesn't mean bounded single objects and randomly-chunked collections of small not-quite-objects are the same. Not-completely-inappropriate(-I-hope) analogy: you can pour water down a pipe, and you can pour marbles that way (as parents tend to figure out); if the pipe doesn't have an odor trap this actually works. You can even mix them. This however is not a good argument to show that water and marbles are somehow fundamentally the same. Nor are water pumps and Marble Madness conveyor belts.
Well, that's given – you did all the work to develop the concepts, from the inside. That's super, in fact without Trio I for one would code a lot less Python these days, but mayyybe optimizing the experience of somebody who just uses / combines these building blocks (and I'm not just talking about the people who need more patience than I ever had before they understand why one write() at end A does not necessarily correspond to one read() at end B for streams – but it does for channels) requires a slightly different PoV. IMHO, and all that. |
Yeah, Trio doesn't actually use the channel/message-stream abstraction for UDP currently, though I guess we could model a UDP socket as a There's a larger point here though – there's actually a huge range of different semantics you could have for a (byte or object) stream:
I'm pretty sure that objects with all these variations are going to exist in the ecosystem. And these are all properties that have substantial effects on how you these objects can be used; they're not just quirky little edge cases. In fact the objects in my HTTP/2-websockets illustrate a bunch of them. Looking at just the last two levels, the main difference between the wsproto-based frame stream and the final websocket So to me the design problem isn't just "how do we represent TCP sockets and Go-style channels". Those are just two examples from this much larger menagerie, and the design problem is to figure out how to help folks wrangle that whole menagerie effectively. It's tricky to strike a balance between providing enough structure to help folks navigate the space and write generic code, without going off into the weeds of trying to nail down every detail and then define separate ABCs for every possible combination of features. I mention this because I think it's important context: in my mind, the
Hmm, that's tricky to say! Anyone implementing the ABCs definitely has to deal with all these edge cases, they're where a lot of the complexity is, and we expect there to be dozens of concrete implementations, so that adds up quickly. (The For producers and consumers... well, the happy path is mostly just send and receive, yeah :-). But if you're trying to build a robust distributed system, you probably spend a lot more time thinking about the error cases than about the happy path – a lot of what makes networking hard is that your peers might suddenly disappear on you and you need to be prepared cope with that. And the first step is getting notified that it happened. Which means you really want the libraries you're using to have thought about this before you need it. But since this is "edge case" stuff, it's easy for one of your dependencies to not think about it, and then you're stuck – especially if you're depending on a whole stack of protocols, like in my examples, since all it takes is one lazy implementation to break everything. But if there are consistent, simple conventions that are strongly adhered to across the ecosystem, then it's more likely that everyone in the stack will do their part. Or at least, that's how I see it. |
I think we're in violent agreement about most parts! But I still like to think that there's a class of these connections that are special, and that's the category of byte streams. This captures TCP, pipes (named or not), UNIX domain and localhost sockets with type SOCK_STREAM, and probably others (TLS?). They all have reliable ordered delivery of bytes without preserving message boundaries. And none of them are particularly amenable to multiple readers or writers (but this is probably related to the lack of message boundaries). Interestingly, these are both abstractions (that could be) built on top of unreliable chunked protocols -- e.g. TCP streams are implemented on top of IP packets. But they are usually used as the lowest layer in a stack where at the next layer you have messages or objects (e.g. websockets). Now, when it comes to static typing (PEP 484), I don't mind seeing all the different message/object protocols being distinguished only by the type, so we could have various things of type I don't expect that such a variety of APIs would be useful for the message-based connections -- I'd expect producers and consumers there to be happy with sending/receiving one message/object at a time (in part because I assume the overhead per object intrinsic to the implementation to be much larger than the cost of a function call). |
(Recent trio user here, ie: somebody who will be affected by this.) While I think renaming things to IMHO, the main purpose of the Feedback welcome! PS: This library rocks by the way! 😁 |
I'm in the process of refactoring AnyIO's ABCs in preparation for subprocess support and I am going to use the opportunity to make them more versatile. As a part of this effort, I would like to make my ABCs conform to whatever consensus we achieve here. So here are my thoughts on the matter: The difference between As for reliability, can we assume the same guarantees from both? Meaning, they either deliver the bytes, or raise As for UDP: Since UDP packets are indeed framed, could we make use of the hypothetical unreliable message stream superclass here? |
One of the major blockers for stabilizing Trio is that we need to stabilize our ABCs for channels/streams/pipes/whatever-they're-called. We've been iterating for a while, and I feel like we're pretty close on the core semantics, and I really want this to be done, but the fact is that I'm just not happy with the names, and those are kind of crucial. So here's an issue to try to sort it out once and for all.
Along the way, I've realized that there isn't really anything Trio-specific about these ABCs, and there are a bunch of advantages to making these ABCs something more widely used across the async Python ecosystem. So also CC'ing for feedback: @asvetlov @1st1 @agronholm @glyph (and feel free to add others). And I'll briefly review what we're trying to do for context before digging into the naming issue.
What problem are we trying to solve?
There are two main concepts that I think we need ABCs for:
trio.abc.Stream
)trio.abc.Channel[T]
)There are many many concrete implementations of each. Trio currently ships with ~7 different implementations of its byte-oriented communication ABC (sockets, TLS, Unix pipes, Windows named pipes, ...) and we expect to add more; it's also an interface that's commonly exposed and consumed by third-party libraries (think of SSH channels, HTTP streaming response bodies, QUIC connections, ...).
Object-wise communication is maybe a bit less familiar because most frameworks don't call it out as a single category, but it's also something you see all over the place. Some examples:
bytes
-object-oriented channelUnion[bytes, str]
objectsHaving standard ABCs for these has a lot of benefits:
read
orrecv
orreceive
orget
JSONChannel
could be wrapped around anyChannel[bytes]
implementer to convert it into aChannel[JSONObject]
. There's more discussion in Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796If this is such a generic problem, then why should Trio be the one to solve it?
Well, no-one else seems to be working on it, and we need it :-). And of course we'd be happy if our work is useful to more people.
Asyncio will be adding an
asyncio.Stream
class in 3.8, but it doesn't seem to be intended as a generic abstraction with multiple implementations. It's a specific concrete class that's tightly tied to the asyncio's transport/protocols layer, and it exposes a rich public interface that includes buffering, line-splitting, fixed-length reads, and TLS.This means that whenever someone needs a new kind of
Stream
object, they have two options: they can either define a new type that quacks like aStream
, which means they have to re-implement all this functionality from scratch. Or else, they have to implement their new functionality using the transport/protocols layer and then wrap anasyncio.Stream
around it; but using the asyncio transport/protocol layer is awkward, adds overhead, and makes it very difficult to support other async libraries. Neither option is very appealing. IMO what we need is a minimal core interface, so that it's easy to implement, and then higher-level tools like buffering, line-splitting, TLS, etc. can be written once and re-used on any object that implements the byte-wise ABC.And asyncio doesn't have an object-based communication abstraction at all, which IMO is a major missed opportunity.
Twisted OTOH does have standard abstractions for all these things, but they were designed 15+ years ago, long before async/await existed, and I think we can do better now. Heck, even the replacement @glyph's been working on for half a decade now predates async/await.
So the field seems wide open here.
What's already working?
We've been iterating on our APIs for these for a year+ now, and I think we're converging on a pretty solid design. You can see the current version in our docs: https://trio.readthedocs.io/en/stable/reference-io.html#abstract-base-classes
There are some details we're still sorting out – if you're curious see #1125, #823 / #1181, #371, #636 – but I don't want to get into the details too much because they're not really on-topic for this issue and I don't think they'll affect the overall adoption of the ABCs.
OK so what's this issue about then?
Like I said above, Trio currently uses
trio.abc.Stream
for the byte-wise interface, andtrio.abc.Channel[T]
for the object-wise interface. These names have two major problems:They're both completely generic. In regular English, "stream" and "channel" mean basically the same thing. This means that the names don't tell you anything about which is which, or how they're similar, or how they're different. And this is unfortunate, because while these concepts are fairly simple and fundamental, experience says that it really takes some work for new users to wrap their heads around them. Anything we can do to make that easier will help a lot.
Also it personally took me like 6 months to stop mixing up the names and saying "stream" where I meant "channel" or vice-versa, which I just can't ignore. If I can't keep them straight, then how can I expect anyone else to keep them straight.
There actually is an important conceptual relationship between them that IMO we should emphasize. Conceptually, byte-streams are basically object-streams where the object type is "a single byte". But if you try to use an interface designed for sending/receiving objects to send/receive individual bytes, then it'll be ridiculously inefficient, so instead you need a vectorized interface that works on whole bytestrings at once. A nice thing about framing things this way is that it emphasizes one of the things that always trips people up, which is that byte-streams don't preserve framing – because they're really a stream of individual bytes.
So I think the right way to do it is to present the object-wise interface as the more fundamental one, and the byte-wise interface as a specialized variant. And we can communicate that right in the names, by calling the object-wise interface
X[T]
, and the byte-wise interface aByteX
.And then when someone asks what the difference is between a
ByteX
and anX[bytes]
, we can say: This is anX[bytes]
. This is aByteX
. (click the images to see the animations)But what should
X
be?Can we steal from another system?
As mentioned above, AsyncIO has a concrete class
Stream
for byte-wise communication and a concrete classQueue
for object-wise communication, but no relevant ABCs.Go has a concrete type
chan
for communicating objects within a single process, and abstractReader
andWriter
interfaces for byte-wise communication.Nodejs has an abstract
Stream
interface, that they use for both byte-wise and object-wise communication. There's a mode argument you can set when creating the stream, that determines whether bytestrings can be rechunked or not. (It's pretty similar to some of the approaches we considered and rejected in #959.) The byte-wise mode is the default, and the object-wise mode seems like an afterthought (e.g. the docs say that nodejs itself never uses the object-wise mode).Rust's Tokio module has an abstract
Stream
interface, which is basically equivalent to an async iterator in Python, or what Trio calls atrio.abc.ReceiveChannel[T]
. And they also have an abstractSink
interface, which is equivalent to what Trio currently calls atrio.abc.SendChannel[T]
. For bytes, they have abstract interfaces calledAsyncRead
andAsyncWrite
. And they have generic framing tools to convert anAsyncRead
into aStream
, or convert anAsyncWrite
into aSink
.Java's
java.nio
library usesChannel
to mean, basically, anything with aclose
method (like Trio'sAsyncResource
). And then it has sub-interfaces likeByteChannel
for byte-oriented communication. I don't think there's any abstract interface for framed/object-wise communication. Java'sjava.io
library usesStream
to refer to byte-wise communication, andReader
andWriter
for character-wise communication.Swift NIO has an abstract
Channel
interface for byte-wise communication, and I don't see any interfaces for framed/object-wise communication.My takeaways:
There's no general consensus on terminology. The words "stream" and "channel" show up a lot, but the meanings aren't consistent. "Read" and "write" are also popular, and are used consistently for byte-oriented interfaces.
There isn't any consensus on the basic concepts either! A major part of our ABC design is the insight that object-wise and byte-wise communication are both fundamental concepts, and that it's valuable to have standard interfaces to express both of them, and how they relate. But most of these frameworks only think seriously about byte-wise communication, and treat object-wise communication as an unrelated problem if they consider it at all. Tokio is the main exception, but their terminology is either ad hoc or motivated by other Rust-specific stuff. So we can't just steal an existing solution.
OK so what are our options?
I tried to brainstorm all the potential names I could think of, assuming we go with the X + ByteX pattern described above:
Channel[T]
+ByteChannel
, e.g. you could use aMemoryChannel
to pass objects between tasks, and aSocketByteChannel
to represent a TCP connectionStream[T]
+ByteStream
, e.g.MemoryStream
,SocketByteStream
Tube[T]
+ByteTube
, e.g.MemoryTube
,SocketByteTube
Flow[T]
+ByteFlow
, e.g.MemoryFlow
,SocketByteFlow
Transport[T]
+ByteTransport
, e.g.MemoryTransport
,SocketByteTransport
Hose[T]
+ByteHose
, e.g.MemoryHose
,SocketByteHose
Ferry[T]
+ByteFerry
, e.g.MemoryFerry
,SocketByteFerry
Duct[T]
+ByteDuct
, e.g.MemoryDuct
,SocketByteDuct
Vent[T]
+ByteVent
, e.g.MemoryVent
,SocketByteVent
Pipe[T]
+BytePipe
, e.g.MemoryPipe
,SocketBytePipe
Port[T]
+BytePort
, e.g.MemoryPort
,SocketBytePort
Criteria: I think ideally our core name should be a common, concrete English word that's short to say and to type, because those make the best names for fundamental concepts. It shouldn't be too "weird" or controversial, because people dislike weird names and will refuse to adopt them even if everything else is good. And of course we want it to be unambiguous, so we need to avoid name clashes.
Unfortunately it's really hard to get all of these at once, which is why I got stuck :-)
Stream
is really solid on the first two criteria: it's one syllable, common, uncontroversial. But! It clashes withasyncio.Stream
. That's not a problem for adoption in the Trio ecosystem. But it might be a major problem if we want this to get uptake across Python more broadly.The other obvious option is
Channel
, but I'm really hesitant because it's more cumbersome: 2 syllables on their own aren't too bad, but by the time you start talking about aSocketByteChannel
it feels extremely Java, not Python.For some reason
Transport
doesn't bother me as much, even though it's two syllables as well, but then you have the clash with the whole protocols/transport terminology, and that also seems like it could be pretty confusing. And we're generally trying to move away from protocols/transports, but I don't want to have to tell beginners "we recommend you use Transports instead of protocols/transports". That's just going to make them more confused.Among the rest,
Tube
is tempting as a simple, short, concrete word that doesn't conflict with anything in Trio or asyncio, and fits very nicely with illustrations like the ones I linked above, where there's a literal tube with objects moving through it. But... @glyph has been trying to make his tubes a thing for half a decade now. I think the proposal here is totally compatible with the goals and vision behind his version of tubes, and much more likely to get wider traction going forward. But OTOH the details are very different. So I can't tell whether using the name here would be the sincerest form of flattery, or super-rude, or both at once.What do y'all think?
The text was updated successfully, but these errors were encountered: