Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Names for communication ABCs #1208

Open
njsmith opened this issue Sep 13, 2019 · 64 comments
Open

Names for communication ABCs #1208

njsmith opened this issue Sep 13, 2019 · 64 comments

Comments

@njsmith
Copy link
Member

njsmith commented Sep 13, 2019

One of the major blockers for stabilizing Trio is that we need to stabilize our ABCs for channels/streams/pipes/whatever-they're-called. We've been iterating for a while, and I feel like we're pretty close on the core semantics, and I really want this to be done, but the fact is that I'm just not happy with the names, and those are kind of crucial. So here's an issue to try to sort it out once and for all.

Along the way, I've realized that there isn't really anything Trio-specific about these ABCs, and there are a bunch of advantages to making these ABCs something more widely used across the async Python ecosystem. So also CC'ing for feedback: @asvetlov @1st1 @agronholm @glyph (and feel free to add others). And I'll briefly review what we're trying to do for context before digging into the naming issue.

What problem are we trying to solve?

There are two main concepts that I think we need ABCs for:

  • communication channels used to transmit/receive bytes (right now Trio calls this a trio.abc.Stream)
  • communication channels used to transmit/receive objects of a given type (right now Trio calls this a trio.abc.Channel[T])

There are many many concrete implementations of each. Trio currently ships with ~7 different implementations of its byte-oriented communication ABC (sockets, TLS, Unix pipes, Windows named pipes, ...) and we expect to add more; it's also an interface that's commonly exposed and consumed by third-party libraries (think of SSH channels, HTTP streaming response bodies, QUIC connections, ...).

Object-wise communication is maybe a bit less familiar because most frameworks don't call it out as a single category, but it's also something you see all over the place. Some examples:

  • asyncio Queues or Golang channels
  • fundamental framing protocols like length-prefixing or newline-termination are strategies for converting an individual-byte-oriented channel into a bytes-object-oriented channel
  • a lot of sans-io libraries like h11, wsproto, h2 are essentially designed to convert a stream-of-individual-bytes into a stream-of-event-objects
  • Websockets are a channel for sending/receiving Union[bytes, str] objects

Having standard ABCs for these has a lot of benefits:

  • Most obviously, it provides "one obvious way to do it", so developers can focus on the interesting parts instead of looking up whether they're supposed to call read or recv or receive or get
  • It lets us write generic algorithms that work on arbitrary implementations. Example: Trio's TLS implementation can be composed with any compliant byte-stream implementation. A generic JSONChannel could be wrapped around any Channel[bytes] implementer to convert it into a Channel[JSONObject]. There's more discussion in Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796
  • Something that surprised me, but that's becoming a major issue: having a standard convention across async libraries makes life much easier for packages that want to support multiple async libraries.
  • Something that surprised me even more: aside from the asyncio/trio split, having standard abstractions here is the best way to write generic, composable sans-io protocols, because sans-io is basically another kind of I/O system, and Python's async/await is generic enough be repurposed for this. For more details see Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796 (comment)

If this is such a generic problem, then why should Trio be the one to solve it?

Well, no-one else seems to be working on it, and we need it :-). And of course we'd be happy if our work is useful to more people.

Asyncio will be adding an asyncio.Stream class in 3.8, but it doesn't seem to be intended as a generic abstraction with multiple implementations. It's a specific concrete class that's tightly tied to the asyncio's transport/protocols layer, and it exposes a rich public interface that includes buffering, line-splitting, fixed-length reads, and TLS.

This means that whenever someone needs a new kind of Stream object, they have two options: they can either define a new type that quacks like a Stream, which means they have to re-implement all this functionality from scratch. Or else, they have to implement their new functionality using the transport/protocols layer and then wrap an asyncio.Stream around it; but using the asyncio transport/protocol layer is awkward, adds overhead, and makes it very difficult to support other async libraries. Neither option is very appealing. IMO what we need is a minimal core interface, so that it's easy to implement, and then higher-level tools like buffering, line-splitting, TLS, etc. can be written once and re-used on any object that implements the byte-wise ABC.

And asyncio doesn't have an object-based communication abstraction at all, which IMO is a major missed opportunity.

Twisted OTOH does have standard abstractions for all these things, but they were designed 15+ years ago, long before async/await existed, and I think we can do better now. Heck, even the replacement @glyph's been working on for half a decade now predates async/await.

So the field seems wide open here.

What's already working?

We've been iterating on our APIs for these for a year+ now, and I think we're converging on a pretty solid design. You can see the current version in our docs: https://trio.readthedocs.io/en/stable/reference-io.html#abstract-base-classes

There are some details we're still sorting out – if you're curious see #1125, #823 / #1181, #371, #636 – but I don't want to get into the details too much because they're not really on-topic for this issue and I don't think they'll affect the overall adoption of the ABCs.

OK so what's this issue about then?

Like I said above, Trio currently uses trio.abc.Stream for the byte-wise interface, and trio.abc.Channel[T] for the object-wise interface. These names have two major problems:

  • They're both completely generic. In regular English, "stream" and "channel" mean basically the same thing. This means that the names don't tell you anything about which is which, or how they're similar, or how they're different. And this is unfortunate, because while these concepts are fairly simple and fundamental, experience says that it really takes some work for new users to wrap their heads around them. Anything we can do to make that easier will help a lot.

    Also it personally took me like 6 months to stop mixing up the names and saying "stream" where I meant "channel" or vice-versa, which I just can't ignore. If I can't keep them straight, then how can I expect anyone else to keep them straight.

  • There actually is an important conceptual relationship between them that IMO we should emphasize. Conceptually, byte-streams are basically object-streams where the object type is "a single byte". But if you try to use an interface designed for sending/receiving objects to send/receive individual bytes, then it'll be ridiculously inefficient, so instead you need a vectorized interface that works on whole bytestrings at once. A nice thing about framing things this way is that it emphasizes one of the things that always trips people up, which is that byte-streams don't preserve framing – because they're really a stream of individual bytes.

So I think the right way to do it is to present the object-wise interface as the more fundamental one, and the byte-wise interface as a specialized variant. And we can communicate that right in the names, by calling the object-wise interface X[T], and the byte-wise interface a ByteX.

And then when someone asks what the difference is between a ByteX and an X[bytes], we can say: This is an X[bytes]. This is a ByteX. (click the images to see the animations)

But what should X be?

Can we steal from another system?

As mentioned above, AsyncIO has a concrete class Stream for byte-wise communication and a concrete class Queue for object-wise communication, but no relevant ABCs.

Go has a concrete type chan for communicating objects within a single process, and abstract Reader and Writer interfaces for byte-wise communication.

Nodejs has an abstract Stream interface, that they use for both byte-wise and object-wise communication. There's a mode argument you can set when creating the stream, that determines whether bytestrings can be rechunked or not. (It's pretty similar to some of the approaches we considered and rejected in #959.) The byte-wise mode is the default, and the object-wise mode seems like an afterthought (e.g. the docs say that nodejs itself never uses the object-wise mode).

Rust's Tokio module has an abstract Stream interface, which is basically equivalent to an async iterator in Python, or what Trio calls a trio.abc.ReceiveChannel[T]. And they also have an abstract Sink interface, which is equivalent to what Trio currently calls a trio.abc.SendChannel[T]. For bytes, they have abstract interfaces called AsyncRead and AsyncWrite. And they have generic framing tools to convert an AsyncRead into a Stream, or convert an AsyncWrite into a Sink.

Java's java.nio library uses Channel to mean, basically, anything with a close method (like Trio's AsyncResource). And then it has sub-interfaces like ByteChannel for byte-oriented communication. I don't think there's any abstract interface for framed/object-wise communication. Java's java.io library uses Stream to refer to byte-wise communication, and Reader and Writer for character-wise communication.

Swift NIO has an abstract Channel interface for byte-wise communication, and I don't see any interfaces for framed/object-wise communication.

My takeaways:

There's no general consensus on terminology. The words "stream" and "channel" show up a lot, but the meanings aren't consistent. "Read" and "write" are also popular, and are used consistently for byte-oriented interfaces.

There isn't any consensus on the basic concepts either! A major part of our ABC design is the insight that object-wise and byte-wise communication are both fundamental concepts, and that it's valuable to have standard interfaces to express both of them, and how they relate. But most of these frameworks only think seriously about byte-wise communication, and treat object-wise communication as an unrelated problem if they consider it at all. Tokio is the main exception, but their terminology is either ad hoc or motivated by other Rust-specific stuff. So we can't just steal an existing solution.

OK so what are our options?

I tried to brainstorm all the potential names I could think of, assuming we go with the X + ByteX pattern described above:

  • Channel[T] + ByteChannel, e.g. you could use a MemoryChannel to pass objects between tasks, and a SocketByteChannel to represent a TCP connection
  • Stream[T] + ByteStream, e.g. MemoryStream, SocketByteStream
  • Tube[T] + ByteTube, e.g. MemoryTube, SocketByteTube
  • Flow[T] + ByteFlow, e.g. MemoryFlow, SocketByteFlow
  • Transport[T] + ByteTransport, e.g. MemoryTransport, SocketByteTransport
  • Hose[T] + ByteHose, e.g. MemoryHose, SocketByteHose
  • Ferry[T] + ByteFerry, e.g. MemoryFerry, SocketByteFerry
  • Duct[T] + ByteDuct, e.g. MemoryDuct, SocketByteDuct
  • Vent[T] + ByteVent, e.g. MemoryVent, SocketByteVent
  • Pipe[T] + BytePipe, e.g. MemoryPipe, SocketBytePipe
  • Port[T] + BytePort, e.g. MemoryPort, SocketBytePort

Criteria: I think ideally our core name should be a common, concrete English word that's short to say and to type, because those make the best names for fundamental concepts. It shouldn't be too "weird" or controversial, because people dislike weird names and will refuse to adopt them even if everything else is good. And of course we want it to be unambiguous, so we need to avoid name clashes.

Unfortunately it's really hard to get all of these at once, which is why I got stuck :-)

Stream is really solid on the first two criteria: it's one syllable, common, uncontroversial. But! It clashes with asyncio.Stream. That's not a problem for adoption in the Trio ecosystem. But it might be a major problem if we want this to get uptake across Python more broadly.

The other obvious option is Channel, but I'm really hesitant because it's more cumbersome: 2 syllables on their own aren't too bad, but by the time you start talking about a SocketByteChannel it feels extremely Java, not Python.

For some reason Transport doesn't bother me as much, even though it's two syllables as well, but then you have the clash with the whole protocols/transport terminology, and that also seems like it could be pretty confusing. And we're generally trying to move away from protocols/transports, but I don't want to have to tell beginners "we recommend you use Transports instead of protocols/transports". That's just going to make them more confused.

Among the rest, Tube is tempting as a simple, short, concrete word that doesn't conflict with anything in Trio or asyncio, and fits very nicely with illustrations like the ones I linked above, where there's a literal tube with objects moving through it. But... @glyph has been trying to make his tubes a thing for half a decade now. I think the proposal here is totally compatible with the goals and vision behind his version of tubes, and much more likely to get wider traction going forward. But OTOH the details are very different. So I can't tell whether using the name here would be the sincerest form of flattery, or super-rude, or both at once.

What do y'all think?

@smurfix
Copy link
Contributor

smurfix commented Sep 13, 2019

IMO what we need is a minimal core interface, so that it's easy to implement, and then higher-level tools like buffering, line-splitting, TLS, etc. can be written once and re-used on any object that implements the byte-wise ABC

Strongly seconded. In an ideal world we would rework asyncio.Stream to do exactly this, instead of releasing it in 3.8 with all these bells and whistles … but I'm afraid it's too late for that.

I'd go with Flow.

@1st1
Copy link

1st1 commented Sep 13, 2019

In an ideal world we would rework asyncio.Stream to do exactly this, instead of releasing it in 3.8 with all these bells and whistles …

Well, in an ideal world we would just have Trio to be the default framework of choice for async in all languages. The reality is different though: asyncio.Stream is designed the way it is designed to preserve backwards compatibility, and so unfortunately there's not a lot of room left to change it. Not including it in 3.8 wouldn't change any of that.

That said, if there's something we can do to make it easier for people to write adapters from Trio streams to asyncio Streams (if that's even a thing) we'll gladly consider that!

@sorcio
Copy link
Contributor

sorcio commented Sep 13, 2019

  • There actually is an important conceptual relationship between them that IMO we should emphasize. Conceptually, byte-streams are basically object-streams where the object type is "a single byte". But if you try to use an interface designed for sending/receiving objects to send/receive individual bytes, then it'll be ridiculously inefficient, so instead you need a vectorized interface that works on whole bytestrings at once.

Can this be made part of the same abstract interface? What receive_some() is saying is "I want some sequential items of type T, if you're able to provide them" and returns a Sequence[T]ish. The way the underlying buffer is managed is left to the individual implementation. A default implementation can just basically return [await self.receive()]. Something backed by a file object might translate it into a read(). On the other hand, I believe that a bytestream kind of object can always have a pretty inefficient receive() method that takes a single byte off the underlying buffer, e.g. read(1). I'm not sure this would ever be useful for anything other than bytes, but maybe the mismatch is smaller if we think of stream operations as just "arbitrary sequential chunks of the basic type"?

This would be more problematic for send_all() because it cannot guarantee in general that the sequence is kept intact, i.e. if you translate it into a series of send() you might get interleaved sends from other tasks. But we already have the same problem in Trio Stream right now, and maybe it becomes more obvious.

@asvetlov
Copy link

Agree with @1st1
asyncio.Stream was not designed from scratch but just a merge of already existing StreamReader + StreamWriter APIs plus some backward-compatible improvements.
Sorry, in asyncio we should provide very conservative backward compatibility politics for any change.
Sometimes it prevents us from making code as beautiful as possible but this is the CPython game rule.

@smurfix
Copy link
Contributor

smurfix commented Sep 13, 2019

That said, if there's something we can do to make it easier for people to write adapters from Trio streams to asyncio Streams (if that's even a thing) we'll gladly consider that!

Umm, yeah. My take would be to make the whole thing modular.

Details: Add a BaseStream abstract interface that requires just the three basic methods (read_some, write_all, close). Create a TransportStream class that implements this interface in terms of protocol+transport compatibility, ideally with the aim of someday deprecating protocols and transports altogether. (Sync callback hell.)
Let Stream be a shallow class that implements all the complicated read and write operations in terms of calling read and write of an underlying BaseStream object.
Create a SSLFilter class that implements SSL on top of that. Stream.start_tls can then simply push a SSLFilter below itself. Implement aBufferFilter` that buffers data for you.

Most of these filters would even be framework agnostic; they would work with trio, asyncio, or whateverio. (As long as nobody needs sub-tasks or timeouts, of course.) You don't even need to write the SSLFilter, as Trio already has one. Steal it. Everything else seems reasonably trivial at first glance.

NB: IMHO: Whoever invented a write method which one may or may not await deserves [removed]. Ugh.

@1st1
Copy link

1st1 commented Sep 13, 2019

NB: IMHO: Whoever invented a write method which one may or may not await deserves [removed]. Ugh.

Oh, hm, that was me. We're still debating about that though. One of the solutions is to add await Stream.send() and keep Stream.write() as is (i.e. not returning a `Future). Do you think that would be a better idea? (Because we still can change it!)

@smurfix
Copy link
Contributor

smurfix commented Sep 13, 2019

Yes, please. Or call it write_all like in Trio.

Arguments against this idea: (a) this prevents automated testing (with mypy et al.) whether somebody forgot an await, (b) prevents automatic warnings when one of these awaitables is garbage collected without having been awaited on (these are very helpful IME), (c) requires everybody who implements this interface to add an unlimited send buffer and to copy/reimplement the special dance you'd need to return such an awaitable.

@1st1
Copy link

1st1 commented Sep 13, 2019

Details: Add a BaseStream abstract interface that requires just the three basic methods (read_some, write_all, close). Create a TransportStream class that implements this interface in terms of protocol+transport compatibility, ideally with the aim of someday deprecating protocols and transports altogether. (Sync callback hell.)
Let Stream be a shallow class that implements all the complicated read and write operations in terms of calling read and write of an underlying BaseStream object.
Create a SSLFilter class that implements SSL on top of that. Stream.start_tls can then simply push a SSLFilter below itself. Implement aBufferFilter` that buffers data for you.

I actually like all of this.

BUT asyncio.Stream.write() is this weird method that's currently can be awaited or not really.
asyncio.Stream.close() isn't awaitable and just broken in my opinion. Our goal is to make the API nicer and less confusing, and returning an optional Future did look like an acceptable solution.

@1st1
Copy link

1st1 commented Sep 13, 2019

How about Trio renames read_some() to read() as is makes it easier for us. I'm against having read_some() as a simple alias for read().

We can maybe work around the weird nature of write(). Options: revert the change, add send() or writeall(); deprecate write() and drain().

And lastly we'll have a problem with asyncio.Stream.close(), which should be awaitable but isn't quite.

@smurfix
Copy link
Contributor

smurfix commented Sep 13, 2019

asyncio.Stream.close() isn't awaitable and just broken

Well, sometimes you really do need to kill off a connection Right Now, no matter what. I'd rename it to abort and deprecate close, though.

The nice, everybody-should-use-it async version might be named aclose

@1st1
Copy link

1st1 commented Sep 13, 2019

I've discussed this with @ambv and here's the plan I like for 3.8:

  • asyncio.Stream.write() will start throwing a DeprecationWarning asking people to add an await if they didn't;

  • asyncio.Stream.close() will start throwing a DeprecationWarning asking people to add an await if they didn't;

  • asyncio.Stream.drain() & asyncio.Stream.wait_closed() will start throwing a DeprecationWarning telling about a scheduled removal (in Python 3.12) when used on Process.std* streams;

  • asyncio.Stream.drain() & asyncio.Stream.wait_closed() will not work at all on Streams created via new 3.8 APIs: connect() & StreamServer.


Well, sometimes you really do need to kill off a connection Right Now, no matter what. I'd rename it to abort and deprecate close, though.

I'm fine with adding Stream.abort(), not super excited about your ideas about the close()/aclose() method.

The key question I have now is this: If we do the above, would that be better for Trio in any way?

@smurfix
Copy link
Contributor

smurfix commented Sep 13, 2019

The key question I have now is this: If we do the above, would that be better for Trio in any way?

Well, I'm not @njsmith but from my PoV, yes sure. Splitting up Stream would also help a lot, not just for Trio interoperability; I should be able to help out if there's a chance to get that into 3.8 even though my free time is annyoingly limited.

The other problem that we still have is the one this topic is supposed to be about, i.e. naming things. If asyncio is set on keeping plain Stream for what Nathaniel tried to name ByteStream, which I personally can live with, that requires us to come up with a reasonable name for pipes/flows/whatever that happen to not consist of chunks of bytes.

@1st1
Copy link

1st1 commented Sep 13, 2019

I should be able to help out if there's a chance to get that into 3.8 even though my free time is annyoingly limited.

Sure, we'd accept help. Splitting up sounds interesting, as I'm a big fan of composable component-oriented APIs such as Twitter's Finagle.

cc @asvetlov

@agronholm
Copy link
Contributor

I'm all for standardization across the board, especially if it ends up allowing me to reduce the code base of AnyIO. Because the semantics of network functionality of the various I/O libraries were so wildly different, I had to invent my own. I would jump at the opportunity to throw out that code and piggyback on the individual network layers of each framework, provided that the semantics were in harmony with each other. But, it seems like there will be a long way to go until that happens.

I have no real preferences about the naming of the ABCs, so long as they're descriptive and everybody agrees on them.

@1st1 Are you still planning to make that asyncio-next library?

@njsmith
Copy link
Member Author

njsmith commented Sep 13, 2019

@smurfix

Whoever invented a write method which one may or may not await deserves

Hey man, I know this was humorous exaggeration, but let's criticize tech not people, and skip the violent imagery. I edited your post to remove that bit.

@1st1

How about Trio renames read_some() to read() as is makes it easier for us. I'm against having read_some() as a simple alias for read().

Trio currently calls the method receive_some, and it has different semantics from your read: with no arguments, receive_some returns an arbitrary sized chunk of data, while read with no arguments consumes the whole stream until EOF. This kind of issue is why we've avoided using read as our verb. We also have a dedicated issue for method names with more discussion: #1125

The different names also have a potential advantage here: they mean the same object could potentially expose both a backwards-compatible set of operations for legacy asyncio code, and also implement this hypothetical ABC we're talking about with the improved semantics. (And the new methods could skip straight to the desired semantics while the old ones are working their way through their deprecation periods.)

There is one big conflict though: iteration. Trio's byte stream interface allows async iteration and it returns arbitrary byte chunks. asyncio.Stream also supports async iteration, but it returns lines. IMO the Trio version is substantially more useful – we only added this recently and it made a lot of code simpler. But you can't support both on the same object at the same time.

We don't have to sort out all the details right now, but maybe for 3.8 you might consider replacing asyncio.Stream.__aiter__ with a generator method like asyncio.Stream.iterlines(), to keep your options open in the future?

@smurfix

Well, sometimes you really do need to kill off a connection Right Now, no matter what. I'd rename it to abort and deprecate close, though.

In Trio this is actually handled as part of the trio.abc.AsyncResource interface. It doesn't just tell you how to spell the close method name (aclose) but also specifies that it should normally do a graceful close, but if cancelled it must do a forceful close. This is necessary for general correctness, but it also means we don't need a separate method for abort – instead we have a helper trio.forceful_close that can do an abort on any kind of AsyncResource, by calling aclose and immediately cancelling it. The downside compared to your proposal is that it means forceful close is an async-colored operation, not sync-colored... but it turns out this actually has benefits, because for e.g. disk files, the OS doesn't provide any way to close it without blocking, so we actually need forceful close to be async-colored.

@jtrakk
Copy link
Contributor

jtrakk commented Sep 14, 2019

I had a similar reaction to @sorcio's about this bit:

Conceptually, byte-streams are basically object-streams where the object type is "a single byte". But if you try to use an interface designed for sending/receiving objects to send/receive individual bytes, then it'll be ridiculously inefficient, so instead you need a vectorized interface that works on whole bytestrings at once.

It seems like there's a generalization of BytesTube, which is

BytesTube = ChunkTube[Byte]

or

Bytes = Chunk[Byte]
BytesTube = Tube[Bytes]

@njsmith
Copy link
Member Author

njsmith commented Sep 14, 2019

Capturing some comments from chat:

@dhirschfeld

Maybe Channel[T] is fine and Stream just needs to be renamed to ByteStream to prevent any confusion. It breaks the naming symmetry you have in the post but that does highlight that they're different things so ¯\(ツ)/¯?

@oakkitten

if there's a Stream[bytes] and ByteStream, that's even more confusing
one day someone will want to subclass Stream[bytes] that makes bytes uppercase and they will name it UppercasingByteStream

My response: Surely they'd at least call it UppercasingBytesStream, emphasis on the "s" :-). Though it would probably make more sense to call it UppercasingStream?

@Fuyukai

I mean the stdlib isn't any better
List[bytes] and bytearray comes to mind

My response: Huh, that's a great analogy. We should steal that for the docs if nothing else. Something like:

  • a bytes object is a sequence of bytes, like b"abcdef"
  • a List[bytes] object is a list of bytes objects, like [b"abc", b"def"]
  • a ByteStream (or whatever we call it) is an incremental stream of bytes, like b"abcdef..."
  • a Stream[bytes] (or whatever we call it) is an incremental stream of bytes objects, like [b"abc", b"def", ...]

...well that leaves out the part where streams are bidirectional, not sure how to work that in. But this is obviously one of those things that's just intrinsically confusing the first time you encounter it, so we'll want to take the time and explain it 3 different ways and hope that at least one of them works. And this is a great addition to our quiver.

@jtrakk
Copy link
Contributor

jtrakk commented Sep 14, 2019

A few of these are interesting.

  • Flume
  • Canal
  • Chute
  • Conduit
  • Intake
  • Siphon
  • Orifice

Plus,

  • Hall
  • Hallway
  • Lane
  • Track
  • Wire

@glyph
Copy link

glyph commented Sep 15, 2019

For what it's worth, I don't love the idea of using Tube here, since it's hard enough to deal with the names of those abstractions without bumping into name conflicts all over the place :-).

@glyph
Copy link

glyph commented Sep 15, 2019

(In tubes, for example, when Tube gets its mypy-ification, it'll be Tube[InT, OutT] because tubes do transformation in that context; the abstraction in tubes that looks like what I think Tube[T] here does is Fount[T].)

@njsmith
Copy link
Member Author

njsmith commented Sep 15, 2019

@glyph The thing we're talking about here is what tubes calls a Flow. Though, in this architecture, there's no distinction between Flow and Tube – if you want to layer an OuterT flow on top of an InnerT flow, you just do:

class TransformedFlow(Flow[OuterT]):
    def __init__(self, inner: Flow[InnerT]):
       ...

outer_flow = TransformedFlow(inner_flow)

(There are unidirectional versions too of course, I'm just illustrating the general point.) It sort of reminds me of that famous line, "have you considered calling a function with arguments?". Async functions/methods enable a lot of simple patterns that didn't use to be possible.

Help us out though :-). I'm know you'd love to see the Python ecosystem develop standard, composable abstractions for this stuff that become popular and ubiquitous, way or another. The goals here are identical to tubes's goals. And even if you prefer the tubes approach, hopefully we can agree that it's possible that this approach is the one that will catch on, and we all want whatever does catch on to have some nice friendly naming. So what would you use?

@1st1
Copy link

1st1 commented Sep 15, 2019

We don't have to sort out all the details right now, but maybe for 3.8 you might consider replacing asyncio.Stream.__aiter__ with a generator method like asyncio.Stream.iterlines(), to keep your options open in the future?

That's a bit too much work for us -- changing the semantics of .write()/.close() is already bad enough, as it requires users of asyncio subprocess API to update their code. Asking them to deal with yet another significant API change is not an option. :(

I have a counter proposal though -- why doesn't Trio implement the default __aiter__ to iterate over lines, and have a .iterbytes() or .iterchunks() to iterate over byte chunks? Or, perhaps, have no __aiter__ at all, and have .iterlines() and .iterchunks() -- that we can do in asyncio too.

@njsmith
Copy link
Member Author

njsmith commented Sep 15, 2019

@1st1 I'm only talking about the new asyncio.Stream type. People already have to update their code to use it :-). (For that matter, can you fix write/close immediately on Stream?)

why doesn't Trio implement the default __aiter__ to iterate over lines

Because we don't have line-splitting functionality at all in these types – adding it would be a major change because it requires adding a buffer and an end-of-line scanning algorithm, and both of those are quite tricky because the naive algorithm is O(N**2). And the whole idea here is to separate those kinds of algorithms off into a single robust implementation instead of duplicating them in every class.

Or, perhaps, have no __aiter__ at all, and have .iterlines() and .iterchunks() -- that we can do in asyncio too.

That's technically feasible, and how it used to work in Trio, but it's unattractive because the __aiter__ saves so much boilerplate:

# what people used to have to write every time they used a Trio stream
while True:
    chunk = await stream.receive_some()
    if not chunk:
        # EOF
        break
    # ... handle chunk ...

# What they write now
async for chunk in stream:
    # ... handle chunk ...

It's not just the extra typing; it's that this particular boilerplate is especially error-prone and confusing to new users. It used to be that basically every time someone posted their first program in the Trio chat to ask for feedback, they had a bug in this loop. Also all our tutorial examples got substantially simpler.

@gvanrossum
Copy link

I guess I'm too old for this -- I don't understand why "stream of objects" would be more fundamental than "stream of bytes", and I thought that the original terms Stream and Channel[T] were exactly right.

@njsmith
Copy link
Member Author

njsmith commented Sep 16, 2019

Hi Guido!

When I say "more fundamental", I mean it's a more general concept – "stream of bytes" is a special case of "stream of things". In an idealized theoretical world, what trio currently calls Stream wouldn't exist, and instead we'd represent a TCP connection as a Channel[Byte]. But doing a separate method call for every byte is way too inefficient, plus Python doesn't even have a Byte type, so Channel[Byte] is totally impractical. Stream exists as an optimized, special-case type to fill the gap left by Channel[Byte]. But it would still be nice if people could take their knowledge of "stream of things", and re-use that knowledge to understand "stream of bytes". I'm not 100% happy with the ByteChannel idea, but it feels better to be able to tell people "ByteChannel is the same as Channel[Byte], just more split off to be more efficient" versus giving them totally unrelated names.

This is a super difficult set of concepts to pin down, and there are at least 3 ways of splitting them up that are natural in some situations, and run into problems in others. (@glyph's tubes actually uses a different one again from what either of us are saying...) I've been struggling with this for the last years, and I think the approach I'm advocating is the one that leads to the least awkwardness overall, but I also totally see why it's not obvious which one is best!

But even if we kept Stream as the name for abstract byte-streams, we'd still have a problem, since the details of the asyncio.Stream interface aren't a good fit for a generic abstract interface, and it would be awkward if asyncio.Stream wasn't a Stream! I wasn't worrying about this when I came up with the original terms, but it turns out to be more important than I thought.

@smurfix
Copy link
Contributor

smurfix commented Sep 16, 2019

but it feels better to be able to tell people "ByteChannel is the same as Channel[Byte], just more split off to be more efficient"

In other words, possibly-different-enough to get a separate name, and mayyybe different accessors (read_some for random-chunks-of-byte streams vs. get for discrete-object channels?). We already debated this in different issues which I'm not going to link to again, without arriving at a consensus, but the problem now comes to a head, with the 3.8 release and its accompanying new asyncio interfaces looming quite large (wait, what, we're at beta 4 already, when did that happen?? :-/ ).

I think the approach I'm advocating is the one that leads to the least awkwardness overall, but I also totally see why it's not obvious which one is best!

Seconded, for what that's worth, and with the above naming caveat.

But even if we kept Stream as the name for abstract byte-streams, we'd still have a problem, since the details of the asyncio.Stream interface aren't a good fit for a generic abstract interface

Exactly. If asyncio keeps this Stream class with all its non-basic-concept methods (SSL, line split, …) we need to name our "Stream" concept something else. And frankly, while I used BaseStream in my split-asyncio.Stream-up draft, that name is kindof too awkward for a fundamental concept.

@njsmith
Copy link
Member Author

njsmith commented Sep 16, 2019

In other words, possibly-different-enough to get a separate name

I could be convinced to use a separate name. My big problem with Stream and Channel is that we could have just as easily named them Channel and Stream. If you have two related-but-different concepts, then the names should give you a clue about which is which!

@asvetlov
Copy link

The new asyncio.Stream is used in old subprocesses API.
await asyncio.create_subprocess_shell() and await asyncio.create_subprocess_exec() returns a Process object that has stdin, stdout and stderr properties.
These properties are asyncio.Stream instances.

That's why in the new stream design we carefully reproduce the existing old API methods.

@smurfix
Copy link
Contributor

smurfix commented Sep 16, 2019

@njsmith Well, that's easy. A channel transports distinct sequential entities. Ships, in the real world.
A stream (or maybe a flow) of water, on the other hand, doesn't have hard boundaries, you cut it off wherever convenient (e.g. when the bucket is full).

@asvetlov These properties could equally well be asyncio.Stream (or even asyncio.abc.Stream) subclasses. We could call the actual class of these attributes ProcessStream or whatever. There's no reason the method I outlined in #1208 (comment) won't work (or at least I don't see any at first+second glance).

@gvanrossum
Copy link

gvanrossum commented Sep 16, 2019 via email

@smurfix
Copy link
Contributor

smurfix commented Sep 17, 2019

In any case, after taking a day off to hack on the code, I've got an initial split-up of it all up and running. Rough code, heaps of compatibility hooks to get the tests working, no docs yet, but tests pass – and Trio's SSL code works with next to no changes.

Available at github.com/smurfix/cpython, "streams" branch.. Will try to work on the code some more later this week.

@njsmith
Copy link
Member Author

njsmith commented Sep 18, 2019

IOW my recommendation is to give in to tradition (even if it's not unanimous), and use Stream and Channel[T]. Stream is coming from a long tradition (e.g. the libc manual uses the word without explanation, instead focusing on the confusion between streams and files inherent in the naming choices of the early C stdio library). Channel may be even older, originating in CSP (according to Wikipedia).

There's definitely a strong tradition of using both "stream" and "channel" to refer to all kinds of incremental point-to-point communication. In fact I just checked the original CSP paper, and Hoare actually uses both "channel" and "stream" about equally often, and with no explanation for either – basically he uses "channel" when he's talking about a connection between two processes, and "stream" when he's talking about the data flowing over that connection. Here's some examples – in both these cases he says "stream", but it's taken for granted that the stream is represented as a CSP channel:

image

In a system like traditional asyncio, a generic name like "stream" is a good choice! There's only one thing it could refer to, so there's no confusion. Ditto for using "channel" in CSP or Go.

But, when we put two different kinds of point-to-point communication in the same system, and try to distinguish them using two words that both refer generically to point-to-point communication, then that's where my problem starts, because now we have to change the meaning: we're redefining "stream" to mean "NOT objects", and "channel" to mean "NOT bytes" [1]. As far as I can tell, Trio is the only system that's ever done this – it's novel. And it contradicts a lot of common usage.

In Trio right now, Hoare's clear, ordinary-looking language is just... wrong. When we talk about SSH channels, we have to be careful to clarify that these channels aren't channels. And we can't say things like "WebSocket enables streams of messages on top of TCP", or "TCP is the protocol that guarantees we can have a reliable communication channel over an unreliable network" – Trio's use of stream/channel turns these ordinary sentences into something completely misleading, and this is already one of the places where new users get the most confused.

[1] This is a classic thing in human language – word meaning is all about contrasts, and when you add new words, the existing words shift their meanings around to make room.

A third way?

That said, I can also see that no-one's convinced by my idea of distinguishing ByteStream and Stream[bytes] :-). So let's step back. AFAIK there are basically three options for how to break up these concepts:

  • treat them as two totally separate types (Stream and Channel[bytes], like current Trio). I don't like this because I can't figure out how to name them without adding more confusion.
  • treat them as two related-but-different types (ByteStream and Stream[bytes], like I proposed at the beginning of the thread). Y'all aren't convinced, and tbh I'm not super happy with it either.
  • treat them as the same type, so you use Stream[bytes] for both framed and unframed data. We haven't talked about this one yet.

That last option is seen in nodejs, and in @glyph's current tubes work; in trio-land we had a whole thread on it in #959. We rejected it back then, but maybe it's time to revisit.

The major challenge is that framed and unframed streams have really different behavior, so you need some way to keep track of that distinction, communicate it to users, make sure you don't accidentally pass an unframed Stream[bytes] into code that expects a framed Stream[bytes], etc.

@glyph's way of handling this is to define two different subtypes of bytes, to represent framed and unframed data – so in his notation, an unframed stream is a Flow[Segment], and a framed stream is a Flow[Frame]. Currently this is accomplished via some kind of zope.interface-based magic, but in the future he's planning to switch to using PEP 484 NewType to make Segment and Frame into static-only subtypes of bytes.

This doesn't seem super appealing to me for two reasons. First, you don't get any runtime checking. Beginners are the ones most likely to mix these up, and also the least likely to have fancy stuff like mypy set up. And second, even if you do have mypy running, I think to make it work you'd have to write explicit casts everywhere whenever you wrote any networking code, which seems annoying.

Here's another idea, that's not fully baked but I think might be promising. What if we define a generic base class, and define two empty sub-interfaces to use as markers for the two confusable variants:

class Stream(Generic[T]):
    # ...

class FramedByteStream(Stream[bytes]):
    pass

class UnframedByteStream(Stream[bytes]):
    pass

We can bikeshed the names of course. These ones are kind of wordy... but that seems OK, because these are abstract types that only show up in documentation, type signatures, and isinstance checks – i.e., they're optional, and you only use them when you want to go out of your way to be explicit:

class LengthPrefixFramer(FramedByteStream):
    "Converts an `UnframedByteStream` into a `FramedByteStream`."  # <-- docs
    def __init__(self, transport: UnframedByteStream):  # <-- static type checks
        if not isinstance(transport, UnframedByteStream):  # <-- runtime type checks
            raise TypeError
        # ...

The rest of the time you get to use both types with regular bytes objects (or bytearray, etc.) and no extra ceremony or runtime cost. And asyncio streams get to stay as streams without creating any terminology conflicts. And it's even convenient for @glyph if he wants to hook up his tubes infrastructure to the new ABCs.

What do y'all think?

Other stray thoughts

An oddity is that according to the Liskov substitution principle, FramedByteStream ought to be a subtype of UnframedByteStream, because framed byte streams make strictly stronger guarantees than unframed ones. In practice, I don't think we need to care about this – if someone really wants to do something weird like run TLS on top of a Websocket, they can always force it through some casting, and the rest of the time treating them as separate types will help avoid silly mistakes.

In this approach, trio.StapledStream, which staples together two unidirectional streams to make a bidirectional one, automatically works for both framed and unframed streams, which is nice – otherwise we'd need some boilerplate code to define StapledStream and StapledChannel separately. But, there is some question of how to track the framed/unframed distinction – a StapledStream object is a FramedByteStream if both of the underlying streams are FramedByteStreams, and an UnframedByteStream if both of the underlying streams are UnframedByteStreams. I guess we could make isinstance work by adding a special case to __subclasscheck__, or playing some games with StapledStream.__new__. Propagating this statically isn't really doable with Python's current static types, AFAIK.

I also thought about having .framed attribute on Streams, that's only meaningful for byte-streams. But that doesn't let us express the framed/unframed distinction statically, and it feels like we want to be able to express it in type signatures.

@njsmith
Copy link
Member Author

njsmith commented Sep 18, 2019

Hmm, here's another limitation of the Stream[T]/FramedByteStream/UnframedByteStream idea: it makes it difficult to express a generic message transport. E.g. for passing messages between tasks within a single process, Trio has a MemoryChannel[T]. Since it's within-process, you can pass objects directly, so T can be any Python type. And that means that you can have MemoryChannel[bytes]. Obviously this should count as a framed transport. But in the Stream[T]/FramedByteStream/UnframedByteStream approach, it doesn't, and I don't know how to fix that in a way that would make mypy happy. (Well, except for writing a plugin to special-case this, but that's not very satisfying.)

I guess the cause is that in this design, unframed streams are the odd-one-out – they don't quite behave like generic streams. One way to patch over this case would be to drop FramedByteStream, so we just have Stream[T] + UnframedByteStream. And at runtime, we could still catch the same errors, by replacing checks like if not isinstance(obj, FramedByteStream): raise TypeError into if isinstance(obj, UnframedByteStream): raise TypeError. But... there's no way for a type signature to say "I don't want an UnframedByteStream" – you can write somearg: Stream[bytes], but that allows UnframedByteStream through. And that's exactly the main mistake that we want to prevent, so we want mypy to be able to catch it, so this patch doesn't seem very attractive.

Another idea: we could push the framed/unframed distinction up into the generic layer, so instead of just Stream[T], there's UnframedStream[T] and FramedStream[T]. Then we could make MemoryChannel[T] a FramedStream[T]. But this is pretty weird, because in practice there is no generic UnframedStream[T], there's only UnframedStream[bytes], and now we've looped back around to the ByteStream/Stream[T] design that we were trying to find an alternative to in the first place.

What if we made it ByteStream and MessageStream[T]? MessageStream[bytes] makes it a lot more explicit that this is a framed transport than Stream[bytes], which I think addresses some of the concerns that folks have raised about my original proposal. And the names are a bit wordy, sure, but now we can say "stream" when it's obvious from context which kind of stream is meant, without risking giving the wrong idea, and you only need to get wordy when you want to be explicit about which kind of stream you mean. And Python even has a tradition for making protocol names a bit wordy – think of Mapping[K, V] versus dict.

@smurfix
Copy link
Contributor

smurfix commented Sep 19, 2019

What if we made it ByteStream and MessageStream[T]?

You could also have a [Unicode]CharacterStream … which is even wordier. Or an IntegerStream or a FloatStream, which means that we're right back where we started. Well … except for the fact that ByteStream is special – because our bytes is really a Tuple[bytes].

Also, Mapping might be a long word, but at least it's just one word. Simple (or not-so-simple-but-somewhat-ubiquitous) concepts should be designated with simple (or at least single) words. When I read code I want names that can be recognized at one glance, not three. (Writing is easier: the editor has macros for that. :-P )

Let's face it, there do not seem to be any choices that are (a) intuitively obvious without explanation (b) used the same way in some sort of majority of other languages (c) short enough to not be too verbose. Thus I move that we go with two simple words, i.e. we follow Guido's recommendation and use Channel (single objects) vs Stream (multiple instances of a primitive like byte, without framing). At least for these we can come up with a mnemonic or two, to help people who are new to this, to distinguish them. They'll quickly become intuitive anyway.

Thus my hypothetical audio processor would have a bunch of Stream[float] instances for the filters to talk to each other. One or two of these streams would get encoded into a Channel[OpusFrame], which ends up in a Stream[byte] (or a ByteStream because, well, bytes are somewhat special) that's sent to the client.

Frankly, I'd rather not discuss the names of these concepts for another week or two. (Which is not to say that the discussion so far has been unhelpful. Quite the opposite. But there's diminishing returns …) I'd rather start write a bunch of helper classes to translate between different types of streams and/or channels which happen to work with both Trio and asyncio.

Before getting there, we also have another naming problem, which is read/write/close vs. receive_some/send_all/aclose (for streams) and possibly read/write vs. receive/send vs. get/put (for channels; also, is there a close, or do I send None to signal The End?). While wrapper classes to translate one to the other are somewhat trivial to write, having to deal with these wrappers at all, indefinitely, is really really annoying IMHO.

@smurfix
Copy link
Contributor

smurfix commented Sep 28, 2019

… no takers? Everybody afraid to step on somebody's toes / strong opinions WRT these names?
(I would be, if I were to be the one to decide on them … but somebody will have to.)

Alternately we could simply agree to disagree and use a translator.

As an example, the code that adapts Trio's SSL stream to an asyncio Stream which I used as a proof-of-concept in my asyncio.stream split-up exercise basically looks like this:

# common code in asyncio.streams or wherever
class _TrioWrap(AbstractStreamModifier):
    """a wrapper to translate trio calls to asyncio"""
    def __init__(self, wrapped): self._wrapped = wrapped
    async def send_all(self,data): await self._wrapped.write(data)
    async def receive_some(self,n=4096): return await self._wrapped.read(n)
    async def aclose(self): await self._wrapped.close()

class _TrioUnwrap:
    """A mixin to translate asyncio calls to trio names"""
    async def write(self, data): await self.send_all(data)
    async def read(self, n=None): return await self.receive_some(n)
    async def close(self): await self.aclose()

def wrap_for_asyncio(TrioStream):
    class WrappedStream(_TrioUnwrap, AbstractStreamModifier, TrioStream):
        """This class wraps a Trio SSLStream to run on asyncio"""

        def __init__(self, transport_stream, *a, **kw):
            super().__init__(lower_stream=transport_stream)
           TrioStream.__init__(self, _TrioWrap(transport_stream), *a, **kw)
    WrappedStream.__name__ = TrioStream.__name__ + "_asyncio"
    return WrappedStream

# asyncio application / library code
AsyncioSSLStream = wrap_for_asyncio(Trio_SSLStream)

async def communicate(…):
    async with connect(…) as my_stream:
        my_stream = AsyncioSSLStream(my_stream, SSLContext(…), …)
        await my_stream.write(b'Hello!\r\n')

… and while that's basically it – you now have an encrypted my_stream, you do not need to add any SSL-related options to each and every Streamish class out there – the overhead doesn't strike me as particularly attractive. Nor does the semantic mismatch for read(no_argument) (and there probbly are others), which this code doesn't yet(?) try to paper over.

@1st1
Copy link

1st1 commented Sep 28, 2019

… no takers? Everybody afraid to step on somebody's toes / strong opinions WRT these names?

No, not really. Since we're reverting the new streams api from 3.8 we'll start a new discussion on what's the new api should look like for asyncio 3.9.

Future compatibility with Trio is desired so I suggest not to settle on any design before we have a chance to discuss it with asyncio devs.

I'll start a thread sometime next week, right now we're busy releasing RC1. Not saying this to discourage the discussion here, quite the contrary it should continue! Just a heads up explaining why Andrew and I are quiet for now.

@njsmith
Copy link
Member Author

njsmith commented Oct 6, 2019

A bunch of discussion about asyncio.Stream happened over here: https://bugs.python.org/issue38242
To summarize the parts that are relevant to this thread:

  • In principle, the asyncio devs are enthusiastic about finding a shared stream API and once the details are more settled Yury and I will likely co-author a PEP to standardize them
  • There was some tricky parts to reconciling the asyncio.Stream API proposed for 3.8 with the Trio-inspired ABC approach, but having worked through the details it all seemed manageable. (See the thread there for the full details.)
  • But, the extra scrutiny on asyncio.Stream revealed some other potential issues, and Yury ultimately made the tough call to push the stream rework back to 3.9.

@smurfix

Thus I move that we go with two simple words, i.e. we follow Guido's recommendation and use Channel (single objects) vs Stream (multiple instances of a primitive like byte, without framing). At least for these we can come up with a mnemonic or two, to help people who are new to this, to distinguish them. They'll quickly become intuitive anyway.

That's the problem though. Everything you're saying seemed plausible a priori, but then we actually did the experiment, and it didn't become intuitive for me. I'm still willing to hear arguments for this approach if folks have them, but it's a very inconvenient fact, and I'd appreciate it if folks arguing for Stream/Channel engage could with that somehow, instead of acting like it never happened.

That said: Having slept on it for a few weeks, I'm still very happy with the ByteStream/MessageStream[T] idea; much more so than my original proposal that I made at the start of the thread. Thanks to the discussion here, I feel like I finally managed to articulate why I have so much trouble with the Stream/Channel approach: the way that when someone says "stream" speaking casually, it becomes confusing whether they mean "specifically bytes, not objects" or "any kind of stream". And that also showed that my ByteStream/Stream[T] doesn't fully solve the problem, because with that when someone says "stream" they might mean "specifically objects, not bytes", and you can't tell.

So the advantages of ByteStream/MessageStream[T] are:

  • It lets us keep using "stream" to refer to all kinds of streaming data, so we don't have to fight the English language, while still giving us a way to be explicit when we want to.
  • There's minimal "weirdness quotient"
  • It's compatible with how asyncio uses "stream"
  • It's compatible with how trio currently uses "stream", e.g. we don't necessarily need to change the name of open_tcp_stream (though we could).
  • It leaves the door open for other types of stream in the future, like TextStream

The one downside is that it's a little more verbose than other options, but given that these are abstract types that you only need to spell out in full in contexts where you want the extra precision, I think we can live with that. All the options we've come up with require some compromises, and I'm much more comfortable with that compromise than the alternatives.

If folks are comfortable moving forward with that, then I think the next steps will be:

  • Discussing the verbs: there's already some discussion in Should we replace send/receive with some other verbs? #1125, but I wanted this part to be settled first in case it influences our decision there
  • Figuring out the details of Trio's transition plan
  • Starting to prototype a streams-based sansio library?

@gvanrossum
Copy link

I really don't want to be the decider here, and I don't have any evidence beyond my own feelings. I also don't recall the "experiment" you have already done -- if it was so compelling, would you mind providing a reference? (I do recall reading about it before, but I don't recall where, and I haven't kept track of this discussion since the last time it flared up.)

@njsmith
Copy link
Member Author

njsmith commented Oct 7, 2019

@gvanrossum No worries! We all value your thoughts, but there's definitely no obligation to make a pronouncement or anything. We'll figure it out :-).

And I was speaking a bit loosely with the "experiment". I don't mean we herded a bunch of undergrads into cubicles and made them answer questionnaires, but rather that we've actually been shipping the Stream/Channel names in Trio for a full year now. We went with it based on arguments like the ones we've seen here, and I've tried to train myself to use the terms consistently, but the fact is that even now I still find them awkward and unintuitive. So when someone argues "don't worry, it'll quickly become intuitive", well, if it was going to happen quickly it should have happened by now! So I just can't find that argument convincing, no matter how much I want to believe.

@gvanrossum
Copy link

gvanrossum commented Oct 7, 2019 via email

@njsmith
Copy link
Member Author

njsmith commented Oct 7, 2019

I don't think it's about anyone's word against anyone. I'm not saying you find those names difficult to work with :-). I'm just saying that I do. And then if folks claim that the names will quickly become intuitive for all the users we care about, AFAICT that implies that either I'm not a user we care about, or that I'm wrong about my own experience, and either way it's not a great feeling. That doesn't mean they're necessarily the wrong choice on net; I'm just hoping to see less of that particular argument.

I do think the meaning of MessageStream is substantially more obvious "at a glance" than Channel though... is it just me? I just realized one way I could get some objective data was to look at how folks talk about TCP-style primitives vs UDP-style primitives, since this is one of the few existing places where a single piece of documentation has to contrast the two styles.

Linux socket(2) says:

  SOCK_STREAM     Provides sequenced, reliable, two-way, connection-based
                  byte streams.  An out-of-band data transmission mecha‐
                  nism may be supported.

  SOCK_DGRAM      Supports datagrams (connectionless, unreliable messages
                  of a fixed maximum length).

MSDN on named pipes:

The type mode of a pipe determines how data is written to a named pipe. Data can be transmitted through a named pipe as either a stream of bytes or as a stream of messages. [...]

To create a byte-type pipe, specify PIPE_TYPE_BYTE or use the default value. The data is written to the pipe as a stream of bytes, and the system does not differentiate between the bytes written in different write operations.

To create a message-type pipe, specify PIPE_TYPE_MESSAGE. The system treats the bytes written in each write operation to the pipe as a message unit.

So that seems to match my intuition – when it comes to IPC mechanisms, Linux and Windows don't agree on much, but apparently they do agree that the natural way to explain this distinction is to use the words "bytes" and "messages".

I'm super curious if any has seen documentation that said something like "one of the main differences between TCP and UDP is that TCP is a stream and UDP is a channel". I don't think I've ever seen anything like that, but I've been wrong before...

@gvanrossum
Copy link

Nathaniel,

I'm really struggling with how to respond to this. I respect you tremendously.

All I meant to say was that my intuition and yours are opposite. I was not trying to deny your experience. However, I would like you to acknowledge mine too. I don't know what to conclude from this about how the different choices of terminology will appear to others.

The rest of your message seems to be rationalizing your intuition. It doesn't convince me. I still feel you're making a mistake by choosing long composite terms for such fundamental concepts.

I want no power here other than that of persuasion, and I acknowledge that I am failing at persuading you. So let's just agree to disagree.

@njsmith
Copy link
Member Author

njsmith commented Oct 7, 2019

Thanks Guido, that helps a lot, and I really appreciate the clarification – I feel like this is one of those topics where the limited bandwidth of text gets especially tricky, between the subtle intuitions involved in naming, and trying to collaborate between two projects that haven't had the chance to build up much shared context. And my frustration there was definitely not at you in particular (or anyone else in particular, really).

I also think we're probably closer than it sounds. Tentatively, I think there's good objective evidence for all of these (i.e. hopefully everyone can agree on these?):

  • There exist at least some cases where using Stream/Channel can make things awkward. For a concrete example: I often find myself talking to beginners trying to help them untangle some fundamental misunderstandings, and it's been awkward to have to carefully double-check what I'm typing to make sure I don't accidentally say things like "if you want to stream data you want a websocket".

  • There's a lot of precedent for using words like "bytes" and "messages" to distinguish between framed and unframed protocols, and not a lot of precedent for using "stream" and "channel" to make this distinction. That's what I've been trying to double-check with my tables and quotes and stuff.

  • Compared to Stream/Channel, ByteStream and MessageStream have 77% more characters and 100% more words, i.e. they're more verbose and complex. Not what you want for fundamental concepts.

The tricky subjective part is figuring out how to weight these things against each other. (Or finding some other solution entirely, but it feels like we've already wrung the English language pretty hard here and there's not much more to squeeze out.) It's not like one of these is an slam-dunk issue that everyone would agree takes precedence.

I hear you saying that for you, looking at that list, the verbosity issue is the one that feels the most pressing. And, I also respect you tremendously, so I'm going to think hard about that! But I also hear you saying that you don't want to be in the hot seat on this one, and I am 100% sympathetic – I honestly don't know how you managed it so long. Your retirement is spectacularly well-earned :-). So like... this sounds weird, but... I hope it's a reassuring thing, that I also respect you too much to just rubber-stamp what you're saying? But I hear you, and will definitely think hard about it, and appreciate your efforts to think and communicate about the issue.

I'm also thinking: maybe I should go spend some time working out what Trio API changes we'd want to do if we did go down the ByteStream/MessageStream route, to get a more concrete idea of how that extra verbosity would play out in real life. And anyway I want to wait a few days at least to give others a chance to chime in before we finalize anything.

@smurfix
Copy link
Contributor

smurfix commented Oct 7, 2019

WRT talking to beginners: While I am not one, the distinction between framed and unframed protocols is often lost even if there are distinct names for them, no matter what they are – I remember teaching people about the need for explicit framing in quite a few programming languages. Thus if they're treated differently, to the tune that (a) type checkers complain and (b) even the methods are different so that you simply can't plug a ByteStream into something that wants a MessageStream[bytes] (otherwise it would work when testing …), I'd consider that a strong plus.

So, another question for @njsmith's list is: Are ByteStream/Stream and MessageStream/Channel things conceptually fundamentally different enough so that they should get distinct fundamental names (and methods), instead of "merely" qualifying the difference with a prefix?

Another question: given the fact that there'll also be a UnicodeStream/StringStream thing we need to think about, we suddenly have no name for the ByteStream concept except, well, Stream, which a MessageStream/Channel is not a subtype of – regardless of how we decide the previous question. This may be confusing to people.

In other words, if you want to name the MessageStream[JSONableObject] type because it's too long to type repeatedly, the obvious choice is JSONStream. So you'd take a ByteStream, stack a codec [which also needs some name …] on top that produces a UnicodeStream, and then add a [yet another name-d] class that converts this to a JSONStream. Presto, the distinction we'd hope to achieve between these types by naming them reasonably … is lost.

I do tend towards saying that they are fundamentally-enough-different and therefore should have different names.

I do admit that my post-fact rationalization for the Stream/Channel name choice is kindof arbitrary, but then streams/channels are not like nurseries. That name works because it's a new concept, for which a new name is appropriate. Streams/channels aren't, so we might as well pick [distinct] words that are used somewhere roughly the way we'd want to use them.

@1st1
Copy link

1st1 commented Oct 7, 2019

And anyway I want to wait a few days at least to give others a chance to chime in before we finalize anything.

I'm still lagging behind on this one. Will try to find some time this week. A quick note: if you want asyncio/Python to adopt Trio's terminology you shouldn't finalize it before we have a chance to discuss. I for one like Stream/Channel way more than ByteStream/MessageStream (even though I follow Nathaniel's line of reasoning, I'm leaning towards simpler/shorter names). But I guess interface-level API compatibility is more important than using same names.

@gvanrossum
Copy link

I will try to explain my reasoning one more time.

For me, the concepts I like to refer to as stream and channel are quite different.

A stream buffers bytes or characters (across some connection) and the primary APIs are to append string of bytes/characters, and to receive a string of the same, with the explicit caveat that the API does not guarantee that write boundaries correspond to read boundaries. You seldom if ever read or write a single bytes/character, as that would be highly inefficient.

There is typically a single reader and a single writer (IOW, a single producer and a single consumer), as the "meaning" of the bytes or characters in the stream is typically unrelated to the read/write boundaries. A reader will typically have some kind of parser that can be fed data in arbitrary chunks (but which is optimized for being fed sizeable chunks, which is the common case).

OTOH, a channel to me looks more like a queue, whose primary API is to put in or get out a single object of arbitrary complexity. In some cases it's reasonable to have multiple producers and multiple consumers. I guess UDP isn't quite a channel since IIRC it doesn't guarantee ordering or delivery, while for a channel I would like both of those. Hence, channels should typically be implemented as a layer on top of a stream of bytes like TCP or pipes, not on a datagram protocol like UDP.

So, while it's true that one of the distinctions is that streams use bytes and channels use messages, the more important difference to me is that the primary stream APIs send and receive string of bytes, and don't preserve boundaries, while it's the opposite for channel APIs, which send and receive objects or messages. If a queue-like API dealt in instances of a user-defined class I could still call it a channel, even though I might not think of those instances as messages. If the queue reaches across a network it could be built on top of a channel of messages (objects encoded as bytes -- or strings, as in JSON), which in turn might be built on top of a stream of characters/bytes, using some explicit framing mechanism to preserve messages/object boundaries.

But despite the layering possible, I don't see streams and channels as similar, because I think of them as having quite different APIs.

@oremanj
Copy link
Member

oremanj commented Oct 7, 2019

I will accept ByteStream and MessageStream, but I'd rather strongly prefer Stream and Channel.

@njsmith has provided an ironclad existence proof that Stream and Channel do not intuitively map to "bytes" and "messages" for everyone who sees them. :-) ByteStream and MessageStream appear to me to point more obviously in about the right direction. But in exchange for greater intuitive appeal, ByteStream/MessageStream require us to pay the cost of more unwieldy compound names. My personal opinion is that this latter cost is greater in the long run. If you don't understand what the names mean, you look it up in the docs, maybe a few times until it sinks in. If you understand the names just fine but they're annoying to type, you keep getting annoyed whenever you use them.

There's also a sense in which having names that are too "intuitive" can create confusion about the ways that the user's model doesn't line up with the reality. For example, if you've been working with MessageStreams for a while, and using multiple concurrent writer tasks with them, you might expect you could do the same with the similarly-named ByteStream. Nope. Or at least use similarly-named methods? Nope. Certainly, if we choose to emphasize the differences (Stream vs Channel), we lose the chance for our users to learn more quickly by making analogies about the similarities... but the flip side of that, I think, is that if we choose to emphasize the similarities, our users can get confused by the differences. It's not clear to me that one of these failure modes is better than the other.

Possibly relevant to this discussion: https://malcolmocean.com/2016/02/sparkly-pink-purple-ball-thing/

@njsmith
Copy link
Member Author

njsmith commented Oct 10, 2019

@gvanrossum

But despite the layering possible, I don't see streams and channels as similar, because I think of them as having quite different APIs.

Ahh, right, I see more where you're coming from now! This is actually where we started out too – I don't know if hearing how my thinking has developed will change your opinion or anything, but maybe you'll find it interesting.

So: originally Trio had a Stream ABC for byte streams, and a Queue type for message-passing between tasks within a process. Exactly like asyncio, actually! Like asyncio.Queue, our Queue's API was copied directly from stdlib queue.Queue.

Since the whole point of an async library is to make lots of tasks, the Queue class got heavy usage. And we discovered that one of the most common stumbling blocks for new Trio users was figuring out how to manage their Queue lifetimes. In particular, we really needed a way for producers to signal to consumers that they were done sending data, so the consumers could exit. (Basically we had an iteration protocol, but no StopIteration.) And secondarily, we wanted a way for consumers to tell their producers when they had gone away, so the producers wouldn't try to keep sending data into the void forever.

After a bunch of discussion, we hit on the idea of splitting the Queue class into two separate objects to represent the two ends, so they could each have their own close and __aenter__/__aexit__ methods. This raised a bunch of subtle questions, like what send/receive should do if the local end and/or the remote end is closed, so that's 8 different cases, plus the closure could happen either before the send/receive call or concurrently with it. But, we'd already figured out how to handle all those cases for our Stream API, so we copied it wholesale.

Then we realized a neat side-effect: splitting the Queue into objects makes it possible to move one end of the Queue into another process, or another machine, while keeping the same basic interface. And as another bonus, it frees up namespace to support bidirectional "queues", which is often handy.

Of course, these are all the same reasons that sockets are structured the way they are, with separate objects for the two endpoints, etc. The underlying insight is: queues and sockets actually have a lot in common! They send/receive different types of data, but aside from that they're both fundamentally ways to manage a communication channel, and that leads to a lot of common structure – basically everything except the actual type signature on the send/receive operations.

We also had an unrelated todo item. A nice feature in Twisted is that if you want to build a new protocol parser, you can build on standard implementations of basic protocol framing like line-based protocols, length-prefixed framing, netstrings, etc. We knew we'd eventually want some similar building blocks. But wait, we realized, that's not an unrelated todo item at all; the changes to our "queue" API turned it into exactly the API you want to represent a byte stream + added framing. And in this context, the alignment between the stream+queue APIs is super handy. For example, consider a cut down version of a line framer:

@dataclass
class LineSender(MessageStream[bytes]):
    transport: ByteStream

    async def send_message(self, message):
        if b"\n" in message:
            raise ValueError
        await self.transport.send_bytes(message + b"\n")

    async def aclose(self):
        self.transport.aclose()

There are actually a ton of subtleties you can't see here. aclose has to do the right thing if cancelled; send_message has to raise the correct exception if aclose has already been called, etc.; there are actually 3 different exceptions you can get in different cases. But since we use the same conventions for both "queues" and byte streams, our LineSender inherits all this from the underlying byte stream.

Around this time we were also iterating on sans-io libraries for various protocols, and found ourselves converging on a general approach to designing these libraries as converters between a stream of bytes and a stream of low-level protocol-level events... i.e., they also act like typed "queue" on top of a byte stream.

So this means that not only do byte streams and message streams have a lot of similarities in terms of lifecycle and close handling, they're also deeply intertwined in pretty much any non-trivial use case. For example, consider websocket-over-HTTP/2 (as supported by e.g. hypercorn). Your protocol stack looks like:

  • A TCP socket (byte stream)
  • A TLS wrapper (converts byte stream → byte stream)
  • A h2-based wrapper for HTTP/2 parsing/encoding (converts byte stream → stream of HTTP/2 frames)
  • A high-level HTTP/2 library, which handles multiplexing HTTP requests/responses and streams. In particular, it exposes... yet another byte stream object (but this one built on top of HTTP/2 frames).
  • A wsproto-based wrapper for websocket parsing/encoding (converts the new byte stream → stream of websocket frames)
  • A high-level websocket library, which builds on top of the websocket frame stream to give the user a stream of Union[str, bytes] messages.

So these days I think of byte streams and message streams as deeply connected. I don't think they should be actually the same type, because sending/receiving an arbitrary chunk of bytes is pretty different from sending/receiving a single coherent message. But all the machinery and conventions around those core operations can be the same, and I think that there are a lot of benefits in making those connections clear to users.

(Bibliography: #497, #586, #719, #620, #796, python-hyper/wsproto#91, probably others...)

@oremanj

For example, if you've been working with MessageStreams for a while, and using multiple concurrent writer tasks with them, you might expect you could do the same with the similarly-named ByteStream. Nope.

Hmm, I dunno... there are lots of message streams that don't support multiple concurrent writer tasks (like LineSender above), and I wouldn't be surprised if asyncio's byte streams do end up supporting multiple concurrent writer tasks (since they use a model where write synchronously appends data to a buffer). It is true that Trio's byte streams don't support concurrent writers. But that's because it often makes sense to freely mix together messages from different tasks, but almost never makes sense to freely mix together bytes from different tasks – i.e., to the extent there's a difference, it comes from the difference between messages and bytes...

@gvanrossum
Copy link

OK, and now I think I see where you're coming from. You're looking at this from the perspective of what should happen for edge cases when one of the communicating parties closes or wants to close the comunication channel. As you describe there are lots of cases and they are mostly the same regardless of whether the endpoints deal in messages or bytes. Whereas I've been focusing on the APIs that are sensitive to framing.

I also note that the TCP vs. UDP (or SOCK_STREAM vs. SOCK_DGRAM) distinction doesn't capture the difference adequately, because UDP may lose messages and even deliver out of order -- I assume we don't want that for our Channel/MessageStream abstraction, or else the Queue analogy fails badly. (Though what do you propose for UDP then? A MessageStream/Channel with an additional flag that says it may lose messages and may deliver them out of order? I suppose that could work. But I don't think we should force those semantics on everyone, so maybe it should be an UnreliableMessageStream/UnreliableChannel, which should be a superclass (!) of Channel/MessageStream.)

I'm not sure where to take it from here. How much of the API that a typical producer or consumer uses has to do with the closing edge cases, and how much with the continued production or consumption of bytes or messages?

@smurfix
Copy link
Contributor

smurfix commented Oct 11, 2019

So these days I think of byte streams and message streams as deeply connected.

Umm, well, they are, but only as far as all the interesting machinery of closing them / concurrent writers / etc. is concerned. So, yeah, there's a common base here. We even have an ABC for that: AsyncResource.

But that's as far as it goes. I tend to think that the conceptual differences are more important. AsyncResource being a generic concept does not force us to name the classes using that concept in a similar way.

Yes we got there by a very interesting route that showed us, yes you can send objects through a queue, and yes you can send random chunks of bytes through a queue, and so you can actually convert one to the other by a simple building block. That doesn't mean bounded single objects and randomly-chunked collections of small not-quite-objects are the same.

Not-completely-inappropriate(-I-hope) analogy: you can pour water down a pipe, and you can pour marbles that way (as parents tend to figure out); if the pipe doesn't have an odor trap this actually works. You can even mix them.

This however is not a good argument to show that water and marbles are somehow fundamentally the same. Nor are water pumps and Marble Madness conveyor belts.

these days I think of byte streams and message streams as deeply connected

Well, that's given – you did all the work to develop the concepts, from the inside. That's super, in fact without Trio I for one would code a lot less Python these days, but mayyybe optimizing the experience of somebody who just uses / combines these building blocks (and I'm not just talking about the people who need more patience than I ever had before they understand why one write() at end A does not necessarily correspond to one read() at end B for streams – but it does for channels) requires a slightly different PoV.

IMHO, and all that.

@njsmith
Copy link
Member Author

njsmith commented Oct 17, 2019

@smurfix

Umm, well, they are, but only as far as all the interesting machinery of closing them / concurrent writers / etc. is concerned. So, yeah, there's a common base here. We even have an ABC for that: AsyncResource.

AsyncResource is just "this object has an async close method". On top of that, the various stream-like objects add a lot:

  • the concept that there's a connection that has two ends, and this object represents one of the ends
    • maybe: that the two ends can have addresses (see High level API for accessing getsockname() / getpeername() #280) (← I totally forgot about this until now... I'm not 100% sure that we should have a generic "what's my peer's address?" operation, but it does make a lot of sense, for both byte-streams and object-streams.)
  • the concept that the other end can be closed, independently of whether we get closed
  • the concept of a half-close
  • that there are send and receive operations
  • and how send/receive act given different combinations of local close, remote close, half-close

@gvanrossum

I also note that the TCP vs. UDP (or SOCK_STREAM vs. SOCK_DGRAM) distinction doesn't capture the difference adequately, because UDP may lose messages and even deliver out of order -- I assume we don't want that for our Channel/MessageStream abstraction, or else the Queue analogy fails badly. (Though what do you propose for UDP then? A MessageStream/Channel with an additional flag that says it may lose messages and may deliver them out of order? I suppose that could work. But I don't think we should force those semantics on everyone, so maybe it should be an UnreliableMessageStream/UnreliableChannel, which should be a superclass (!) of Channel/MessageStream.)

Yeah, Trio doesn't actually use the channel/message-stream abstraction for UDP currently, though I guess we could model a UDP socket as a MessageStream[Tuple[NetworkAddress, bytes]]. I think the main reason it doesn't feel super natural is that it's not connection oriented. It's not a perfect precedent, but there don't seem to be any perfect precedents, so I'm trying to find inspiration anywhere I can :-).

There's a larger point here though – there's actually a huge range of different semantics you could have for a (byte or object) stream:

  • single- vs. multi-producer
  • single- vs. multi-consumer
  • reliable vs. unreliable
  • order-preserving vs. not
  • can you send arbitrary messages, or does the stream object enforce some protocol state machine? E.g. if you're using h11 to model HTTP/1.1, then you have to first send a headers message and then send a body message; you can't do it the other way around.
  • if you cancel a send, does that mean nothing happened, does it leave it indeterminate whether the message was sent or not, or does it leave the connection in a corrupted state so it has to be abandoned?
  • same question for cancelling a receive
  • infinite buffer vs. finite buffer vs. no buffer (this has more effects then you might expect at first, e.g. no buffer is easier to deadlock, but it lets senders reliably tell if a receiver accepted their message instead of getting left in the buffer forever)

I'm pretty sure that objects with all these variations are going to exist in the ecosystem. And these are all properties that have substantial effects on how you these objects can be used; they're not just quirky little edge cases. In fact the objects in my HTTP/2-websockets illustrate a bunch of them. Looking at just the last two levels, the main difference between the wsproto-based frame stream and the final websocket Union[str, bytes] stream is that the former is single-producer/single-consumer/breaks-on-cancellation, while the latter is multi-producer/multi-consumer/cancellation-safe.

So to me the design problem isn't just "how do we represent TCP sockets and Go-style channels". Those are just two examples from this much larger menagerie, and the design problem is to figure out how to help folks wrangle that whole menagerie effectively. It's tricky to strike a balance between providing enough structure to help folks navigate the space and write generic code, without going off into the weeds of trying to nail down every detail and then define separate ABCs for every possible combination of features.

I mention this because I think it's important context: in my mind, the ByteStream/MessageStream[T] approach is exactly a proposal for how to strike that balance. So you might have one class that implements a connection that's multi-producer/multi-consumer/in-order/reliable/cancellation-safe, and another that implements a connection that's single-producer/single-consumer/out-of-order/unreliable/cancellation-safe, and they'd both use the MessageStream[T] conventions for the common parts, even though they're definitely not drop-in replacements for each other in general.

How much of the API that a typical producer or consumer uses has to do with the closing edge cases, and how much with the continued production or consumption of bytes or messages?

Hmm, that's tricky to say!

Anyone implementing the ABCs definitely has to deal with all these edge cases, they're where a lot of the complexity is, and we expect there to be dozens of concrete implementations, so that adds up quickly. (The trio.testing module actually ships some helpers, so you can hand it your shiny new stream implementation and it goes through and checks that you're handling all the edge cases right.)

For producers and consumers... well, the happy path is mostly just send and receive, yeah :-). But if you're trying to build a robust distributed system, you probably spend a lot more time thinking about the error cases than about the happy path – a lot of what makes networking hard is that your peers might suddenly disappear on you and you need to be prepared cope with that. And the first step is getting notified that it happened. Which means you really want the libraries you're using to have thought about this before you need it. But since this is "edge case" stuff, it's easy for one of your dependencies to not think about it, and then you're stuck – especially if you're depending on a whole stack of protocols, like in my examples, since all it takes is one lazy implementation to break everything. But if there are consistent, simple conventions that are strongly adhered to across the ecosystem, then it's more likely that everyone in the stack will do their part.

Or at least, that's how I see it.

@gvanrossum
Copy link

I think we're in violent agreement about most parts!

But I still like to think that there's a class of these connections that are special, and that's the category of byte streams. This captures TCP, pipes (named or not), UNIX domain and localhost sockets with type SOCK_STREAM, and probably others (TLS?). They all have reliable ordered delivery of bytes without preserving message boundaries. And none of them are particularly amenable to multiple readers or writers (but this is probably related to the lack of message boundaries).

Interestingly, these are both abstractions (that could be) built on top of unreliable chunked protocols -- e.g. TCP streams are implemented on top of IP packets. But they are usually used as the lowest layer in a stack where at the next layer you have messages or objects (e.g. websockets).

Now, when it comes to static typing (PEP 484), I don't mind seeing all the different message/object protocols being distinguished only by the type, so we could have various things of type Channel[T] where there are methods send(msg: T) and receive() -> T. But somehow I really dislike the choice (made by BSD UNIX?) of also using these same operations for byte streams. When it comes to what I'd write as Stream[AnyStr] (recall that AnyStr is a type variable constrained to str and bytes) I'd like to see write(data: AnyStr) and read(n: int) -> AnyStr. And when it comes to reading, I think we may need a few different methods, for example one to read up to n bytes/chars, and one to read exactly n bytes/chars (when you need one, it's annoying to have to build it on top of the other). There are also probably a few more esoteric APIs, e.g. for reading into a bytearray (to save an allocation) and perhaps scatter-gather read/write APIs (e.g. writelines() -- a terrible misnomer). (PS: if you'd rather only talk about bytes here I'd be happy to drop the AnyStr from the interface.)

I don't expect that such a variety of APIs would be useful for the message-based connections -- I'd expect producers and consumers there to be happy with sending/receiving one message/object at a time (in part because I assume the overhead per object intrinsic to the implementation to be much larger than the cost of a function call).

@ntninja
Copy link

ntninja commented Oct 25, 2019

(Recent trio user here, ie: somebody who will be affected by this.) While I think renaming things to XYStream is OK (I was never confused by the old terminology, but you probably have a point about this @njsmith), I'm not sure why you chose to replace Channel by the name MessageStream.

IMHO, the main purpose of the (Binary)Stream is to forward a stream of “binary data” and the main purpose Channel is to forward a stream of “Python objects”. So why not call the later an ObjectStream, rather than a MessageStream: Not only is it one byte shorter (and hence the same length as Binary), it also give a much better idea what to expect when interacting with this kind of stream.

Feedback welcome!

PS: This library rocks by the way! 😁

@agronholm
Copy link
Contributor

agronholm commented Dec 17, 2019

I'm in the process of refactoring AnyIO's ABCs in preparation for subprocess support and I am going to use the opportunity to make them more versatile. As a part of this effort, I would like to make my ABCs conform to whatever consensus we achieve here.

So here are my thoughts on the matter:

The difference between ByteStream and MessageStream[bytes] is that ByteStream does not guarantee any kind of framing for the bytestrings. In other words, it might arbitrarily join or split the bytestrings in transit, but it does guarantee that they are delivered in the same order. As noted in #959, buffering can be used to return an exact number of bytes from the stream even if the stream can only return bytes in arbitrary sized chunks.

As for reliability, can we assume the same guarantees from both? Meaning, they either deliver the bytes, or raise EndOfChannel if the peer sends an EOF? What about situations where the recipient cannot handle the stream as quickly as the sender is sending data and applying back pressure is not an option? I have a use case for async notifications where this becomes an issue. Buffering can mitigate this but it won't solve the problem. Perhaps a subclass or superclass is needed to indicate whether back pressure can be applied or not?

As for UDP: Since UDP packets are indeed framed, could we make use of the hypothetical unreliable message stream superclass here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests