Streams reorganization #3440

asvetlov · 2018-12-08T16:47:01Z

I'd like to share my thoughts about streams architecture.

Current state

Now we have a StreamReader class.
ClientSession has an auto_compress parameter for optional disabling a stream compression (gzip, deflate etc.)

WebSockets have own buffer implementation for reading a data from a stream.

Proposal

Let's use StreamReader for everything.
On the bottom, there is a raw stream which is a good data source for both zgipped content and websockets data.
Unzipper provides an own class that is a wrapper around a raw stream. Unzipper's .read() calls raw.read() and pass the given data chunks into decompressor.
Websocket reader does the same.

The raw stream becomes the only TCP flow control aware structure, all descendants just use it.
Raw -> Unzipped
Raw -> WebSocket

The raw stream is accessible from Unzipped stream by it's .raw attribute. Something like io.BufferedIOBase.raw

I believe the proposal simplifies the design.

Optimizations

Since asyncio 3.7 we have an advanced BufferedProtocol (old good asyncio.Protocol for old Python's can be supported easily on top of it, the price is more memory copy operations).

For reducing mem-copy BufferedProtocol should use mem structures like Linux Kernel SLAB allocators, the protocol was designed for it.
A simplified idea is:

Allocate a large memory page (4Kb or more, we can start from a simple scenario but optimize an allocator for 4MB data chunks later)
Fetch a data chunk into an allocated page
Free a page when all page's data is consumed by a raw StreamReader

P.S.
IIRC nginx uses an interesting custom allocator strategy: it has a memory context tightly coupled with HTTP request. The memory is never released until the request is gone, then all memory allocated by the request processing goes away. We cannot steal the NGINX memory strategy as-is: a lot of Python objects are created an released during HTTP request life-cycle but we can optimize at least StreamReader implementation for this case.

The text was updated successfully, but these errors were encountered:

aio-libs-bot · 2018-12-08T16:49:59Z

GitMate.io thinks the contributor most likely able to help you is @fafhrd91.

Possibly related issues are #2555 (Simplify stream classes), #1101 (Streaming to FormData), #3260 (Deprecate stream.unread_data), #1694 (Cancelling websocket coroutine streaming task), and #1106 (Explicitly document streams API).

webknjaz · 2018-12-10T08:15:43Z

Sounds good. It looks typical for other networking protocol wrappers to do the same.

aio-libs-bot added the documentation Improvements or additions to documentation label Dec 8, 2018

webknjaz added enhancement need pull request and removed documentation Improvements or additions to documentation labels Dec 10, 2018

asvetlov mentioned this issue Jan 21, 2019

Waiters refac #3568

Merged

5 tasks

aio-libs-bot mentioned this issue Feb 7, 2019

Counting bytes in a forwarded stream #3598

Closed

asvetlov mentioned this issue May 10, 2019

Support raw http responses on per-request basis #3751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streams reorganization #3440

Streams reorganization #3440

asvetlov commented Dec 8, 2018 •

edited

Loading

aio-libs-bot commented Dec 8, 2018

webknjaz commented Dec 10, 2018

Streams reorganization #3440

Streams reorganization #3440

Comments

asvetlov commented Dec 8, 2018 • edited Loading

Current state

Proposal

Optimizations

aio-libs-bot commented Dec 8, 2018

webknjaz commented Dec 10, 2018

asvetlov commented Dec 8, 2018 •

edited

Loading