Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streams reorganization #3440

Open
asvetlov opened this issue Dec 8, 2018 · 2 comments
Open

Streams reorganization #3440

asvetlov opened this issue Dec 8, 2018 · 2 comments

Comments

@asvetlov
Copy link
Member

asvetlov commented Dec 8, 2018

I'd like to share my thoughts about streams architecture.

Current state

Now we have a StreamReader class.
ClientSession has an auto_compress parameter for optional disabling a stream compression (gzip, deflate etc.)

WebSockets have own buffer implementation for reading a data from a stream.

Proposal

Let's use StreamReader for everything.
On the bottom, there is a raw stream which is a good data source for both zgipped content and websockets data.
Unzipper provides an own class that is a wrapper around a raw stream. Unzipper's .read() calls raw.read() and pass the given data chunks into decompressor.
Websocket reader does the same.

The raw stream becomes the only TCP flow control aware structure, all descendants just use it.
Raw -> Unzipped
Raw -> WebSocket

The raw stream is accessible from Unzipped stream by it's .raw attribute. Something like io.BufferedIOBase.raw

I believe the proposal simplifies the design.

Optimizations

Since asyncio 3.7 we have an advanced BufferedProtocol (old good asyncio.Protocol for old Python's can be supported easily on top of it, the price is more memory copy operations).

For reducing mem-copy BufferedProtocol should use mem structures like Linux Kernel SLAB allocators, the protocol was designed for it.
A simplified idea is:

  • Allocate a large memory page (4Kb or more, we can start from a simple scenario but optimize an allocator for 4MB data chunks later)
  • Fetch a data chunk into an allocated page
  • Free a page when all page's data is consumed by a raw StreamReader

P.S.
IIRC nginx uses an interesting custom allocator strategy: it has a memory context tightly coupled with HTTP request. The memory is never released until the request is gone, then all memory allocated by the request processing goes away. We cannot steal the NGINX memory strategy as-is: a lot of Python objects are created an released during HTTP request life-cycle but we can optimize at least StreamReader implementation for this case.

@aio-libs-bot
Copy link

GitMate.io thinks the contributor most likely able to help you is @fafhrd91.

Possibly related issues are #2555 (Simplify stream classes), #1101 (Streaming to FormData), #3260 (Deprecate stream.unread_data), #1694 (Cancelling websocket coroutine streaming task), and #1106 (Explicitly document streams API).

@aio-libs-bot aio-libs-bot added the documentation Improvements or additions to documentation label Dec 8, 2018
@webknjaz
Copy link
Member

Sounds good. It looks typical for other networking protocol wrappers to do the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants