You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to share my thoughts about streams architecture.
Current state
Now we have a StreamReader class. ClientSession has an auto_compress parameter for optional disabling a stream compression (gzip, deflate etc.)
WebSockets have own buffer implementation for reading a data from a stream.
Proposal
Let's use StreamReader for everything.
On the bottom, there is a raw stream which is a good data source for both zgipped content and websockets data.
Unzipper provides an own class that is a wrapper around a raw stream. Unzipper's .read() calls raw.read() and pass the given data chunks into decompressor.
Websocket reader does the same.
The raw stream becomes the only TCP flow control aware structure, all descendants just use it.
Raw -> Unzipped
Raw -> WebSocket
The raw stream is accessible from Unzipped stream by it's .raw attribute. Something like io.BufferedIOBase.raw
I believe the proposal simplifies the design.
Optimizations
Since asyncio 3.7 we have an advanced BufferedProtocol (old good asyncio.Protocol for old Python's can be supported easily on top of it, the price is more memory copy operations).
For reducing mem-copy BufferedProtocolshould use mem structures like Linux Kernel SLAB allocators, the protocol was designed for it.
A simplified idea is:
Allocate a large memory page (4Kb or more, we can start from a simple scenario but optimize an allocator for 4MB data chunks later)
Fetch a data chunk into an allocated page
Free a page when all page's data is consumed by a raw StreamReader
P.S.
IIRC nginx uses an interesting custom allocator strategy: it has a memory context tightly coupled with HTTP request. The memory is never released until the request is gone, then all memory allocated by the request processing goes away. We cannot steal the NGINX memory strategy as-is: a lot of Python objects are created an released during HTTP request life-cycle but we can optimize at least StreamReader implementation for this case.
The text was updated successfully, but these errors were encountered:
I'd like to share my thoughts about streams architecture.
Current state
Now we have a
StreamReader
class.ClientSession
has anauto_compress
parameter for optional disabling a stream compression (gzip, deflate etc.)WebSockets have own buffer implementation for reading a data from a stream.
Proposal
Let's use
StreamReader
for everything.On the bottom, there is a
raw
stream which is a good data source for both zgipped content and websockets data.Unzipper provides an own class that is a wrapper around a raw stream. Unzipper's
.read()
callsraw.read()
and pass the given data chunks into decompressor.Websocket reader does the same.
The raw stream becomes the only TCP flow control aware structure, all descendants just use it.
Raw -> Unzipped
Raw -> WebSocket
The raw stream is accessible from Unzipped stream by it's
.raw
attribute. Something likeio.BufferedIOBase.raw
I believe the proposal simplifies the design.
Optimizations
Since asyncio 3.7 we have an advanced
BufferedProtocol
(old goodasyncio.Protocol
for old Python's can be supported easily on top of it, the price is more memory copy operations).For reducing mem-copy
BufferedProtocol
should use mem structures like Linux Kernel SLAB allocators, the protocol was designed for it.A simplified idea is:
StreamReader
P.S.
IIRC
nginx
uses an interesting custom allocator strategy: it has a memory context tightly coupled with HTTP request. The memory is never released until the request is gone, then all memory allocated by the request processing goes away. We cannot steal the NGINX memory strategy as-is: a lot of Python objects are created an released during HTTP request life-cycle but we can optimize at leastStreamReader
implementation for this case.The text was updated successfully, but these errors were encountered: