Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streams are iterable + receive_some doesn't require an explicit size #1123
Streams are iterable + receive_some doesn't require an explicit size #1123
Changes from 3 commits
ee4cedb
aab5fe3
ccf637d
1c16947
754fd30
8de6171
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One wrinkle: AFAIK, each call to
socket.recv()
allocates a newbytes
object that is large enough for the entire given chunksize. If large allocations are more expensive, passing a too-large buffer is probably bad for performance. (The allocators I know of use 128KB as their threshold for "this is big, mmap it instead of finding a free chunk" but if one used 64KB instead and we got a mmap/munmap pair on each receive, that feels maybe bad?)My intuition favors a much lower buffer size, like 4KB or 8KB, but I also do most of my work on systems that are rarely backlogged, so my intuition might well be off when it comes to a high-throughput Trio application.
Another option we could consider: the socket owns a receive buffer (bytearray) which it reuses, calls
recv_into()
, and extracts just the amount actually received into abytes
for returning. Downside: spends 64KB (or whatever) per socket in steady state. Counterpoint: the OS-level socket buffers are probably much larger than that (but I don't know how much memory they occupy when the socket isn't backlogged).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting discussion but I don't want it to hold up merging the basic functionality, so I split it off into #1139
(Twisted has apparently used 64 KiB receive buffers for its entire existence and I can't find any evidence that anyone has ever thought twice about it. So we're probably not risking any disaster by starting with 64 KiB for now :-).)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused at what the benefit is of having a
DEFAULT_RECEIVE_SIZE
for SSLStream at all. It seems like we could instead have a nice magic-number-free policy of "ask the transport stream to receive_some() with no size specified, then return all the data we decrypted from whatever we got in that chunk, or loop and receive_some() again if we didn't get any decrypted data".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is complicated... what you say makes logical sense, but, openssl's API is super awkward. There isn't any way to say "please decrypt all the data in your receive buffer". You have to pick a value to pass to
SSLObject.read
. And even more annoying: you don't find out until after you've picked a value whether you have to go back to the underlying transport for more data. So you have to pick the value before you know how much data the underlying transport wants to give you. And once you've picked a value, you have to keep using that value until some data is returned.So my logic was: well, if we already have a bunch of data in the receive buffer because the underlying transport was generous, then likely we can just decrypt and return that, and the size of the encrypted data is a plausible upper bound on the size of the decrypted data, so
self._incoming.pending
is a good value to pass toSSLObject.read
.But, sometimes there won't be a lot of data in the receive buffer – for example, because our heuristic worked well the previous time, and cleared everything out, or almost everything. Like, imagine there's 1 byte left in the receive buffer. The way TLS works, you generally can't decrypt just 1 byte – everything's transmitted in frames, and you need to get the whole frame with its header and MAC and everything before you can decrypt any of it. So if we call
ssl_object.read(1)
, then openssl will end up requesting another large chunk of data from the underlying transport, then ourread(1)
call will decrypt the first byte and return it, leaving the rest of the data sitting in the buffer for next time. And that would be unfortunate.So my first attempt at a heuristic is: use the receive buffer size, but never anything smaller than
DEFAULT_RECEIVE_SIZE
.I guess this has a weird effect if the underlying transport likes to return more than
DEFAULT_RECEIVE_SIZE
. Say it gives us 65 KiB, whileDEFAULT_RECEIVE_SIZE
is 64 KiB. On our first call toSSLStream.receive_some
, the buffer size is zero, so we doread(64 KiB)
. This drains 65 KiB from the underlying transport, then decrypts and returns the first 64 KiB. The next time we callSSLStream.receive_some
, we doread(64 KiB)
again, but there's already 1 KiB of data in the buffer, so we just return that immediately without refilling the buffer. Then this repeats indefinitely, so we alternate between doing a big receive and a small receive every time. Seems wasteful – it'd be better to return 65 KiB each time.So maybe a better strategy would be to start with some smallish default receive size, and then increase it over time if we observe the underlying transport giving us more data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote the
SSLStream
stuff to hopefully address the above issues...