Streams are iterable + receive_some doesn't require an explicit size #1123

njsmith · 2019-06-25T05:52:01Z

This came out of discussion in gh-959

This came out of discussion in python-triogh-959

codecov · 2019-06-25T23:39:52Z

Codecov Report

Merging #1123 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1123      +/-   ##
==========================================
- Coverage   99.55%   99.51%   -0.04%     
==========================================
  Files         105      104       -1     
  Lines       12716    12666      -50     
  Branches      970      977       +7     
==========================================
- Hits        12659    12605      -54     
- Misses         36       40       +4     
  Partials       21       21

Impacted Files	Coverage Δ
trio/_windows_pipes.py	`100% <100%> (ø)`	⬆️
trio/_subprocess.py	`100% <100%> (ø)`	⬆️
trio/_unix_pipes.py	`100% <100%> (ø)`	⬆️
trio/_ssl.py	`100% <100%> (ø)`	⬆️
trio/testing/_check_streams.py	`99.33% <100%> (+0.01%)`	⬆️
trio/_highlevel_generic.py	`100% <100%> (ø)`	⬆️
trio/tests/test_testing.py	`100% <100%> (ø)`	⬆️
trio/_abc.py	`100% <100%> (ø)`	⬆️
trio/_highlevel_socket.py	`100% <100%> (ø)`	⬆️
trio/tests/test_ssl.py	`99.71% <100%> (ø)`	⬆️
... and 15 more

njsmith · 2019-06-26T00:29:17Z

Codecov doesn't seem to be updating its comment, but if I click through now, it says that this isn't adding any new uncovered lines.

oremanj

Thanks for this -- I think it's a huge usability improvement in working with Streams. Feel free to merge if you feel you've adequately addressed my comments, as I won't be online to respond till next Monday.

docs/source/tutorial/echo-server.py

newsfragments/959.feature.rst

trio/_abc.py

oremanj · 2019-07-02T11:21:11Z

trio/_highlevel_socket.py

@@ -10,6 +10,12 @@

 __all__ = ["SocketStream", "SocketListener"]

+# XX TODO: this number was picked arbitrarily. We should do experiments to


One wrinkle: AFAIK, each call to socket.recv() allocates a new bytes object that is large enough for the entire given chunksize. If large allocations are more expensive, passing a too-large buffer is probably bad for performance. (The allocators I know of use 128KB as their threshold for "this is big, mmap it instead of finding a free chunk" but if one used 64KB instead and we got a mmap/munmap pair on each receive, that feels maybe bad?)

My intuition favors a much lower buffer size, like 4KB or 8KB, but I also do most of my work on systems that are rarely backlogged, so my intuition might well be off when it comes to a high-throughput Trio application.

Another option we could consider: the socket owns a receive buffer (bytearray) which it reuses, calls recv_into(), and extracts just the amount actually received into a bytes for returning. Downside: spends 64KB (or whatever) per socket in steady state. Counterpoint: the OS-level socket buffers are probably much larger than that (but I don't know how much memory they occupy when the socket isn't backlogged).

This is an interesting discussion but I don't want it to hold up merging the basic functionality, so I split it off into #1139

(Twisted has apparently used 64 KiB receive buffers for its entire existence and I can't find any evidence that anyone has ever thought twice about it. So we're probably not risking any disaster by starting with 64 KiB for now :-).)

oremanj · 2019-07-02T11:27:30Z

trio/_ssl.py

+                # Heuristic: normally we use DEFAULT_RECEIVE_SIZE, but if
+                # the transport gave us a bunch of data last time then we'll
+                # try to decrypt and pass it all back at once.
+                max_bytes = max(DEFAULT_RECEIVE_SIZE, self._incoming.pending)


I'm a little confused at what the benefit is of having a DEFAULT_RECEIVE_SIZE for SSLStream at all. It seems like we could instead have a nice magic-number-free policy of "ask the transport stream to receive_some() with no size specified, then return all the data we decrypted from whatever we got in that chunk, or loop and receive_some() again if we didn't get any decrypted data".

Yeah, this is complicated... what you say makes logical sense, but, openssl's API is super awkward. There isn't any way to say "please decrypt all the data in your receive buffer". You have to pick a value to pass to SSLObject.read. And even more annoying: you don't find out until after you've picked a value whether you have to go back to the underlying transport for more data. So you have to pick the value before you know how much data the underlying transport wants to give you. And once you've picked a value, you have to keep using that value until some data is returned.

So my logic was: well, if we already have a bunch of data in the receive buffer because the underlying transport was generous, then likely we can just decrypt and return that, and the size of the encrypted data is a plausible upper bound on the size of the decrypted data, so self._incoming.pending is a good value to pass to SSLObject.read.

But, sometimes there won't be a lot of data in the receive buffer – for example, because our heuristic worked well the previous time, and cleared everything out, or almost everything. Like, imagine there's 1 byte left in the receive buffer. The way TLS works, you generally can't decrypt just 1 byte – everything's transmitted in frames, and you need to get the whole frame with its header and MAC and everything before you can decrypt any of it. So if we call ssl_object.read(1), then openssl will end up requesting another large chunk of data from the underlying transport, then our read(1) call will decrypt the first byte and return it, leaving the rest of the data sitting in the buffer for next time. And that would be unfortunate.

So my first attempt at a heuristic is: use the receive buffer size, but never anything smaller than DEFAULT_RECEIVE_SIZE.

I guess this has a weird effect if the underlying transport likes to return more than DEFAULT_RECEIVE_SIZE. Say it gives us 65 KiB, while DEFAULT_RECEIVE_SIZE is 64 KiB. On our first call to SSLStream.receive_some, the buffer size is zero, so we do read(64 KiB). This drains 65 KiB from the underlying transport, then decrypts and returns the first 64 KiB. The next time we call SSLStream.receive_some, we do read(64 KiB) again, but there's already 1 KiB of data in the buffer, so we just return that immediately without refilling the buffer. Then this repeats indefinitely, so we alternate between doing a big receive and a small receive every time. Seems wasteful – it'd be better to return 65 KiB each time.

So maybe a better strategy would be to start with some smallish default receive size, and then increase it over time if we observe the underlying transport giving us more data.

I rewrote the SSLStream stuff to hopefully address the above issues...

trio/tests/test_ssl.py

oremanj

Sorry for the delay here - looks good!

Streams are iterable + receive_some doesn't require an explicit size

ee4cedb

This came out of discussion in python-triogh-959

Attempt to fix 3.5 compat

aab5fe3

njsmith force-pushed the no-more-mandatory-buffer-size branch from 25bcd80 to aab5fe3 Compare June 25, 2019 23:42

python-trio deleted a comment from codecov bot Jun 25, 2019

Add deprecation test to get coverage up

ccf637d

njsmith mentioned this pull request Jun 26, 2019

Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796

Open

oremanj reviewed Jul 2, 2019

View reviewed changes

njsmith mentioned this pull request Jul 4, 2019

Tune default receive buffer size #1139

Open

njsmith added 2 commits July 5, 2019 03:35

Respond to review feedback

1c16947

Rework SSLStream's receive size handling

754fd30

oremanj approved these changes Jul 30, 2019

View reviewed changes

Merge branch 'master' into no-more-mandatory-buffer-size

8de6171

njsmith merged commit d2f364e into python-trio:master Jul 30, 2019

njsmith deleted the no-more-mandatory-buffer-size branch July 30, 2019 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streams are iterable + receive_some doesn't require an explicit size #1123

Streams are iterable + receive_some doesn't require an explicit size #1123

njsmith commented Jun 25, 2019

codecov bot commented Jun 25, 2019 •

edited

Loading

njsmith commented Jun 26, 2019

oremanj left a comment

oremanj Jul 2, 2019

njsmith Jul 4, 2019

oremanj Jul 2, 2019

njsmith Jul 5, 2019

njsmith Jul 6, 2019

oremanj left a comment

		@@ -10,6 +10,12 @@

		__all__ = ["SocketStream", "SocketListener"]

		# XX TODO: this number was picked arbitrarily. We should do experiments to

Streams are iterable + receive_some doesn't require an explicit size #1123

Streams are iterable + receive_some doesn't require an explicit size #1123

Conversation

njsmith commented Jun 25, 2019

codecov bot commented Jun 25, 2019 • edited Loading

Codecov Report

njsmith commented Jun 26, 2019

oremanj left a comment

Choose a reason for hiding this comment

oremanj Jul 2, 2019

Choose a reason for hiding this comment

njsmith Jul 4, 2019

Choose a reason for hiding this comment

oremanj Jul 2, 2019

Choose a reason for hiding this comment

njsmith Jul 5, 2019

Choose a reason for hiding this comment

njsmith Jul 6, 2019

Choose a reason for hiding this comment

oremanj left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 25, 2019 •

edited

Loading