WIP: Streaming #1791

Tronic · 2020-02-21T16:00:29Z

This is an effort the clean up server code and make streaming requests and responses first class citizens, with non-streaming implemented on top of them instead of having duplicate code and separate modes for them.

So far request streaming is looked at, and it seems to work fairly well. 7 tests fail, which might be due to expectation that all of request body would be read prior to sending response because the new code sends a response as soon as possible.

A possible bug discovered: I think the old code would get confused if more than one request was sent without waiting for the prior ones be responded first. Additionally, such parallel streaming responses could get entirely mixed up. Although pipelining in such a way is non-standard, it should now be fixed.

This is NOT complete or meant to be merged yet, but the idea is to finish the request handling, and then try to attempt streaming responses without callbacks as prototyped earlier in the Trio branch.

Testing for the advantageous:

pip install git+https://github.com/Tronic/sanic@streaming

Bug reports resolved in this:
Fix #1722
Fix #1730
Fix #1785
Fix #1790
Fix #1804

…CRLF-CRLF.

Tronic · 2020-02-24T15:30:50Z

Status update: rewriting the server with async instead of a lot of callbacks made the code a lot easier to work with and actually faster despite also removing the MagicHttp stack (because it doesn't support async). The current pure Python parser pushes 50000 req/s where the master branch only does 44000 req/s.

Quite a bit of work remains to be done, especially if all corner cases are to be covered. A new parser means that it is very difficult to make it behave identically to the old one. At this point I've got 6 failing tests and 636 passing.

vltr · 2020-02-24T15:41:44Z

Wow! You took out httptools and was still able to get better performance from Sanic using plain Python? That's quite unexpected but really interesting ... I'll take a closer look on the code when I can. Great job!

codecov · 2020-02-26T17:24:09Z

Codecov Report

Merging #1791 into master will decrease coverage by 2.07%.
The diff coverage is 80.05%.

@@            Coverage Diff             @@
##           master    #1791      +/-   ##
==========================================
- Coverage   92.17%   90.09%   -2.08%     
==========================================
  Files          23       24       +1     
  Lines        2312     2525     +213     
  Branches      424      475      +51     
==========================================
+ Hits         2131     2275     +144     
- Misses        141      190      +49     
- Partials       40       60      +20

Impacted Files	Coverage Δ
sanic/compat.py	`81.25% <20.00%> (-4.96%)`	⬇️
sanic/errorpages.py	`93.47% <40.00%> (-6.53%)`	⬇️
sanic/http.py	`76.44% <76.44%> (ø)`
sanic/response.py	`96.92% <84.21%> (-1.15%)`	⬇️
sanic/app.py	`91.37% <88.09%> (-0.89%)`	⬇️
sanic/asgi.py	`92.20% <96.29%> (+1.39%)`	⬆️
sanic/headers.py	`100.00% <100.00%> (ø)`
sanic/request.py	`99.60% <100.00%> (-0.03%)`	⬇️
sanic/testing.py	`98.01% <100.00%> (+0.04%)`	⬆️
sanic/router.py	`96.09% <0.00%> (-0.98%)`	⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 36ffaa7...36ffaa7. Read the comment docs.

vltr · 2020-02-26T17:27:31Z

So ... Since we might be moving away from httptools, one thing comes to mind: we are now parsing all data by ourselves and there are some specific cases that httptools (or, more specifically, http-parser) would handle for us (and have that available with its own unit tests), like buffer overflows, well formed but incomplete requests and so on. I believe that we should tackle at least a handful of these tests into Sanic to avoid introducing new bugs at the cost of the void left by httptools | http-parser.

Tronic · 2020-02-26T17:33:12Z

Agreed, we definitely need to comprehensive testing of request handling. There are also interesting cases around streaming, like if a handler finishes its response before request body is even (fully) sent. I'd appreciate it if someone else was willing to write tests because I'll probably overlook the assumptions done in my implementation.

Next milestones:

Maybe get rid of StreamingBuffer class as it is already degraded to a simple wrapper.
Implement request.respond(...) for spawning streaming responses
Implement StreamingHttpResponse and simple responses on top of that

…nger be supported. Chunked mode is now autodetected, so do not put content-length header if chunked mode is preferred.

…), all requests made streaming. A few compatibility issues and a lot of cleanup to be done remain, 16 tests failing.

Tronic · 2020-02-28T17:01:58Z

All responses made streaming now, too. That was quite a bit more work than anticipated, and there is still work to be done but this is taking good shape now, and doesn't break existing interfaces much. Looks like this could be aimed for Sanic 20.06 release, possibly with some deprecations to prepare for it in 20.03.

Next milestones:

Resolve remaining test failures
Improve request.respond usability?
Fix response middleware handling (how exactly, remains an issue)
Try to simplify and de-duplicate error handling (especially so in app.py)
Fix curl is not agree wit return in chunk mode #1722 and Error handlers are synchronous #1785

vltr · 2020-02-28T19:18:41Z

Agreed, we definitely need to comprehensive testing of request handling. There are also interesting cases around streaming, like if a handler finishes its response before request body is even (fully) sent. I'd appreciate it if someone else was willing to write tests because I'll probably overlook the assumptions done in my implementation.

Awesome work, @Tronic ! Regarding the parser (yet), I see that you already transferred most of the logic to the http.py module, but to create tests regarding HTTP parsing only we would need to break it down a little bit more (more specifically take out the parsing logic to not be intertwined with the protocol logic). Do you think that's "doable" - even if this requires a plethora arguments in function calls?

Tronic · 2020-02-29T08:36:06Z

@vltr The accessing protocol members from Http is only a temporary workaround until tests are passing and the code can be cleaned up some more. I also considered sans-I/O implementation but async I/O calls allow for automatic flow control without extra tasks for copying data, and presumably higher performance.

After cleanup you should be able to insert mock functions send and receive_more for unit testing the Http class, without dependencies to Protocol (or to asyncio, for that matter).

…timeout=<value> which wasn't the case with existing implementation, and secondly none of the other web servers I tried include this header.

Tronic · 2020-03-26T14:59:15Z

For some reason on Windows the payload too large test fails because the test client cannot read the response. I suspected a deadlock between writing request and reading response blocking but increasing buffer sizes didn't help, and I can't currently think of other reasons why this might be happening.

This reverts commit abc1e3e.

Tronic · 2020-03-26T17:20:59Z

I've traced how that goes: the HTTP handler function writes its response to asyncio protocol and exits cleanly, after which transport.close() is called. According to docs this function should write out any pending data before actually closing.

My bet is that this failure occurs because httpx is still trying to send the request and doesn't notice that there is a response coming. Adding a delay allows for the request to flow into Sanic's receive buffer, after which httpx manages to read the response without any error. I do wonder how this is supposed to be solved...

Tronic · 2020-03-26T18:15:42Z

Yep, confirmed working, although that is not really a fix, just a hack around httpx in this particular test where the request happens to fit in network buffers despite being larger than request maximum size.

Tronic · 2020-04-03T16:03:40Z

Deadlocks are possible in bidirectional streaming handlers if the HTTP client does not support streaming but insists on sending all of request before reading any response. This will lead to request timeout but Sanic could also detect when this happens and issue a proper diagnostic. Unfortunately the problem itself cannot be avoided but it is up to the application whether or not to implement this kind of handlers.

This does not affect:

Normal handling (because all of request is read first)
Streaming in most cases

Non-streaming clients fail:

When request is too large: we won't receive infinite amounts of data
- Sanic instantly sends a 413 PayloadTooLarge and disconnects (all-ok)
- Client gets a network error on send and never reads the response
Streaming handlers with large request and large response
- Small request or small response fits in network buffers and avoids the deadlock.
- Client and server both stuck trying to send until response timeout

The problems are not new to this branch, and also occur with any other HTTP server implementing this functionality. Notably, browsers don't do bidirectional streaming but they have large network buffers to avoid any issues with moderately sized bodies. Curl, Nginx, Express and others implement proper bidirectional streaming.

ahopkins · 2020-05-13T17:55:15Z

@Tronic Can you squash these commits?

Tronic · 2020-05-15T05:42:56Z

@ahopkins You mean rebase and force-push? And all of them or just the merge commits? There's further work to be done on this prior to merge to master, so probably such merging should only be done then; and I believe this can be done more cleanly in Github UI while merging.

Tronic · 2020-05-17T08:16:34Z

Since the new policy is not to introduce any breaking changes in .09 and .12 releases, I guess this should target 20.06 (due to the delay with 20.03 I would been more comfortable with 20.09). To make that happen, I'll need help with proper code review, API discussion and writing comprehensive tests. Is anyone up for that?

Also, can I expect 20.06 to be delayed at least until the end for June, or better yet, July?

ahopkins · 2020-05-17T14:09:51Z

I will help with the review
Ideally we'd be able to publish notice with some warnings. Not really applicable for a change like this.
Our choice is to make it for this next release, or wait until March.
Given the nature of this past spring release, and the short notice on declaring depreciation schedule, I'm okay with releasing in July. @huge-success/sanic-release-managers? If there is consensus, I'll post to the Sanic front page with an announcement.

I think this is a big enough change that as many @huge-success/sanic-core-devs thay put their eyes on this and chime in the better.

Tronic · 2020-06-04T11:01:38Z

Closing PR as development is moved to huge-success/sanic:streaming

Tronic added 6 commits February 21, 2020 10:51

Streaming request by async for.

6279eac

Make all requests streaming and preload body for non-streaming handlers.

fe64a27

Cleanup of code and avoid mixing streaming responses.

f609a48

Async http protocol loop.

f6a0b4a

Change of test: don't require early bad request error but only after …

3d05e1e

…CRLF-CRLF.

Add back streaming requests.

ef4f233

Tronic added 4 commits February 26, 2020 13:03

Rewritten request body parser.

42d86bc

Misc. cleanup, down to 4 failing tests.

6b9f0ec

All tests OK.

b87364b

Entirely remove request body queue.

29c6f3c

Let black f*ckup the layout

6d8f598

Tronic mentioned this pull request Feb 27, 2020

curl is not agree wit return in chunk mode #1722

Closed

Tronic added 4 commits February 28, 2020 17:26

Better testing error messages on protocol errors.

b2476bd

Remove StreamBuffer tests because the type is about to be removed.

d597137

Remove tests using the deprecated get_headers function that can no lo…

85c67a0

…nger be supported. Chunked mode is now autodetected, so do not put content-length header if chunked mode is preferred.

Major refactoring of HTTP protocol handling (new module http.py added…

85b1ad5

…), all requests made streaming. A few compatibility issues and a lot of cleanup to be done remain, 16 tests failing.

Terminate check_timeouts once connection_task finishes.

8a1baeb

Tronic added 5 commits February 29, 2020 14:18

Code cleanup, 14 tests failing.

57202bf

Much cleanup, 12 failing...

a553e64

Even more cleanup and error checking, 8 failing tests.

7f41c5f

Remove keep-alive header from responses. First of all, it should say …

85be576

…timeout=<value> which wasn't the case with existing implementation, and secondly none of the other web servers I tried include this header.

Everything but CustomServer OK.

2840e4c

Tronic added 7 commits March 25, 2020 14:58

Merge branch 'master' into streaming

5832764

Linter and typing

6bedec9

These must be tuples + hack mypy warnings away.

f928ad2

Merge branch 'master' into streaming

1aac4f5

New streaming test and minor fixes.

01480c4

Constant receive buffer size.

e4a9b43

256 KiB send and receive buffers.

abc1e3e

Tronic added 3 commits March 26, 2020 18:09

Revert "256 KiB send and receive buffers."

39d10bf

This reverts commit abc1e3e.

app.handle_exception already sends the response.

c84163f

Improved handling of errors during request.

fcae70b

An odd hack to avoid an httpx limitation that causes test failures.

232177d

Merge branch 'master' into streaming

b6a4bd3

Tronic mentioned this pull request Mar 29, 2020

How to increase timeout on specific method? #1824

Closed

Tronic mentioned this pull request Apr 6, 2020

How can Install sanic with pypy v3.5? #747

Closed

Tronic and others added 3 commits April 7, 2020 03:50

Merge branch 'master' into streaming

95b8b7c

Merge branch 'master' into streaming

9b6f259

Merge branch 'master' into streaming

36ffaa7

Merge branch 'master' into streaming

ef10275

Tronic closed this Jun 4, 2020

Tronic deleted the streaming branch June 4, 2020 11:08

Tronic mentioned this pull request Jun 21, 2020

Streaming Server #1876

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Streaming #1791

WIP: Streaming #1791

Tronic commented Feb 21, 2020 •

edited

Loading

Tronic commented Feb 24, 2020

vltr commented Feb 24, 2020

codecov bot commented Feb 26, 2020 •

edited

Loading

vltr commented Feb 26, 2020

Tronic commented Feb 26, 2020

Tronic commented Feb 28, 2020

vltr commented Feb 28, 2020

Tronic commented Feb 29, 2020

Tronic commented Mar 26, 2020

Tronic commented Mar 26, 2020

Tronic commented Mar 26, 2020

Tronic commented Apr 3, 2020

ahopkins commented May 13, 2020

Tronic commented May 15, 2020

Tronic commented May 17, 2020

ahopkins commented May 17, 2020 •

edited

Loading

Tronic commented Jun 4, 2020

WIP: Streaming #1791

WIP: Streaming #1791

Conversation

Tronic commented Feb 21, 2020 • edited Loading

Tronic commented Feb 24, 2020

vltr commented Feb 24, 2020

codecov bot commented Feb 26, 2020 • edited Loading

Codecov Report

vltr commented Feb 26, 2020

Tronic commented Feb 26, 2020

Tronic commented Feb 28, 2020

vltr commented Feb 28, 2020

Tronic commented Feb 29, 2020

Tronic commented Mar 26, 2020

Tronic commented Mar 26, 2020

Tronic commented Mar 26, 2020

Tronic commented Apr 3, 2020

ahopkins commented May 13, 2020

Tronic commented May 15, 2020

Tronic commented May 17, 2020

ahopkins commented May 17, 2020 • edited Loading

Tronic commented Jun 4, 2020

Tronic commented Feb 21, 2020 •

edited

Loading

codecov bot commented Feb 26, 2020 •

edited

Loading

ahopkins commented May 17, 2020 •

edited

Loading