Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architectural problems of Sanic framework #1176

Closed
sindzicat opened this issue Mar 26, 2018 · 20 comments
Closed

Architectural problems of Sanic framework #1176

sindzicat opened this issue Mar 26, 2018 · 20 comments

Comments

@sindzicat
Copy link

sindzicat commented Mar 26, 2018

Article on Reddit

A copy of content of this article:

Performance of Sanic-based web servers is pretty good, sure.

The only problem is that people want not only fast but stable servers.

From this perspective the Sanic is awful.

Malicious software can crash any Sanic server easy by out-of-memory error.

Let me analyze several different attack vectors:

  1. Send a POST to any (even not existing) PATH.

    Push "Content-Length" HTTP header with the maximum available value without sending a body.

    You should discover what is a maximum allowed size for POST HTTP body. Sanic by default limiting it down to insane 100 megabytes but Reverse Proxy Server like NGINX may reduce it to more reasonable 1 MB for example. Even 1 MB is enough.

    Now we have 2 options:

    a. Without closing the connection open as many concurrent connections as you can and push all of them into the state when HTTP headers are sent but HTTP body transition is postponed. Pretty classic attack for locking all TCP ports on a server by opened idle connections.

    Sanic will drop the connection after 60 seconds by default but a minute may be enough for pushing the server into Deny-Of-Service state. The problem is not specific to Sanic, consequences are relative innocent.

    b. The more interesting case is sending almost whole BODY but without a couple bytes at the end.

    In fact, Sanic performs a routing and request handling only after fetching the whole BODY.

    It means that BODY is COLLECTED IN MEMORY before starting of PATH/HEADERS analyzing and processing. You can declare 100 MB request's BODY, send 99 MB of garbage random data and stop sending after that.

    Open another concurrent request and do the same. Repeat multiple times. Most likely the server will run out of memory before getting run out of free ports.

    Why out-of-memory error is more dramatic than free-ports problem? Because of it, not only web process(es) stops processing incoming requests but the whole server goes to unresponsible state: physical memory is out, everything becomes swapped to disk, CPU is overheated by kernel swapper worker, as the result even connecting to a problematic server by SSH becomes deadly slow.

    The problem can be reduced by adding monitoring tools for looking at Sanic processes memory and killing problematic web workers. But default configuration has no such monitors.

    I bet that most web servers in the world are not configured properly (at least very many of them). Also killing a web process is a painful procedure, sometimes is not easy to distinguish normal occasional high memory consumption from malicious attack. As result, normal processing of user data will be killed.

  2. Use Sanic streaming.

    Sanic has Response Streaming feature. It is widely used to download big data, video streaming etc.

    Assume you know that https://server.com/video is a resource name for video stream powered by Sanic.

    How to screw it up? Really very easy.

    Connect to the server by regular GET https://server.com/video and read body SLOWLY.

    Sanic has no Flow Control for streaming data (in fact it has no flow control at all). Data a sent to peer when the next data chunk is available. If TCP socket's Write Buffer is overloaded -- the data is pushed into process memory. If HTTP peer (browser or another client) consumes stream slower than Sanic produces it -- Sanic process will end up with out-of-memory eventually.

    The problem is very dramatic because it doesn't need a malicious software to reproduce -- just slow network connection between client and server is enough to explode the bomb.

    As result, a streaming in Sanic is broken by design, the feature usage is very dangerous even if nobody wants to knock out your server -- it will be demolished by an innocent client with slow network.

What to do?

Unfortunately, problems described above are architectural problems of Sanic framework, they cannot be solved on the user side.

Moreover, fixing is not possible without changing Sanic public API.

Good news: Sanic development team runs so fast that new backward incompatible changes can land into master without any deprecation period and related procedures. They did it several times, the project is still in beta stage.

The only real protection can be done Right Now is limiting a memory acquired by Sanic process. Better to kill an eager process than allow it to grab all memory with dying not only the Sanic process but the whole server.

Graceful restart could be very complicated but even rough "kill -9" is better than nothing.

A careful review of configuration parameters for both Sanic and Reverse Proxy (like NGINX) is also very important.

UPD: I don't have account on Reddit.

Best regards,
Andrey.

@ashleysommer
Copy link
Member

ashleysommer commented Mar 28, 2018

Issues 1.a and 1.b can be mitigated by changing some default config values.

By default REQUEST_MAX_SIZE = 100000000 #100 megabytes but it could be changed to REQUEST_MAX_SIZE = 1000000 #1 megabyte by the developer implementing sanic in their application.
The 60 second timeout for delayed requests can be changed by configuring the REQUEST_TIMEOUT variable.
By default REQUEST_TIMEOUT = 60 but it can easily be changed to REQUEST_TIMEOUT = 5 if the developer wishes.

Issue 2 is harder to address. I will have a look how hard it might be add a limited size FIFO buffer in the response streaming code. (EDIT: See my next comment.)

@ashleysommer
Copy link
Member

ashleysommer commented Mar 28, 2018

Issue 2 is completely unfounded.

Sanic uses the TCP transport provided by the uvloop library.

Sanic does not implement any flow control, because flow control is handled internally by uvloop.

Data is sent to a peer when the next data chunk is available. If TCP socket's Write Buffer is overloaded -- the data is pushed into process memory. If HTTP peer (browser or another client) consumes stream slower than Sanic produces it -- Sanic process will end up with out-of-memory eventually.

This is false. Uvloop maintains a 65kb flow-control send buffer. If the buffer hits this limit, the tcp write() command blocks the calling thread until the buffer is successfully sent. In this case, it is not possible for sanic to continue to produce streaming data if the client is consuming it slower than sanic is producing it. It will keep hitting the 65kb buffer limit and waiting for the client to catch up.

@sindzicat
Copy link
Author

Cool. It seems to me that article on Reddit was writen by Andrey Svetlov @asvetlov, the author of uvloop. But I don't know precise if I'm true.

@sindzicat
Copy link
Author

It seems to me we should to create issue on uvloop github page. But I have some difficulties with English (It's not my native lang).

@ashleysommer
Copy link
Member

Hmm, seems you're right. Now I'm so confused.

I did work out how I can add a flow-control buffer to Sanic, but it doesn't make sense to do so because uvloop already has a size limited flow control buffer internally. Why would the author make that argument in the article if he knows uvloop has that feature?

@sindzicat
Copy link
Author

Why would the author make that argument in the article if he knows uvloop has that feature?

I don't know. I even don't know if we should to create issue on uvloop github page.

How can we invite @asvetlov here?

@r0fls
Copy link
Contributor

r0fls commented Mar 28, 2018

Well I probably went a little over the top, but this was my response:
https://www.reddit.com/r/Python/comments/876msl/sanic_python_web_server_thats_written_to_die_fast/dweofk8/

I stopped reading after the memory body limit stuff, which appeared to be merely FUD fuel.

@asvetlov
Copy link
Contributor

I'm not uvloop author (@1st1 is) but I am a co-maintainer of asyncio itself.
UVLoop is not different in this question from asyncio, in fact it does support asyncio transport API.

transport.write() is not a blocking operation by definition.
If write socket buffer on kernel side is not full -- .write() pushes a sent data into the buffer immediatelly.
Otherwise it saves a data into transport's write buffer (in process memory).
That's how asyncio (and uvloop as asyncio loop third-party implementation) works.
asyncio streams have flow control facility, but Sanic uses a transport API directly. It is not bad at all but in this case Sanic should care about write buffer overflow.

Regarding to memory body limit. It is the project choice to read the body in memory before getting to user any chance to execute some code.
aiohttp handler is called before body reading, the request can be rejected by HTTP headers analyzing for example (no/invalid Authorization header).
Alfter pre-checks user has to call await request.read() and family for getting BODY data explicitly.
In aiohttp user can do smart things like reading a body chunk by chunk for reducing memory footprint.
In Sanic the body is preloaded in memory before calling a registered handler -- this is the difference.

Guys if you ok this existing design -- it is your deliberate choice.

@ashleysommer
Copy link
Member

ashleysommer commented Mar 28, 2018

asyncio streams have flow control facility, but Sanic uses a transport API directly. It is not bad at all but in this case Sanic should care about write buffer overflow.

I think you have misinterpreted something there. The transport API that sanic uses is the asyncio streams api. When sanic executes self.transport.write() in the response streaming function here https://github.com/channelcat/sanic/blob/79df52e519588112e43dd771dcd42ea245e732d2/sanic/response.py#L130
It is calling this line in asyncio/uvloop:
https://github.com/MagicStack/uvloop/blob/fbe6675807aa702fcfefbb7ef77c5ca8d8e99bb3/uvloop/handles/stream.pyx#L635
That in turn executes the _write() function that does do flow control as seen here: https://github.com/MagicStack/uvloop/blob/fbe6675807aa702fcfefbb7ef77c5ca8d8e99bb3/uvloop/handles/stream.pyx#L389
That does do flow control as you mentioned. Sanic does not somehow bypass this flow control.

@sindzicat
Copy link
Author

Nub question: were things tested from this article?

@ashleysommer
Copy link
Member

Something I did notice when looking at the uvloop code; uvloop streamtransport can send an on_pause and on_resume callback to the protocol, when it wants the protocol to pause and resume it's stream. Sanic does not have handlers for these callbacks, so they are unhandled. Its possible this might be key to having the uvloop flow control work properly.

I'm looking in to how Sanic might implement these pause and resume functions.

@asvetlov
Copy link
Contributor

I think you have misinterpreted something there. The transport API that sanic uses is the asyncio streams api. When sanic executes self.transport.write() in the response streaming function here

asyncio streams API is https://docs.python.org/3/library/asyncio-stream.html It is built on top of transport/protocol abstraction layer.

A protocol has pause_writing/resume_writing methods (empty by default), you can implement it.
Look on streaming API writer for inspiration (search for "drain" keyword).

@pgjones
Copy link

pgjones commented Mar 28, 2018

I think @asvetlov is referring to the _drain_helper method which I think is available on the Sanic HttpProtocol class and you could probably pass into the response stream method to be awaited. I think aiohttp does it here.

I went for the pause_writing / resume_writing approach in Quart in cf82d539, if you are interested (I need to add some more testing and more Slow Loris protection though).

Anyway I'm a fan of Sanic, keep up the good work. (I'm the author of Quart)

@fafhrd91
Copy link

I am not sure, you guys can’t read or don’t care. I pointed to sanic architectural problems long time ago, everyone just happy to stay in silence.

#770 (comment)

@vltr
Copy link
Member

vltr commented Apr 13, 2018

Well, I must say I got a little worried about these problems right now. One of them seems to be fixed; the other one, about the request.body, IMO should be tackled or, at least, leave an option to the developer run its checks before the body is received, like a hook, so to speak. Thoughts?

@ashleysommer
Copy link
Member

@fafhrd91 @vltr
Don't forget, Sanic is an open-source project that relies on community members to make contributions to the codebase.

It's almost as if anyone could open a PR with some fixes.

Just sayin'...

@vltr
Copy link
Member

vltr commented Apr 13, 2018

@ashleysommer of course, I agree with you. I can work on a PR in my spare time, just wanted to know what the team thinks about this topic and what would be the best approach 😉

@rubik
Copy link

rubik commented Aug 5, 2018

@ashleysommer Are there any plans to implement flow control? While the request body issue is not a concern at all, because the developer can set a max limit, the second issue prevents from using the streaming response feature completely.

I know that you pointed out that we can contribute, but I'm sorry to say that I'm not proficient at all in this matter, so I cannot implement it myself. Maybe someone can take inspiration from the commit that implemented flow control in Quart? Here it is: https://gitlab.com/pgjones/quart/commit/cf82d539c67125eb3c3b96e2decb65966f6d174b

@ashleysommer
Copy link
Member

ashleysommer commented Aug 5, 2018

@rubik
A basic flow-control using pause_writing and resume_writing was implemented with this PR back in March: #1179

Its not merged yet, we need support to get it merged into master before the next release.

@sjsadowski
Copy link
Contributor

I am not going to do a full audit for this but from my interpretation, the two main concerns, response size and flow control, have been addressed and #1179 was merged prior to 0.8.0 going out the door.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants