Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large file upload (streaming support) #546

Closed
nszceta opened this issue Mar 13, 2017 · 25 comments
Closed

Large file upload (streaming support) #546

nszceta opened this issue Mar 13, 2017 · 25 comments

Comments

@nszceta
Copy link

nszceta commented Mar 13, 2017

There is currently no support for uploading very large files greater than the amount of RAM available to the machine.

@r0fls
Copy link
Contributor

r0fls commented Mar 13, 2017

@nszceta did you checkout the stream response type? https://github.com/channelcat/sanic/blob/master/docs/sanic/response.md

@nszceta
Copy link
Author

nszceta commented Mar 13, 2017

@r0fls input, not output. Request, not response streaming to file.

@r0fls
Copy link
Contributor

r0fls commented Mar 13, 2017

ah yes, that's why you said upload :)

@nszceta
Copy link
Author

nszceta commented Mar 13, 2017

I have successfully implemented a method for this in Flask, using a server side session due to the insane way browsers stream files to the server.

Here is a snippet of my Flask code I used in one of my projects

@app.route('/image', methods=['POST'])
def image_post():
    rf = request.files['files[]']
    ext = rf.mimetype.split('/')[1]
    dst = session['image'] = 'tmp/{}.{}'.format(session['key'], ext)

    try:
        # extract starting byte from Content-Range header string
        range_str = request.headers['Content-Range']
        start_bytes = int(range_str.split(' ')[1].split('-')[0])
        with open(dst, 'ab') as f:
            f.seek(start_bytes)
            f.write(rf.stream.read())
        return jsonify({})
    except KeyError:
        with open(dst, 'wb') as f:
            f.write(rf.stream.read())
        return jsonify({})

A few issues:

  • Returning return jsonify({}) is sub-optimal
  • There has to be a better way to identify start_bytes
  • Is there any other way to keep track of who is uploading besides a session?

I'd love to see support for this in Sanic, and I may have time to implement it as well, but I would like more feedback first.

@frnkvieira
Copy link

I would like to see this too. I miss the "request.stream" feature a lot. (similar to request.body but a file-like object not already in-memory)
I'm willing to help to implement it as well :)

@nszceta
Copy link
Author

nszceta commented Mar 13, 2017

@frnkvieira what would be the best way to get started?

@frnkvieira
Copy link

@nszceta I really don't know. I have to check Sanic internals but I guess Sanic is already reading the entire stream which means we would need a pretty big refactoring with a lot of lazy loaders added... @r0fls Do you have any opinions about this ?

@sankaravadivel
Copy link

@nszceta Thanks for linking #568 to this issue. This is exactly what I am looking for.

@kakarukeys
Copy link

I am happy to contribute some changes to Sanic to address this. It looks like some rework is needed on sanic.request.Request.form and sanic.request.Request.files properties and parse_multipart_form?

not quite a "pretty big refactoring", am I missing something?

@r0fls
Copy link
Contributor

r0fls commented May 8, 2017

This could be covered by #697 but I'm not sure. @nszceta do you care to test with your example that was larger than the machine's ram? cc @38elements

@nszceta
Copy link
Author

nszceta commented May 8, 2017

@r0fls sure. I can also add a compact unit test soon too.

@Nothing4You
Copy link

What's the current state on this?

@Jeffwhen
Copy link

Jeffwhen commented May 4, 2018

I use aiofiles to write the bytes got from response.stream queue instantaneously, it works fine. My memory's ok.

Get to override default REQUEST_MAX_SIZE and REQUEST_TIMEOUT though.

@asvetlov
Copy link
Contributor

asvetlov commented May 4, 2018

Sanic stores request body in memory even for streamed requests: https://github.com/channelcat/sanic/blob/master/sanic/server.py#L278

The framework has no flow control at all unfortunately.

@Jeffwhen
Copy link

Jeffwhen commented May 4, 2018

@asvetlov response.stream is a asyncio.Queue object. So while the request handler keeps putting bytes into it, my stream handler keeps getting the bytes out. Thus no memory problem. Or am I missing something?

@asvetlov
Copy link
Contributor

asvetlov commented May 4, 2018

The mentioned above line pushes a data into the queue. The data is in memory already
If your code consumes the data slower than a peer sends it -- out-of-memory error is unavoidable.
To solve it sanic should pause reading from the socket on reaching some memory limit.

@fduxiao
Copy link

fduxiao commented May 4, 2018

@Jeffwhen I agree with @asvetlov . You need to await the put coroutine to finish and then read new body, which means you may not use the javascript-like callback style. Currently while pushing the new-received body to the queue (as a task), the server is also receiving new body. Since the previous body may not have been processed (freed), after new body is pushed, the total memory consumption will increase which will surely lead to out-of-memory. You could expose the socket to the stream handler(async def) so one could await the reading coroutine on demand.

@Jeffwhen
Copy link

Jeffwhen commented May 8, 2018

Are there any way to do flow control using asyncio.Protocol abstraction?

@asvetlov
Copy link
Contributor

asvetlov commented May 8, 2018

Sure, there is but somebody should make a Pull Request :)

@fun0gs
Copy link

fun0gs commented Nov 15, 2018

one temporary solution for nginx users: nginx-upload-module

@fun0gs
Copy link

fun0gs commented Nov 15, 2018

What a pity is that aiohttp supports it.

@fun0gs
Copy link

fun0gs commented Nov 15, 2018

one temporary advice for flow control in sanic.server.HttpProtocol

    # line 247 nearby
    def on_headers_complete(self):
        from multidict import CIMultiDict

        self.request = self.request_class(
            url_bytes=self.url,
            headers=CIMultiDict(self.headers),
            version=self.parser.get_http_version(),
            method=self.parser.get_method().decode(),
            transport=self.transport
        )
        # Remove any existing KeepAlive handler here,
        # It will be recreated if required on the new request.
        if self._keep_alive_timeout_handler:
            self._keep_alive_timeout_handler.cancel()
            self._keep_alive_timeout_handler = None
        if self.is_request_stream:
            self._is_stream_handler = self.router.is_stream_handler(
                self.request)
            if self._is_stream_handler:
                ################################## fix-1 start
                self.request.stream = asyncio.Queue(10)  # whatever size you like
                ################################## fix-1 end
                self.execute_request_handler()

    # line 267 nearby
    def on_body(self, body):
        if self.is_request_stream and self._is_stream_handler:

            #################################### fix-2 start
            async def put_body(body):
                if self.request.stream.full():
                    self.transport.pause_reading()
                    await self.request.stream.put(body)
                    self.transport.resume_reading()
                else:
                    await self.request.stream.put(body)

            self._request_stream_task = self.loop.create_task(put_body(body))
            #################################### fix-2 end
            return
        self.request.body.append(body)

@sjsadowski
Copy link
Contributor

Closed per #1423

@ahopkins
Copy link
Member

This issue has been mentioned on Sanic Community Discussion. There might be relevant details there:

https://community.sanicframework.org/t/large-file-upload-using-sanic-api/1326/1

@rajnathsah
Copy link

I started the above thread on sanic community after going through the documentation and the issues reported on github. I can't see any option to upload file larger than ram in sanic. Just wanted to know if this is limitation of sanic or there is way to enable file upload larger than ram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests