-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for chunked and resumable file uploads #5516
Conversation
raise exceptions.MessageException("Incorrect session start.") | ||
source = payload.get("session_chunk") | ||
with open(target_file, "a") as f: | ||
f.write(source.file.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be read in chunks? I suppose this can fill up the memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already the file chunk. I added additional checks to verify the chunk size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the configured chunk size, this can still be quite big. You can replace this line with something like:
read_chunk_size = 2 ** 16
while True:
read_chunk = source.file.read(read_chunk_size)
if not read_chunk:
break
f.write(read_chunk)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Just a minor comment, sorry I missed it earlier.
if not read_chunk: | ||
break | ||
f.write(read_chunk) | ||
f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to close f
, it gets closed automatically as part of the with
statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python part LGTM, thanks @guerler!
Maybe we could add an integration test where chunk_upload_size
is set to a value enough smaller than a test input to exercise the new feature. I can help with that if you want.
error_login: "Uploads require you to log in.", | ||
error_retry: "Waiting for server to resume...", | ||
}, config); | ||
console.debug(cnf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it in on purpose. Afaik we'll add a feature to the client builder to disable it for production.
Sounds good, thanks for the review. How about selenium tests? |
@galaxybot test this |
target_size = os.path.getsize(target_file) | ||
if session_start != target_size: | ||
raise MessageException("Incorrect session start.") | ||
chunk_size = os.fstat(session_chunk.file.fileno()).st_size / self.BYTES_PER_MEGABYTE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to add from __future__ import division
at the top of this file, otherwise you get the floor of the division result in Python2.
@dannon we allow fractions of MB's. This is also used in the test cases, otherwise we would have to use MB sized test datasets. |
error_default: "Please make sure the file is available.", | ||
error_server: "Upload request failed.", | ||
error_login: "Uploads require you to log in." | ||
error_file: "File not provied.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/provied/provided/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, looks good!
This augments the file uploader to allow resumable, chunked uploads without nginx. The client uses the
File.slice
operation to submit chunks of the target file to an api function located atapi/uploads
. If the server is not available the client will wait 5 seconds and then make another attempt to submit the last chunk of 100MB. Chunk size can be set in the configuration with thechunk_upload_size
option. In addition to the file chunk, the client submits a pseudo-uniqueSession-ID
to identify the target file. TheSession-ID
consists ofUser ID
,timestamp
andfilesize
. When all chunks have been uploaded and concatenated, the client triggers theUpload
tool and provides the file path and file name as parameters. A version of this PR using nginx and its upload module is available too. The nginx upload module supports chunked uploads.