Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for chunked and resumable file uploads #5516

Merged
merged 63 commits into from
Mar 14, 2018

Conversation

guerler
Copy link
Contributor

@guerler guerler commented Feb 13, 2018

This augments the file uploader to allow resumable, chunked uploads without nginx. The client uses the File.slice operation to submit chunks of the target file to an api function located at api/uploads. If the server is not available the client will wait 5 seconds and then make another attempt to submit the last chunk of 100MB. Chunk size can be set in the configuration with the chunk_upload_size option. In addition to the file chunk, the client submits a pseudo-unique Session-ID to identify the target file. The Session-ID consists of User ID, timestamp and filesize. When all chunks have been uploaded and concatenated, the client triggers the Upload tool and provides the file path and file name as parameters. A version of this PR using nginx and its upload module is available too. The nginx upload module supports chunked uploads.

untitled

raise exceptions.MessageException("Incorrect session start.")
source = payload.get("session_chunk")
with open(target_file, "a") as f:
f.write(source.file.read())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be read in chunks? I suppose this can fill up the memory.

Copy link
Contributor Author

@guerler guerler Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already the file chunk. I added additional checks to verify the chunk size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the configured chunk size, this can still be quite big. You can replace this line with something like:

read_chunk_size = 2 ** 16
while True:
    read_chunk = source.file.read(read_chunk_size)
    if not read_chunk:
        break
    f.write(read_chunk)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok cool.

Copy link
Member

@nsoranzo nsoranzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Just a minor comment, sorry I missed it earlier.

if not read_chunk:
break
f.write(read_chunk)
f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to close f, it gets closed automatically as part of the with statement.

Copy link
Member

@nsoranzo nsoranzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python part LGTM, thanks @guerler!

Maybe we could add an integration test where chunk_upload_size is set to a value enough smaller than a test input to exercise the new feature. I can help with that if you want.

error_login: "Uploads require you to log in.",
error_retry: "Waiting for server to resume...",
}, config);
console.debug(cnf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

@guerler guerler Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it in on purpose. Afaik we'll add a feature to the client builder to disable it for production.

@guerler
Copy link
Contributor Author

guerler commented Mar 11, 2018

Sounds good, thanks for the review. How about selenium tests?

@guerler
Copy link
Contributor Author

guerler commented Mar 13, 2018

@galaxybot test this

target_size = os.path.getsize(target_file)
if session_start != target_size:
raise MessageException("Incorrect session start.")
chunk_size = os.fstat(session_chunk.file.fileno()).st_size / self.BYTES_PER_MEGABYTE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add from __future__ import division at the top of this file, otherwise you get the floor of the division result in Python2.

@guerler
Copy link
Contributor Author

guerler commented Mar 13, 2018

@dannon we allow fractions of MB's. This is also used in the test cases, otherwise we would have to use MB sized test datasets.

@dannon
Copy link
Member

dannon commented Mar 13, 2018

@guerler Got it, the change in 22dd4a8 was the sort of logic consolidation I was thinking about there, that's perfect.

error_default: "Please make sure the file is available.",
error_server: "Upload request failed.",
error_login: "Uploads require you to log in."
error_file: "File not provied.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/provied/provided/

@dannon dannon merged commit 7595114 into galaxyproject:dev Mar 14, 2018
@dannon dannon self-requested a review March 14, 2018 13:13
Copy link
Member

@dannon dannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, looks good!

This was referenced Mar 14, 2018
@guerler guerler deleted the chunk_uploads branch February 19, 2020 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants