Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Creation With Upload extension #88

Merged
merged 5 commits into from
Oct 6, 2019
Merged

Add Creation With Upload extension #88

merged 5 commits into from
Oct 6, 2019

Conversation

Acconut
Copy link
Member

@Acconut Acconut commented Jun 30, 2016

See #82 for the underlying discussion.

@kvz
Copy link
Member

kvz commented Jul 1, 2016

The creation extension is getting quite large here, and since it says:

If the Server supports the functionality described in this paragraph, it MUST advertise this by including the creation-with-chunk value to the Tus-Extension header.

Should we not rather place this paragraph under its own creation-with-chunk extension header?

@Acconut
Copy link
Member Author

Acconut commented Jul 1, 2016

@kvz Since the Creation With Chunk only applies when the Creation extension is supported, I dislike the idea of splitting these two.

@kvz
Copy link
Member

kvz commented Jul 15, 2016

How about solving that with something along the lines of:

In order to implement and/or enable the Creation With Chunk extension, the Creation extension MUST also be supported as a prerequisite.

@Acconut
Copy link
Member Author

Acconut commented Jul 18, 2016

Apart from the extensions' size is there some feedback on it's content?

@kvz
Copy link
Member

kvz commented Jul 18, 2016

As for the content, I think it could use a

If the Client wishes to use the Creation With Chunk extension,

Where it says

The Client SHOULD

Otherwise it makes sense to me, but for some reason I do have to try hard to grasp it. But I can't think of a better way to rephrase. Maybe we can let AJ take a look though?

@Acconut
Copy link
Member Author

Acconut commented Jul 21, 2016

What about @cjhenck and @MMasterson?

@cjhenck
Copy link

cjhenck commented Jul 21, 2016

I think it might make sense to make it a separate extension that requires Creation because we are adding a second advertisement to the extensions list.

I have some suggested wording changes that I will add to the diff, and personally think it would make sense to call it "Creation with Data" since it is more general.

@@ -351,6 +351,20 @@ If the length of the upload exceeds the maximum, which MAY be specified using
the `Tus-Max-Size` header, the Server MUST respond with the
`413 Request Entity Too Large` status.

The Client MAY include the entire or a chunk of the data, which is meant to be
uploaded, in the body of the `POST` request. In this case, similar rules as for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this wording?
The Client MAY include the entirety or a chunk of the data to be uploaded in the body of the POST request.

@Acconut
Copy link
Member Author

Acconut commented Jul 22, 2016

@cjhenck Thank you for the feedback! I mostly agree with your findings and will address them correctly.

Since you also suggested to move this into a separate extension, I will follow this path. However, instead of naming it Creation With Data, I now tend more towards Creation With Upload. While "data" is a better fit than "chunk", it's still a bit to general. Data could be anything, maybe even metadata or additional information which is totally unrelated to this specific upload. In my opinion, the name Creation With Upload suites the best since it not only indicates that we are creating a new upload resource but also directly provide the entire upload data or just a part of it.

How do you think about this name, @cjhenck and @kvz?

Acconut added a commit to tus/tusd that referenced this pull request Aug 28, 2016
@Acconut
Copy link
Member Author

Acconut commented Aug 28, 2016

@kvz and @cjhenck, I have moved this into its own extension. Would you mind giving it a last review before we can shit it in the next 1.1 release? Also, I just pushed the first implementation for it to tusd.

@Acconut Acconut changed the title Add Creation With Chunk functionality Add Creation With Upload extension Aug 29, 2016
@cjhenck
Copy link

cjhenck commented Aug 29, 2016

Looks great to me! Thanks!

@MMasterson
Copy link

When do we see a possible merge and shipment of the next 1.1 release

@Acconut
Copy link
Member Author

Acconut commented Aug 30, 2016

@cjhenck Great!
@MMasterson I would like to release it in the next few weeks.

@Acconut Acconut added this to the 1.1 milestone Aug 30, 2016
### Creation With Upload

The Client MAY want to include parts of the upload in the initial Creation
request. This MAY be achieved using the Creation With Upload extension.
Copy link
Member

@kvz kvz Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Client MAY want to

The Client MAY

This MAY be achieved

This can be achieved

Reason is: doing this is optional. But if opt-in, we can expect to use this things, and no other thing. So I feel the second MAY is superfluous

offer the Creation extension, it MUST NOT offer the Creation With Upload
extension either.

The Client MAY include the entire upload data or a chunk of it in the body of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAY include either the entirety or a chunk of the upload data in the body

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, that sounds eloquent 👍

@Acconut
Copy link
Member Author

Acconut commented Sep 2, 2016

Thank you, @kvz and @AJvanLoon, for your suggestions. ❤️

@alfiehanssen
Copy link

@Acconut @kvz @bhstahl @MMasterson

Apologies for chiming in late, I'm just reading through this thread now, figured I would comment here as opposed to on #82. If my comments are irrelevant at this point that's ok, I'm thinking better to add a few thoughts even if they're old news. Also, apologies if I'm writing stuff you already know.

I implemented video upload in Vimeo's iOS application and many of the challenges described in #82 are familiar.

On iOS, and mobile in general, video upload in the foreground doesn't make a ton of sense from a user perspective. Videos tend to be large and are getting larger as device cameras improve. Users expect to be able to kick off an upload and put the device back in their pocket, it's not really acceptable to expect them to keep the app in the foreground until the upload completes. Unless we're talking about a really short / small video that can be uploaded in a few seconds (e.g. video upload from within the Instagram app).

Uploading in the background requires using the NSURLSession APIs. The NSURLSession APIs for background upload are very limited. They only allow you to pass in a local file URL. No option to pass in a stream or even an offset.

This means that if the upload fails midway you have to start from 0 when you retry. You cannot resume from an offset. As the video files become larger this limitation becomes more and more cumbersome.

This means if you want to be able to resume uploads, you have to manually split the file into chunks (smaller individual files on disk), and pass the URLs to those files to the NSURLSession.

But this is quite problematic from a UX perspective also. To my knowledge the only way to manually chunk a file is to use an AVAssetExportSession to do so. And this API requires that the chunking be performed when the app is in the foreground. It's also a very time consuming process. Chunking a 2GB file will take a while, getting back to the problem of the user having to keep the app in the foreground while the operation completes. Not to mention complexities of chunking precisely, and handling chunking failures etc. It also means you're literally making a copy of the file and thereby blowing up the user's disk space (at least until the upload completes).

You could create chunks as needed (i.e. only create a chunk once the previous chunk has completed), but this is not a solution I am confident in because: if chunk A finishes uploading in the background and you want to create the next chunk (chunk B) and initiate the background upload of that chunk, the time window you have between the moment upload A completes and when you must have initiated upload B is small and unpredictable (undocumented). In other words, I'm not confident that there is enough time to create a chunk in that time (also not sure AVAssetExportSession is usable at this moment in the app lifecycle since the app is technically still in the background).

Anyway, just wanted to throw some ideas / experiences out there. Hopefully something in here is valuable.

Our upload library is open source here: https://github.com/vimeo/VimeoUpload

It's not yet consumable as a Cocoapod (coming soon) but the code is there and is operational in our iOS app.

Thanks 👍

@Acconut
Copy link
Member Author

Acconut commented Oct 22, 2016

Thank you a lot for this feedback, @alfiehanssen. It is absolutely invaluable for us. The actual reason for why we did consider to add this extension, was to provide a basis on which resumable uploads could also be achieved on the iOS platform. @MMasterson told us that the situation is pretty unfortunate, what is also confirmed by your comment. However, it seems as if this extension does still not solve the problem entirely.

This means if you want to be able to resume uploads, you have to manually split the file into chunks (smaller individual files on disk), and pass the URLs to those files to the NSURLSession.

From what you are describing, does it even make sense to attempt this chunking approach? Your comment draws a relatively depressive image of the upload capabilities for iOS on my mind. Does the VimeoUpload library use the chunking method (AVAssetExportSession) and then uploads in the background (NSURLSession)? If not, are the any alternative approaches?

@alfiehanssen
Copy link

@Acconut, phew! I'm glad this info was helpful.

The VimeoUpload library does not use the chunking method. It just retries the file and in doing so starts from 0 again. I opted for this approach for v1.0 of the library and figured that, depending on usage and product goals, we'd decide later whether to prototype the chunking approach.

Without having prototyped the chunking approach I can't really say whether it will work or not. My exploration though does make me skeptical. I think it will hinge on:

(1) Whether a chunk can be created when the app is in the background (or do the iOS APIs involved in creating the chunk only work in the foreground), and

(2) whether a chunk can be created within the window of time available, i.e. when the app is awakened in the background in response to the previous chunk upload having completed. That's a mouthful, but there's basically a callback that the background upload session invokes when a network task completes. I.e. the NSURLSession process (separate from your app's process) wakes your app up briefly. You can use this time to "respond to completion of the network task" (e.g. by updating persisted state etc.). Can you create a chunk and start uploading it before the app is killed again? As far as I know the duration of this time period is undocumented. And the time to create a chunk probably varies by device / OS profile.

Let me know if you have other questions but that's basically where I ended up when considering chunking.

@cjhenck
Copy link

cjhenck commented Oct 25, 2016

Hi all, sorry to chime in late. @alfiehanssen thanks for adding your expertise to the discussion!

If I recall, upload-with-create was a bit of a compromise: when I had considered using chunking, the problem I ran into was that there is no way to tell the server that it must either accept or reject a whole chunk. When a chunk fails, iOS will automatically retry it from the beginning, but the offsets will no longer match and all future requests will fail, and the upload is essentially stuck in an endless retry loop (hence both #82 and #83). #82 was meant to be a way to allow the application to upload individual chunks using the concatenation extension, while #83 would have allowed for sequential chunking.

I believe that this extension will actually allow us to use chunking with concatenation, but with the side-effect of having unused data on the server until it expires.

@Acconut
Copy link
Member Author

Acconut commented Oct 26, 2016

@alfiehanssen The VimeoUpload library does not use the chunking method. It just retries the file and in doing so starts from 0 again.

Thank you for clarifying this.

(2) whether a chunk can be created within the window of time available, i.e. when the app is awakened in the background in response to the previous chunk upload having completed.

While reading about the functionality behind the NSUrlSession, I stumbled upon some numbers on following site: https://krumelur.me/2015/11/25/ios-background-transfer-what-about-uploads/

The main gotcha here is that you’ll have to ensure that your app doesn’t get suspended while these standard request is running. You can use a UIApplication background task to do that, although it puts strict limits on how long you can run. Specifically:

  1. When you move from the foreground to the background, the limit is currently 3 minutes.
  2. When you are resumed in the background, the limit is 30 seconds.

(1) should be plenty but (2) is much more of a challenge.

IMPORTANT: These limits have changed in the past and will change again in the future. I’m quoting them here just so you have some idea of what to expect. You should manage this process with the UIApplication background task API, specifically:

  • monitor the time remaining using UIApplication’s backgroundTimeRemaining property
  • make sure you implement your background task expiry handler correctly

From my personal experience, these number seem to be enough to divide a medium sized file into smaller chunks and prepare them for being uploaded in the background. However, I may be absolutely wrong with my assumptions and only trying could bring light in this area.

Let me know if you have other questions but that's basically where I ended up when considering chunking.

Thank you, again, a lot for you advice.

@cjhenck If I recall, upload-with-create was a bit of a compromise: when I had considered using chunking, the problem I ran into was that there is no way to tell the server that it must either accept or reject a whole chunk.

Exactly, that's what this extension tries to solve. Were you able to try to use this approach in reality? The tusd server actually support this extension which may be relevant for testing.

@MMasterson
Copy link

@alfiehanssen I was wondering if you had any input on this topic from a macOS programming perspective. I am currently making a few changes to make TUSKit work on the macOS environment. Obviosuly factors for the iOS side don't all apply 100% for macOS, but curious if you ran into any of these issues or different ones on macOS.

@cjhenck
Copy link

cjhenck commented Oct 26, 2016

@Acconut most of our files are small enough that using the background tasks in TUSKit and restarting any cancelled uploads during significant location changes has been sufficient for our needs. It seems like this is what DropBox, etc. do as well.

I would consider trying the chunking approach again, but I'm constrained by wanting to reduce my users' data use. The current technique is clearly the most data-efficient, whereas chunks uploading in the background will retry indefinitely. The latter also presents an interesting optimization problem because you want to find a sweet spot where you maximize chunk success rates and also minimize the HTTP overhead by having as few chunks as possible.

@Acconut
Copy link
Member Author

Acconut commented Oct 28, 2016

@cjhenck most of our files are small enough that using the background tasks in TUSKit and restarting any cancelled uploads during significant location changes has been sufficient for our needs. It seems like this is what DropBox, etc. do as well.

That's good to know, thanks.

The current technique is clearly the most data-efficient.

Yes, but only if the upload works as intended and is not interrupted, I assume.

The latter also presents an interesting optimization problem because you want to find a sweet spot where you maximize chunk success rates and also minimize the HTTP overhead by having as few chunks as possible.

Absolutely correct.

After thinking and reading more about the situation on iOS, I still do not see a viable alternative than using this extension for making uploading work on Apple's mobile platform. The question, whether chunking should be used to support uploading bigger files in a reliable fashion, does not matter. We need this extension since we have to supply a single request, which combines upload creation and file data, to NSUrlSession and this extension is currently the only way for going forward in this matter. Is this correct or am I still missing something crucial?

@cjhenck
Copy link

cjhenck commented Oct 28, 2016

@cjhenck The current technique is clearly the most data-efficient.

@acconaut Yes, but only if the upload works as intended and is not interrupted, I assume.

I don't think that's true - in terms of data use, the current implementation doesn't use any retrying tasks, so as soon as an upload is interrupted, it makes a new HEAD request and restarts from the existing offset. The downside is that the upload only restarts if your app is somehow reawakened or never left the foreground.

We need this extension since we have to supply a single request, which combines upload creation and file data, to NSUrlSession and this extension is currently the only way for going forward in this matter. Is this correct or am I still missing something crucial?

Well... I think we need either this extension or one of the "PATCH with reset" options (#82 or #83). If we use this extension with background URLs then we have to wait until every chunk has been successfully uploaded before we can submit the concatenation request, and we'll (likely) have lots of dead uploads. I believe that once all of the chunks are successfully uploaded that the application would be reawakened with a "success" callback.

If we used #82 or #83 then we could create each chunk's upload, get the URLs, submit a concatenation request, and then create background tasks for the upload that would run entirely in the background without any additional intervention.

The latter approach is cleaner to me but presents opportunities for potential data corruption (in the case of #82). The former approach has the advantage of minimizing overhead for TUS in all use cases by eliminating one request-reply cycle.

@Acconut Acconut mentioned this pull request Nov 21, 2016
@ronag
Copy link
Contributor

ronag commented Nov 22, 2016

Shouldn't Content-Type: application/octet-stream be the content type for this request? Content-Type: application/offset+octet-stream kind of implies that upload-offset is required, which is not the case here?

@Acconut
Copy link
Member Author

Acconut commented Nov 30, 2016

@ronag Shouldn't Content-Type: application/octet-stream be the content type for this request? Content-Type: application/offset+octet-stream kind of implies that upload-offset is required, which is not the case here?

I don't think so. The chunk will be put at the offset 0 which is implicitly defined as the upload is newly created. Therefore the content type application/offset+octet-stream is still appropriate IMO.

@cjhenck: The latter approach [of resetting an upload's offset] is cleaner to me but presents opportunities for potential data corruption (in the case of #82).

I generally enjoy the idea of seeing an upload as an append-only file. It provides you with the guarantee that chunks which have already been received will not change. This makes it possible and easy to process a file will it's being uploaded.

The former approach has the advantage of minimizing overhead for TUS in all use cases by eliminating one request-reply cycle.

Correct.

If there is nothing else to be said about this extension, I think we can finally move to merge this.

@MMasterson
Copy link

@Acconut Any word on when this will be merged in?

@Acconut
Copy link
Member Author

Acconut commented Feb 27, 2018

Sorry about the inactivity, I lost this from my radar. I am going to try to publish a new version of the protocol soon.

@Acconut Acconut merged commit 2334dd8 into master Oct 6, 2019
@Acconut Acconut deleted the creation-with-chunk branch October 6, 2019 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants