Skip to content
timkay edited this page Dec 19, 2012 · 21 revisions

"aws" now supports multi-part upload seamlessly: if it is bigger than 5GB, the file automatically will be uploaded using multi-part uploads.

Use "aws" as you normally would: "aws put BUCKET/OBJECT file". Use --progress and --parts=N to show progress. (See the last Note below.)

The current support for error checking and retries is limited. This issue will be addressed soon.

Multipart uploads can be used these ways:

  1. "aws" automatically uses multi-part upload if the file is larger than 5GB
  2. specify --parts=N to force multi-part upload (e.g., aws put test682/ myfile.txt --parts=4)
  3. one part at a time, specifying all the parameters explicitely
  4. one part at a time, having "aws" fill in the parameters for you

Uses for Multi-Part upload:

  • Upload files larger than 5GB.
  • Upload files more quickly. (At some point, will add a pseudo-streaming feature soon, where you can pipe many GB into aws, and it will upload 5MB (or whatever) at a time.)
  • Mitigate race conditions: The code checks to see if uploading makes sense and starts the upload. While it's uploading, something happens, so that it shouldn't be uploaded. Too bad. Instead, do a multipart upload, upload the file, but don't "Complete" the upload. Now, check to see if the file should be uploaded. If so, do the "Complete" step. Otherwise, "Abort".

"aws" will automatically enable multi-part uploads if the file is larger than 5GB.

If you want to upload one part at a time, the first way looks like this. It's complicated, and you probably should skip to the second method below (unless you might be uploading the same file twice concurrently).

$ aws post test682/mputest?uploads
<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Bucket>test682</Bucket>
    <Key>mputest</Key>
    <UploadId>KQxiZ9rr8_eGZx_...</UploadId>
</InitiateMultipartUploadResult>
$ aws put 'test682/mputest?partNumber=1&uploadId=KQxiZ9rr8_eGZx_...' 1.part
$ aws put 'test682/mputest?partNumber=2&uploadId=KQxiZ9rr8_eGZx_...' 2.part
$ cat >mputest.xml
<CompleteMultipartUpload>
  <Part>
    <PartNumber>1</PartNumber>
    <ETag>&quot;b026324c6904b2a9cb4b88d6d61c81d1&quot;</ETag>
  </Part>
  <Part>
    <PartNumber>2</PartNumber>
    <ETag>&quot;bcb4b88d624c6904b2a9d61c81d10263&quot;</ETag>
  </Part>
</CompleteMultipartUpload>
^D
$ aws post test682/mputest?uploadId=KQxiZ9rr8_eGZx_... mputest.xml
<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Location>https://test682.s3.amazonaws.com/mputest</Location>
    <Bucket>test682</Bucket>
    <Key>mputest</Key>
    <ETag>&quot;80ec144fdfd47f7f32e96fc58b2e01ee-1&quot;</ETag>
</CompleteMultipartUploadResult>

The second method is much easier. "aws" fills in all the blanks.

$ aws post test682/mputest?uploads
<InitiateMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Bucket>test682</Bucket>
    <Key>mputest</Key>
    <UploadId>ggfgZUNWlfiypsxCe1NAd...</UploadId>
</InitiateMultipartUploadResult>
$ aws put test682/mputest?part 1.part
$ aws put test682/mputest?part 2.part
$ aws post test682/mputest?upload
<CompleteMultipartUploadResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Location>https://test682.s3.amazonaws.com/mputest</Location>
    <Bucket>test682</Bucket>
    <Key>mputest</Key>
    <ETag>&quot;80ec144fdfd47f7f32e96fc58b2e01ee-1&quot;</ETag>
</CompleteMultipartUploadResult>

Specify a part number explicitely (perhaps to upload the part again if it failed).

aws put test682/mputest?partNumber=2 2.part

List all multipart uploads that are active for a given bucket.

aws get test682?uploads

List the details of a particular multipart upload, including all parts.

aws get test682/mputest?upload
<ListPartsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <Bucket>test682</Bucket>
    <Key>mputest</Key>
    <UploadId>RiBj2R0wi096qIvptvmNo...</UploadId>
    <StorageClass>STANDARD</StorageClass>
    <PartNumberMarker>0</PartNumberMarker>
    <NextPartNumberMarker>2</NextPartNumberMarker>
    <MaxParts>1000</MaxParts>
    <IsTruncated>false</IsTruncated>
    <Part>
            <PartNumber>1</PartNumber>
            <LastModified>2012-11-03T03:28:32.000Z</LastModified>
            <ETag>&quot;5f363e0e58a95f06cbe9bbc662c5dfb6&quot;</ETag>
            <Size>5242880</Size>
    </Part>
    <Part>
            <PartNumber>2</PartNumber>
            <LastModified>2012-11-03T03:28:45.000Z</LastModified>
            <ETag>&quot;26ab0db90d72e28ad0ba1e22ee510510&quot;</ETag>
            <Size>2</Size>
    </Part>
</ListPartsResult>

Abort a multipart upload.

aws delete test682/mputest?upload

Notes:

  • Each part must be at least 5MB, except for the last part.
  • Be careful: If two separate multipart uploads target the same filename concurrently, and you use the second method for upload (where you omit the uploadId), you'll end up with a complete mess on your hands.
  • Progress indicator does not work properly, as curl can't tell how big each part is. It says 100.0% all the time. However, you will see one progress bar for each part + 3 more bars. To get a better progress indication, use --progress with --parts=N, where N is large. Then you'll see N + 3 progress bars. For exampoe, if you say --parts=97, then you will get 100 progress bars total.