Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[@aws-sdk/client-s3] GetObject - ChecksumMode does not work for multipart uploaded files that have additional checksums #5032

Closed
3 tasks done
jacklin213 opened this issue Jul 31, 2023 · 10 comments · Fixed by #5345
Assignees
Labels
bug This issue is a bug. p1 This is a high priority issue queued This issues is on the AWS team's backlog

Comments

@jacklin213
Copy link
Member

jacklin213 commented Jul 31, 2023

Checkboxes for prior research

Describe the bug

Downloading an object using GetObjectCommand with the ChecksumMode: 'ENABLED' flag for a multipart uploaded file fails with a checksum mismatch error.

C:\GitHub\s3-checksummode\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21
            throw new Error(`Checksum mismatch: expected "${checksum}" but received "${checksumFromResponse}"` +
                  ^

Error: Checksum mismatch: expected "EYjbVvTmA+u/bYKvB+581GRo8b1lcQWsR5vpy7KFcas=" but received "mHB8HONJyh7rfIdkdP+zmuj+WpVE59Zak7FzTLbtIaI=-2" in response header "x-amz-checksum-sha256".
    at validateChecksumFromResponse (C:\GitHub\s3-checksummode\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Command works correctly when getting an object that was not a multipart uploaded object.

SDK version number

@aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v18.17.0

Reproduction Steps

I've created a minimal repo demonstrating this issue.

  1. git clone https://github.com/jacklin213/s3-checksummode.git
  2. Follow instructions to replace bucketName in index.js
  3. Run npm start

Alternatively, using the AWS Console, upload any file greater than 16MB.
Download the file using

const client = new S3Client({});
const response = await client.send(
  new GetObjectCommand({
    Bucket: bucketName,
    Key: key,
    ChecksumMode: "ENABLED"
  })
);
response .Body?.transformToString()

Observed Behavior

GetObject failed with a Checksum mismatch error:

C:\GitHub\s3-checksummode\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21
            throw new Error(`Checksum mismatch: expected "${checksum}" but received "${checksumFromResponse}"` +
                  ^

Error: Checksum mismatch: expected "EYjbVvTmA+u/bYKvB+581GRo8b1lcQWsR5vpy7KFcas=" but received "mHB8HONJyh7rfIdkdP+zmuj+WpVE59Zak7FzTLbtIaI=-2" in response header "x-amz-checksum-sha256".
    at validateChecksumFromResponse (C:\GitHub\s3-checksummode\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Expected Behavior

GetObject should succeed if the file is downloaded correctly.

Note this does not happen in other SDKs (Tested with Python explicitly with s3.get_object)

Possible Solution

No response

Additional Information/Context

Issue is coming from the fact that @aws-sdk\middleware-flexible-checksums\src\validateChecksumFromResponse.js calculates an expected checksum based on the file contents (Line 33), however multipart uploaded file checksums are a checksum of checksums.

if (checksumFromResponse) {
const checksumAlgorithmFn = selectChecksumAlgorithmFunction(algorithm as ChecksumAlgorithm, config);
const { streamHasher, base64Encoder } = config;
const checksum = await getChecksum(responseBody, { streamHasher, checksumAlgorithmFn, base64Encoder });
if (checksum === checksumFromResponse) {
// The checksum for response payload is valid.
break;
}
throw new Error(
`Checksum mismatch: expected "${checksum}" but received "${checksumFromResponse}"` +
` in response header "${responseHeader}".`
);

@jacklin213 jacklin213 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jul 31, 2023
@jacklin213 jacklin213 changed the title [@aws-sdk/client-s3] GetObject - ChecksumMode does not work for multipart uploaded files [@aws-sdk/client-s3] GetObject - ChecksumMode does not work for multipart uploaded files that have additional checksums Jul 31, 2023
@jacklin213
Copy link
Member Author

Thought I'd also mention, I can't seem to catch the error thrown from the middleware either. Not sure if this is expected behavior

try {
	const command = new GetObjectCommand({
		Bucket: "example-bucket-name",
		Key: "multipart.txt",
		ChecksumMode: 'ENABLED'
	});
	const response = await client.send(command);
	const str = await response.Body?.transformToString();
	console.log(str?.length);
} catch (error) {
	console.log("Error was called")
}

Output

> ts-node src/index.ts

10383360
C:\Users\linjck\Documents\Development\Node\checksum\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21
            throw new Error(`Checksum mismatch: expected "${checksum}" but received "${checksumFromResponse}"` +
                  ^
Error: Checksum mismatch: expected "EYjbVvTmA+u/bYKvB+581GRo8b1lcQWsR5vpy7KFcas=" but received "mHB8HONJyh7rfIdkdP+zmuj+WpVE59Zak7FzTLbtIaI=-2" in response header "x-amz-checksum-sha256".
    at validateChecksumFromResponse (C:\Users\linjck\Documents\Development\Node\checksum\node_modules\@aws-sdk\middleware-flexible-checksums\dist-cjs\validateChecksumFromResponse.js:21:19)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

No output showing Error was called

@jacklin213
Copy link
Member Author

The latest comment about the uncatchable error has been resolved in #5043 (Thanks @kuhe). Original problem in this issue still stands

@yenfryherrerafeliz yenfryherrerafeliz added investigating Issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Aug 23, 2023
@yenfryherrerafeliz
Copy link
Contributor

Hi @jacklin213, I will mark this issue with a needs-review label so we can further address it.

Reproduction code:

import {GetObjectCommand, S3Client} from "@aws-sdk/client-s3";
import {Upload} from "@aws-sdk/lib-storage";

const client = new S3Client({
    region: "us-east-2"
});
// Upload object
const upload = new Upload({
    client: client,
    params: {
        Bucket: process.env.TEST_BUCKET,
        Key: process.env.TEST_KEY,
        Body: "#".repeat(1024 * 1024 * 6),
        ContentType: "text",
        ChecksumAlgorithm: "SHA256"
    }
})
await upload.done();
// Get object
const response = await client.send(new GetObjectCommand({
    Bucket: process.env.TEST_BUCKET,
    Key: process.env.TEST_KEY,
    ChecksumMode: "ENABLED"
}))

console.log(response)

Thanks!

@yenfryherrerafeliz yenfryherrerafeliz added needs-review This issue/pr needs review from an internal developer. p2 This is a standard priority issue and removed investigating Issue is being investigated and/or work is in progress to resolve the issue. labels Aug 23, 2023
@ajredniwja ajredniwja added the queued This issues is on the AWS team's backlog label Sep 12, 2023
@RanVaknin RanVaknin removed the needs-review This issue/pr needs review from an internal developer. label Sep 19, 2023
@trivikr trivikr self-assigned this Oct 6, 2023
@kuhe kuhe added p1 This is a high priority issue and removed p2 This is a standard priority issue labels Oct 6, 2023
@trivikr
Copy link
Member

trivikr commented Oct 10, 2023

Verified that this issue is not reproducible in AWS CLI.

$ export MULTIPART_BUCKET_NAME=bucket-name-where-example-file-from-repro-was-uploaded
$ aws --version
aws-cli/2.13.25 Python/3.11.5 Darwin/22.6.0 exe/x86_64 prompt/off

$ aws s3api get-object --bucket $MULTIPART_BUCKET_NAME --key multipart.txt --checksum-mode ENABLED multipart.txt
{
    "AcceptRanges": "bytes",
    "LastModified": "2023-10-10T15:43:36+00:00",
    "ContentLength": 10383360,
    "ETag": "\"0e38848d5bca56cc2cad089592581871-2\"",
    "ChecksumSHA256": "mHB8HONJyh7rfIdkdP+zmuj+WpVE59Zak7FzTLbtIaI=-2",
    "ContentType": "binary/octet-stream",
    "ServerSideEncryption": "AES256",
    "Metadata": {}
}

@trivikr
Copy link
Member

trivikr commented Oct 10, 2023

There is a customization in the internal specification for flexible checksums, where we need to skip validation for whole-object multipart GET. We'll add that customization to fix this bug.

@jacklin213
Copy link
Member Author

Hey @trivikr,

Does this mean that in present day, other clients such as python, AWSCLI etc skip over checksum validation for multipart uploaded when ChecksumMode: 'ENABLED' is set on GetObject?

How are customers supposed to validate the integrity of these objects then? Does the specification provide any recommendations?

@trivikr
Copy link
Member

trivikr commented Oct 10, 2023

Does this mean that in present day, other clients such as python, AWSCLI etc skip over checksum validation for multipart uploaded when ChecksumMode: 'ENABLED' is set on GetObject?

Yes. If they're fully compliant with SDKs flexible checksums specification.
S3 does not provide checksums of parts in GetObject call responses. So SDKs do not have a way to compute checksum of checksums in case of whole object multipart GET.

Does the specification provide any recommendations?

No. The SDKs had reported this edge case with S3 when flexible checksums was being implemented, and S3 team had confirmed that customers are okay with this behavior.

@trivikr
Copy link
Member

trivikr commented Oct 11, 2023

The customization for S3 whole-object multipart GET was merged in #5345
It will be published with https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.428.0, expected on Thu Oct 12th.

@jacklin213
Copy link
Member Author

jacklin213 commented Oct 20, 2023

As per this thread, is there an action item to update the JS SDK documentation to state something like, multipart uploaded files will be skipped over for Checksum validations: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/s3/command/GetObjectCommand/

Currently just has

CheckumMode: To retrieve the checksum, this mode must be enabled.

The existing description also misses information describing that the checksum of the downloaded object is validated

Copy link

github-actions bot commented Nov 4, 2023

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. p1 This is a high priority issue queued This issues is on the AWS team's backlog
Projects
None yet
6 participants