Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Setting highWaterMark in AWS S3 GetObject Stream #6890

Open
3 of 4 tasks
ujjwol05 opened this issue Feb 16, 2025 · 4 comments
Open
3 of 4 tasks

Problem with Setting highWaterMark in AWS S3 GetObject Stream #6890

ujjwol05 opened this issue Feb 16, 2025 · 4 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.

Comments

@ujjwol05
Copy link

ujjwol05 commented Feb 16, 2025

Checkboxes for prior research

Describe the bug

When using the AWS SDK for JavaScript v3 to stream data from S3 (via GetObjectCommand), I cannot set highWaterMark , I've also set a custom highWaterMark value for stream buffers to see if that works but, the buffer size remains at the 16 KB

Regression Issue

  • Select this option if this issue appears to be a regression.

SDK version number

@aws-sdk/package-name@version, ...

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.17.0

Reproduction Steps

const command = new GetObjectCommand(params);
const response = await s3Client.send(command);

const stream = response.Body as NodeJS.ReadableStream;
const customStream = stream.pipe(new Stream.PassThrough({
    highWaterMark: 32 * 1024 
}));

customStream.on("data", (chunk) => {
    console.log(`Chunk size: ${chunk.length}`);
});

Observed Behavior

INFO Chunk size: 16384
INFO Chunk size: 389

Expected Behavior

INFO Chunk size: 30243
INFO Chunk size: 30243

Possible Solution

No response

Additional Information/Context

No response

@ujjwol05 ujjwol05 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 16, 2025
@zshzbh zshzbh self-assigned this Feb 17, 2025
@zshzbh
Copy link
Contributor

zshzbh commented Feb 19, 2025

Hey @ujjwol05 ,

The highWaterMark setting defines the maximum buffer size, but it doesn't guarantee that chunks will be exactly that size. The actual chunk sizes you're seeing (16384 and 389 bytes) are determined by the underlying stream implementation and the source

@ujjwol05
Copy link
Author

@zshzbh Does that mean we have no control over the S3 stream, and if we need a fixed-size buffer, we must rebuffer it ourselves?

@zshzbh
Copy link
Contributor

zshzbh commented Feb 20, 2025

Depending on what you are trying to achieve.

S3 honors HTTP byte range request, so any file can be downloaded in fixed-size pieces using multiple GETs. You can use PartNumber

If this is referring to individual TCP packet sizes, then no you cannot control this.

@zshzbh zshzbh added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Feb 20, 2025
@zshzbh
Copy link
Contributor

zshzbh commented Feb 20, 2025

I can share the code if you want a consistant chunk size - To get consistent 32KB chunks, you'll need to use a Transform stream instead of PassThrough. -

import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import { Transform } from 'stream';

const s3Client = new S3Client({
  region: "us-east-1",
});

const params = {
  Bucket: "test-s3-XXXX-mm",
  Key: "large-file.txt",
};

const command = new GetObjectCommand(params);
const response = await s3Client.send(command);
const stream = response.Body;

// Create a custom transform stream that buffers data into 32KB chunks
const chunkSize = 32 * 1024; // 32KB
let buffer = Buffer.alloc(0);

const chunker = new Transform({
    transform(chunk, encoding, callback) {
        // Add new chunk to our buffer
        buffer = Buffer.concat([buffer, chunk]);

        // While we have enough data for a full chunk
        while (buffer.length >= chunkSize) {
            // Push a chunk of exactly 32KB
            this.push(buffer.slice(0, chunkSize));
            buffer = buffer.slice(chunkSize);
        }
        callback();
    },
        // Push any remaining data when the stream ends
    flush(callback) {

        if (buffer.length > 0) {
            this.push(buffer);
        }
        callback();
    }
});

const customStream = stream.pipe(chunker);

customStream.on("data", (chunk) => {
    console.log(`Chunk size: ${chunk.length}`);
});

@zshzbh zshzbh added p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Feb 20, 2025
@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Feb 21, 2025
@zshzbh zshzbh added the response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants