Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override highWaterMark didn't work #94

Open
H4ad opened this issue Jan 2, 2024 · 5 comments
Open

Override highWaterMark didn't work #94

H4ad opened this issue Jan 2, 2024 · 5 comments

Comments

@H4ad
Copy link

H4ad commented Jan 2, 2024

The following code will send the data BEFORE 6s:

const util = require('node:util');
const HANDLER_HIGHWATERMARK = Symbol.for(
  'aws.lambda.runtime.handler.streaming.highWaterMark',
);

const handler = async (event, responseStream, context) => {
        responseStream.setContentType('text/html; charset=utf-8');

        responseStream.write("Thinking...");
        responseStream.write("a".repeat(64 * 1024));

        setTimeout(function() {
            responseStream.write("b".repeat(64 * 1024));
        }, 3000);

        setTimeout(function() {
            responseStream.write("After 6s");
            responseStream.end();
        }, 6000);
    };

handler[HANDLER_HIGHWATERMARK] = 512;

exports.handler = awslambda.streamifyResponse(handler, {
        highWaterMark: 512
    }
);

The following code will only send data AFTER 6s.

const util = require('node:util');
const HANDLER_HIGHWATERMARK = Symbol.for(
  'aws.lambda.runtime.handler.streaming.highWaterMark',
);

const handler = async (event, responseStream, context) => {
        responseStream.setContentType('text/html; charset=utf-8');

        responseStream.write("Thinking...");
        responseStream.write("a".repeat(1 * 1024));

        setTimeout(function() {
            responseStream.write("b".repeat(1 * 1024));
        }, 3000);

        setTimeout(function() {
            responseStream.write("After 6s");
            responseStream.end();
        }, 6000);
    };

handler[HANDLER_HIGHWATERMARK] = 512;

exports.handler = awslambda.streamifyResponse(handler, {
        highWaterMark: 512
    }
);

From what I tested, the default highWaterMark is 64Kb, but there is no way to override it to be lower, I tried two different ways to do it.

This causes weird issues where I can't send some tiny text and then wait 6s to send more data, it will be sent as a single chunk only after 6s.

This issue was initially discovered at H4ad/serverless-adapter#166

@conico974
Copy link

From what i've found the threshold seems to be around 16Kb to get reliable streaming, but even that seems to be pretty inconsistent.
Calling responseStream.cork() before write and responseStream.uncork() after seems to help a little bit as well.

The things i've noticed is that even for the same data sent, for small chunks, sometimes the stream is buffering and no data is sent, and sometimes it just works well. I assume it might be when the data are close to this highWaterMark or the receiving server are not ready (but this doesn't seem to happen for big chunks)

Potentially the issue might be on the other hand of the stream, they create a request to an external server

req = options.httpOptions.http.request(
and it might be on this server that the buffering is happening. It might also explain why the highWaterMark has no effect locally even though it seems to be applied.

@stackia
Copy link

stackia commented Feb 22, 2024

I found the first 2 writes are always bufferer:

const handler = awslambda.streamifyResponse(async (event, responseStream, context)=>{
    for (let i = 1; i <= 10; i++) {
        responseStream.write(i.toString());
        await new Promise((r) => setTimeout(r, 1000));
    }
    responseStream.end();
});

export { handler };
Screen.Recording.2024-02-22.at.2.48.25.PM.mov

@stackia
Copy link

stackia commented Feb 22, 2024

Found a tricky workaround

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms))
}

const handler = awslambda.streamifyResponse(async (event, responseStream, context)=>{
    async function flush() {
        await sleep(0);
        responseStream.write(" ")
        await sleep(0);
    }

    await flush();
    
    responseStream.write("a".repeat(1 * 1024))
    await flush();
    
    await sleep(1000)
    
    responseStream.write("b".repeat(1 * 1024));
    await flush();
    
    await sleep(1000)
    
    responseStream.write("c".repeat(1 * 1024));
    await flush();
    
    await sleep(1000)
    
    responseStream.write("d".repeat(1 * 1024));
    await flush();
    
    await sleep(1000)
    
    responseStream.write("e".repeat(1 * 1024));
    await flush();
    
    await sleep(1000)
    
    responseStream.write("f".repeat(1 * 1024));
    await flush();
    
    responseStream.end();
});

export { handler };

It looks like every write will flush the previous buffer (starting from the second write).

@conico974
Copy link

I’ve delved into some experimentation with this and i found some very interesting things.

The responseStream is only a restricted version of an http.request, and streaming can be achieved without awslambda.streamifyResponse by creating the request ourselves (This is what i've done to eliminate some potential issues)

Moreover, it seems that neither pipe nor pipeline are affected by the minimum size constraint. The example lambda below works fine even with small chunks of data. I think there is still a minimum size but it's far less than with write

Last things i've discovered is that using awslambda.HttpResponseStream.from can also trigger some other buffering issues that can cause the stream to wait for the end. This doesn’t occur consistently, only about 15-20% of the time.
I've found a workaround for this, if you let enough time (roughly 25ms has proven effective most of the time, but still not 100%) after the last write from awslambda.HttpResponseStream.from (the req.write(new Uint8Array(8))) and your first write, the stream seems to work fine.

My guess as to why this happen is that if your first write and the req.write(new Uint8Array(8)) happens to end up on the same chunk, then the underlying implementation think that it has not received the end marker and then wait for the full response before returning anything.
It's possible that the underlying implementation expect a chunk to end with new Uint8Array(8) to signify the headers’ termination and thus the start of the stream.

For those interested here is the lambda i used

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms))
}
import {request} from "http"
import {setDefaultHighWaterMark, Readable} from "stream"

// Changing this doesn't seem to have any effect
setDefaultHighWaterMark(false,16384)

const handler = async (event, context) => {

    const req = request(
        {
          host: "127.0.0.1",
          port: 9001,
          path: `/2018-06-01/runtime/invocation/${context.awsRequestId}/response`,
          method: 'POST', 
          headers: {
              "Lambda-Runtime-Function-Response-Mode": "streaming",
              "transfer-encoding": "chunked",
              "Content-Type": "application/vnd.awslambda.http-integration-response"
          },
        },
        res => {
            var result = ''
            res.on('data', function (chunk) {
                result += chunk;
            });
        
            res.on('end', function () {
                console.log(result);
            });
        }
    ) 
    let _resolve
    const promise = new Promise((resolve) => {
        _resolve = resolve
    })
    const size = 64
    console.log(req.writableHighWatermark)
    
    // This is equivalent to running awslambda.HttpResponseStream.from
    req.write(
        JSON.stringify({
            statusCode: 200,
            cookies: [],
            headers: {
                "content-type": "text/html",
                "x-test": "1"
            }
        }))
    req.write(new Uint8Array(8))
    await sleep(10)
    

    async function * generate() {
        yield "<html>"
        await sleep(1)
        yield "<p>initial </p>"
        await sleep(1500)
        // yield "<html>"
        for(let i = 0; i <10; i++){
            yield `<p>${i}</p>`
            await sleep(200)
        }
        yield "</html>"
    }
    const dataStream = Readable.from(generate())
    
    
    dataStream.pipe(req)
    req.on('end',() => _resolve())
    await promise
    return {}
};

export { handler };

conico974 added a commit to opennextjs/opennextjs-aws that referenced this issue Mar 12, 2024
@stackia
Copy link

stackia commented Mar 13, 2024

From my guess, the gateway between awslambda.HttpResponseStream and the actual user request is the root cause of all these buffer issues.

Anything writes to awslambda.HttpResponseStream are first streamed to the gateway. The gateway decides when to send its internal buffer to the user. For example, the gateway might flush its buffer every time it receives new data after some idle time.

conico974 added a commit to opennextjs/opennextjs-aws that referenced this issue May 3, 2024
* created basic config file

* basic wrapper and converter implementation

* Minimal response writable

* build config

* change response to transform to allow to use pipeline

* fix streaming for v3

* compression support

* better docker handler

* add converter for apigw-v1 & cloudfront

* overridable queue

* overridable s3 cache

* overridable tag cache

* prebuild middleware

* refactor routing and middleware

* big refactoring

moved files around so that it makes more sense
deleted a bunch of useless files
added todo to remind myself of what i still need to do

* refactor: cleanup plugins

added a deletes options in open-next plugin

* make other lambdas overridable as well

* externalMiddleware

* improve plugins

* fix proxy request and make it work with streaming

* bugfix

* fix host

* refactor wrapper

* generate basic dockerfile

* Only build open-next config once

* generate basic output file for IAC to use

* basic splitting

* bundled next server

* fix external middleware cloudfront

* fix image adapter rebase

* couple of fix for node

* package version

* support for warmer with splitted fn

* basic support for edge runtime

There is some restriction:
Only 1 route per function
Support only app route and page
No streaming

* external middleware support rewrite between splitted servers

* fix alias

* update package.json

* use AsyncLocalStorage to scope lastModified to a single request

* merge upstream/main

* Add basic validation

* fix EISDIR issue with copying traced symlink

* added override name to the output for better IAC support

* rename BuildOptions
remove some unused options
properly handle minify

* normalize locale path before passing to middleware

* Copy necessary static files

* fix issues with fallback and i18n in page router

* Add a big warning for build on windows

* fix for cloudflare workers

* add wasm fils and assets

* fix 14.1 cache

* fix wasm import node

* update version

* merge upstream

* make open-next.config.ts optional

* Fix cannot write default config file b/c folder not created (#364)

* Fix cannot write default config file b/c folder not created

* Removed copyTracedFiles debug log

* fix for monorepo

* fix for output for dynamodb provider

* fix dynamoProvider, skipTrailingSlash, weird ISR deduplication issue

* little improvement to streaming in lambda

* fix another monorepo error

* e2e fixes for v3 rc

* update version

* Not use custom-resource converter for dynamodb seeding adapter (#365)

* Not use custom-resource converter for dynamodb seeding adapter

* fix e2e

---------

Co-authored-by: Dorseuil Nicolas <[email protected]>

* fix fallback false for route without i18n

* version package update

* Squashed commit of the following:

commit ff37de2
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Wed Mar 6 15:37:07 2024 +0100

    Version Packages (#378)

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 3235392
Author: Iakhub Seitasanov <[email protected]>
Date:   Wed Mar 6 17:29:46 2024 +0300

    fix: prevent duplication of location header (#369)

    * fix: prevent duplication of location header

    * changeset

    * fix linting

    ---------

    Co-authored-by: conico974 <[email protected]>

commit af2d3ce
Author: Chung Wei Leong <[email protected]>
Date:   Wed Mar 6 22:06:33 2024 +0800

    Fix image optimization support for Next 14.1.1 (#377)

    * Move image optimization to plugin

    * Refactor image optimization code

    * Added image optimization plugin for 14.1.1

    * Fix image optimization plugin

    * Add changeset

    * Revert default sharp version to 0.32.6

    * e2e test for image optimization

    * change one of the test to use an external image

    ---------

    Co-authored-by: Dorseuil Nicolas <[email protected]>

commit 3deb202
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Tue Feb 13 08:39:35 2024 -0800

    Version Packages (#363)

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit f9b90b6
Author: khuezy <[email protected]>
Date:   Tue Feb 13 08:35:10 2024 -0800

    changeset/2.3.6 (#362)

commit 40c2b36
Author: Patrick Ufer <[email protected]>
Date:   Tue Feb 13 09:23:40 2024 -0700

    security fix: upgrade sharp version to 0.32.6 (#361)

    * upgrade sharp version

commit 63fab05
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Fri Feb 2 00:14:11 2024 +0100

    Version Packages (#359)

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit c80f1be
Author: conico974 <[email protected]>
Date:   Fri Feb 2 00:00:56 2024 +0100

    Fix trailing slash redirect to external domain (#358)

    * fix trailing slash redirect to external domain

    * changeset

commit 186e28f
Author: Jaden VanEckhout <[email protected]>
Date:   Thu Feb 1 16:49:14 2024 -0600

    fix(open-next): correctly set cache control for html pages (#353)

    * fix(open-next): correctly set cache control for html pages

    * changeset

    ---------

    Co-authored-by: conico974 <[email protected]>

commit b9eefca
Author: Manuel Antunes <[email protected]>
Date:   Thu Feb 1 19:41:47 2024 -0300

    Fix Cache Support for [email protected] (#356)

    * feat: add cache support for [email protected]

    * fix: lint files

    * chore: apply the proposed changes

    * Fix typo

    * changeset

    ---------

    Co-authored-by: conico974 <[email protected]>

commit afd9605
Author: conico974 <[email protected]>
Date:   Sat Jan 27 15:19:11 2024 +0100

    update docs for V3 (#351)

commit 46241fe
Author: Abhishek Malik <[email protected]>
Date:   Sat Jan 27 19:45:18 2024 +0530

    Update bundle_size.mdx for excluding pdfjs-dist optional dependency docs (#346)

    * Update bundle_size.mdx for excluding pdfjs-dist optional dependency docs

    The current fix didn't work, but this updated fix did work for me. Hence proposing this as another solution.

    * Update docs/pages/common_issues/bundle_size.mdx

    Co-authored-by: khuezy <[email protected]>

    ---------

    Co-authored-by: khuezy <[email protected]>

commit 9a6473a
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Fri Jan 5 16:56:42 2024 +0100

    Version Packages (#345)

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit bbf9b30
Author: Lucas Vieira <[email protected]>
Date:   Fri Jan 5 12:45:13 2024 -0300

    fix(open-next): use dynamic import handler for monorepo entrypoint (#341)

    * fix(open-next): use dynamic import handler for monorepo entrypoint

    * changeset

    ---------

    Co-authored-by: Dorseuil Nicolas <[email protected]>

commit 83b0838
Author: santiperone <[email protected]>
Date:   Fri Jan 5 12:38:12 2024 -0300

    add suport for bun lockfile in monorepo (#337)

    * add suport for bun lockfile in monorepo

    * changeset

    ---------

    Co-authored-by: Dorseuil Nicolas <[email protected]>

commit e773e67
Author: Jan Stevens <[email protected]>
Date:   Fri Jan 5 16:31:27 2024 +0100

    fix: try to match errors, fall back to just adding raw key / value pare (#336)

    * fix: try to match errors, fall back to just adding raw key / value pair instead

    * changeset

    * fix lint

    ---------

    Co-authored-by: Dorseuil Nicolas <[email protected]>

commit fd90b26
Author: Dylan Irion <[email protected]>
Date:   Fri Jan 5 17:22:28 2024 +0200

    Changes encoding on cache.body from utf8 to base64 (#329)

    * changes encoding on cache.body from utf8 to base64

    * retain utf8 for json content-type

    * opting for less greedy base64

    * use isBinaryContentType

    * changeset

    ---------

    Co-authored-by: Dorseuil Nicolas <[email protected]>

commit eb08980
Author: sommeeeR <[email protected]>
Date:   Fri Jan 5 16:02:47 2024 +0100

    fix: make invalidateCFPaths function async in docs (#344)

commit 83207d8
Author: conico974 <[email protected]>
Date:   Thu Dec 14 16:59:15 2023 +0100

    updated docs for v3 (#334)

commit 0e827ce
Author: conico974 <[email protected]>
Date:   Fri Dec 8 17:57:51 2023 +0100

    ci: update node e2e

commit 36da819
Author: conico974 <[email protected]>
Date:   Thu Dec 7 17:44:06 2023 +0100

    Initial docs for V3 (#330)

    * docs for V3

    * fix link

    * clearer routes in config

* fix for next 12

* add support for basePath

* allow customization of sharp runtime

* updated edge converter to match behaviour of lambda

* update version

* fix monorepo

* improved streaming

aws/aws-lambda-nodejs-runtime-interface-client#94 (comment)

* update version

* fix open-next config build that depends on node

* fix crypto middleware node 20

* Sync

* fix resolve in image optimization
also fix image opt not using streaming

* add better error when edge runtime is used inside node

* update version

* fix null error on lambda
hopefully

* update version

* fix 500 on aws-lambda wrapper

* update version

* fix duplex for request in node

* fix & refactor middleware response headers

* update version

* Sync

* update version

* removed specific lamda streaming hack
It's been fixed upstream

* add geo in middleware

* added helpers function for config file
Better typing as well

* fix for 14.2

* update version

* fix redirect lambda streaming

* fix e2e tests

* test: improve reliability of test for revalidateTag

* update version

* review fix

* fix cookies in streaming
also fix an issue when both middleware and page try to set cookies
OpenNextNodeResponse also implements ServerResponse

* make all write to ddb chunked

* changeset

* fix e2e

---------

Co-authored-by: Frank <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants