Bug: `renderToPipeableStream()` emit mysterious mojibake whitespace chars in the result #24985

tetsuharuohzeki · 2022-07-25T13:02:52Z

renderToPipeableStream() emit mojibake whitespace chars in its result.
This would sometimes breaks the final generated html.

React version: 18.2.0

Steps To Reproduce

Run this test case with jest.

import { PassThrough } from 'node:stream';
import { renderToPipeableStream } from 'react-dom/server';

const element = (
    <html lang="ja">
        <head>
            <meta charSet="utf-8" />
            <title>吾輩は猫である</title>
        </head>
        <body>
            <p>
                どこで生れたかとんと見当がつかぬ。何でも薄暗いじめじめした所でニャーニャー泣いていた事だけは記憶している。吾輩はここで始めて人間というものを見た。しかもあとで聞くとそれは書生という人間中で一番獰悪な種族であったそうだ。この書生というのは時々我々を捕えて煮て食うという話である。しかしその当時は何という考もなかったから別段恐しいとも思わなかった。ただ彼の掌に載せられてスーと持ち上げられた時何だかフワフワした感じがあったばかりである。掌の上で少し落ちついて書生の顔を見たのがいわゆる人間というものの見始であろう。この時妙なものだと思った感じが今でも残っている。第一毛をもって装飾されべきはずの顔がつるつるしてまるで薬缶だ。その後猫にもだいぶ逢ったがこんな片輪には一度も出会わした事がない。のみならず顔の真中があまりに突起している。そうしてその穴の中から時々ぷうぷうと煙を吹く。どうも咽せぽくて実に弱った。これが人間の飲む煙草というものである事はようやくこの頃知った。
            </p>
            <p>
                この書生の掌の裏でしばらくはよい心持に坐っておったが、しばらくすると非常な速力で運転し始めた。書生が動くのか自分だけが動くのか分らないが無暗に眼が廻る。胸が悪くなる。到底助からないと思っていると、どさりと音がして眼から火が出た。それまでは記憶しているがあとは何の事やらいくら考え出そうとしても分らない。
            </p>
            <p>
                ふと気が付いて見ると書生はいない。たくさんおった兄弟が一疋も見えぬ。肝心の母親さえ姿を隠してしまった。その上今までの所とは違って無暗に明るい。眼を明いていられぬくらいだ。はてな何でも容子がおかしいと、のそのそ這い出して見ると非常に痛い。吾輩は藁の上から急に笹原の中へ棄てられたのである。
            </p>
        </body>
    </html>
);

test('renderToPipeableStream', async () => {
    const onStreamReady = new Promise((resolve, reject) => {
        const stream = renderToPipeableStream(element, {
            onAllReady() {
                resolve(stream);
            },
            onError(err) {
                reject(err);
            },
        });
    });

    const renderingStream = await onStreamReady;

    const sink = new PassThrough();
    const connected = renderingStream.pipe(sink);

    const result = [];
    for await (const chunk of connected) {
        if (typeof chunk === 'string') {
            result.push(chunk);
        } else {
            const txt = String(chunk);
            result.push(txt);
        }
    }

    const actual = result.join('');
    expect(actual).toMatchSnapshot();
});

The current behavior

The above testcase generate this result:

"<!DOCTYPE html><html lang="ja"><head><meta charSet="utf-8"/><title>吾輩は猫である</title></head><body><p>どこで生れたかとんと見当がつかぬ。何でも薄暗いじめじめした所でニャーニャー泣いていた事だけは記憶している。吾輩はここで始めて人間というものを見た。しかもあとで聞くとそれは書生という人間中で一番獰悪な種族であったそうだ。この書生というのは時々我々を捕えて煮て食うという話である。しかしその当時は何という考もなかったから別段恐しいとも思わなかった。ただ彼の掌に載せられてスーと持ち上げられた時何だかフワフワした感じがあったばかりである。掌の上で少し落ちついて書生の顔を見たのがいわゆる人間というものの見始であろう。この時妙なものだと思った感じが今でも残っている。第一毛をもって装飾されべきはずの顔がつるつるしてまるで薬缶だ。その後猫にもだいぶ逢ったがこんな片輪には一度も出会わした事がない。のみならず顔の真中があまりに突起している。そうしてその穴の中から時々ぷうぷうと煙を吹く。どうも咽せぽくて実に弱った。これが人間の飲む煙草というものである事はようやくこの頃知った。</p><p>この書生の掌の裏でしばらくはよい心持に坐っておったが、しばらくすると非常な速力で運転し始めた。書生が動くのか自分だけが動くのか分らないが無暗に眼が廻る。胸が悪くなる。到底助からないと思っていると、どさりと音がして眼から火が出た。それまでは記憶しているがあとは何の事やらいくら考え出そうとしても分らない。</p><p>ふと気が付いて見ると書生はいない。たくさんおった兄弟が一疋も見えぬ。肝心の母親さえ姿を隠してしまった。その上今まで��の所とは違って無暗に明るい。眼を明いていられぬくらいだ。はてな何でも容子がおかしいと、のそのそ這い出して見ると非常に痛い。吾輩は藁の上から急に笹原の中へ棄てられたのである。</p></body></html>"

The expected behavior

The current behavior's result contains その上今まで��の所とは違って無暗に明るい.
But we expect this sentence should be その上今までの所とは違って無暗に明るい.

So renderToPipeableStream() should not emit mysterious whitespace chars.

The text was updated successfully, but these errors were encountered:

tetsuharuohzeki · 2022-07-25T13:03:07Z

I confirmed:

This bug is not reproducible with renderToStaticNodeStream().
This bug is not reproducible in v18.0.0.
- But this bug is reproducble from v18.1.0.

So I guess some regressions happens in this range.
v18.0.0...v18.1.0

tetsuharuohzeki · 2022-07-25T13:04:31Z

This bug may be related to #24592

sophiebits · 2023-02-24T06:39:23Z

Thanks for the clear repro case!

We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful. Fixes facebook#24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries).

sophiebits · 2023-02-24T06:45:55Z

(I am guessing #24592 is not related, since this issue seems to be with non-ASCII characters and I don't believe that one is.)

We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful. Fixes facebook#24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries).

We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful. Fixes #24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries).

We encode strings 2048 UTF-8 bytes at a time. If the string we are encoding crosses to the next chunk but the current chunk doesn't fit an integral number of characters, we need to make sure not to send the whole buffer, only the bytes that are actually meaningful. Fixes #24985. I was able to verify that this fixes the repro shared in the issue (be careful when testing because the null bytes do not show when printed to my terminal, at least). However, I don't see a clear way to add a test for this that will be resilient to small changes in how we encode the markup (since it depends on where specific multibyte characters fall against the 2048-byte boundaries). DiffTrain build for [96cdeaf](96cdeaf)

tetsuharuohzeki · 2023-02-25T14:00:44Z

@sophiebits Thank you!

hanayashiki · 2024-03-11T04:04:11Z

For those who have this problem in remix, I can provide a fix in entry.server.tsx:

For quick code:

function handleBrowserRequest(
  request: Request,
  responseStatusCode: number,
  responseHeaders: Headers,
  remixContext: EntryContext,
) {
  return new Promise((resolve, reject) => {
    let shellRendered = false;
    const { pipe, abort } = renderToPipeableStream(
      <RemixServer
        context={remixContext}
        url={request.url}
        abortDelay={ABORT_DELAY}
      />,
      {
        onShellReady() {
          shellRendered = true;
          
          /**
           * https://github.com/facebook/react/pull/26228
           * Null bytes are possible at the end of the chunk, if we are using non-ascii characters.
           * The fix is not released to 18.2.0 so all we can do is transforming the chunks.
           */
          const body = new Transform({
            transform(chunk: Buffer, encoding, callback) {
              let endingZeroes =
                Number(chunk.at(-1) === 0) + Number(chunk.at(-2) === 0);

              callback(
                null,
                endingZeroes > 0
                  ? chunk.subarray(0, chunk.length - endingZeroes)
                  : chunk,
              );
            },
          });

          responseHeaders.set("Content-Type", "text/html");

          resolve(
            new Response(createReadableStreamFromReadable(body), {
              headers: responseHeaders,
              status: responseStatusCode,
            }),
          );

          pipe(body);
        },
        onShellError(error: unknown) {
          reject(error);
        },
        onError(error: unknown) {
          responseStatusCode = 500;
          // Log streaming rendering errors from inside the shell.  Don't log
          // errors encountered during initial shell rendering since they'll
          // reject and get logged in handleDocumentRequest.
          if (shellRendered) {
            console.error(error);
          }
        },
      },
    );

    setTimeout(abort, ABORT_DELAY);
  });
}

Details:

To fix the bug while not hurt the performance, we only need to check if the last two bytes are zeroes.

Because utf-8 is 3-byte at most and the bug is only triggered if they are filling a fixed length array which doesn't fit a 3-byte character at the end. For example, given they are encoding into a 8-byte array, AAの will be [0x41 0x41 0x41 0xe3 0x81 0xae 0x00 0x00], leaving the last two bytes empty. They will pass the entire 8-byte array downstream to the response instead of only the valid bytes. So we need to trim at most 2 bytes for this specific bug.

tats-u · 2024-10-20T10:31:23Z

We have to wait for v19 or patch react-dom using patch-package, pnpm patch, or yarn patch by yourself.

pnpm example: https://github.com/pnpm/pnpm.io/pull/588/files

tetsuharuohzeki added the Status: Unconfirmed A potential issue that we haven't yet confirmed as a bug label Jul 25, 2022

sophiebits mentioned this issue Feb 24, 2023

[Fizz Node] Fix null bytes written at text chunk boundaries #26228

Merged

sophiebits closed this as completed in #26228 Feb 24, 2023

tats-u mentioned this issue Oct 15, 2024

Bug: React 18 emit null chars in renderToPipeableStream #31134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: `renderToPipeableStream()` emit mysterious mojibake whitespace chars in the result #24985

Bug: `renderToPipeableStream()` emit mysterious mojibake whitespace chars in the result #24985

tetsuharuohzeki commented Jul 25, 2022 •

edited

Loading

tetsuharuohzeki commented Jul 25, 2022

tetsuharuohzeki commented Jul 25, 2022

sophiebits commented Feb 24, 2023

sophiebits commented Feb 24, 2023

tetsuharuohzeki commented Feb 25, 2023

hanayashiki commented Mar 11, 2024

tats-u commented Oct 20, 2024 •

edited

Loading

Bug: renderToPipeableStream() emit mysterious mojibake whitespace chars in the result #24985

Bug: renderToPipeableStream() emit mysterious mojibake whitespace chars in the result #24985

Comments

tetsuharuohzeki commented Jul 25, 2022 • edited Loading

Steps To Reproduce

The current behavior

The expected behavior

tetsuharuohzeki commented Jul 25, 2022

tetsuharuohzeki commented Jul 25, 2022

sophiebits commented Feb 24, 2023

sophiebits commented Feb 24, 2023

tetsuharuohzeki commented Feb 25, 2023

hanayashiki commented Mar 11, 2024

tats-u commented Oct 20, 2024 • edited Loading

Bug: `renderToPipeableStream()` emit mysterious mojibake whitespace chars in the result #24985

Bug: `renderToPipeableStream()` emit mysterious mojibake whitespace chars in the result #24985

tetsuharuohzeki commented Jul 25, 2022 •

edited

Loading

tats-u commented Oct 20, 2024 •

edited

Loading