fix(chunker): Fix chunk length calculation for unicode characters #726

SyntheticGoop · 2024-01-25T04:39:28Z

What kind of change does this PR introduce?

Bug fix

What is the current behavior?

The length of cookies with URI escaped characters can exceed the valid cookie length of 4096, causing parts of the split cookies to be rejected by the browser.

What is the new behavior?

Chunked cookies after URI encoding now fit within the chunk size specified.

Additional context

The reason for this issue occurring is that we are chunking base on lengths before escaping. While we have some buffer in the MAX_CHUNK_SIZE for longer unicode characters, this does not always work.

For example ﷽ has a length in javascript of 1 as well as match /(.{1})/g.exec("﷽") to be $1 = ﷽, but it will escape to %EF%B7%BD which is 9 characters long.

The current approach of chunking with a buffer for some unicode characters will never be "correct" unless we implement a buffer that accounts for every character being a maximum size unicode.

I've reimplemented the createChunks function to do the following:

Split the chunk after encoding it.
Ensure that chunks are split correctly between unicode characters instead of inside of it.
Not use regex as it is hard to determine what a "correct" unicode character boundary is in regex when it is escaped.

There are currently other related solutions and issues:

PRs

Issues

0xfacad3

ssr/src/utils/chunker.ts createChunks() in the diffs of this Pull Request should be updated as follows, due to TypeScript constraints:

export function createChunks(key: string, value: string, chunkSize?: number): Chunk[] {
	const resolvedChunkSize = chunkSize ?? MAX_CHUNK_SIZE;

	let encodedValue = encodeURIComponent(value);

	if (encodedValue.length <= resolvedChunkSize) {
		return [{ name: key, value }];
	}

	const chunks: Chunk[] = [];

	while (encodedValue.length > 0) {
		let encodedChunkHead = encodedValue.slice(0, resolvedChunkSize);

		const lastEscapePos = encodedChunkHead.lastIndexOf('%');

		// Check if the last escaped character is truncated.
		if (lastEscapePos > resolvedChunkSize - 3) {
			// If so, reslice the string to exclude the whole escape sequence.
			// We only reduce the size of the string as the chunk must
			// be smaller than the chunk size.
			encodedChunkHead = encodedChunkHead.slice(0, lastEscapePos);
		}

		let valueHead;

		// Check if the chunk was split along a valid unicode boundary.
		while (encodedChunkHead.length > 0) {
			try {
				// Try to decode the chunk back and see if it is valid.
				// Stop when the chunk is valid.
				valueHead = decodeURIComponent(encodedChunkHead);
				break;
			} catch (error) {
				if (
					error instanceof URIError &&
					encodedChunkHead.at(-3) === '%' &&
					encodedChunkHead.length > 3
				) {
					encodedChunkHead = encodedChunkHead.slice(0, encodedChunkHead.length - 3);
				} else {
					throw error;
				}
			}
		}

		chunks.push({
			name: `${key}.${chunks.length}`,
			value: valueHead as string,
		  });
		encodedValue = encodedValue.slice(encodedChunkHead.length);
	}

	return chunks.map((value, i) => ({
		name: `${key}.${i}`,
		value: value.value as string,
	  }));
}

0xfacad3 · 2024-01-26T20:57:21Z

@SyntheticGoop
Hey! Your pull request was exactly what I wanted. However, when I forked and used it, a problem seemed to occur when deploying Vercel. The functions to make it work are presented above. The problem was a simple Typescript constraint.

SyntheticGoop · 2024-01-27T04:15:38Z

@SyntheticGoop Hey! Your pull request was exactly what I wanted. However, when I forked and used it, a problem seemed to occur when deploying Vercel. The functions to make it work are presented above. The problem was a simple Typescript constraint.

I've updated the types so that the package build on my system with pnpm build.

kangmingtay · 2024-01-30T09:48:24Z

packages/ssr/src/utils/chunker.ts

+		const lastEscapePos = encodedChunkHead.lastIndexOf('%');
+
+		// Check if the last escaped character is truncated.
+		if (lastEscapePos > resolvedChunkSize - 3) {


just to clarify, we subtract 3 here because an escaped character would take up 3 characters in the string right (a % followed by 2 hexadecimal characters)?

Yes that's exactly it.

* chore: add missing changeset for #726 * chore: add missing changeset for #722

fix(chunker): Fix chunk length calculation for unicode characters

76ea9d2

SyntheticGoop force-pushed the main branch from 76c1d5b to 76ea9d2 Compare January 25, 2024 16:50

kangmingtay requested review from dijonmusters, hf, kangmingtay and a team and removed request for hf and kangmingtay January 26, 2024 09:34

0xfacad3 approved these changes Jan 26, 2024

View reviewed changes

sscr197 approved these changes Jan 26, 2024

View reviewed changes

0xfacad3 suggested changes Jan 26, 2024

View reviewed changes

SyntheticGoop added 2 commits January 27, 2024 12:06

fix: Update types

f5f66fc

fix: Update types

5251364

kangmingtay approved these changes Jan 30, 2024

View reviewed changes

kangmingtay merged commit 156b96e into supabase:main Jan 30, 2024

This was referenced Jan 31, 2024

Fixed chunker chunk length calculation #680

Closed

feat(ssr): base64-encode cookie value #701

Closed

kangmingtay added a commit that referenced this pull request Jan 31, 2024

chore: add missing changeset for #726

31ff7c4

kangmingtay mentioned this pull request Jan 31, 2024

chore: add missing changesets #736

Merged

kangmingtay added a commit that referenced this pull request Jan 31, 2024

chore: add missing changesets (#736)

18327fc

* chore: add missing changeset for #726 * chore: add missing changeset for #722

This was referenced Jan 31, 2024

Failed to set cookie when OAuth username includes non-ASCII characters #729

Closed

SSR: Chunked cookie is setting wrong if new cookie has less length #715

Closed

Auth token cookie chunk exceeds the size limit when using SSR setup #707

Closed

kangmingtay mentioned this pull request Feb 16, 2024

Large Auth Cookie Split Into 2 Causing Realtime Failures, etc supabase/supabase-js#963

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chunker): Fix chunk length calculation for unicode characters #726

fix(chunker): Fix chunk length calculation for unicode characters #726

SyntheticGoop commented Jan 25, 2024

0xfacad3 left a comment •

edited

Loading

0xfacad3 commented Jan 26, 2024

SyntheticGoop commented Jan 27, 2024

kangmingtay Jan 30, 2024

SyntheticGoop Jan 30, 2024

fix(chunker): Fix chunk length calculation for unicode characters #726

fix(chunker): Fix chunk length calculation for unicode characters #726

Conversation

SyntheticGoop commented Jan 25, 2024

What kind of change does this PR introduce?

What is the current behavior?

What is the new behavior?

Additional context

0xfacad3 left a comment • edited Loading

Choose a reason for hiding this comment

0xfacad3 commented Jan 26, 2024

SyntheticGoop commented Jan 27, 2024

kangmingtay Jan 30, 2024

Choose a reason for hiding this comment

SyntheticGoop Jan 30, 2024

Choose a reason for hiding this comment

0xfacad3 left a comment •

edited

Loading