Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect validation regexes in serialize #165

Closed
bewinsnw opened this issue Jun 17, 2024 · 4 comments · Fixed by #167
Closed

incorrect validation regexes in serialize #165

bewinsnw opened this issue Jun 17, 2024 · 4 comments · Fixed by #167

Comments

@bewinsnw
Copy link
Contributor

bewinsnw commented Jun 17, 2024

The code here:
https://github.com/jshttp/cookie/blob/master/index.js#L119-L125

... tests cookie names and values against the field-content regex from RFC7230. Unfortunately, that's the wrong pattern; that's the regex for the whole header value, not the individual parts. The cookie name and value per https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.1 use token from rfc7230 for the name, and a specific
pattern in rfc6265 for the value. While the default encode function will keep values within the required pattern, it can be overridden, so the validation can fail on user-supplied encoders; the name validation is always wrong.

It's pretty easy to demonstrate:

cookie = require('cookie');

function identity(s) {
  return s;
}

tests = {
  goodname: 'goodvalue',
  'bad name': 'bad value',
  'bad=name': 'good=value',
  'bad;name': 'bad;value',
  'bad☹name': 'bad;value',
  name1: 'bad☹value',
  name2: 'bad"value',
  name3: '"goodvalue"',
};
// https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.1 refers
// to https://datatracker.ietf.org/doc/html/rfc2616#section-2.2
// which is obsoleted and replaced by the token definition in
// https://www.rfc-editor.org/rfc/rfc7230#appendix-B
tokenRegex = /^[!#$%&'*+.^_`|~0-9A-Za-z-]+$/;
// https://datatracker.ietf.org/doc/html/rfc6265#section-4.1.1
cookieValueRegex =
  /^("?)[\u0021\u0023-\u002B\u002D-\u003A\u003C-\u005B\u005D-\u007E]*\1$/;

for (const [name, value] of Object.entries(tests)) {
  var validName = tokenRegex.test(name);
  var validValue = cookieValueRegex.test(value);
  var serialized;
  try {
    serialized = cookie.serialize(name, value, { encode: identity });
  } catch (error) {
    serialized = error;
  }
  console.log(
    `name: ${name} (${validName}) value: ${value} (${validValue}) serialized: ${serialized}`
  );
}

(demonstrating both good/bad values where the true regexes and cookie agree, and a handful of the many values where cookie would incorrectly say it's valid)

Discovered looking for whatever was leaking whitespace into our cookies; I'm not certain cookie did it, but seeing that the validation allowed whitespace jumped out at me.

@wesleytodd
Copy link
Member

Hey, thanks for the report! I have not had a chance to dig deeply into this report, but with my limited time I wonder if you were willing to submit a PR for this? It feels like it would be easier to make sure we are all on the same page of what the report means for changes in the behavior of this relatively stable lib. If not I can loop back around sometime next week and read the relevant spec sections to make sure I follow the issue.

@bewinsnw
Copy link
Contributor Author

Sure. Coincidentally we later had a production incident that very definitely did involve this bug so I'm happy to contribute

@wesleytodd
Copy link
Member

Cool, well I will look for that PR and we can discuss there. When you open that please close this so we don't have the same conversation going on in two places.

bewinsnw added a commit to bewinsnw/cookie that referenced this issue Jul 17, 2024
fixes jshttp#165. Previously, validation on serialization used the field-content
regexp from RFC7230 to perform all checks. However, that is the regexp
for the cookie header as a whole, but each individual part of a cookie
has a tighter restriction.

In the bug report jshttp#165 I demonstrated that whitespace in names and values
was invalid but allowed by the field-content pattern. In this code I
started by adding tests for those two cases, then added the expressions
derived from the RFC; the relevant portions of the RFC have been inlined
as comments.

Removing the field-content regexp unearthed that as well as those two
it was being used to validate domain names and paths. Paths are fairly
unrestricted (being any ascii character except control characters and
semicolon) but the code was wrong here too - it allowed all 8-bit characters
not just 7-bit characters.

Domain names have a well recognised pattern, there are only a couple of
gotchas: RFC6265 explicitly says RFC1123 applies, so the leading character
of a label is allowed to be a digit; and while I wondered for a moment
if this should allow things like [::1] the domain matching section of
https://datatracker.ietf.org/doc/html/rfc6265#section-5.1.3
teaches that domain values are _not_ ip addresses.
@bewinsnw
Copy link
Contributor Author

Closing per discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants