Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Expose http.validateHeaderName/Value #33119

Closed
wants to merge 1 commit into from
Closed

http: Expose http.validateHeaderName/Value #33119

wants to merge 1 commit into from

Conversation

osher
Copy link
Contributor

@osher osher commented Apr 28, 2020

The use-case is for any framework that provides user mw a response replacement, that collects the desired response state, and sends it only on conclusion.

As such a framework, I'd want to validate the header names and values as soon as the user-code provides them.
This - to

  1. eliminate errors on response-send time,
  2. provide developer stack trace that contains the line that submits the offending values

Doing it this will allow to:

const { validateHeaderName, validateHeaderValue } = require('http');

and use them.

Checklist

@nodejs-github-bot nodejs-github-bot added the http Issues or PRs related to the http subsystem. label Apr 28, 2020
@osher osher marked this pull request as draft April 28, 2020 09:36
@osher
Copy link
Contributor Author

osher commented Apr 28, 2020

BTW - My current work-around is a hack, roughly in the spirit below.
I'd be more than happy to just use the validators as proposed...

(req, res, next) => {
   const headers = {}
   //TRICKY: have to do fool around the sybmol here because of a
   //      "[kOutHeaders]  === null" check in `setHeader`
   //        - even though cautious code initiates it when it's not found!!
   //      (classic case of a strict check that defeats the purpose)
   //      anyway - this is fragile in case the symbol form changes ... :stuck_out_tongue: 
   const headersProp = Object.getOwnPropertySymbols(res).find(
      s => s.toString().includes('kOutHeaders')
   );
   const { setHeader } = res
   //.... more stuff

   const reply = {
      //... more stuff
      setHeader: (k, v) => {
          try {
               setHeader.call({ [headersProp] : null }, k, v)
          } catch (e) {
               throw augmentWithFrameworkContextInfo(e)
          }
      },
      //... more stuff
   }

   controller[method]( req, reply).then(finishIt).catch(next)
}

@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2020

-1 It's not clear to me why this is required. Besides, if the internals change where other public API methods end up using additional validation functions, those would need to be exported too and trying to keep up with that would be a nightmare.

If the end goal here is to mutate the error object thrown by setHeader(), there are less brittle solutions without needing to expose and keep on top of these internal functions:

  1. Use a custom response class via ServerResponse: MyServerResponseConstructor in a config object passed to http(s).createServer(). This avoids having to monkey-patch methods every time for every request/response.

  2. If the purpose of the custom reply is to restrict access to res methods/properties, you could do:

   const reply = {
      //... more stuff
      setHeader: (...args) => {
          try {
               setHeader.call(this, ...args)
          } catch (e) {
               throw augmentWithFrameworkContextInfo(e)
          }
      },
      //... more stuff
   }
   OutgoingMessage.call(reply);
  1. If restricting access to res methods/properties is not a goal, you can simply just monkey-patch setHeader directly (although option 1 would be better for this instead):
  res.setHeader = (...args) => {
    try {
      setHeader.call(this, ...args)
    } catch (e) {
      throw augmentWithFrameworkContextInfo(e)
    }
  }

@osher osher marked this pull request as ready for review April 28, 2020 11:51
@osher
Copy link
Contributor Author

osher commented Apr 28, 2020

@mscdex - thanks for your quick response.

My personal goal - as in, the specific use-case I'm currently at now - is indeed to mediate and restrict access to res, but I was not thinking only about myself but trying to think generically.
I want to be able to reuse the same validators. The fact that I'm building a mediator is a private case.
It could be a proofer, a code analysis tool, a server-code generator, a network analyzer, a VS/Atom editor-plugin, anything that might want to assist with http-headers.

It's a purely functional logic with no implications of state... 😉

Keeping this logic hidden basically results in code duplications in user codes...
Examples of code duplications from the open-source word - without searching a lot I saw this:
https://www.npmjs.com/package/http-headers-validation
That's only because of the obviousness of the relevance implied by the name.
I'm afraid we can be sure it's possible to find more.

Anyway - the example I brought here is very simplified, I have more going on (potentially proprietary).

I admit that what you proposed in your points will work - but IMHO - it is by far more cumbersome than just using the validator functions.
in cumbersome - I mean:

  • it needs more lines of code
  • reading them explains less the goal, and will require a lot of stories in comments and lore passed in code-reviews.

For the work-around - I consider replacing my workaround with option 2 just until such change can be accepted - however - that's basically replacing one specific monkey patch with another, both of which are workaround hacks tailored for this use-case and do not solve the fact that this logic should be reusable and it is not.
I'm considering option 2 because indeed it's less fragile - so as a start - sincere thanks here 😄

About the argument of needing future validations - do you feel that any future validation will not be applicable as part of validateHeaderName or validateHeaderValue?
Would there be anything else to validate except header names and values? If there were - wouldn't that be a breaking change?

I considered for a moment to add validateHeader(name, value), but strived to keep my changes minimal and not introduce new functions, because this my first PR here, and I'm yet to get tests running locally. Admittedly lame, but I'll do it in baby steps and let myself be led from here 😄

@osher
Copy link
Contributor Author

osher commented Apr 28, 2020

@mscdex - thanks for the lead!! - FYI - I got a simpler hack:

const dummy = new OutgoingMessage()
const reply {
   //...
   setHeader: (k, v) => {
       try {
          dummy.setHeader(k, v)
       //....
   },
   //.. 
}

(never thought this constructor will work without all the wireworks...)

Still monkeying, just shorter, and does not mount lotta dirt on an Object passed to user-code.
The fact that I have to run the OutgoingMessage constructor at all is ...:thinking: ...not pretty...

So I still believe this PR has value and no risk 😉

@mscdex
Copy link
Contributor

mscdex commented Apr 28, 2020

I still think we shouldn't expose them. If internals change and/or these functions change in one way or another (e.g. split into smaller functions or even removed entirely), we don't want to have to be maintaining these utility functions because they've now been exposed to userland.

I think the solutions outlined in this PR are fine for userland. If others happen to share this similar (and uncommon IMO) need, they can use one of those -- or develop their own validators.

@osher
Copy link
Contributor Author

osher commented Apr 28, 2020

would you please consider for a moment that whatever any additional validations may be needed in the future, they can be implemented as part of validateHeaderName and/or validateHeaderValue?

Let us also consider that if there will be such a need for future validations, it would also mean that the code duplications of this logic will need it too... 😉

As for the 3 ways - I have played with them, and they do not fit the use-case of mediation and restricting access to res.
Way 1 - I end with working on the response object - which is exactly what I want to prevent.
Way 2 - I end up with loads of attributes created by the constructor mounted on the object I pass the user
Way 3 - works on the actual response object, and assumes that all I want is to augment the error.
Way 4 - (the one I added) - requires me to create and keep a hidden dummy instance of an OutgoingMessage, which I have doubts about its footprint. Feels like abuse. I'd do it happily on tests, not on production code.

Hmmm.
You know what? Let's ask this. I have another direction I can propose.

If you would like not to export these validators, - here's an alternative that does not expose anything that is not already exposed.

How about the following change instead. Would that be considerable?

 OutgoingMessage.prototype.setHeader = function setHeader(name, value) {
   if (this._header) {
     throw new ERR_HTTP_HEADERS_SENT('set');
   }
   validateHeaderName(name);
   validateHeaderValue(name, value);

   let headers = this[kOutHeaders];
-  if (headers === null)
+  if (!headers)
     this[kOutHeaders] = headers = ObjectCreate(null);

   headers[name.toLowerCase()] = [name, value];
 };

it could also be

-  if (headers == null)
+  if (!headers)

or

-  if (headers === null)
+  if (!Boolean(headers))

whatever fits the lint rules...

Unfortunately, this line is what prevents me from a clear and efficient code, in the spirit of:

  const { setHeader } = OutgoingMessage.prototype.setHeader;
  ...
  try {
     setHeader.call({}, k, v)

because, you see, it ensists on having this[kOutgoingHeaders] === null in order to initiate that object...
If it did not ensist on null - it would have initiated it and move on happily

What do you think?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems okay to me, but it should come with tests and documentation

@addaleax addaleax added the semver-minor PRs that contain new features and should be released in the next minor version. label Apr 29, 2020
@osher
Copy link
Contributor Author

osher commented Apr 30, 2020

Let`s start with tests.
well. here's a cultural shock... 🤔
I took a look at the tests directory... not what I expected to see.

As a basis - I can obviously show the validators are exposed and that they have the expected signature.
I can also show that they are usable as purely functional "static methods".
Last - I can show that they pass correct values and fail wrong values - but I'm not sure if I should - this should already be tested somewhere already, no?

Anyway - I'll appreciate a point in the right direction

So what I'm expected here is something in the spirit of /test/parallel/test-http-outgoing-header-validators.js?
does it matter if it`s one file per tested API or it can be one for both?

I'm also confused that there are no test titles... hardly comments...
Does that mean I should isolate a test-case per file? an API per file?

Thanks in advance for any guidance.
I've shied from getting involved for a long time, but here we are...

Edited

ok. I found the writing-tests.md guide and the BUILDING.md, I'm running make for the first time (in my life, ever)...

I mean to do it in one file for both validators if that's alright with ppl here
does /test/parallel/test-http-outgoing-header-validators.js sound correct?
or should it be /test/parallel/test-http-header-validators.js ?
And if so - should they be exposed as require('http').validateHeaderName|validateHeaderName as well?

@osher
Copy link
Contributor Author

osher commented May 5, 2020

@addaleax
I added tests, I'm looking for the right place to add docs - I could not find any page dedicated to OutgoingMessage.

I do suspect that it's what referred in docs as ServerResponse, am I right? Is that were I should document it?

When I'm looking at how it looks from user perspective, I start to think that it will provide a better interface if they'll be exposed directly on http, so that instead:

const { OutgoingMessage: { validateHeaderName, validateHeaderValue } } = require('http')

it will be

const { validateHeaderName, validateHeaderValue } = require('http')

I was also surprized to find that some obvious validations do not happen.

E.g - this does not fail validation:

const str = "1234567890"
const obviouslyTooLong = Array.from(Array(5e6)).map(() => str) // Gives length 5e7
validateHeaderValue('x-too-long', obviouslyTooLong)

I actually got to fail on the max length of strings before I saw any error thrown for invalid length of header name or value.

I don't mind if it's not part of the infra, this length validation may indeed belong to user-land, I just have to double-check with you...

To sum up:

  • LMK how to proceed with docs - I'll be happy to go on
  • LMK about what's the policy re. validations
  • LMK about the final interface exposed to the user - http.validate.... or http.OutgoingMessage.validate... (or both, one a friendlier shorthand to the other)

Thanks!!

EDITED

I'm getting the feeling that OutgoingMessage is an undocumented implementation detail, and therefore the validators should not be exposed on it, but either on http or on http.ServerResponse.

Can you please confirm this or correct me so I know how to proceed?
Thanks again!!

@addaleax
Copy link
Member

addaleax commented May 5, 2020

I'm getting the feeling that OutgoingMessage is an undocumented implementation detail, and therefore the validators should not be exposed on it, but either on http or on http.ServerResponse.

Can you please confirm this or correct me so I know how to proceed?

Yes, that’s correct – OutgoingMessage serves as the base class for client requests and server responses, but it’s not intended to be used directly by users. Providing these methods on the http module itself seems to make sense to me 👍

@osher
Copy link
Contributor Author

osher commented May 5, 2020 via email

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the linter failures fixed

@nodejs/http

doc/api/http.md Show resolved Hide resolved
doc/api/http.md Show resolved Hide resolved
doc/api/http.md Outdated Show resolved Hide resolved
doc/api/http.md Outdated Show resolved Hide resolved
doc/api/http.md Outdated Show resolved Hide resolved
doc/api/http.md Outdated
try {
validateHeaderName('')
} catch(err) {
err instanceof TypeError // --> true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure we should document this, partly because it seems wrong to me to use TypeError here.

Suggested change
err instanceof TypeError // --> true

Copy link
Contributor Author

@osher osher May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have a look at the docs for setHeader - it explicitly says TypeError, so I took it from there.
Do you still believe we should remove it?

P.S - good that you note it - because many infrastructures I've witnessed (the lionshare are proprietary, but not only) use error types to distinguish between and sort errors, often with a secondary look at the code.

I personally detest class hierarchies and consider them to be implementation details, but, you know... let them have what they say they can't live without...

doc/api/http.md Outdated
try {
validateHeaderValue('x-my-header', undefined);
} catch(err) {
err instanceof TypeError; // --> true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here,

Suggested change
err instanceof TypeError; // --> true

The use-case is for any framework that provides user mw a response
replacement, that collects the desired response state, and applies them
only on conclusion. As such a framework, I'd want to validate the header
names and values as soon as the user-code provides them. This - to
eliminate errors on response-send time, and provide developer stack trace
that contains the line that submits the offending values.
@osher osher changed the title http: Expose OutgoingMessage.validateHeaderName/Value http: Expose http.validateHeaderName/Value May 5, 2020
@osher osher requested a review from addaleax May 6, 2020 13:53
@osher
Copy link
Contributor Author

osher commented May 6, 2020

@addaleax - I adhered everything except one thing I would like to double-check -

about removing from doc examples:

   err instanceof TypeError; // --> true

Please have a look at the docs for setHeader - it explicitly says TypeError, so I took it from there.
Do you still believe we should remove it?

@addaleax
Copy link
Member

addaleax commented May 7, 2020

@osher It’s not essential but yes, I’d prefer to remove those lines – partially for the exact reason that you mentioned, which is that if you want to check for this type of error, looking at the error code is the right thing to do 👍

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label May 7, 2020
@nodejs-github-bot
Copy link
Collaborator

addaleax pushed a commit that referenced this pull request May 9, 2020
The use-case is for any framework that provides user mw a response
replacement, that collects the desired response state, and applies them
only on conclusion. As such a framework, I'd want to validate the
header names and values as soon as the user-code provides them.
This - to eliminate errors on response-send time, and provide developer
stack trace that contains the line that submits the offending values.

PR-URL: #33119
Reviewed-By: Anna Henningsen <[email protected]>
@addaleax
Copy link
Member

addaleax commented May 9, 2020

Landed in 96faea1, thanks for the PR!

@addaleax addaleax closed this May 9, 2020
@mscdex
Copy link
Contributor

mscdex commented May 9, 2020

It seems these functions are still tacked onto OutgoingMessage instead of being added to the module.exports for lib/_http_outgoing.js?

@osher osher deleted the patch-1 branch May 10, 2020 07:39
@osher
Copy link
Contributor Author

osher commented May 10, 2020

@mscdex -
they are exposed on http, see here: https://github.com/nodejs/node/blob/master/lib/http.js#L67
the fact that they stay on outgoing-message where they are primarily used is an implementation detail 😉

@addaleax - I agree that the err.code has a finer resolution, and as a consultant, when I'm asked about error handling - that's what I always recommend. However, people come to node from many different backgrounds and bring their habits with them, and they kinda expect it to work like they're used to, even if they're missing the spirit of the platform. So I thought that as long as it's not a mistake - it would be a part of making the platform welcoming for foreigners.
I see you took the PR as is - Thank you for your responsiveness and tolerance - I will be happy to work with you again 👍

@mscdex
Copy link
Contributor

mscdex commented May 10, 2020

the fact that they stay on outgoing-message where they are primarily used is an implementation detail

I don't understand how it's an implementation detail. There's no need for it to be set on OutgoingMessage solely for the purposes of exporting from require('http'). The functions could have been added to the module.exports in lib/_http_outgoing.js just like OutgoingMessage is, and then those would get pulled out of there from lib/http.js.

@osher
Copy link
Contributor Author

osher commented May 10, 2020

Yes. I see what you point at, that's true.
You mean in _http_outgoing

- OutgoingMessage.validateHeaderName = validateHeaderName;
- OutgoingMessage.validateHeaderValue = validateHeaderValue;

  module.exports = {
+   validateHeaderName,
+   validateHeaderValue,
    OutgoingMessage
  };

and in http:

- const { OutgoingMessage } = require('_http_outgoing');
- const { validateHeaderName, validateHeaderValue } = OutgoingMessage;
+ const { OutgoingMessage, validateHeaderName, validateHeaderValue } = require('_http_outgoing');

(plus similar effect on tests)

I can agree It looks tidier.

Would you like a PR for that?
given that the tests and docs are ready - it should be a fast one, right?

@mscdex
Copy link
Contributor

mscdex commented May 10, 2020

PR would be great.

codebytere pushed a commit that referenced this pull request May 11, 2020
The use-case is for any framework that provides user mw a response
replacement, that collects the desired response state, and applies them
only on conclusion. As such a framework, I'd want to validate the
header names and values as soon as the user-code provides them.
This - to eliminate errors on response-send time, and provide developer
stack trace that contains the line that submits the offending values.

PR-URL: #33119
Reviewed-By: Anna Henningsen <[email protected]>
codebytere added a commit that referenced this pull request May 18, 2020
Notable changes:

async_hooks:
  * (SEMVER-MINOR) move PromiseHook handler to JS (Stephen Belanger) #32891
cli:
  * (SEMVER-MINOR) add `--trace-atomics-wait` flag (Anna Henningsen) #33292
fs:
  * (SEMVER-MINOR) add .ref() and .unref() methods to watcher classes (rickyes) #33134
http:
  * (SEMVER-MINOR) expose http.validate-header-name/value (osher) #33119
repl:
  * (SEMVER-MINOR) deprecate repl._builtinLibs (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) remove obsolete completer variable (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) deprecate repl.inputStream and repl.outputStream (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) improve repl autocompletion for require calls (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) replace hard coded core module list with actual list (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) show reference errors during preview (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) improve repl preview (Ruben Bridgewater) #33282
src:
  * add support for TLA (Gus Caplan) #30370
test:
  * (SEMVER-MINOR) refactor test/parallel/test-bootstrap-modules.js (Ruben Bridgewater) #33282

PR-URL: TODO
@codebytere codebytere mentioned this pull request May 18, 2020
codebytere added a commit that referenced this pull request May 19, 2020
Notable changes:

async_hooks:
  * (SEMVER-MINOR) move PromiseHook handler to JS (Stephen Belanger) #32891
cli:
  * (SEMVER-MINOR) add `--trace-atomics-wait` flag (Anna Henningsen) #33292
fs:
  * (SEMVER-MINOR) add .ref() and .unref() methods to watcher classes (rickyes) #33134
http:
  * (SEMVER-MINOR) expose http.validate-header-name/value (osher) #33119
repl:
  * (SEMVER-MINOR) deprecate repl._builtinLibs (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) deprecate repl.inputStream and repl.outputStream (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) show reference errors during preview (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) improve repl preview (Ruben Bridgewater) #33282
src:
  * add support for TLA (Gus Caplan) #30370

PR-URL: TODO
codebytere added a commit that referenced this pull request May 19, 2020
Notable changes:

async_hooks:
  * (SEMVER-MINOR) move PromiseHook handler to JS (Stephen Belanger) #32891
cli:
  * (SEMVER-MINOR) add `--trace-atomics-wait` flag (Anna Henningsen) #33292
fs:
  * (SEMVER-MINOR) add .ref() and .unref() methods to watcher classes (rickyes) #33134
http:
  * (SEMVER-MINOR) expose http.validate-header-name/value (osher) #33119
repl:
  * (SEMVER-MINOR) deprecate repl._builtinLibs (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) deprecate repl.inputStream and repl.outputStream (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) show reference errors during preview (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) improve repl preview (Ruben Bridgewater) #33282
src:
  * add support for TLA (Gus Caplan) #30370

PR-URL: TODO
codebytere added a commit that referenced this pull request May 19, 2020
Notable changes:

async_hooks:
  * (SEMVER-MINOR) move PromiseHook handler to JS (Stephen Belanger) #32891
cli:
  * (SEMVER-MINOR) add `--trace-atomics-wait` flag (Anna Henningsen) #33292
fs:
  * (SEMVER-MINOR) add .ref() and .unref() methods to watcher classes (rickyes) #33134
http:
  * (SEMVER-MINOR) expose http.validate-header-name/value (osher) #33119
repl:
  * (SEMVER-MINOR) deprecate repl._builtinLibs (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) deprecate repl.inputStream and repl.outputStream (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) show reference errors during preview (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) improve repl preview (Ruben Bridgewater) #33282
src:
  * add support for TLA (Gus Caplan) #30370

PR-URL: TODO
codebytere added a commit that referenced this pull request May 19, 2020
Notable changes:

async_hooks:
  * (SEMVER-MINOR) move PromiseHook handler to JS (Stephen Belanger) #32891
cli:
  * (SEMVER-MINOR) add `--trace-atomics-wait` flag (Anna Henningsen) #33292
fs:
  * (SEMVER-MINOR) add .ref() and .unref() methods to watcher classes (rickyes) #33134
http:
  * (SEMVER-MINOR) expose http.validate-header-name/value (osher) #33119
repl:
  * (SEMVER-MINOR) deprecate repl._builtinLibs (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) deprecate repl.inputStream and repl.outputStream (Ruben Bridgewater) #33294
  * (SEMVER-MINOR) show reference errors during preview (Ruben Bridgewater) #33282
  * (SEMVER-MINOR) improve repl preview (Ruben Bridgewater) #33282
src:
  * add support for TLA (Gus Caplan) #30370

PR-URL: #33452
@MylesBorins
Copy link
Contributor

Would we want to backport this to v12.x? If so it seems like they should come with #33371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. http Issues or PRs related to the http subsystem. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants