Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy should be able to preserve header casing in HTTP/1.1 #14363

Closed
esmet opened this issue Dec 10, 2020 · 40 comments · Fixed by #15619
Closed

Envoy should be able to preserve header casing in HTTP/1.1 #14363

esmet opened this issue Dec 10, 2020 · 40 comments · Fixed by #15619
Assignees
Labels
area/http design proposal Needs design doc/proposal before implementation help wanted Needs help!

Comments

@esmet
Copy link
Contributor

esmet commented Dec 10, 2020

This ticket continues the various discussions that have happened over the years on the topic of header casing in HTTP/1.1

In short, while the spec does not require header casing to be preserved, many legacy applications were written to assume the casing of HTTP headers and may be hard or impossible for some organizations to fix.

I will (soon) be preparing a design doc for this feature. The high-level idea is that we might be able to modify the HTTP1 codec to be aware of the original header casing observed from either downstream clients (when forwarding headers upstream) or from upstream servers (when sending headers downstream) and use that context to format headers, perhaps re-using the existing HttpHeaderFormatter.

As additional context, to support a much simpler use case, I wrote datawire#9 to force (not preserve) header casing on certain configured header keys. This was relatively straight-forward because forcing header values on a fixed set of keys doesn't require complex context to be passed around. Ideally this solution could be replaced by a more comprehensive solution that we come up with here.

Continues: #8463

cc @mattklein123 @snowp

@esmet esmet added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Dec 10, 2020
@mattklein123 mattklein123 added area/http design proposal Needs design doc/proposal before implementation help wanted Needs help! and removed enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Dec 10, 2020
@chradcliffe
Copy link
Contributor

Would you be open to designing this in such a way that would preserve the input order as well (or at least allow for preserving input order)? We've seen cases where poorly-written clients/servers can break on header ordering as well as case changes.

@mattklein123
Copy link
Member

As long as the feature has zero cost when disabled I don't have a strong opinion about how it's done. I think the header formatter per stream with appropriate callbacks from the codec could probably allow order to be modified somehow but it would require a bit more thinking.

@alyssawilk
Copy link
Contributor

cc @jmarantz

@anirudha-oc
Copy link

We can have the following approach to solve this problem:

  1. In the current envoy code, when a downstream header is received by envoy, it creates a header-entry for it in a header-map by calling addViaMove() or addCopy() header-map functions. A newly created header entry in the header-map stores key_ and value_ of the header as HeaderString(s). key_ stores the header field name in lower case. I am proposing to add a case_preserved_key_ HeaderString which will store the header field name in original casing, alongside key_ and value_ in the same header-entry. For eg. if an incoming header is: "NaMe: Envoy", then key_ == "name", value_ == "Envoy", case_preserved_key_ == "NaMe".
  2. I propose to add another function:
    void addCasePreservedKey(LowerCaseString& key, std::string_view case_preserved_key);
    This function will store the original casing of the header field name in case_preserved_key_ HeaderString. This function need to be called after addViaMove() / addCopy() of header-map to preserve the casing.
  3. After this, the same header-map (being populated in above steps) is passed to various http / network filters which may add / modify / remove headers. I saw most of the http filters (eg. Lua filter) use addCopy(LowerCaseString& key, absl::string_view value); function to add new headers. Thus, if I go by above approach, all filters have to call addCasePreservedKey() alongside addCopy(). Or another approach is: we can create a wrapper function of addCopy() (and similarly for others) such as:
void addCopy(const absl::string_view key, absl::string_view value) {
  this->addCopy(LowerCaseString(key), value);
  if (headerKeysCasePreserved()) {
    addCasePreservedCopy(LowerCaseString(key), key);
  }
}

Still, each filter has to call this a new function instead of the old addCopy.
The point here is that if we go by the above approach, we need to make all the filters and new filter-programmers aware of the new function to be called for case-preservation. Is this a feasible/good approach?
4. Also, there are some headers added by Envoy programmatically such as x-forwarded-proto, x-request-id which always remain in lowercase. Thus, the above approach only preserves in-coming headers (downstream client -> envoy, upstream service -> envoy).
5. Once all the filters have populated the header-map with 0 or more headers and optionally preserving their casing, it is passed to upstream codec where encoders checks if the header-map-entries have case_preserved_key_ is set, then encode the upstream headers using case_preserved_key_. If not then use, lower-case key_.
6. All of this can be made configurable from envoy config by adding a http codec setting something like: Http1Settings::HeaderKeyFormat::PreserveCase.

Any thoughts on the above approach?

@jmarantz
Copy link
Contributor

Per @mattklein123 we want to achieve zero-cost when the feature is disabled, so I don't think you want to add a new field to the header structures.

I also don't think you need/want to modify existing filters. You mainly want to modify the H1 codec to (if the new feature is enabled) add some annotation metadata. One way to do this without adding any existing cost (when disabled) is to add hidden header-map entries that will not serialize, but will store enough bits to decode the header-map values.

E.g. let's say I have these headers presented to the codec:

   CACHE-CONTROL: max-age:100
   CoNtEnT-eNcOdInG: deflate

We could leave that in its case-folded form for use in the system but add an extra hidden header to capture the case-preserved names:

   cache-control: max-age:100
   content-encodIng: deflate
   :case-preserve: CACHE-CONTROL,CoNtEnT-eNcOdInG

You'd modify the deserialization code to populate :case-preserve and the serialization code to look up ":case-preserve" value and if present, use it to build a temp case-correction map locally for use while looping over the headers.

This doesn't require modifying any filters unless those filters want to add to the case-preservation hidden entry. And it doesn't require increasing the footprint of any existing data structures.

@chradcliffe
Copy link
Contributor

In the :case-preserve case, I'm a bit concerned about the cost of tokenizing and iterating on the fields on output. Is there a way to back the hidden header by something like a vector and provide access to each entry in the value? I can't see anything obvious in the current API.

I like the general approach of the :case-preserve header, though, since it would allow for adding support for preservation of input header order as well at some later date.

@cretz
Copy link
Contributor

cretz commented Jan 28, 2021

How about just a std::vector<std::string> case_preserved_keys_ inside the HeaderMapImpl? That can accomplish it without a special header value, or is there concern that it won't be persisted across some map copies?

@jmarantz
Copy link
Contributor

jmarantz commented Jan 28, 2021

RE speed of iterating over the comma-separated string? Let's get some numbers with a microbenchmark. Iterating over a comma-separated string-view I think can be made pretty fast.

RE order-of-headers; that's already preserved, no?

RE new case_preserved_keys_ -- sure but that adds memory footprint even when not in use. I'm also not convinced it would be measurably faster. While you wouldn't need to search for the next comma, you may not get as good memory locality as with a simple comma-split string. This might be hard to capture with a microbenchmark though because often with those everything will be in L1 cache. In any case the dominant cost is probably building the map.

And while I'm thinking of it: that map doesn't need to copy any strings. It should actually be a set: absl::flat_hash_set<absl::string_view, CaseInsensitiveCompare, CaseInsensitiveHash> -- see

struct CaseInsensitiveCompare {
for refs to those.

RE "persisted across copies"; that would not be my concern. If you did add a new field you'd need to preserve it on copies.

@chradcliffe
Copy link
Contributor

RE order-of-headers; that's already preserved, no?
You're right, my mistake.

@mattklein123
Copy link
Member

See Slack around here https://envoyproxy.slack.com/archives/C78HA81DH/p1607554081053300. I think we can do this with a header formatter per stream which stores a map. It should be pretty clean, no cost when not used, and cover most cases.

@cretz
Copy link
Contributor

cretz commented Jan 29, 2021

@mattklein123 - Can you elaborate on this a tad bit and/or confirm my understanding is correct? So a new concept called a "header formatter" that is stored as a member on stream info? A pure virtual class w/ an overridable formatKey(LowerCaseString&) or some such? Then there is a default no-op formatter impl that leaves lowercase, a "proper case" stateless formatter impl that converts to canonical casing when that setting today is used, then a "preserve case" stateful formatter impl that contains the original key casings that will be used on when headers are written? Can the stateful formatter support programmatic additions to preserve-case keys into the formatter by filters instead of just preserving incoming keys? We have a use case where we want to put case-specific header keys programmatically too.

@jmarantz
Copy link
Contributor

You want to add case-sensitive header names programmatically in a custom C++ filter?

Rewriting to ":case-preserve" would work for that I suppose, though you'd need to be careful if you do that a lot to avoid n^2 manipulation of that header's value.

Otherwise you'd need to have give the filter access to the formatter, which seems plausible. Matt, this formatter doesn't exist currently, right?

@snowp
Copy link
Contributor

snowp commented Jan 29, 2021

@jmarantz The concept of the formatter exists in the HTTP codec, but nothing exists that propagate the original casings. I think this would all be easily done if we're just storing a map/set of header casings on StreamInfo (either via FilterState or via explicit StreamInfo fields) that can be set to the original upstream/downstream casings if configured to do so. This can be made mutable so that filters can modify the casing, or even dictate the entire casing if preserving original casing is not actually desired.

I think once we have this all we'd need is to plumb this through to the upstream/downstream codecs and we can use this to power a new kind of header formatter that uses this map/set.

@jmarantz
Copy link
Contributor

Ah cool -- this sounds reasonable and better than :case-preserve then.

@mattklein123
Copy link
Member

I think once we have this all we'd need is to plumb this through to the upstream/downstream codecs and we can use this to power a new kind of header formatter that uses this map/set.

Yeah, so per @snowp right now the header formatter exists but it's a static concept. What I'm suggesting is that instead of the formatter being static/set by config for all requests, we allow it to be allocated on a per-stream basis. So we would allow allocating an extension formatter on downstream stream creation, which is attached to the stream end-to-end. Then, on the encoding side it can do whatever it wants. In this case, it would probably internally create some type of map and then reference the map on encoding. This is roughly semantically equivalent to what @snowp is suggesting, though I would suggest doing it the way I'm suggesting vs. stream info, mainly because I think you are going to need extension code that runs on the downstream side no matter what, and I think it would be cleaner to just allow that to own the process end-to-end vs. involving stream info, but I don't feel that strongly about it.

@cretz
Copy link
Contributor

cretz commented Feb 1, 2021

Pardon my ignorance again here. I think I am following with regards to setting the header key formatter on Envoy::Http::Http1::StreamEncoderImpl (i.e. the impl of stream) and propagating it that way. One thing I am not understanding is how I, in my filter extension, can programmatically mark a key as using a certain case without involving stream info (meaning more than just preserving the request keys)? How do I update that map on the stream to say I want X-AbCd as the key in decodeHeaders?

I was gonna go with just a absl::flat_hash_map<LowerCaseString, string> case_preserved_keys_ on StreamInfoImpl w/ mutators exposed (but can make non-const HeaderKeyFormatter exposed instead of having the map right there). But at this point, I'm kinda lost on the suggestion here.

@mattklein123
Copy link
Member

Here is what I'm concretely suggesting but it may need tweaking depending on the requirements. At this point I think you would be better off creating a gdoc with various options that is easier to iterate on:

  • Add a new extension point here for custom formatters that can be specified as HTTP/1 protocol options:
    oneof header_format {
    option (validate.required) = true;
    // Formats the header by proper casing words: the first character and any character following
    // a special character will be capitalized if it's an alpha character. For example,
    // "content-type" becomes "Content-Type", and "foo$b#$are" becomes "Foo$B#$Are".
    // Note that while this results in most headers following conventional casing, certain headers
    // are not covered. For example, the "TE" header will be formatted as "Te".
    ProperCaseWords proper_case_words = 1;
  • As part of the extension factory, have a way of indicating whether the formatter is static (usable by all streams in the HCM), or is per-stream, and thus will be created by the HCM on a per-stream basis and then passed through the chain. Note, there is some weirdness here regarding whether we define this extension point as part of the protocol options or in the HCM options. We may want the latter since otherwise we could get a conflict between HCM options and cluster options. We need to think this through and figure out how to reconcile it.
  • When a new stream is created in the HCM, optionally create a formatter and pass it to the decoder. The formatter can have some type of onHeader(...) method, which internally can keep a map or whatever.
  • Plumb the formatter all the way through to the encoder. Perhaps this is done explicitly, or maybe it is an optional part of the stream info. There are multiple options there.
  • Then have a format(...) like we have today, but can use the internal state (the map) to figure out the case to write.
  • This should be zero cost for people not using this feature.

Now, from your previous description, it seems like you also want to change the state in a filter? For that perhaps your custom formatter can operate on some well known stream metadata?

Again, I think I would do a gdoc that clearly lists out end-user requirements, and then looks into implementation options. The hard requirement is this is zero cost (other than possibly some pointer checks) for people not using this feature.

@cretz
Copy link
Contributor

cretz commented Feb 1, 2021

Now, from your previous description, it seems like you also want to change the state in a filter? For that perhaps your custom formatter can operate on some well known stream metadata?

That was where I got confused about "not involving stream info", because my use case is I have externally defined headers that I set in a filter extension programmatically and I must preserve their casing (i.e. this is more than just request casing).

But now that I see that this is a completely new typed extension endpoint that needs to be created, yes, I can choose to have the filter set some state on the stream the formatter references. This is a bit difficult to do across multiple filters, but we'll see. Therefore, instead of writing a preserve-case formatter, we are writing a custom-formatter extension of which anyone can implement preserve-case. I assume to satisfy this ticket itself we also need a preserve-case formatter? (I doubt the original issue opener and others were expecting to have to write custom code to preserve request header case). This became a bit bigger of a task :-)

If the requirements get too complicated during impl, definitely will create a gdoc and share it. Otherwise it may be a POC PR sans tests/docs/polish to confirm approach.

@jmarantz
Copy link
Contributor

jmarantz commented Feb 1, 2021

I think once the infrastructure is plumbed in and a preserve-case formatter is supplied, anyone using Envoy would be able to select case-preservation for HTTP 1.* in configuration, without writing any code.

If you want to add a case-preserved header in a new filter then you'd have to just make different or extra call from the filter implementation.

It's probably worthwhile iterating in a doc first as that can be faster than iterating in code :)

@mattklein123
Copy link
Member

+1 please do a doc before writing code. This is pretty nuanced and I want to make sure we are all on the same page. Thank you!

@dalegaspi
Copy link

dalegaspi commented Feb 6, 2021

i think that i'm speaking for a lot of people when i say that i am looking forward to this feature to be available. for us, it's not really a convenient solution to change the client to conform with the HTTP 1.1 headers being case-insensitive.

@jtway
Copy link
Contributor

jtway commented Feb 10, 2021

I've been discussing this with @cretz off-issue. I'm looking into putting a design proposal together on this soon.

@esmet
Copy link
Contributor Author

esmet commented Feb 10, 2021

@jtway et. al thanks for discussing this further. I've been meaning to work on this but have some other things on my plate right now. Maintainers, please feel free to reassign as necessary, in case someone ends up being ready to work on this before me.

@jtway
Copy link
Contributor

jtway commented Feb 17, 2021

I have been working to put together a basic design doc/proposal for this feature. I do believe there are quite a few outstanding questions, but I believe there is enough to further the discussion.
https://docs.google.com/document/d/1GNL4iaNTwEYpPeWFslGUXnoX5K7mn3tipYAvSQ9_-TE/edit?usp=sharing

@mattklein123
Copy link
Member

@jtway thanks that doc looks like a good overview. Please ping me when it's more fleshed out and I can help with the review! Excited to see this issue finally fixed.

@ericfrancis
Copy link

@mattklein123 do you have an example of what you are looking for?

@snowp
Copy link
Contributor

snowp commented Feb 18, 2021

Briefly scanning over the doc I would love to see

  • what creates the formatter? where exactly is it stored?
  • how will extensions be able to access the formatter?

@jtway
Copy link
Contributor

jtway commented Feb 18, 2021

@snowp Will look to add those details to the document. That being said, I was hoping for some thoughts from the maintainers here, as there seems to be some preference. Initially I was thinking StreamInfo, but @mattklein123 seems to be discouraging that.

@snowp
Copy link
Contributor

snowp commented Feb 18, 2021

The first alternative that comes to mind would be to store it on the HCM ActiveStream (one for upstream and one for downstream) and then expose HTTP filter callbacks that allows reading the downstream formatter or setting the upstream formatter. This would allow the HCM to initialize this formatter when desired and expose it to the router filter which would pass it to the upstream codec, and allow the router filter to initialize the formatter based on the upstream formatter and have the HCM read it back when constructing the downstream response.

@jtway
Copy link
Contributor

jtway commented Feb 18, 2021

Thanks @snowp, completely forgot I have been thinking ActiveStream at one point as well. I will think through this a bit more and update the document.

@mattklein123
Copy link
Member

Yes I would roughly suggest what @snowp outlines above. What I'm asking for is a more fully fleshed out implementation proposal. What the doc contains now is mostly a summary of my comments in this issue. If it turns out that doing what @snowp outlines above is difficult for some reason, I'm not 100% opposed to attaching a custom formatter to stream info somehow, but it wouldn't be my first choice. The doc should outline the final proposal which might require some limited prototyping.

@sunnoy
Copy link

sunnoy commented Mar 19, 2021

i think that i'm speaking for a lot of people when i say that i am looking forward to this feature to be available. for us, it's not really a convenient solution to change the client to conform with the HTTP 1.1 headers being case-insensitive.

Our equipment is distributed all over the world, code changes are convenient, but equipment updates are almost impossible to complete,Looking forward to the release of this feature !

@jtway
Copy link
Contributor

jtway commented Mar 19, 2021

I plan on circling back to this, but have been delayed by a few other priorities.

@mattklein123
Copy link
Member

@jtway I would like to see this closed out so I can potentially work on this. Let me know if you won't get to it soon.

@jtway
Copy link
Contributor

jtway commented Mar 19, 2021

How soon were you thinking? As I am still figuring out the specifics of how the suggested approach would work, I would be open to you taking this on. This will largely be a "as time permits" task for me, as we have a work around. I'm still relatively new to the Envoy ecosystem, and the suggested approach contained a lot more moving parts than I was thinking it did at first glance.

I can definitely find time in the next few business days to spruce up the gdoc, extend with a few more specifics, and most importantly challenges I've encountered. Hopefully that will at least prove helpful.

One other, relatively big challenge, is figuring out how we would have this work with HTTP Filters who may inject headers (which would need their case preserved as well) since often Filters directly access the HeaderMap instead of going through the codec.

@mattklein123
Copy link
Member

I can probably work on this next week, so if you want to clean up the gdoc that would be helpful.

@jtway
Copy link
Contributor

jtway commented Mar 19, 2021

I will try to make the time to update as much as I can, and flush some areas out by CoB Monday. It no doubt will still be woefully incomplete, but I do hope it will at least save a little time.

@mattklein123 mattklein123 self-assigned this Mar 19, 2021
@mattklein123
Copy link
Member

I will try to make the time to update as much as I can, and flush some areas out by CoB Monday. It no doubt will still be woefully incomplete, but I do hope it will at least save a little time.

@jtway don't worry about it. I have a good idea of what needs to be done and I will report back!

@jtway
Copy link
Contributor

jtway commented Mar 22, 2021

Understood. I would very much like to stay involved with further discussion and looking at the PR. I do not know if you plan on updating the document at all, however, if you do, I'll use it to better gauge the level of detail you all are looking for in the future.

mattklein123 added a commit that referenced this issue Mar 23, 2021
1) Add new stateful header formatter extension point
2) Add preserve case formatter extension

Fixes #14363

Signed-off-by: Matt Klein <[email protected]>
mattklein123 added a commit that referenced this issue Mar 28, 2021
1) Add new stateful header formatter extension point
2) Add preserve case formatter extension

Fixes #14363

Signed-off-by: Matt Klein <[email protected]>
lizan pushed a commit to envoyproxy/data-plane-api that referenced this issue Mar 28, 2021
1) Add new stateful header formatter extension point
2) Add preserve case formatter extension

Fixes envoyproxy/envoy#14363

Signed-off-by: Matt Klein <[email protected]>

Mirrored from https://github.com/envoyproxy/envoy @ 2a4d97ce66db565d191b42cdf51f4b99edf04f12
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
1) Add new stateful header formatter extension point
2) Add preserve case formatter extension

Fixes envoyproxy/envoy#14363

Signed-off-by: Matt Klein <[email protected]>
@mandarjog
Copy link
Contributor

@esmet do you or anyone else use this extension in production ? Is the extension considered “production ready”? I assume so since the doc does not say anything about it being beta or experimental.

jiangshantao-dbg pushed a commit to istio-mt/envoy that referenced this issue Mar 30, 2022
1) Add new stateful header formatter extension point; 2) Add preserve case formatter extension;
Fixes envoyproxy#14363

Signed-off-by: jiangshantao <[email protected]>
jiangshantao-dbg added a commit to istio-mt/envoy that referenced this issue Mar 31, 2022
* feat: http: add HTTP/1.1 case preservation (envoyproxy#15619)

1) Add new stateful header formatter extension point; 2) Add preserve case formatter extension;
Fixes envoyproxy#14363

Signed-off-by: jiangshantao <[email protected]>

* fix: fix Error envoy_cc_library() got unexpected keyword argument: category

Signed-off-by: jiangshantao <[email protected]>

* fix: source/common/http/http1/settings.cc:39:39: error: unused parameter 'validate_scheme'

Signed-off-by: jiangshantao <[email protected]>

* feat: preserve case with proper case config

Signed-off-by: jiangshantao <[email protected]>

Co-authored-by: jiangshantao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/http design proposal Needs design doc/proposal before implementation help wanted Needs help!
Projects
None yet
Development

Successfully merging a pull request may close this issue.