Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: Non-validated HttpHeaders enumeration #35126

Closed
stephentoub opened this issue Apr 17, 2020 · 55 comments · Fixed by #53555
Closed

API Proposal: Non-validated HttpHeaders enumeration #35126

stephentoub opened this issue Apr 17, 2020 · 55 comments · Fixed by #53555
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Net.Http tenet-performance Performance related issue
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Apr 17, 2020

EDIT June 1, 2021 by @stephentoub: This was previously approved, but we'd like to make some tweaks: #35126 (comment)


Proposed API

New NonValidated property returning all headers received by the client without any validation is to be added to HttpHeaders collection.

public abstract class HttpHeaders // existing type
{
    public NonValidatedEnumerator NonValidated { get; } ; // new property

    public struct NonValidatedEnumerator : IEnumerable<KeyValuePair<string, HeaderStringValues>>, IEnumerable, IEnumerator<KeyValuePair<string, HeaderStringValues>>
    {
        public NonValidatedEnumerator GetEnumerator();
        public bool MoveNext();
        public KeyValuePair<string, HeaderStringValues> Current { get; }
        public void Dispose();
        ... // explicitly implemented interface members
    }
}

Usage example

foreach (var h in resp.Headers.NonValidated) { }
foreach (var h in resp.Content.Headers.NonValidated) { }

Original proposal

Today to enumerate the response headers from an HttpClient HTTP request, you can write code like this:

foreach (KeyValuePair<string, IEnumerable<string>> h in resp.Headers) { ... }
foreach (KeyValuePair<string, IEnumerable<string>> h in resp.Content.Headers) { ... }

There are a few issues, here:

  1. Validation. TryAddWithoutValidation can be used to add request headers to be sent, and those headers will never be validated. And the handler can similarly use TryAddWithoutValidation to store response headers into the response headers collection. But enumerating the headers in the above manner will force validation of all the headers, which incurs measurable overhead.
  2. Allocation. We're forced to allocate enumerables to hand back the response value(s), even if there's only one. And for an unvalidated header, there generally will be only one, because it won't have been validated/parsed.
  3. Verbosity. Some developers are frustrated by the need to enumerate two different collections, with the headers split across response.Headers and response.Content.Headers.

We can address (1) and (2) by adding an API like this:

public abstract class HttpHeaders // existing type
{
    public NonValidatedEnumerator EnumerateWithoutValidation(); // new method

    public struct NonValidatedEnumerator : IEnumerable<KeyValuePair<string, HeaderStringValues>>, IEnumerable, IEnumerator<KeyValuePair<string, HeaderStringValues>>
    {
        public NonValidatedEnumerator GetEnumerator();
        public bool MoveNext();
        public KeyValuePair<string, HeaderStringValues> Current { get; }
        public void Dispose();
        ... // explicitly implemented interface members
    }
}

public readonly struct HeaderStringValues : IEnumerable<string>
{
    public Enumerator GetEnumerator();
    public struct Enumerator : IEnumerator<string>
    {
        public bool MoveNext();
        public string Current { get; }
        public void Dispose();
        ... // explicitly implemented interface members
    }
}

As a test, I looked to see what headers my Edge browser is sending when connecting to a particular service on the net, and what response headers I got in response, ~12 headers of varying lengths in each direction, some standard, some non-standard, and I put that into a benchmark that hits a local server. The benchmark makes the request and enumerates all the response and response content headers:

var req = new HttpRequestMessage(HttpMethod.Get, uri);
req.Headers.TryAddWithoutValidation(..., ...);
...

using (HttpResponseMessage resp = await client.SendAsync(req, default))
{
    foreach (var h in resp.Headers) { }
    foreach (var h in resp.Content.Headers) { }
}

With .NET Core 3.1, I get results like this:

Method Toolchain Mean Allocated
MakeRequests \netcore31\corerun.exe 96.41 us 16.64 KB

We've already made very measurable perf improvements for .NET 5, so running against master I get:

Method Toolchain Mean Allocated
MakeRequests \master\corerun.exe 81.54 us 10.31 KB

When I then change the test to use an implementation of the above proposal:

foreach (var h in resp.Headers.EnumerateWithoutValidation()) { }
foreach (var h in resp.Content.Headers.EnumerateWithoutValidation()) { }

I get:

Method Toolchain Mean Allocated
MakeRequests \proposal\corerun.exe 70.10 us 7.65 KB

Note that very similar numbers are also possible without a new API but with a (small?) behavioral change: we could change HttpHeaders.GetEnumerator to a) not validate and b) to return arrays that are only valid until MoveNext is called again, such that we can reuse the same array for all headers in the enumeration.

Another variation on this would also address the 3rd cited issue around verbosity: we could instead expose such an API on HttpResponseMessage:

public class HttpResponseMessage
{
    public NonValidatedEnumerator EnumerateWithoutValidation();
    ...
}

in which case it internally would enumerate both Headers and Content.Headers. If we did that, to go along with it we could add:

public class HttpRequestMessage
{
    public bool TryAddWithoutValidation(string key, string value);
}

which would determine whether the specified header should go into Headers or Content.Headers.

If we did want to add new API here, there's a question around the HeaderStringValues type above. ASP.NET uses the StringValues type from Microsoft.Extensions.Primitives. A variety of options exist:

  1. Just reuse that type, taking a dependency from System.Net.Http.dll onto Microsoft.Extensions.Primitives.dll and having devs that want to use this import the additional Microsoft. namespace.
  2. Create a new type in a new System. namespace that's similar in nature, albeit with a few things cleaned up.
  3. Do (2), and change ASP.NET APIs to use it instead of the Microsoft.Extensions.Primitives one.

cc: @scalablecory, @davidsh, @Tratcher, @samsp-msft

@stephentoub stephentoub added api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Net.Http labels Apr 17, 2020
@stephentoub stephentoub added this to the 5.0 milestone Apr 17, 2020
@ghost
Copy link

ghost commented Apr 17, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 17, 2020
@samsp-msft
Copy link
Member

What exactly does the validation do? I am assuming its primarily mapping the header name to the specific properties on the headers class? Or is this to split them up into content and request collections?

@stephentoub
Copy link
Member Author

stephentoub commented Apr 17, 2020

It parses the header value according to the rules for that header, creating the relevant data structure to hold the pieces; in some cases that's just the raw string, in some cases it's a primitive like an int or a DateTimeOffset, in some cases it's a dedicated type for that header, such as a MediaTypeHeaderValue. See all of the XxParser and XxValue types in https://github.com/dotnet/runtime/tree/master/src/libraries/System.Net.Http/src/System/Net/Http/Headers.

@samsp-msft
Copy link
Member

Does SocketsHttpHandler need the validated headers, or is that more because of legacy API design choices for the headers collection? Could we have something like

IDictionary<string, HeaderStringValues> RawHeaders { get;}

That would expose all the headers as a dictionary that could be enumerated, inserted into, removed, modified without having to validate or know whether its a content header or not?
If a request / response were made and only used that property, would that mean that the headers would not need to be validated and turned into strongly typed properties etc?

@Tratcher
Copy link
Member

Legacy design choices. The default enumerator was supposed to filter out invalid headers to stay consistent with the strongly typed properties.

@stephentoub
Copy link
Member Author

stephentoub commented Apr 17, 2020

Does SocketsHttpHandler need the validated headers

The ones it needs it will have already validated, e.g. Content-Length. It won't have done anything with the rest.

That would expose all the headers as a dictionary that could be enumerated, inserted into, removed, modified without having to validate or know whether its a content header or not?

It would. It would also be an additional allocation for the IDictionary<> (even if that instance was just a pass through to the underlying collection(s)) and another for an enumerator if someone wanted to enumerate it.

If a request / response were made and only used that property, would that mean that the headers would not need to be validated and turned into strongly typed properties etc?

Request headers are already not validated by the implementation. If someone adds one with HttpHeaders.Add, the Add method itself is validating them, but if you call TryAddWithoutValidation, then it obviously doesn't, and SocketsHttpHandler isn't either (other than ones it needs, e.g. Content-Length).

Similarly, SocketsHttpHandler isn't doing any general validation of response headers, just storing them into the relevant response headers collection with TryAddWithoutValidation. It's the act of taking them out that's validating them, whether via the enumerator or via the strongly-typed properties. This issue is about a mechanism to get them without that validation.

@samsp-msft
Copy link
Member

Yuck - IMHO the current WithoutValidation functions are a band-aid on what is already becoming a clunky API - its not clear what the validation is actually doing. It may force an allocation, but putting a new collection on the request/response is probably cleaner, and could lead to eventual deprecation of the typed/split collections we have today.

@stephentoub
Copy link
Member Author

stephentoub commented Apr 17, 2020

functions are a band-aid

That suggests you think they were added later? They were not.

@Tratcher
Copy link
Member

The initial design emphasized protocol correctness and strongly typed object models representing aspects of that protocol. That's quite at odds with the current emphasis on performance.

and could lead to eventual deprecation of the typed/split collections we have today.

That's unlikely, and would be a step backwards from a general usability perspective. Developers use the strongly typed APIs all the time. They just don't work as well for the bulk copy scenario we're focused on at the moment. Yes we likely need some new APIs for this copy scenario, but they'll need to interop with the old ones indefinitely.

@karelz
Copy link
Member

karelz commented May 26, 2020

Triage:

  • Pros and cons of various approaches -- perf should be driving factor
  • gRPC needs just few headers
  • Consider encoding (potentially per header)
  • Related to TryAddWithoutValidation

Useful for reverse proxy and gRPC scenarios.

@karelz karelz removed the untriaged New issue has not been triaged by the area owner label May 26, 2020
@karelz
Copy link
Member

karelz commented Jul 28, 2020

@alnikola can you please take a look at this one as well?

@geoffkizer
Copy link
Contributor

I wonder if we could get away with simply changing the enumerator here to not parse and return invalid values.

It's a change in behavior, but I wonder how many people are actually relying on this.

That wouldn't solve all of the problems enumerated above, but it would be a big improvement, and it wouldn't require any new API.

@samsp-msft
Copy link
Member

What we need for YARP is the ability to read and write headers as fast as possible as we transfer data from the source request to the proxied call and back again. When looking at perf graphs of RPS/size, the per-request overhead of YARP seems to be a weak point compared to the other proxies - and headers are going to be an important part of that.
Stepping back, the question is how do we copy the headers across - leaving most as is, but with the ability to remove, modify and add headers within the proxy. Hydrating to a collection is probably the friendliest from an API perspective, but I don't know how that would compare perf-wise to a reader/writer API. The corollary would be XML or JSON where there are reader APIs which are forward only read/write, verses DOM apis where the whole structure is hydrated into objects.

@alnikola
Copy link
Contributor

I wonder if we could get away with simply changing the enumerator here to not parse and return invalid values.

Agree, it looks like the best solution from cost/benefit perspective.

@alnikola
Copy link
Contributor

Triage:

Plan A

The simplest approach that is to disable validation on condition is a safe change.

Plan B

We decided to proceed with one of the following options for .NET 5.

foreach (var h in resp.Headers.GetWithoutValidation()) { }
foreach (var h in resp.Content.Headers.GetWithoutValidation()) { }
foreach (var h in resp.Headers.NonValidated) { }
foreach (var h in resp.Content.Headers.NonValidated) { }

@stephentoub
Copy link
Member Author

We decided to proceed with one of the following options for .NET 5.

What is the static type returned from these?

@geoffkizer
Copy link
Contributor

What is the static type returned from these?

Something like this:

public struct HeadersWithoutValidation : IEnumerable<string, IEnumerable<string>> {}

@stephentoub
Copy link
Member Author

stephentoub commented Jul 30, 2020

Should it implement IDictionary and/or expose such lookup methods?

@stephentoub
Copy link
Member Author

(And as long as we're exposing this, we don't also want to avoid the per header allocation for the value (s)?)

@scalablecory
Copy link
Contributor

we don't also want to avoid the per header allocation for the value (s)

True. We could have something like:

struct HeaderValuesWithoutValidation : IEnumerable<string> {}
struct HeadersWithoutValidation : IEnumerable<KeyValuePair<string, HeaderValuesWithoutValidation>> {}

@geoffkizer
Copy link
Contributor

Should it implement IDictionary

That does seem like goodness, but then, HttpHeaders itself does not implement IDictionary, so it would be weird to do it here.

and/or expose such lookup methods?

I can certainly see some value here...

Methods on HttpHeaders like GetValues, TryGetValues, Contains, etc presumably don't operate on invalid header values today, so it makes sense to have equivalents of these on HeadersWithoutValidation (or whatever this type is called).

We already have HttpHeaders.TryAddWithoutValidation, but it would make sense to have an equivalent Add or TryAdd method on HeadersWithoutValidation.

This all kinda makes me want to avoid creating new API here and just change so we no longer validate (i.e. Plan A).

@MihaZupan
Copy link
Member

struct HeaderValuesWithoutValidation : IEnumerable<string> {}

Would this contain a single value in the case we have the value as a single string, or still try to split it into values?

@scalablecory
Copy link
Contributor

I would leave these as minimal design -- only enumeration -- and we can add more APIs (lookup etc.) later if we want.

@stephentoub stephentoub added the blocking Marks issues that we want to fast track in order to unblock other important work label Jun 1, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 1, 2021
@terrajobst
Copy link
Member

How does one construct an HeaderStringValues, e.g. for testing?

@stephentoub
Copy link
Member Author

How does one construct an HeaderStringValues, e.g. for testing?

Testing what? The implementation Isn't currently tied to the internal details of HttpHeaders, but it easily could be (and I've considered doing so).

@terrajobst
Copy link
Member

terrajobst commented Jun 2, 2021

I meant consumers testing their own header processing code.

I guess the answer is: write your own structs or use KV<K,V>

@stephentoub
Copy link
Member Author

I guess the answer is: write your own structs or use KV<K,V>

Or if you mean they want to unit test their code that consumes HeaderStringValues from an HttpResponseMessage, something like:

private static HeaderStringValues Create(string headerName, string[] values)
{
    var m = new HttpResponseMessage();
    m.Headers.TryAddWithoutValidation(headerName, values);
    return m.Headers.NonValidated[headerName];
}

@bartonjs
Copy link
Member

bartonjs commented Jun 3, 2021

Video

The proposed amendments look good. The constructors can always be added later when there's a compelling scenario.

The net approved shape is now

namespace System.Net.Http.Headers
{
    public partial class HttpHeaders
    {
        public HttpHeadersNonValidated NonValidated { get; }
    }

    public readonly struct HttpHeadersNonValidated : 
         IReadOnlyDictionary<string, HeaderStringValues>
    {
         public int Count { get; }
         public bool Contains(string headerName);
         public bool TryGetValues(string headerName, out HeaderStringValues values);
         public HeaderStringValues this[string headerName];
         public Enumerator GetEnumerator();
         public readonly struct Enumerator : IEnumerator<KeyValuePair<string, HeaderStringValues>>
         {
             public bool MoveNext();
             public KeyValuePair<string, HeaderStringValues> Current { get; }
             public void Dispose();
             ... // explicitly implemented interface members
         }
         ... // explicitly implemented interface members
    }

    public readonly struct HeaderStringValues : 
        IReadOnlyCollection<string>
    {
         public int Count { get; }
         public override string ToString();
         public Enumerator GetEnumerator();
         public readonly struct Enumerator : IEnumerator<string>
         {
             public bool MoveNext();
             public string Current { get; }
             public void Dispose();
             ... // explicitly implemented interface members
         }
         ... // explicitly implemented interface members
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels Jun 3, 2021
@davidfowl
Copy link
Member

HeaderStringValues is basically a clone of StringValues but in the System.Net.Http.Headers namespace right?

@Tratcher
Copy link
Member

Tratcher commented Jun 3, 2021

Seems like, and YARP will have to convert it to a StringValues as efficiently as possible.

@stephentoub
Copy link
Member Author

At the moment implementation is very similar, a wrapper for a string or a string[]. I've toyed with instead having it be a wrapper for the internal HeaderItemInfoStore, but the benefit of doing that is primarily in the "the header has been parsed into multiple values" case, which should be relatively rare in this scenario, such that the 99% case should just be HeaderStringValues is wrapping a string. The ToString in HeaderStringValues also factors in the right separator based on the internal HeaderDescriptor used by HttpHeaders, whereas StringValues always uses ",". And then of course there's the layering, namespace, etc.

We discussed this a bit in the API review (noting that this re-review was just about adding the additional members to the struct that was already approved a year ago), and @halter73 is on point to start a discussion about what it would mean to move StringValues lower in the stack.

@davidfowl
Copy link
Member

Seems like, and YARP will have to convert it to a StringValues as efficiently as possible.

We should spike this as soon as the new APIs are available.

@GrabYourPitchforks
Copy link
Member

If you bring StringValues lower in the stack, we should run it through API review. Extensions types like StringSegment and StringValues could use some love if we're going to make them first-class citizens within the BCL, as they currently exhibit behaviors that we tend not to want in BCL types. Example: StringValues.ToArray() really should return a duplicate array (as all other ToArray methods in the BCL do), not the original array instance.

@halter73
Copy link
Member

halter73 commented Jun 4, 2021

@halter73 is on point to start a discussion about what it would mean to move StringValues lower in the stack.

I've started the discussion with @davidfowl. If YARP can convert efficiently enough, it's probably not worth the time and effort to get a Microsoft.Extensions API into the core runtime for .NET 6. I'm assuming if we don't do it in .NET 6, we probably will never move StringValues. The redundancy is a little sad, but not a huge deal.

@samsp-msft
Copy link
Member

I'm assuming if we don't do it in .NET 6, we probably will never move StringValues. The redundancy is a little sad, but not a huge deal.

The thing is that we have to live with decisions like this for decades. Lets try and clean this up in 6, or define an interface for the pattern so the different classes can be interoperable.

@stephentoub
Copy link
Member Author

or define an interface for the pattern

That would defeat the non-allocating nature in the common case.

@stephentoub
Copy link
Member Author

stephentoub commented Jun 4, 2021

If we're really concerned (and don't move down StringValues), we could add a (string?, string[]?)- returning method to HeaderStringValues and a corresponding ctor to StringValues.

(FWIW, I expect the HeaderStringValues to StringValues conversion cost to not be more than a blip either way. )

@terrajobst
Copy link
Member

Both are structs. StringValues could just define an implicit conversion from HeaderStringValues to it.

@stephentoub
Copy link
Member Author

StringValues could just define an implicit conversion from HeaderStringValues to it.

It could, but that would be about usability rather than efficiency, in terms of getting the data out of HeaderStringValues.

@terrajobst
Copy link
Member

Right, I kind of assumed that even if we were to TF StringValues we couldn't get the same efficiency as HeaderStringValues as StringValues is basically just a wrapper around a string or string[].

@stephentoub
Copy link
Member Author

Right, I kind of assumed that even if we were to TF StringValues we couldn't get the same efficiency as HeaderStringValues as StringValues is basically just a wrapper around a string or string[].

At the moment we would: as with StringValues, HeaderStringValues just wraps a string or a string[]. If they remain separate, it's possible in the future that could change. Functionally right now the main difference is that HeaderStringValues.ToString() uses the proper separator for the represented header (rather than always using ","), which means it also needs to store that.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Net.Http tenet-performance Performance related issue
Projects
None yet