Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP header encoding #37024

Closed
samsp-msft opened this issue May 26, 2020 · 7 comments
Closed

HTTP header encoding #37024

samsp-msft opened this issue May 26, 2020 · 7 comments
Assignees
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-compatibility Incompatibility with previous versions or .NET Framework
Milestone

Comments

@samsp-msft
Copy link
Member

The HTTP spec is weak on what encoding is used for headers. Some treat it as UTF-8 others as ASCII/Latin1. There should be a way to control what encoding is used when converting headers to/from string.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Http untriaged New issue has not been triaged by the area owner labels May 26, 2020
@ghost
Copy link

ghost commented May 26, 2020

Tagging subscribers to this area: @dotnet/ncl
Notify danmosemsft if you want to be subscribed.

@samsp-msft
Copy link
Member Author

Should consider in relation to #35126.
@Tratcher - what is ASP.NET doing for this?

@Tratcher
Copy link
Member

See dotnet/aspnetcore#17400

@karelz karelz added this to the 5.0 milestone May 29, 2020
@karelz karelz added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels May 29, 2020
@karelz karelz changed the title How to read/write http headers with specific encodings HTTP header encoding Jun 4, 2020
@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions tenet-compatibility Incompatibility with previous versions or .NET Framework and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jun 4, 2020
@karelz
Copy link
Member

karelz commented Jun 4, 2020

This used to work on .NET Framework -- from compat perspective we will have to enable that - maybe as an opt-in config, or API, maybe as default.

@karelz
Copy link
Member

karelz commented Jun 9, 2020

@MihaZupan can you please look at a repro? I think we need to investigate these things on .NET Framework 4.8 and .NET Core 3.1/5.0, how they behave today to make decision how we want them to behave in 5.0:

  1. What happens if we pass byte ("Latin1"-encoded character) casted to char in the range of 0x80-0xff in request headers. What do we send over the wire on .NET Framework? (Wireshark/Fiddler may be needed) + Confirm .NET Core throws.
  2. What happens if we get these bytes over the wire from server in response headers? How do we present them in our APIs? As byte casted to char, or do we try to do some decoding?
  3. What happens if full UTF16 character is passed to request headers? How does it look over the wire?
  4. What happens if full UTF16 character is received from server in response headers? What if 2nd byte is end-of-line terminator for headers?
  5. What happens if multi-byte UTF8 character is received from server in response headers? What if 2nd/3rd/4th bytes are end-of-line terminator for headers?

(I might have missed some interesting cases)

We will need repro for test case in .NET 5.
Looking at .NET Framework code is bonus, but not necessity at this point.

All this info will help us decide how sane the .NET Framework behavior is and how to expose it in .NET Core -- as default behavior, as context-switch opt-in behavior, or as new API.

@markalward
Copy link

markalward commented Jun 16, 2020

I setup a repro to answer @karelz's questions above.

Test setup:

  • .NET Framework 4.8 using HttpClient with WebRequestHandler
  • Server is a test app that reads/writes http messages using a socket directly. The client-to-server test results were also confirmed using wireshark.

On the client side, I'm showing header values as strings in the tables below, since that's the representation exposed by HttpClient apis. On the server side, I'm showing the raw header bytes.

Request header tests:

Scenario Header sent by client Header bytes received by server Comment
Character codes from 00-FF "\u0000\u0001\u0002 ... \u00FF" 00 01 02 .. FF Low byte of each char is sent. All the character codes are sent as-is, including CTL characters.
Character codes from 100-FFFF "\u0100\u0101\u0102 ... \uFFFF" 00 01 02 .. FF Low byte of each char is sent. The high byte is discarded.

Response header tests:

Scenario Header bytes sent by server Header received by client Comment
CTL chars 7F HttpClient throws HttpRequestException
CTL chars 0E 0F HttpClient throws HttpRequestException
Non-CTL chars 20 21 22 .. 7E "\u0020\u0021\u0022 ... \u007E" Each byte is mapped to the char with the same numeric value.
Non-CTL chars 80 81 82 .. FF "\u0080\u0081\u0082 ... \u00FF" Each byte is mapped to the char with the same numeric value.
UTF-16 encoded w/ newline byte 10 0A HttpClient throws HttpRequestException
UTF-8 encoded E0 A8 90 "\u00E0\u00A8\u0090" This is the same 0x0A10 character code as the last example, but now encoded in UTF-8. HttpClient does not decode as UTF-8, but instead casts each byte into a char as in the other examples above.

For request headers, WebRequestHandler is just sending the low byte of each character code in the string. If the high byte is set, it gets truncated.

For response headers, the behavior is similar. Each byte on the wire is mapped to a char with the same character code. The only exceptions are CTL chars (byte values 0-31 and 127), which don't seem to be permitted in the response.

The test project is attached:

HeaderEncodingTests.zip

@MihaZupan
Copy link
Member

Closing as the APIs to control header value encoding have been merged.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-compatibility Incompatibility with previous versions or .NET Framework
Projects
None yet
Development

No branches or pull requests

6 participants