Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-Encoding and other HTTP Headers #315

Closed
Racum opened this issue Aug 31, 2016 · 8 comments
Closed

Content-Encoding and other HTTP Headers #315

Racum opened this issue Aug 31, 2016 · 8 comments

Comments

@Racum
Copy link

Racum commented Aug 31, 2016

I think we should start the conversation about how to standardise the use of Zstandard with HTTP and REST environments.

Content-Encoding Token

Does Zstandard already have an IANA-defined Content-Encoding token? if not, an obvious candidate would be zstd itself, for example:

GET /api/posts/ HTTP/1.1
Accept: */*
Accept-Encoding: zstd, gzip
…

HTTP/1.1 200 OK
Content-Type: application/json
Content-Encoding: zstd
…

Dictionary Definition Header (Zstd-Dict)

The end of the section 14.11 of the RFC-2616 (HTTP/1.1) says:

Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.

Dictionaries can qualify as "additional information about the encoding parameters". Ideally client and server should agree in a pre-defined built-in dictionary, but can be useful to change dictionaries on-the-fly in some scenarios. For that cases, I propose the custom header Zstd-Dict that points to a URL with the dictionary used to encode the current data being sent:

GET /api/posts/ HTTP/1.1
Accept: */*
Accept-Encoding: zstd, gzip
…

HTTP/1.1 200 OK
Content-Type: application/json
Content-Encoding: zstd
Zstd-Dict: https://mydomain.com/zdicts/myapi_json_bc1b45d.zdict
…

Notice that this header is not prefixed with X-, adhering to the RFC-6648.

Thoughts?

@kitten
Copy link

kitten commented Sep 1, 2016

I'd love this, since it perfectly fits into the narrative that Google is following with Brotli.

@KrzysFR
Copy link
Contributor

KrzysFR commented Sep 1, 2016

  • It would be great if the Zstd-Dict would include some hash or timestamp, to prevent issues from operator error that would result in an updated dictionary at the same uri (copying the wrong file, a bug in the dictionary generator that would reuse previous names, ...).
  • What would be the expected behavior if the client cannot retrieve the dictionary? (timeout, 200/500/418, firewall, server offline, web farm where the new dict has not made it to all the hosts yet...).
  • How can the client determine that the downloaded dictionary is complete (not corrupted or truncated).
  • When inspecting captured HTTP traffic, the catpure tool may not be able to retrieve the corresponding dictionary (offline, server is dead, dictionary has long been removed/revoked, ...) and make inspecting the body impossible.

How would this work in the reverse direction, for POST/PUT ?

@Racum
Copy link
Author

Racum commented Sep 1, 2016

@KrzysFR

  • It would be great if the Zstd-Dict would include some hash or timestamp, to prevent issues from operator error that would result in an updated dictionary at the same uri (copying the wrong file, a bug in the dictionary generator that would reuse previous names, ...).

We can check the zdict URL for ETag and/or Last-Modified.

  • What would be the expected behavior if the client cannot retrieve the dictionary? (timeout, 200/500/418, firewall, server offline, web farm where the new dict has not made it to all the hosts yet...).

Should we specify this behaviour?

  • How can the client determine that the downloaded dictionary is complete (not corrupted or truncated).

This looks like a job for Content-Lenght and Content-MD5 on the zdict URL.

  • When inspecting captured HTTP traffic, the catpure tool may not be able to retrieve the corresponding dictionary (offline, server is dead, dictionary has long been removed/revoked, ...) and make inspecting the body impossible.

Yeap! you are right! ...this approach requires better tooling and some guaranties from both sides (specially backend) to work.

  • How would this work in the reverse direction, for POST/PUT ?

Having a Zstd-Dict header only makes sense in the same message as an Content-Encoding, and, to be honest, I didn't know if the RFC allowed for it on the request, the specification don't say directly, but based on this paragraph is safe to infer that this is legal:

An origin server MAY respond with a status code of 415 (Unsupported
Media Type) if a representation in the request message has a content
coding that is not acceptable.

-- Last paragraph of RFC-7231, 3.1.2.2.

@KrzysFR
Copy link
Contributor

KrzysFR commented Sep 2, 2016

We can check the zdict URL for ETag and/or Last-Modified.

This would work if you have already downloaded the dictionary once, but not for the first time.

Should we specify this behaviour?

Someone (app or client) would need to handle this case anyway, so if there was a general guidance on how to do this, it would help implementors to not fall into this trap.

This looks like a job for Content-Lenght and Content-MD5 on the zdict URL.

Yes, but then this means that the download of the dictionary is not a regular GET: the HTTP server must know to include the Content-MD5 header for these files, and the HTTP client must know to verify it.

Dictionaries are supposed to be used for very small files (a few KB or less), which makes me think that this would probably target REST APIs where you GET /SomeEntity/1234 from a single page or mobile app, and want to reduce the size of the JSON (or other) downloaded. This means that the dictionary stuff should be specified at the level of the API, not at the level of a generic HTTP client implementation. In that case, it seems to me that it specifying these headers (and behavior expected to validate the dictionary, manage their lifetime, ...) would make sense, because then it would be part of the API contract.

If this is targetting APIs, then some of the points above become less of an issue (how to validate a dictionary, how to retry if there is an issue, ...), and also the client -> server case could also be specified as part of this (client knows in advance that server supports zstd with dicts because it is part of the API contract, and there would be some endpoint to discover the list of supported dictionary uris).

@dimkr
Copy link
Contributor

dimkr commented Sep 16, 2016

I think there should be support for Zstd-Dict-less compression as well, for two reasons:

  1. Sometimes, in security-hardened environment, you wouldn't want the server to download a file (or initiate TCP connections, in general), since this increases the server's attack surface.
  2. This allows easy implementation of supporting HTTP clients (since the difference between gzip, deflate and zstd is just a matter of calling the right function).

@Cyan4973
Copy link
Contributor

Request for HTTP content encoding has been formally started :
https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-00.html

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 3, 2018

Zstandard is now published as RFC8478 .

It's also fully registered as IANA media type.

With these 2 conditions fulfilled, it's now possible to make progresses on above topics.

@Cyan4973
Copy link
Contributor

zstd content encoding is now defined and becomes a real option.

It's possible to deliver content in zstd format using this nginx module.

Client-side support is still sparse, but exists, there is wget2 for example.

We wish to build up support throughout 2019.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants