Content-Encoding and other HTTP Headers #315

Racum · 2016-08-31T19:56:57Z

I think we should start the conversation about how to standardise the use of Zstandard with HTTP and REST environments.

Content-Encoding Token

Does Zstandard already have an IANA-defined Content-Encoding token? if not, an obvious candidate would be zstd itself, for example:

GET /api/posts/ HTTP/1.1
Accept: */*
Accept-Encoding: zstd, gzip
…

HTTP/1.1 200 OK
Content-Type: application/json
Content-Encoding: zstd
…

Dictionary Definition Header (Zstd-Dict)

The end of the section 14.11 of the RFC-2616 (HTTP/1.1) says:

Additional information about the encoding parameters MAY be provided by other entity-header fields not defined by this specification.

Dictionaries can qualify as "additional information about the encoding parameters". Ideally client and server should agree in a pre-defined built-in dictionary, but can be useful to change dictionaries on-the-fly in some scenarios. For that cases, I propose the custom header Zstd-Dict that points to a URL with the dictionary used to encode the current data being sent:

GET /api/posts/ HTTP/1.1
Accept: */*
Accept-Encoding: zstd, gzip
…

HTTP/1.1 200 OK
Content-Type: application/json
Content-Encoding: zstd
Zstd-Dict: https://mydomain.com/zdicts/myapi_json_bc1b45d.zdict
…

Notice that this header is not prefixed with X-, adhering to the RFC-6648.

Thoughts?

The text was updated successfully, but these errors were encountered:

kitten · 2016-09-01T14:01:16Z

I'd love this, since it perfectly fits into the narrative that Google is following with Brotli.

KrzysFR · 2016-09-01T16:12:43Z

It would be great if the Zstd-Dict would include some hash or timestamp, to prevent issues from operator error that would result in an updated dictionary at the same uri (copying the wrong file, a bug in the dictionary generator that would reuse previous names, ...).
What would be the expected behavior if the client cannot retrieve the dictionary? (timeout, 200/500/418, firewall, server offline, web farm where the new dict has not made it to all the hosts yet...).
How can the client determine that the downloaded dictionary is complete (not corrupted or truncated).
When inspecting captured HTTP traffic, the catpure tool may not be able to retrieve the corresponding dictionary (offline, server is dead, dictionary has long been removed/revoked, ...) and make inspecting the body impossible.

How would this work in the reverse direction, for POST/PUT ?

Racum · 2016-09-01T17:17:26Z

@KrzysFR

It would be great if the Zstd-Dict would include some hash or timestamp, to prevent issues from operator error that would result in an updated dictionary at the same uri (copying the wrong file, a bug in the dictionary generator that would reuse previous names, ...).

We can check the zdict URL for ETag and/or Last-Modified.

What would be the expected behavior if the client cannot retrieve the dictionary? (timeout, 200/500/418, firewall, server offline, web farm where the new dict has not made it to all the hosts yet...).

Should we specify this behaviour?

How can the client determine that the downloaded dictionary is complete (not corrupted or truncated).

This looks like a job for Content-Lenght and Content-MD5 on the zdict URL.

When inspecting captured HTTP traffic, the catpure tool may not be able to retrieve the corresponding dictionary (offline, server is dead, dictionary has long been removed/revoked, ...) and make inspecting the body impossible.

Yeap! you are right! ...this approach requires better tooling and some guaranties from both sides (specially backend) to work.

How would this work in the reverse direction, for POST/PUT ?

Having a Zstd-Dict header only makes sense in the same message as an Content-Encoding, and, to be honest, I didn't know if the RFC allowed for it on the request, the specification don't say directly, but based on this paragraph is safe to infer that this is legal:

An origin server MAY respond with a status code of 415 (Unsupported
Media Type) if a representation in the request message has a content
coding that is not acceptable.

-- Last paragraph of RFC-7231, 3.1.2.2.

KrzysFR · 2016-09-02T08:14:13Z

We can check the zdict URL for ETag and/or Last-Modified.

This would work if you have already downloaded the dictionary once, but not for the first time.

Should we specify this behaviour?

Someone (app or client) would need to handle this case anyway, so if there was a general guidance on how to do this, it would help implementors to not fall into this trap.

This looks like a job for Content-Lenght and Content-MD5 on the zdict URL.

Yes, but then this means that the download of the dictionary is not a regular GET: the HTTP server must know to include the Content-MD5 header for these files, and the HTTP client must know to verify it.

Dictionaries are supposed to be used for very small files (a few KB or less), which makes me think that this would probably target REST APIs where you GET /SomeEntity/1234 from a single page or mobile app, and want to reduce the size of the JSON (or other) downloaded. This means that the dictionary stuff should be specified at the level of the API, not at the level of a generic HTTP client implementation. In that case, it seems to me that it specifying these headers (and behavior expected to validate the dictionary, manage their lifetime, ...) would make sense, because then it would be part of the API contract.

If this is targetting APIs, then some of the points above become less of an issue (how to validate a dictionary, how to retry if there is an issue, ...), and also the client -> server case could also be specified as part of this (client knows in advance that server supports zstd with dicts because it is part of the API contract, and there would be some endpoint to discover the list of supported dictionary uris).

dimkr · 2016-09-16T09:53:47Z

I think there should be support for Zstd-Dict-less compression as well, for two reasons:

Sometimes, in security-hardened environment, you wouldn't want the server to download a file (or initiate TCP connections, in general), since this increases the server's attack surface.
This allows easy implementation of supporting HTTP clients (since the difference between gzip, deflate and zstd is just a matter of calling the right function).

Cyan4973 · 2017-09-27T08:55:48Z

Request for HTTP content encoding has been formally started :
https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-00.html

Cyan4973 · 2018-10-03T18:49:36Z

Zstandard is now published as RFC8478 .

It's also fully registered as IANA media type.

With these 2 conditions fulfilled, it's now possible to make progresses on above topics.

Cyan4973 · 2018-12-22T18:05:14Z

zstd content encoding is now defined and becomes a real option.

It's possible to deliver content in zstd format using this nginx module.

Client-side support is still sparse, but exists, there is wget2 for example.

We wish to build up support throughout 2019.

Cyan4973 closed this as completed Dec 22, 2018

fabiang mentioned this issue Feb 1, 2019

Composer v2: Pool/Solver/Repo/Installer Tasks composer/composer#7630

Closed

17 tasks

stapelberg mentioned this issue May 30, 2020

Use transparent zstd over HTTP for fetching and exporting packages distr1/distri#77

Open

9 tasks

felixhandte mentioned this issue Mar 21, 2022

Feature request: Standard dictionary for web usage #3100

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-Encoding and other HTTP Headers #315

Content-Encoding and other HTTP Headers #315

Racum commented Aug 31, 2016

kitten commented Sep 1, 2016

KrzysFR commented Sep 1, 2016

Racum commented Sep 1, 2016

KrzysFR commented Sep 2, 2016 •

edited

Loading

dimkr commented Sep 16, 2016 •

edited

Loading

Cyan4973 commented Sep 27, 2017

Cyan4973 commented Oct 3, 2018 •

edited

Loading

Cyan4973 commented Dec 22, 2018

Content-Encoding and other HTTP Headers #315

Content-Encoding and other HTTP Headers #315

Comments

Racum commented Aug 31, 2016

Content-Encoding Token

Dictionary Definition Header (Zstd-Dict)

kitten commented Sep 1, 2016

KrzysFR commented Sep 1, 2016

Racum commented Sep 1, 2016

KrzysFR commented Sep 2, 2016 • edited Loading

dimkr commented Sep 16, 2016 • edited Loading

Cyan4973 commented Sep 27, 2017

Cyan4973 commented Oct 3, 2018 • edited Loading

Cyan4973 commented Dec 22, 2018

KrzysFR commented Sep 2, 2016 •

edited

Loading

dimkr commented Sep 16, 2016 •

edited

Loading

Cyan4973 commented Oct 3, 2018 •

edited

Loading