Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash-based etag layer #324

Open
casey opened this issue Jan 25, 2023 · 16 comments
Open

Hash-based etag layer #324

casey opened this issue Jan 25, 2023 · 16 comments
Labels
A-new-middleware Area: new middleware proposals E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate

Comments

@casey
Copy link
Contributor

casey commented Jan 25, 2023

I'm thinking about writing a hash-based etag layer.

My plan is to:

  • When a response is going to go out, hash the body with blake3 or xxhash
  • Add a strong etag header with the digest
  • If the request contains an If-Match with a matching etag value and the response code is 200, set the code to 304 and remove the body

I probably can't contribute what I wrote to tower-http, since I'm very busy with the project that this is for, but if I wind up writing it I can definitely post or link to the code, in case a tower-http contributor is interested in getting it into tower-http. Also, even if it doesn't get into tower-http, it might give some insight into the viability of the approach.

Some reasons why this might not be a good idea:

  • Calculating response body digests adds too much CPU load for not enough benefit
  • Replacing the body of responses interacts badly with content encoding, or other layers

Any thoughts on the viability of this approach are welcome, especially if it's been considered for tower-http. I've been looking around for other applications that automatically add etags using a hash function, and I haven't found many. This makes me wonder if the idea is actually wrothwhile.

@davidpdrsn
Copy link
Member

Yeah we could have something like that. The issue is that it requires buffering the whole response body in the middleware which doesn't work for streaming responses but for responses made using Full it should be fine.

@davidpdrsn davidpdrsn added A-new-middleware Area: new middleware proposals E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate labels Jan 27, 2023
@casey
Copy link
Contributor Author

casey commented Feb 8, 2023

Our axum web server is now behind cloudflare, so this is basically at the bottom of my priorities list. I started working on it, but I want to make sure I get the most out of cloudflare before this, since there are downsides, like the CPU needed to hash and the buffering.

By the way, huge thank you to everyone working on tokio, tower, and axum, and the massive long tail of dependencies which enabled us to ship something great.

We are up to 13 million requests per day, around 150 qps, and we absolutely could not have done this without tower. Everything we do is open source and CC0, so everywhere it makes sense we want to upstream.

Thank you! ❤️

@jplatte
Copy link
Collaborator

jplatte commented Feb 8, 2023

We are up to 13 million requests per day, around 150 qps, and we absolutely could not have done this without tower. Everything we do is open source and CC0, so everywhere it makes sense we want to upstream.

I'm curious, who is 'we'?

@casey
Copy link
Contributor Author

casey commented Feb 8, 2023

We are me and the small cadre of lunatics behind casey/ord, which is the block explorer and wallet for ordinals, a Bitcoin NFT protocol that has been getting popular recently. The main feed of new NFTs is here. Things are amazingly rough, but the response has been incredible.

@jplatte
Copy link
Collaborator

jplatte commented Feb 8, 2023

Oh 😐

@SvizelPritula
Copy link
Contributor

I think it could make sense to implement this for ServeDir, where the ETag could be computed without reading everything into memory by reading the file twice, cached or even precomputed like can be done for compression.

@DoumanAsh
Copy link

Rather than hashing whole file, I would suggest to consider computing etag from last modified timestamp, length and optionally hash of file name

@SvizelPritula
Copy link
Contributor

SvizelPritula commented Jul 3, 2023 via email

@billythedummy
Copy link

Related: tower-rs/tower#692. This seems like a pretty often requested feature

@billythedummy
Copy link

I've started on an implementation at https://github.com/billythedummy/tower-etag-cache/tree/master/tower-etag-cache. The implemented ConstLruProvider probably won't work well for streaming responses due to #324 (comment) but the generic traits exported by the crate (hopefully) allows for better implementations. It currently seems to work ok with the examples/simple example axum server in the repo

@Weltpilot
Copy link

@billythedummy I am also interested in this feature, so I had a look at your implementation and example.
Thank you for your work.

I tested the caching behaviour with the index.css file of your example app:

  1. Request the file index.css => 200 OK (Etag is returned)
  2. Request the file again (including if-none-match with matching Etag) => 304 Not Modified
  3. Modify the css file on the server
  4. Request the file again (including if-none-match with same Etag as in step 2) => 304 Not Modified

But in step 4, the server should return the new file (as it has been modified).
Doesn't this defeat the purpose of the Etag middleware?

@billythedummy
Copy link

@Weltpilot thank you for the review. I wanted to just do a basic proof-of-concept with the ConstLruProvider so it has a very simple model that assumes all resources are static and don't change throughout the lifetime of the server. You can build your own cache provider with different behavior - perhaps it invalidates certain entries when files in a specific directory change or perhaps it always reruns the request and recalculates the etag

@Weltpilot
Copy link

If you assume that all resources are static and don't change, why not simply always return "304 Not Modified",
when the request contains a "If-None-Match" or "If-Modified-Since" header?

@billythedummy
Copy link

billythedummy commented Oct 17, 2023

@Weltpilot does not change throughout the lifetime of the server. What this enables:

  • user GETs index.html, server responds with document + etag abc
  • i modify the contents of index.html and push the update, restarting my server. The etag of index.html is now xyz due to the different content
  • user GETs index.html again with if-none-match abc. The new server does not see a index.html: abc entry in its cache and responds with the new index.html + etag xyz

@Weltpilot
Copy link

@billythedummy So, when you want to change a static file, you need to restart the server? Sorry, that doesn't make any sense to me.
@jplatte, @davidpdrsn Maybe one of the maintainers can make a proposal how to implement an ETag middleware that has a chance of being accepted as pull request?
I would opt for the suggestion by @SvizelPritula to extend ServeDir by an option to read the ETag from the file system,
that has been generated in advance (similar to the compressed files).

@davidpdrsn
Copy link
Member

I think the best path is to develop it as a standalone crate, unless it specifically needs private API for tower-http, that we can’t just make public.

tower-http doesn’t get much maintenance these days (personally I’m busy with work and axum) so don’t wanna add more stuff to tower-http which further burdens maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-new-middleware Area: new middleware proposals E-help-wanted Call for participation: Help is requested to fix this issue. E-medium Call for participation: Experience needed to fix: Medium / intermediate
Projects
None yet
Development

No branches or pull requests

7 participants