Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgs: Support for conditional requests #177

Open
mac-chaffee opened this issue Dec 19, 2024 · 3 comments
Open

pgs: Support for conditional requests #177

mac-chaffee opened this issue Dec 19, 2024 · 3 comments

Comments

@mac-chaffee
Copy link
Contributor

mac-chaffee commented Dec 19, 2024

I did some looking into conditional requests now that we have caching in place and files pulled from minio include Etag and last-modified headers.

Short summary: Conditional requests allow browsers to efficiently refresh their locally cached versions of files by making a request like: curl -H "If-Modified-Since: <date>" https://some-site.pgs.sh/file.html. If pgs supported conditional requests, it would return a 304 Not Modified response with an empty body, saving everyone a lot of bandwidth and time.

Here's the current situation: Souin supports conditional requests if the file requested is cached and the TTL hasn't expired. But browsers will basically never benefit from this, because this is what happens:

  1. Browser sends an unconditional GET file.html request
  2. pgs serves the file with Etag and Last-Modified headers (from minio), and Cache-Control: max-age=600. Souin saves the file with a 600 second ttl.
  3. The browser saves file.html in a local cache with a 600 second ttl.
  4. 601 seconds later, the user requests file.html again.
  5. The browser's cache has expired, but because the cached version had Etag and Last-Modified headers, it sends a conditional GET file.html request with If-Modified-Since and If-None-Match headers.
    • (Any image files served by imgproxy currently do not include Etag or Last-Modified headers, so browsers don't even attempt conditional requests for images).
  6. Souin's cached version has expired, so it forwards the conditional request to pgs.
    • (The redis storage provider has more advanced management of "stale" cache entries via max-stale, but Otter doesn't AFAIK)
  7. pgs doesn't support conditional requests, so it returns 200 OK with the entire contents of file.html even though it hasn't changed in the last 601 seconds.

I think we can fix this with two small-ish changes:

1 - Use checkPreconditions from the Go standard library

EDIT: I originally planned to useServeContent but that didn't end up working because responses returned from imgproxy are not seekable. This section has been edited to describe the replacement solution I ended up using.

The Golang standard library has a utility called http.ServeContent which handles conditional requests. But we can't use that directly since it requires seekable files (files returned from pobj and imgproxy are not seekable).

Instead, we can just use the helper function checkPreconditions directly. The downside is that we still don't support Range requests, which would definitely require seekable files.

checkPreconditions supports all the RFCs so we don't have to worry about it.

PS: It's not clear in the docs, but checkPreconditions compares the etag/mod time from the If-None-Match/If-Modified-Since headers on the *Request to the Etag/Last-Modified headers in the ResponseWriter, so you have to set those headers manually before calling it.

PPS: the pobj storage interface we use could accept the Etag and Last-Modified headers as parameters so that it can instantly return after the StatObject check, avoiding the need to read the whole file. But then we'd have to move the checkPreconditions check into pobj. The extra disk read operation for re-validation is not a big deal IMO.

Implemented in #178

2 - Add etag and last-modified headers from imgproxy

Imgproxy can already provide etag and last-modified headers, we just need to enable that and forward them in the response.

If we do both of those things, then I think that's all that is required to support conditional requests for pgs. Analytics would still be counted like normal (would require visitHost to count 304s as a "visit" but actually checkPreconditions sets the status code after we log the visit as a 200 anyway, so no change needed).

Implemented in #179.

Thoughts?

@neurosnap
Copy link
Member

Hey @mac-chaffee

This all sounds great to me, I didn't know ServeContent was a thing! I'd be happy to work on this or if you want to submit a PR I can review.

Thanks!

@mac-chaffee
Copy link
Contributor Author

Hey @neurosnap , has #178 and #179 been deployed yet? No rush, just wanted to make sure it wasn't bugged since I'm not seeing the correct etags in responses yet

@neurosnap
Copy link
Member

It has not been deployed yet. I'm working on moving pgs out of the pico repo in the hope that it will be easier to self host and collaborate. I will update in thread when it is deployed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants