Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to caching #62

Closed
marfire opened this issue Aug 6, 2017 · 10 comments
Closed

Changes to caching #62

marfire opened this issue Aug 6, 2017 · 10 comments

Comments

@marfire
Copy link

marfire commented Aug 6, 2017

I'd like to suggest some changes to the way django-s3-storage handles the caching of static files.

The first issue is that the default expiration time—1 year—is too long. That default applies not just to the versioned ManifestStaticS3Storage files but to the regular StaticS3Storage files as well. By comparison, django-storages doesn't set the header at all, by default, and whitenoise uses either 0 or 60 seconds, depending on DEBUG.

It's important to set a conservative default here because once a user's browser has cached a file there's no way for the developer to force it to refresh. So someone who starts with the default value and then decides to change it later will find that it's too late for some users.

The second issue is that ManifestStaticS3Storage doesn't distinguish between the versioned (e.g. myfile.123456.css) and the unversioned (myfile.css) files when it comes to caching. The way this should work is that the versioned files get cached forever, while the unversioned files get cached for a relatively short time. whitenoise gets this right, using the filename to figure out if it's a versioned file or not and setting the expiration time accordingly.

So my suggestions are:

  1. Lower the default expiration time (perhaps to one hour, to match the file storage default, or lower).
  2. With ManifestStaticS3Storage, set the expiration time differently depending on whether or not it's a versioned file. If it is, set it to cache forever. If it isn't, use the AWS_S3_MAX_AGE_SECONDS_STATIC setting.
@etianen
Copy link
Owner

etianen commented Aug 7, 2017 via email

@marfire
Copy link
Author

marfire commented Aug 7, 2017

Thanks for your response.

Sorry if I wasn't clear about "default expiration time". I was talking about the AWS_S3_MAX_AGE_SECONDS_STATIC setting, which defaults to one year, not anything to do with the filesystem. When I mentioned matching the "file storage default" I was talking about the AWS_S3_MAX_AGE_SECONDS setting in the file storage backend, which defaults to one hour.

Backwards compatibility is certainly a fair point. I agree that it would be bad to have some files with the shorter default and some with the longer default. Fortunately, there's an easy solution to that: people can explicitly set AWS_S3_MAX_AGE_SECONDS_STATIC to one year, if they like, or they can run s3_sync_meta to adopt the new default. By contrast, there's no good way to recover from setting a cache time that's too long (with regard to users that have already cached the file). I think the issue is serious enough to warrant a backwards-incompatible change, but that's up to you.

You may have missed my second point, which is about how the manifest storage implementation here has a more serious problem. The issue is that unversioned (myfile.css) and versioned (myfile.123456.css) files need to be cached differently. Just because you're using a manifest backend doesn't mean that the only references to your static files come from your dynamically generated pages. (To be clear, I'm not talking about the difference between the static and manifest backends, I'm talking about the two different kinds of file generated by the manifest backend itself.)

Let's take robots.txt as an example. After collectstatic is run (with the manifest backend) you'll have two files uploaded, robots.txt and robots.123456.txt. If for some reason you generate an internal reference to that file then, sure, it will be robots.123546.txt and the long cache time will be appropriate. But browsers are looking for robots.txt, and a long cache time there would be inappropriate.

That's a trivial example, but there are plenty of other scenarios in which your static files need to be available by their real file name (say, a mobile app that accesses static files directly rather than by fetching HTML pages first). That's why I suggest you adopt the approach of whitenoise (linked above), where the versioned files are cached forever, and the unversioned files are cached for a user-configurable amount of time.

@etianen
Copy link
Owner

etianen commented Aug 9, 2017 via email

@marfire
Copy link
Author

marfire commented Aug 9, 2017

If you decide not to change the expiration time, you might want to add a note in the documentation mentioning that it's one year for backwards-compatibility reasons, and suggest that users set it explicitly.

Regarding the manifest backend, you probably don't even need the regex. Looking at whitenoise's implementation, they just see if there's any suffix separated by periods, and if there is they redo the versioning and compare the filenames. Or you could do the stricter regex to avoid doing some needless work (at the cost of being more closely coupled to Django's naming convention). Either way should work.

If you want to keep the logic in the regular file backend you'll probably have to redo the hash. That's not a big deal, but wouldn't you have to fetch the original file back from S3? The Storage API just passes you the bytes on a file-by-file basis, right, you have no way to access an arbitrary static file on disk? That wouldn't be ideal. If that's true it would probably be better to modify the manifest backend and access its global mapping of hashes. (Or I suppose you could create your own global mapping in the file backend for this purpose.)

@etianen
Copy link
Owner

etianen commented Aug 14, 2017 via email

@marfire
Copy link
Author

marfire commented Aug 18, 2017

One straightforward idea is to just change the cache time in post_process(). I believe it's the case that all files saved before post_process() is called will be unversioned, and all files that are saved during post_process() will be versioned. If true, it would look something like this pseudocode:

class S3Storage(Storage):
    def __init__():
        ...
        self.max_age = self.settings.MAX_AGE_SECONDS

    def _object_put_params(self, name):
        params = {"CacheControl": "max-age={}".format(self.max_age)}
        ...

class ManifestStaticS3Storage(ManifestFilesMixin, StaticS3Storage):
    def post_process(...):
        # At this point all the unversioned files have already been saved with
        # the correct, short cache time.
        self.max_age = forever

        # This will save all the versioned files with the long cache time.
        super().post_process(...)

        # Set it back. May not be necessary.
        self.max_age = self.settings.MAX_AGE_SECONDS

If that doesn't work for some reason, my other idea would be to simply update the metadata in post_process(). After calling super().post_process() you'll have access to the the versioned file mapping in self.hashed_files. So you know which files are versioned at that point and can just iterate through them and update the cache headers (as you're doing in sync_meta).

@etianen
Copy link
Owner

etianen commented Aug 18, 2017 via email

@etianen
Copy link
Owner

etianen commented Aug 22, 2017

Good call. Here's the actual implementation:

def post_process(self, *args, **kwargs):

@etianen etianen closed this as completed Aug 22, 2017
@etianen
Copy link
Owner

etianen commented Aug 22, 2017

v0.12.0 is out!

@marfire
Copy link
Author

marfire commented Aug 22, 2017

Great, thanks for taking this on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants