-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to caching #62
Comments
Thanks for putting this together. There are some good points.
I don't think that the default file storage has a default expiration time,
since it just stores files on disk. The HTTP cache headers are up to
whatever web server is used to cache these files. S3 is different in that
it requires you to choose the cache headers as you upload the file, rather
than having it be configurable on a per-bucket basis.
I think that, starting from scratch, a conservative default would be best.
However, the default expiration time has been 1 year since forever, and
changing it now will affect existing projects. Given that S3 hardcodes this
setting on each uploaded file, this would mean existing projects that
updated naively would end up with some files having 1 year expiry, and
other files having 1 hour expiry. When they notice this, changing the
setting back in their project, and re-syncing the changes across the entire
bucket, can be a very time-consuming operation.
I guess a way to do this in a backwards-compatible way would be to have a
warning emitted if the setting is not explicitly set, or to make a new
storage class with a different default. The former is going to be quite
noisy on the console, and the latter is an ongoing source of maintenance
and confusion for people.
So, while I agree this would have been a better way to go from scratch, I
think it's a bit damaging to make the change now.
Fundamentally, switching to the manifest storage fixes all these problems.
…On 6 August 2017 at 04:25, Kevin Christopher Henry ***@***.*** > wrote:
I'd like to suggest some changes to the way django-s3-storage handles the
caching of static files.
The first issue is that the default expiration time—1 year—is too long.
That default applies not just to the versioned ManifestStaticS3Storage
files but to the regular StaticS3Storage files as well. By comparison,
django-storages doesn't set the header at all, by default, and whitenoise
uses <http://whitenoise.evans.io/en/stable/django.html#WHITENOISE_MAX_AGE>
either 0 or 60 seconds, depending on DEBUG.
It's important to set a conservative default here because once a user's
browser has cached a file there's no way for the developer to force it to
refresh. So someone who starts with the default value and then decides to
change it later will find that it's too late for some users.
The second issue is that ManifestStaticS3Storage doesn't distinguish
between the versioned (e.g. myfile.123456.css) and the unversioned (
myfile.css) files when it comes to caching. The way this should work is
that the versioned files get cached forever, while the unversioned files
get cached for a relatively short time. whitenoise gets this right
<https://github.com/evansd/whitenoise/blob/8c16bbccfcd771d0f434d11a6609c95a93323102/whitenoise/base.py#L206>,
using the filename to figure out if it's a versioned file or not and
setting the expiration time accordingly.
So my suggestions are:
1. Lower the default expiration time (perhaps to one hour, to match
the file storage default, or lower).
2. With ManifestStaticS3Storage, set the expiration time differently
depending on whether or not it's a versioned file. If it is, set it to
cache forever. If it isn't, use the AWS_S3_MAX_AGE_SECONDS_STATIC
setting.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#62>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAJFCFmJPWkRPHXweU5SCv27T_ppvuf_ks5sVTI7gaJpZM4Oun_2>
.
|
Thanks for your response. Sorry if I wasn't clear about "default expiration time". I was talking about the Backwards compatibility is certainly a fair point. I agree that it would be bad to have some files with the shorter default and some with the longer default. Fortunately, there's an easy solution to that: people can explicitly set You may have missed my second point, which is about how the manifest storage implementation here has a more serious problem. The issue is that unversioned ( Let's take That's a trivial example, but there are plenty of other scenarios in which your static files need to be available by their real file name (say, a mobile app that accesses static files directly rather than by fetching HTML pages first). That's why I suggest you adopt the approach of whitenoise (linked above), where the versioned files are cached forever, and the unversioned files are cached for a user-configurable amount of time. |
About the default expiration time: I think you're probably right, but I'm
going to ponder it some more while I finish my current deadline. :)
It's interesting about setting different cache expiration for hashed files.
Yes, you're right, sometimes you simply cannot link to the hashed file and
solve all the issues! I guess the question here is how do we test if a
given file is hashed or not? The only way I can think of is a heuristic,
where we consider a file hashed if:
- It has a regex .*?\.([a-e0-9]{12})\..*?
- The md5 of the file content matches the regex.
This shouldn't slow down collectstatic so much.
…On 8 August 2017 at 00:42, Kevin Christopher Henry ***@***.*** > wrote:
Thanks for your response.
Sorry if I wasn't clear about "default expiration time". I was talking
about the AWS_S3_MAX_AGE_SECONDS_STATIC setting, which defaults to one
year, not anything to do with the filesystem. When I mentioned matching the
"file storage default" I was talking about the AWS_S3_MAX_AGE_SECONDS
setting in the file storage backend, which defaults to one hour.
Backwards compatibility is certainly a fair point. I agree that it would
be bad to have some files with the shorter default and some with the longer
default. Fortunately, there's an easy solution to that: people can
explicitly set AWS_S3_MAX_AGE_SECONDS_STATIC to one year, if they like,
or they can run s3_sync_meta to adopt the new default. By contrast,
there's no good way to recover from setting a cache time that's too long
(with regard to users that have already cached the file). I think the issue
is serious enough to warrant a backwards-incompatible change, but that's up
to you.
You may have missed my second point, which is about how the manifest
storage implementation here has a more serious problem. The issue is that
unversioned (myfile.css) and versioned (myfile.123456.css) files need to
be cached differently. Just because you're using a manifest backend doesn't
mean that the only references to your static files come from your
dynamically generated pages. (To be clear, I'm not talking about the
difference between the static and manifest backends, I'm talking about the
two different kinds of file generated by the manifest backend itself.)
Let's take robots.txt as an example. After collectstatic is run (with the
manifest backend) you'll have two files uploaded, robots.txt and
robots.123456.txt. If for some reason you generate an internal reference
to that file then, sure, it will be robots.123546.txt and the long cache
time will be appropriate. But browsers are looking for robots.txt, and a
long cache time there would be inappropriate.
That's a trivial example, but there are plenty of other scenarios in which
your static files need to be available by their real file name (say, a
mobile app that accesses static files directly rather than by fetching HTML
pages first). That's why I suggest you adopt the approach of whitenoise
(linked above), where the versioned files are cached forever, and the
unversioned files are cached for a user-configurable amount of time.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJFCO1b5-gBD-tE1DQN2tgpnTAD8-Kuks5sV6D5gaJpZM4Oun_2>
.
|
If you decide not to change the expiration time, you might want to add a note in the documentation mentioning that it's one year for backwards-compatibility reasons, and suggest that users set it explicitly. Regarding the manifest backend, you probably don't even need the regex. Looking at whitenoise's implementation, they just see if there's any suffix separated by periods, and if there is they redo the versioning and compare the filenames. Or you could do the stricter regex to avoid doing some needless work (at the cost of being more closely coupled to Django's naming convention). Either way should work. If you want to keep the logic in the regular file backend you'll probably have to redo the hash. That's not a big deal, but wouldn't you have to fetch the original file back from S3? The Storage API just passes you the bytes on a file-by-file basis, right, you have no way to access an arbitrary static file on disk? That wouldn't be ideal. If that's true it would probably be better to modify the manifest backend and access its global mapping of hashes. (Or I suppose you could create your own global mapping in the file backend for this purpose.) |
I've pushed up a change to make the default expiration time for static
files 1 hour.
I've also had a look at treating hashed named differently, but it's
unfortunately not straightforward. Unlike whitenoise, django-s3-storage has
to decide the cache time when the file is being saved, as opposed to when
it is being retrieved. This means that the internal mapping of named to
cached names doesn't exist, so the trick of asking the storage to
regenerate the hash won't work.
The only thing it has access to is the filename. This means that it would
have to download the file from S3 and regenerate the hash, which is going
to double the time taken to run collectstatic. I'm not sure that this is a
good idea.
…On 10 August 2017 at 00:09, Kevin Christopher Henry < ***@***.***> wrote:
If you decide not to change the expiration time, you might want to add a
note in the documentation mentioning that it's one year for
backwards-compatibility reasons, and suggest that users set it explicitly.
Regarding the manifest backend, you probably don't even need the regex.
Looking at whitenoise's implementation
<https://github.com/evansd/whitenoise/blob/bf292167eed07d901a2649ba97aaf11f117ee6ab/whitenoise/middleware.py#L107>,
they just see if there's any suffix separated by periods, and if there is
they redo the versioning and compare the filenames. Or you could do the
stricter regex to avoid doing some needless work (at the cost of being more
closely coupled to Django's naming convention). Either way should work.
If you want to keep the logic in the regular file backend you'll probably
have to redo the hash. That's not a big deal, but wouldn't you have to
fetch the original file back from S3? The Storage API just passes you the
bytes on a file-by-file basis, right, you have no way to access an
arbitrary static file on disk? That wouldn't be ideal. If that's true it
would probably be better to modify the manifest backend and access its
global mapping of hashes. (Or I suppose you could create your own global
mapping in the file backend for this purpose.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJFCNA3LjNtGAlghsf2TLcwoneuECx4ks5sWjwPgaJpZM4Oun_2>
.
|
One straightforward idea is to just change the cache time in
If that doesn't work for some reason, my other idea would be to simply update the metadata in |
That's a pretty good idea. I'll investigate that next week.
…On 18 August 2017 at 09:15, Kevin Christopher Henry < ***@***.***> wrote:
One straightforward idea is to just change the cache time in
post_process(). I believe it's the case that all files saved before
post_process() is called will be unversioned, and all files that are
saved during post_process() will be versioned. If true, it would look
something like this pseudocode:
class S3Storage(Storage):
def __init__():
...
self.max_age = self.settings.MAX_AGE_SECONDS
def _object_put_params(self, name):
params = {"CacheControl": "max-age={}".format(self.max_age)}
...
class ManifestStaticS3Storage(ManifestFilesMixin, StaticS3Storage):
def post_process(...):
# At this point all the unversioned files have already been saved with
# the correct, short cache time.
self.max_age = forever
# This will save all the versioned files with the long cache time.
super().post_process(...)
# Set it back. May not be necessary.
self.max_age = self.settings.MAX_AGE_SECONDS
If that doesn't work for some reason, my other idea would be to simply
update the metadata in post_process(). After calling
super().post_process() you'll have access to the the versioned file
mapping in self.hashed_files. So you know which files are versioned at
that point and can just iterate through them and update the cache headers
(as you're doing in sync_meta).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJFCLczB-nNGKfIit2c4V6rRJk_2KYoks5sZUgJgaJpZM4Oun_2>
.
|
Good call. Here's the actual implementation:
|
v0.12.0 is out! |
Great, thanks for taking this on. |
I'd like to suggest some changes to the way
django-s3-storage
handles the caching of static files.The first issue is that the default expiration time—1 year—is too long. That default applies not just to the versioned
ManifestStaticS3Storage
files but to the regularStaticS3Storage
files as well. By comparison,django-storages
doesn't set the header at all, by default, andwhitenoise
uses either 0 or 60 seconds, depending onDEBUG
.It's important to set a conservative default here because once a user's browser has cached a file there's no way for the developer to force it to refresh. So someone who starts with the default value and then decides to change it later will find that it's too late for some users.
The second issue is that
ManifestStaticS3Storage
doesn't distinguish between the versioned (e.g.myfile.123456.css
) and the unversioned (myfile.css
) files when it comes to caching. The way this should work is that the versioned files get cached forever, while the unversioned files get cached for a relatively short time.whitenoise
gets this right, using the filename to figure out if it's a versioned file or not and setting the expiration time accordingly.So my suggestions are:
ManifestStaticS3Storage
, set the expiration time differently depending on whether or not it's a versioned file. If it is, set it to cache forever. If it isn't, use theAWS_S3_MAX_AGE_SECONDS_STATIC
setting.The text was updated successfully, but these errors were encountered: