Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files get re-uploaded on every deploy #8

Open
krosaen opened this issue Jun 30, 2016 · 2 comments
Open

Large files get re-uploaded on every deploy #8

krosaen opened this issue Jun 30, 2016 · 2 comments

Comments

@krosaen
Copy link

krosaen commented Jun 30, 2016

When deploying a lektor site that hosts a podcast each mp3 file is re-uploaded each time:

$ AWS_PROFILE=lektor-deploy lektor deploy
Deploying to S3
  Build cache: /Users/krosaen/Library/Caches/Lektor/builds/47d71c9cd6c3bea470ee10d73758dde5
  Target: s3://brosaen.com
  adding podcasts/pistons/27/.__transQy8kFO
  adding podcasts/pistons/27/brosaen-episode-27.mp3
  adding podcasts/pistons/27/index.html
  updating index.html
  updating podcasts/pistons.xml
  updating podcasts/pistons/1/brosaen-episode-1.mp3
  updating podcasts/pistons/1/index.html
  updating podcasts/pistons/10/brosaen-episode-10.mp3
  updating podcasts/pistons/10/index.html
  updating podcasts/pistons/11/brosaen-episode-11.mp3
  updating podcasts/pistons/11/index.html
  updating podcasts/pistons/12/brosaen-episode-12.mp3
...

I'm guessing this is due to, as the code states,

MD5s can be stored in the 'ETag' field of S3 objects. The
field doesn't store the MD5 in two cases: objects uploaded
with Multipart Upload and objects encrypted with SSE-C or
SSE-KMS. In those cases, we'll just return an empty string.

so perhaps this is a feature request—is there a way to set the etag field for larger objects?

@spenczar
Copy link
Owner

spenczar commented Jul 1, 2016

Amazon's docs say that, for multipart uploads, the ETag "will not necessarily be an MD5 hash of the object data." They don't actually say what it will be, just that it isn't the md5.

So, we don't have a confident way to know that the mp3s have not changed.

One option is to just look at file size and last-modified time for files that don't have usable ETags. This seems a little dangerous because the file contents could change without changing the size and we wouldn't know to push the update. Maybe we could disable it by default, but let users enable it through config if they understand the risk.

@krosaen
Copy link
Author

krosaen commented Jul 12, 2016

Thanks for the additional details, Spencer, makes sense—I can certainly give something like you suggest a try if the redeployment of the mp3s really becomes a hassle.

@spenczar spenczar changed the title cache comparison for mp3 files not working (re-uploads all mp3s each deploy) Large files get re-uploaded on every deploy Jul 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants