Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Rebuild Caching #1150

Closed

Conversation

rogermparent
Copy link
Contributor

This is another pretty fat commit, and with less of an excuse than the previous one. However, this one trades theoretical messiness for functionality!

The previous solution would upload public to the prefix ${s3Prefix}/public/, but I believe the website is served from ${s3Prefix}. This solution fixes that and changes .cache to go into the prefix ${s3Prefix}-cache so it doesn't get picked up in "folder" queries for the root prefix.

Also, the pruning functionality that used to happen in the deploy script by parsing the XML sitemap is now moved to a more "native" solution in Gatsby's onPostBuild hook.

I've tested this locally and successfully got rebuilds with no image processing after running gatsby clean and then downloaded a previous cache with the deploy script. (It even handled a partial cache after I unthrottled the page count!)

dmpetrov and others added 29 commits April 9, 2020 00:33
There was double remote name, so it didn't work.
fix synopsis sections of list, update cmd refs
* april draft

* add new DVC features

* revisions

* comments link

* add link in new release
* readme/contributing: update after migration to Gatsby

* README: fix copyright, include proper links, refactor sections

* docs contib guide: update, fix linters a litle bit

* contrib: update and improve writing a blog post section

* readme/contrib: address PRs review

* readme/contrib: address second review
const publicDirName = 'public'
const publicDirEntry = [publicDirName, '/']
const cacheDirEntry = ['.cache', '-cache/']
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tuple syntax can be changed to something like objects if we're not all into it.

pavelgrinchenko and others added 14 commits April 15, 2020 07:30
…iterative#1154)

already replace original href with redirect if it exists

Co-authored-by: Pavel Grinchenko <[email protected]>
This way the code doesn't depend on the source file's relative directory.
I'm not sure if there's any rate limits in the stack that will cause this to break, but on first blush this looks like a quick and easy improvement.

The cached folders are also now accessed by a shared array that is hard-coded for now. This both makes them easier to work with uniformly and open to sourcing directories from elsewhere in the future.
I believe the crux of what keeps this PR from working is that `public` and
`.cache` the folders were being uploaded to `public` and `,cache` the S3
prefixes when the site is deployed from the root prefix. This change alters how
the cache directories are stored: now as tuple pairs with the local directory in
slot 0 and a string that's appended to the base prefix in slot 2. `public` work
with the root prefix and the `.cache` folder works with the root prefix with
`-cache` appended to it such that it's another "folder" that doesn't get
downloaded with the root prefix.

I find the tuple syntax more concise, but I'm open to changing it if we prefer
objects for better grokking at a glance.

This change also moves the responsibility of cache culling into an `onPostBuild`
Gatsby hook. This means we don't have to parse the sitemap XML after the fact,
and as such the XML parser used to do it is removed. Since this replaces
`cleanUpPageData`, it also invalidates the whole `page-data-utils` file which
has been removed.
@rogermparent
Copy link
Contributor Author

I rebased my fork's branch onto master earlier today without thinking of the ramifications on the PR. I think my changes will have to be opened up on a new PR to master.

@shcheklein shcheklein temporarily deployed to dvc-landing-rebuilds-ca-075zz5 April 16, 2020 01:27 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants