Only copy HTML build output to web servers #6326

humitos · 2019-10-24T11:04:54Z

We do not need to copy localmedia, search, pdf and epub anymore because they are uploaded to Cloud storage and served from there directly via django-storages.

This PR removes this code completely because we don't really need anymore. Although, this may still needed for local development while we still use readthedocs.builds.syncers.LocalSyncer (won't be needed anymore if we decide to use #6295, though)

Another possibility here is to modify readthedocs.builds.syncers.RemotePuller (used in production) to skip these paths. This will keep LocalSyncer to continue working on development.

Our corporate site is going to start using Cloud storage soon, so this PR also makes sense in that context as well. Although, that should happen in Corporate before merging this PR here.

We do not need to copy localmedia, search, pdf and epub anymore because they are uploaded to Cloud storage and served from there directly via django-storages.

agjohnson · 2019-10-24T15:25:58Z

There are some test failures here to address before review maybe.

So far the change looks good! I won't sign off on it just yet as I think @davidfischer can provide the best feedback around storage and our asset syncing. I'm excited to get closer to not having web disks around though.

stsewd

Although, this may still needed for local development while we still use readthedocs.builds.syncers.LocalSyncer (won't be needed anymore if we decide to use #6295, though)

We can just use a local file system storage backend, I think we already do actually.

There are more things that can be removed. Actually, all parameters that are not html can be removed from those functions.

agjohnson · 2019-10-25T17:22:58Z

Wouldn't that break file syncing on the commercial site, where we are not yet serving files from remote storage?

humitos · 2019-10-28T10:07:19Z

Wouldn't that break file syncing on the commercial site, where we are not yet serving files from remote storage?

I thought that we were serving files from remote storage if the RTD_BUILD_MEDIA_STORAGE was defined. Although, I just saw this conditional

readthedocs.org/readthedocs/projects/views/public.py

Line 304 in f41a84d

if settings.DEFAULT_PRIVACY_LEVEL == 'public' or settings.DEBUG:

I think we can remove that condition since this URL (storage.url(..)) will contain the AccessKeyId with an expiration for private buckets:

readthedocs.org/readthedocs/projects/views/public.py

Lines 311 to 312 in f41a84d

    
           if storage.exists(storage_path): 
        
               return HttpResponseRedirect(storage.url(storage_path))

So, I think there is no need to keep serving files from disk in corporate.

humitos · 2019-10-28T10:09:01Z

There are more things that can be removed. Actually, all parameters that are not html can be removed from those functions.

We can't remove them all because they are used to call an API endpoint that set has_* fields. If you saw that I forget to remove some in particular, please point them out to me.

davidfischer

So, I think there is no need to keep serving files from disk in corporate.

If we feel comfortable serving from S3 for corporate, that's fine. I figured by just using S3 for search at first, we could ease into more "storage" features on corporate. I'm fine speeding that process up.

PDFs and ePubs are currently served from disk but we could change that and that was my eventual plan. I'd like to get the settings and setup between community and corporate as close as possible (using cloud storage, serving from storage, etc.). We can't merge this until PDFs and ePubs are served from storage AND we have copied all the existing ones into storage.

davidfischer · 2019-10-28T22:54:58Z

readthedocs/projects/tasks.py

@@ -1207,61 +1200,6 @@ def move_files(
        target = version.project.rtd_build_path(version.slug)
        Syncer.copy(from_path, target, host=hostname)

-    if search:


Overall, I'm fine removing this (large) block of code although I have no idea if there are any private installs using it. We probably want to make that clear in the changelog.

Eventually however, it's likely that the entire move_files function can be removed as everything will be done with storage. I like this solution because it doesn't treat HTML as a special case separate from PDFs or ePubs which has been the source of some problems. We will also then be able to remove the syncers entirely.

humitos · 2019-10-29T10:57:11Z

PDFs and ePubs are currently served from disk but we could change that and that was my eventual plan.

Are you talking about corporate, community or both? My guess is that we are serving PDFs and ePubs in community from storage and in corporate from disk.

humitos · 2019-10-29T10:59:24Z

Some good points were raised here:

are we OK serving artifacts (pdf, epub, htmlzip) from S3 with AccessKey and Expiration parameters?
we need to copy all the artifacts from media to S3 before merging/deploying this PR into corporate
make very explicit in the changelog the backward incompatibility that we are adding here
development environment needs to adopt the change (LocalSyncer will stop working)

I'm 👍 going in this direction.

humitos · 2019-10-29T11:12:53Z

Hrm... Another idea, maybe easier and that does not need code changes, could be to use the NullSyncer in community once El Proxito is deployed. That way, nothing will be copied to webs.

davidfischer · 2019-10-29T16:04:08Z

Are you talking about corporate, community or both? My guess is that we are serving PDFs and ePubs in community from storage and in corporate from disk.

Just corporate. On community, ePubs and PDFs are served from cloud storage.

davidfischer · 2019-10-30T16:50:49Z

are we OK serving artifacts (pdf, epub, htmlzip) from S3 with AccessKey and Expiration parameters?

For now, yes. Longer term, I think we'll reverse proxy to storage.

…tml-to-webs

humitos · 2019-11-25T10:40:37Z

There are some test failures here to address before review maybe.

@agjohnson I updated the tests and all of them should be passing now.

ericholscher

This looks good to me once we have downloads serving properly from .com 👍

ericholscher · 2019-11-25T16:40:46Z

Of note, I'd like to hold off merging this until after the deploy when we start serving via proxito. I like having a good deprecation path, instead of turning off the old thing in the same deploy when we turn on the new thing.

Especially in this case, once this code is deployed, it's going to be almost impossible to revert, because we won't have the files on disk, and will have to try and copy them back down from cloud hosting.

humitos · 2019-12-04T10:08:45Z

This looks good to me once we have downloads serving properly from .com +1

This was deployed yesterday and it's working.

We can plan to merge this PR and deploy on next release.

humitos · 2020-01-14T13:09:25Z

From our talk in chat:

probably worth holding off, and doing it all in the same release as we remove the HTML syncing, and noting it in the changelog
it will use the storage backends
and sync properly
it will only break when the builds & webs are different filesystems

agjohnson · 2020-01-27T21:32:21Z

To clarify where we are with this PR: the worry here is about backwards compatibility and we will be timing this at the same time as removing HTML syncing, to make one tidy backwards incompatible release.

stale · 2020-03-12T22:20:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

humitos · 2020-03-14T01:00:59Z

We are probably closed to serve all the docs via El Proxito and remove all this code completely at this point. I'm not sure if it worth to keep this PR open or even merge it before #6535

ericholscher · 2020-04-16T18:09:56Z

I think we can close this, and take a larger approach addressing a full removal of this code at some point.

Only copy HTML build output to web servers

122a5e8

We do not need to copy localmedia, search, pdf and epub anymore because they are uploaded to Cloud storage and served from there directly via django-storages.

humitos added the Needed: design decision A core team decision is required label Oct 24, 2019

humitos requested review from davidfischer and a team October 24, 2019 11:04

agjohnson added the PR: work in progress Pull request is not ready for full review label Oct 24, 2019

stsewd reviewed Oct 24, 2019

View reviewed changes

davidfischer reviewed Oct 28, 2019

View reviewed changes

humitos mentioned this pull request Nov 25, 2019

Serve non-html files from nginx (X-Accel-Redirect) #6404

Merged

humitos added 2 commits November 25, 2019 11:25

Merge remote-tracking branch 'origin/master' into humitos/only-copy-h…

7a6f11f

…tml-to-webs

Update tests to check that Sync.copy is not called on non-html

1749f59

humitos removed Needed: design decision A core team decision is required PR: work in progress Pull request is not ready for full review labels Nov 25, 2019

humitos requested review from davidfischer and agjohnson November 25, 2019 10:40

ericholscher approved these changes Nov 25, 2019

View reviewed changes

ericholscher added the Status: blocked Issue is blocked on another issue label Nov 25, 2019

humitos removed the Status: blocked Issue is blocked on another issue label Dec 4, 2019

humitos mentioned this pull request Jan 18, 2020

Remove code replaced by El Proxito and stateless servers #6535

Merged

humitos mentioned this pull request Jan 27, 2020

Remove webs from broadcast list #6592

Closed

stale bot added the Status: stale Issue will be considered inactive soon label Mar 12, 2020

stale bot removed the Status: stale Issue will be considered inactive soon label Mar 14, 2020

ericholscher closed this Apr 16, 2020

stsewd deleted the humitos/only-copy-html-to-webs branch April 16, 2020 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only copy HTML build output to web servers #6326

Only copy HTML build output to web servers #6326

humitos commented Oct 24, 2019 •

edited

Loading

agjohnson commented Oct 24, 2019

stsewd left a comment

agjohnson commented Oct 25, 2019

humitos commented Oct 28, 2019

humitos commented Oct 28, 2019

davidfischer left a comment

davidfischer Oct 28, 2019

humitos commented Oct 29, 2019

humitos commented Oct 29, 2019

humitos commented Oct 29, 2019

davidfischer commented Oct 29, 2019

davidfischer commented Oct 30, 2019

humitos commented Nov 25, 2019

ericholscher left a comment

ericholscher commented Nov 25, 2019

humitos commented Dec 4, 2019

humitos commented Jan 14, 2020

agjohnson commented Jan 27, 2020

stale bot commented Mar 12, 2020

humitos commented Mar 14, 2020

ericholscher commented Apr 16, 2020

Only copy HTML build output to web servers #6326

Only copy HTML build output to web servers #6326

Conversation

humitos commented Oct 24, 2019 • edited Loading

agjohnson commented Oct 24, 2019

stsewd left a comment

Choose a reason for hiding this comment

agjohnson commented Oct 25, 2019

humitos commented Oct 28, 2019

humitos commented Oct 28, 2019

davidfischer left a comment

Choose a reason for hiding this comment

davidfischer Oct 28, 2019

Choose a reason for hiding this comment

humitos commented Oct 29, 2019

humitos commented Oct 29, 2019

humitos commented Oct 29, 2019

davidfischer commented Oct 29, 2019

davidfischer commented Oct 30, 2019

humitos commented Nov 25, 2019

ericholscher left a comment

Choose a reason for hiding this comment

ericholscher commented Nov 25, 2019

humitos commented Dec 4, 2019

humitos commented Jan 14, 2020

agjohnson commented Jan 27, 2020

stale bot commented Mar 12, 2020

humitos commented Mar 14, 2020

ericholscher commented Apr 16, 2020

humitos commented Oct 24, 2019 •

edited

Loading