Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only copy HTML build output to web servers #6326

Closed
wants to merge 3 commits into from

Conversation

humitos
Copy link
Member

@humitos humitos commented Oct 24, 2019

We do not need to copy localmedia, search, pdf and epub anymore because they are uploaded to Cloud storage and served from there directly via django-storages.

This PR removes this code completely because we don't really need anymore. Although, this may still needed for local development while we still use readthedocs.builds.syncers.LocalSyncer (won't be needed anymore if we decide to use #6295, though)

Another possibility here is to modify readthedocs.builds.syncers.RemotePuller (used in production) to skip these paths. This will keep LocalSyncer to continue working on development.

Our corporate site is going to start using Cloud storage soon, so this PR also makes sense in that context as well. Although, that should happen in Corporate before merging this PR here.

We do not need to copy localmedia, search, pdf and epub anymore
because they are uploaded to Cloud storage and served from there
directly via django-storages.
@humitos humitos added the Needed: design decision A core team decision is required label Oct 24, 2019
@humitos humitos requested review from davidfischer and a team October 24, 2019 11:04
@agjohnson
Copy link
Contributor

There are some test failures here to address before review maybe.

So far the change looks good! I won't sign off on it just yet as I think @davidfischer can provide the best feedback around storage and our asset syncing. I'm excited to get closer to not having web disks around though.

@agjohnson agjohnson added the PR: work in progress Pull request is not ready for full review label Oct 24, 2019
Copy link
Member

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, this may still needed for local development while we still use readthedocs.builds.syncers.LocalSyncer (won't be needed anymore if we decide to use #6295, though)

We can just use a local file system storage backend, I think we already do actually.

There are more things that can be removed. Actually, all parameters that are not html can be removed from those functions.

@agjohnson
Copy link
Contributor

Wouldn't that break file syncing on the commercial site, where we are not yet serving files from remote storage?

@humitos
Copy link
Member Author

humitos commented Oct 28, 2019

Wouldn't that break file syncing on the commercial site, where we are not yet serving files from remote storage?

I thought that we were serving files from remote storage if the RTD_BUILD_MEDIA_STORAGE was defined. Although, I just saw this conditional

if settings.DEFAULT_PRIVACY_LEVEL == 'public' or settings.DEBUG:

I think we can remove that condition since this URL (storage.url(..)) will contain the AccessKeyId with an expiration for private buckets:

if storage.exists(storage_path):
return HttpResponseRedirect(storage.url(storage_path))

So, I think there is no need to keep serving files from disk in corporate.

@humitos
Copy link
Member Author

humitos commented Oct 28, 2019

There are more things that can be removed. Actually, all parameters that are not html can be removed from those functions.

We can't remove them all because they are used to call an API endpoint that set has_* fields. If you saw that I forget to remove some in particular, please point them out to me.

Copy link
Contributor

@davidfischer davidfischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think there is no need to keep serving files from disk in corporate.

If we feel comfortable serving from S3 for corporate, that's fine. I figured by just using S3 for search at first, we could ease into more "storage" features on corporate. I'm fine speeding that process up.

PDFs and ePubs are currently served from disk but we could change that and that was my eventual plan. I'd like to get the settings and setup between community and corporate as close as possible (using cloud storage, serving from storage, etc.). We can't merge this until PDFs and ePubs are served from storage AND we have copied all the existing ones into storage.

@@ -1207,61 +1200,6 @@ def move_files(
target = version.project.rtd_build_path(version.slug)
Syncer.copy(from_path, target, host=hostname)

if search:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I'm fine removing this (large) block of code although I have no idea if there are any private installs using it. We probably want to make that clear in the changelog.

Eventually however, it's likely that the entire move_files function can be removed as everything will be done with storage. I like this solution because it doesn't treat HTML as a special case separate from PDFs or ePubs which has been the source of some problems. We will also then be able to remove the syncers entirely.

@humitos
Copy link
Member Author

humitos commented Oct 29, 2019

PDFs and ePubs are currently served from disk but we could change that and that was my eventual plan.

Are you talking about corporate, community or both? My guess is that we are serving PDFs and ePubs in community from storage and in corporate from disk.

@humitos
Copy link
Member Author

humitos commented Oct 29, 2019

Some good points were raised here:

  • are we OK serving artifacts (pdf, epub, htmlzip) from S3 with AccessKey and Expiration parameters?
  • we need to copy all the artifacts from media to S3 before merging/deploying this PR into corporate
  • make very explicit in the changelog the backward incompatibility that we are adding here
  • development environment needs to adopt the change (LocalSyncer will stop working)

I'm 👍 going in this direction.

@humitos
Copy link
Member Author

humitos commented Oct 29, 2019

Hrm... Another idea, maybe easier and that does not need code changes, could be to use the NullSyncer in community once El Proxito is deployed. That way, nothing will be copied to webs.

@davidfischer
Copy link
Contributor

Are you talking about corporate, community or both? My guess is that we are serving PDFs and ePubs in community from storage and in corporate from disk.

Just corporate. On community, ePubs and PDFs are served from cloud storage.

@davidfischer
Copy link
Contributor

are we OK serving artifacts (pdf, epub, htmlzip) from S3 with AccessKey and Expiration parameters?

For now, yes. Longer term, I think we'll reverse proxy to storage.

@humitos
Copy link
Member Author

humitos commented Nov 25, 2019

There are some test failures here to address before review maybe.

@agjohnson I updated the tests and all of them should be passing now.

@humitos humitos removed Needed: design decision A core team decision is required PR: work in progress Pull request is not ready for full review labels Nov 25, 2019
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me once we have downloads serving properly from .com 👍

@ericholscher
Copy link
Member

Of note, I'd like to hold off merging this until after the deploy when we start serving via proxito. I like having a good deprecation path, instead of turning off the old thing in the same deploy when we turn on the new thing.

Especially in this case, once this code is deployed, it's going to be almost impossible to revert, because we won't have the files on disk, and will have to try and copy them back down from cloud hosting.

@ericholscher ericholscher added the Status: blocked Issue is blocked on another issue label Nov 25, 2019
@humitos
Copy link
Member Author

humitos commented Dec 4, 2019

This looks good to me once we have downloads serving properly from .com +1

This was deployed yesterday and it's working.

We can plan to merge this PR and deploy on next release.

@humitos humitos removed the Status: blocked Issue is blocked on another issue label Dec 4, 2019
@humitos
Copy link
Member Author

humitos commented Jan 14, 2020

From our talk in chat:

probably worth holding off, and doing it all in the same release as we remove the HTML syncing, and noting it in the changelog
it will use the storage backends
and sync properly
it will only break when the builds & webs are different filesystems

@agjohnson
Copy link
Contributor

To clarify where we are with this PR: the worry here is about backwards compatibility and we will be timing this at the same time as removing HTML syncing, to make one tidy backwards incompatible release.

@stale
Copy link

stale bot commented Mar 12, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Mar 12, 2020
@humitos
Copy link
Member Author

humitos commented Mar 14, 2020

We are probably closed to serve all the docs via El Proxito and remove all this code completely at this point. I'm not sure if it worth to keep this PR open or even merge it before #6535

@stale stale bot removed the Status: stale Issue will be considered inactive soon label Mar 14, 2020
@ericholscher
Copy link
Member

I think we can close this, and take a larger approach addressing a full removal of this code at some point.

@stsewd stsewd deleted the humitos/only-copy-html-to-webs branch April 16, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants