Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the GitHub proxy with git over CORS proxy #1467

Closed
adamziel opened this issue May 28, 2024 · 7 comments
Closed

Replace the GitHub proxy with git over CORS proxy #1467

adamziel opened this issue May 28, 2024 · 7 comments

Comments

@adamziel
Copy link
Collaborator

adamziel commented May 28, 2024

Let's provide a scalable way for Playground to interact with remote resources and GitHub repositories. The current one, github-proxy.com, might be challenging to scale in the longer term.

With JavaScript git clone sparse checkout support and a PHP CORS Proxy, the footprint would be as small as it gets.

  • Start a new site, e.g. playground-proxy.wordpress.net to separate this from the Playground server.
  • Deploy the PHP CORS Proxy there.
  • Stress-test it with 2000 clones per hour to verify it's a viable approach.
  • Set up rate limiting.
  • Perhaps restrict the CORS proxy to git URLs for now.
  • Map out a project of shutting down the plugin-proxy.php endpoint in favor of that CORS proxy.
  • For git operations, run git clone via JavaScript(), do a sparse checkout as needed.
  • Create a new git resource type for referencing git repositories.
  • Rewrite github-proxy.com and github.com references in Blueprints as the new git resource type.
  • Document the official and recommended way of interacting with git.

Optionally, we could migrate the PHP CORS proxy to Async\HttpClient to give that library more real-world testing.

GitHub rate-limiting shouldn't be a problem for those git clone operations.

Long term, we might run into issues with the actual throughput. If and when that happens, let’s explore edge-caching zip files, queuing and rate limiting requests, size limits for cloned repos and artifacts, maybe keeping hot clones of 10 most popular repos etc.

@adamziel adamziel added this to the Innovative Developer Tools milestone May 28, 2024
@adamziel adamziel changed the title Scaling GotHub proxy Scaling GitHub proxy May 28, 2024
@adamziel
Copy link
Collaborator Author

adamziel commented Jun 21, 2024

Let's move all the Git-related computations (fetching, computing deltas, decompression etc.) into the browser – no server, no problem (with CPUs, memory, or storage). PHP and JavaScript can talk to Git directly. We'll still need a CORS proxy to make the repositories accessible via fetch(), but that's much easier to scale and maintain.

For v1 we need:

  • Git sparse checkout support – I've prototyped it here: https://github.com/adamziel/git-sparse-checkout-in-js/tree/trunk. In the long run we could port it to PHP and allow using it in WordPress core. See also my explainer blog post.
  • A CORS proxy – we'll need to host a streaming CORS proxy on a separate domain, like cors.playground.wordpress.net or even cors-playground.wordpress.net or playground-cors.com to avoid any subdomains-related risks.

@adamziel
Copy link
Collaborator Author

I think we could have the same for SVN. We’d need to implement svn+ssh over a CORS proxy which would take some time so I’m not prioritizing it, but it’s a nice future option to have.

Also porting that to PHP would unlock installing WordPress plugins from the latest trunk version and updating them on new commit. Probably not good for production sites, but I can think of a few use-cases already.

@adamziel adamziel changed the title Scaling GitHub proxy Replace GitHub proxy with git over CORS proxy Jun 28, 2024
adamziel added a commit that referenced this issue Jun 29, 2024
Work in progress.

To integrate [git clone](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/)
via `fetch()`, we need a CORS proxy. This PR explores an implementation.

Assumptions:

* It will run on a separate hostname – ideally not even a subdomain
* No auth headers should make it through either way
* No requests to private IPs
* Stream data both ways, don't buffer

Remaining work:

* For now, refuse to process non-GET non-POST non-OPTIONS requests
* Refuse to process POST request body larger than, say, 100KB
* Refuse to process responses larger than, say, 100MB
* Smart rate limiting
* Support for query args
* More unit tests

See #1467
@adamziel adamziel moved this to Project: Up Soon in Playground Board Jun 30, 2024
@adamziel adamziel moved this from Project: Triage to Project: Up soon in Playground Board Jul 1, 2024
adamziel added a commit that referenced this issue Jul 12, 2024
## Description

Ships a PHP-based CORS proxy we'll need to integrate [git
clone](https://adamadam.blog/2024/06/21/cloning-a-git-repository-from-a-web-browser-using-fetch/
via `fetch()`).

### Usage

1. Run `dev.sh` to start a local server, then go to
`http://127.0.0.1:5263/proxy.php/https://w.org/` and confirm it worked.
2. Request `http://127.0.0.1:5263/proxy.php/https://w.org/?test=1` to
get the response from `https://w.org/?test=1` plus the CORS headers.

### Technical Design

Assumptions:

* Run on a separate hostname for increased origin separation, like
`playground-proxy.wordpress.net`. Do not use a subdomain, like
`proxy.playground.wordpress.net`.
* Stream data both ways, don't buffer.
* Don't pass auth headers in either direction.
* Refuse to request private IPs.
* Refuse to process non-GET non-POST non-OPTIONS requests.
* Refuse to process POST request body larger than, say, 100KB.
* Refuse to process responses larger than, say, 100MB.

## Follow-up work

* Start a server at `playground-proxy.wordpress.net`.
* Implement rate limiting (could be at the hosting platform level).

## Testing instructions

* Run `dev.sh` to start a local server, then go to
`http://127.0.0.1:5263/proxy.php/https://w.org/` and confirm it worked.
* Run `test.sh` to run PHPUnit tests, confirm they all pass.

See #1467
@adamziel
Copy link
Collaborator Author

adamziel commented Aug 2, 2024

Here's the way forward for this issue:

  • Start a new site, e.g. playground-proxy.wordpress.net.
  • Deploy the PHP CORS Proxy there.
  • Stress-test it with 2000 clones per hour to verify it's a viable approach.
  • Set up rate limiting.
  • Perhaps restrict the CORS proxy to git URLs for now.
  • Map out a project of shutting down the plugin-proxy.php endpoint in favor of that CORS proxy.
  • For git operations, run git clone via JavaScript(), do a sparse checkout as needed.
  • Create a new git resource type for referencing git repositories.
  • Rewrite github-proxy.com and github.com references in Blueprints as the new git resource type.
  • Document the official and recommended way of interacting with git.

Optionally, we could migrate the PHP CORS proxy to Async\HttpClient to give that library more real-world testing.


GitHub does not have specific rate-limiting for the regular git client operations:

"Git operations do not consume part of your API rate limit, as there are no API calls made to GitHub.com via the git client.

We don't have any hard limits for clones, though we may delay requests if they come in fast enough to potentially cause overload on one of our servers (this would be determined by the amount of load being placed on our servers at the time of the clones, and would have to exceptionally high to occur). So while Git operations do have dynamic limits, this might result in slower clones, but shouldn't cause any failures.

However, they do on REST API requests:

In terms of API requests: unauthenticated ones are typically limited to ~5000/hr per IP address, 5000/hr per authenticated user, and significantly lower for anonymous requests (typically ~100/hr or less)."

@adamziel
Copy link
Collaborator Author

@brandonpayton can we resolve this one now?

@adamziel adamziel moved this from Project: Up soon to In progress in Playground Board Oct 23, 2024
@brandonpayton
Copy link
Member

brandonpayton commented Oct 31, 2024

@brandonpayton can we resolve this one now?

@adamziel, I'm not sure how to answer because there are multiple kinds of things mentioned in this issue, including both the GitHub proxy and plugin-proxy.php.

It seems like the issue title "Replace the GitHub proxy with git over CORS proxy" could be considered complete because of GitDirectoryResource using cors-proxy.php automatically, but we still have plugin-proxy.php references like these for WP core and Gutenberg PR previews:

In case it helps close this issue, I created #1970 to track migration away from plugin-proxy.php.

There are also other CORS proxy changes to make in response to feedback from A8C Systems, including switching to a separate, dedicated domain.

@brandonpayton
Copy link
Member

Let's wait to close this until we're relying entirely on the CORS proxy.

@brandonpayton
Copy link
Member

Let's wait to close this until we're relying entirely on the CORS proxy.

Actually, I forgot that we created #1970 "Replace uses of plugin-proxy.php with generic CORS proxy" for that.

I'll go ahead and close this.

@github-project-automation github-project-automation bot moved this from In progress to Done in Playground Board Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

2 participants