Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the repo off of the server #433

Closed
tobias opened this issue Dec 4, 2015 · 6 comments
Closed

Move the repo off of the server #433

tobias opened this issue Dec 4, 2015 · 6 comments

Comments

@tobias
Copy link
Member

tobias commented Dec 4, 2015

Almost all of the usage of clojars (> 99%) is reading from the
repo. The repo is currently served as static files by nginx. If the
server goes down, the repo cannot be read. I would like to move the
repo off of the server and on to something that isn't affected by the
state of the server. A blobstore system (Amazon S3, Rackspace CBS,
etc) would be one approach, but I'm curious if there are other
options.

Whatever we move to would need the following features:

  • an API for uploading artifacts
  • an API for reading download statistics
  • an API for removing entries (rare, but we'd need to do that when
    deleting artifacts)
  • it would need to be directly accessible over HTTP

Known impacts of this change would be:

  • Deploying an artifact would have a higher chance of failure, since
    we're replacing what is essentially a filesystem move operation with
    a network operation. This would require making deployments
    transactional, something that is already planned:
    Deployment is not atomic #226
  • We currently read project/artifact data from the pom files for
    display in the UI (per request). Since the poms would no longer be
    on disk, we would need to parse them at deploy time and insert that
    data in the db.
  • We currently provide rsync access to the repo
    (https://github.com/clojars/clojars-web/wiki/Data#rsync-the-whole-classic-repository) -
    this would no longer be available. {{I'm curious how many people use
    this? I see network spikes a couple of times a day which makes me
    think at least a couple of people are rsync'ing}}

We already have code in place for uploading to s3, since it's
currently used for the releases.clojars.org repository.

@danielcompton
Copy link
Member

If we're going to provide direct access (instead of through a CDN), then whatever block store we choose will need to support HTTPS to Java 6.

@tobias
Copy link
Member Author

tobias commented Jan 10, 2016

More thoughts on this:

I think it makes sense to point a separate domain at the off-disk repo
(repo.clojars.org or somesuch). We could then completely separate the
repo and server (other than DNS), since we wouldn't need something
routing GETs against clojars.org/repo to the repo, and PUTs to the
server. We could update boot & lein to use the repo domain.

If we then also keep a copy of the repo on disk, it gives us the
following advantages:

  • older clients/configs could continue to use the on disk repo without
    issue (at the sacrifice of reliability, which would really remain at
    current levels)
  • rsync would still work
  • we can decouple publishing to the off disk repo from deployment, and
    make it a background process
  • processes that need the artifacts local could be left alone for this
    implementation (pom parsing, signature checking, etc) since the
    artifacts would still be available locally

This sounds quite a bit like the original implementation of the
releases repo, except we would be pushing everything, not just
artifacts that met certain criteria (promotion was originally
automatic, but became a manual process when it exposed concurrency
issues with sqlite).

Disadvantages:

  • deployments may not end up in the off disk repo right away, even
    though they would appear to be successfully deployed
  • we'd have another process to maintain

@neverfox
Copy link

neverfox commented Feb 2, 2016

👍 Could have used this today ;)

tobias added a commit that referenced this issue Apr 9, 2016
This provides a tool to upload the existing repo to cloudfiles, and it
uploads new deploys there as well. If that upload fails, it won't yet
fail the deploy, since this is just experimental at this point.
tobias added a commit that referenced this issue Apr 30, 2016
This will allow non-clojure scripts to upload files (the pom list, maven
index files, etc).
tobias added a commit that referenced this issue May 1, 2016
Otherwise, they end up with "application/unknown".
tobias added a commit that referenced this issue May 5, 2016
Once we move to the cloudfiles repo fully, we won't be able to load the
pom files from disk, so we now store the data we need from the pom in
the db at deploy time.

This also includes a tool to update the db from existing poms.
@tobias tobias self-assigned this May 7, 2016
@tobias
Copy link
Member Author

tobias commented Jun 20, 2016

An update on where this is at the moment:

  • the existing repo has been uploaded to cloudfiles
  • the next deploy will include:
    • each new deploy will be uploaded to cloudfiles in addition to the on-disk repo
    • artifact metadata will be stored/read from the db instead of from pom files on-disk
  • we're working with Fastly to serve the repo from their CDN, with a custom SSL certificate which will allow us to support Java 6 clients
  • there are a few related issues that need to be resolved before we can move away from the on-disk repo completely (see the off-disk-repo tag, with more issues to come)

@tobias
Copy link
Member Author

tobias commented Sep 21, 2016

Given the issues with the indexer (#534) and the fact that we'd lose rsync, we're keeping a copy of the repo on-disk for now. We have a protocol for storage (on the https://github.com/clojars/clojars-web/tree/storage-protocol branch) that handles repo operations and strives to keep the on-disk and cloudfiles versions in sync.

@tobias
Copy link
Member Author

tobias commented Jan 8, 2017

This is done.

@tobias tobias closed this as completed Jan 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants