Skip to content

Disaster recovery instructions

Eric Carmichael edited this page Jul 23, 2019 · 4 revisions

Codalab backup recovery instructions

From VM snapshot

These are brief instructions, for more thorough instructions see Restoring from VM Snapshot Guide.

Create a new machine using the snapshot. If you need to restore the database see below instructions, from step 3 "upload"

Once you have the VM created, SSH in and edit the .env to have the right settings for the current domain/IP you're running from, which include:

  • BROKER_URL and RABBITMQ_HOST will need to contain the full dns name instead of rabbit
  • SSL_ALLOWED_HOSTS
  • CODALAB_SITE_DOMAIN should be something like competitions.codalab.org
  • CHAHUB_API_URL and CHAHUB_API_KEY

Non-vm snapshot instructions

Step 1 - Download

Download backup from storage, this depends on your storage endpoint. Will be different for Minio, S3, Azure, Google Cloud Storage.

In our case, for production, backups are accessible via SSH into our minio server. Backups are stored in /path/to/private/bucket/backups.

Step 2 - Setup new server

In your new server...

Setup docker for your platform and docker-compose

Clone the codalab-competitions github:

$ git clone https://github.com/codalab/codalab-competitions.git
$ cd codalab-competitions

Upload your SSL certificates to /path/to/codalab-competitions/certs

Make a docker-compose.override.yml according to these instructions

If you have an old .env

Put it in /path/to/codalab-competitions/.env

Check with your administrator for a backed up .env

If you don't have previous .env...

cp .env_production_sample .env

Setup the initial environment variables

Pay particular attention to:

  • BROKER_URL and RABBITMQ_HOST will need to contain the full dns name instead of rabbit
  • RABBITMQ_DEFAULT_USER and RABBITMQ_DEFAULT_PASS
  • FLOWER_BASIC_AUTH
  • EMAIL_HOST, EMAIL_HOST_USER, etc. email settings
  • ADMINS
  • SSL_CERTIFICATE
  • SSL_CERTIFICATE_KEY
  • SSL_ALLOWED_HOSTS
  • CODALAB_SITE_DOMAIN should be something like competitions.codalab.org
  • CHAHUB_API_URL and CHAHUB_API_KEY

Make sure the following ports are open (respecting your .env settings):

  • 80
  • 443
  • 5555
  • 5671/5672
  • 15671/15672
docker-compose up -d

Step 3 - Upload

Upload your backup to /path/to/codalab-competitions/backups, this directory is accessible by the Postgres docker

Using DB_NAME and other setings from .env...

The backup we uploaded will be available from inside the container at /app/backups/<filename>.dump

# from in codalab-competitions directory
$ docker-compose exec postgres bash

container$ dropdb $DB_NAME -U $DB_USER
container$ createdb $DB_NAME
container$ pg_restore -U $DB_USER -d $DB_NAME -1 /app/backups/<filename>.dump

Step 4 - Finalize & Test

Last step, to fix a Rabbit problem, is to set the login password properly. Set the guest account password to the same as the RABBITMQ_DEFAULT_PASS variable from .env. You do this in the RabbitMQ management portal: http://<your domain>:15672/#/users/guest (or whatever your rabbitmq management port is)

Then in the "Update this user" section you can configure the password.

Hand testing, basics

  1. Make sure all ports are open.
  2. Create a competition.
  3. Create a queue. (confirms rabbit connection, do not need to make compute worker)
  4. Make submission to competition

Hand testing, complete

  1. Make sure all ports are open.
  2. Create a competition.
  3. Create a queue.
  4. Edit the competition to point to the queue.
  5. Create a compute worker machine somewhere pointing to this BROKER_URL.

New auto-testing script NOT READY!

$ sudo apt-get install -y python3-pip
$ pip3 install tqdm requests
$ python3 /path/to/codalab-competitions/scripts/test_restored_instance.py -w True -d <domain> -u <username> -p <password>
Clone this wiki locally