The datachile project has been configured to run on docker containers. Make sure git
, docker
and docker-compose
are installed and running in the machine you intend to setup datachile. You also must have previously prepared the DNS configuration for the domain this machine will be located on. This is a required step, as the virtual hosts are configured to run for a domain.
The deployment procedure can be divided in 2 steps: preparation, and setup; however, it's convenient you understand the internal structure in each container before you start the deployment.
Each container runs a different part of the project, and has some common elements that connects them.
As part of the setup procedure, a restui image is built, ran, and then deleted. This image is based in the mondrian-rest-ui repository, specifically in the Dockerfile present in it. The files resulting from the build are stored in a docker volume, which is then shared with the nginx container.
The certbot container contains an instance of EFF's Certbot, to get the SSL certificates for the domain. This container runs on setup but doesn't do anything; it exits with code (1) immediately. Instead, this container is meant to be run from the outside with docker-compose run certbot
commands.
The obtained certificates are saved in a docker volume, and shared with the nginx container.
The db container runs a postgres instance, based on the official postgres:latest
docker image, which will contain the main database.
According to the instructions for the official postgres image, the first time the container is ran, it checks for the database files in the internal /var/lib/postgresql
folder (which is mounted as a external volume from /datastore/postgres
).
If the files needed don't exist, it will create them, and then run all the scripts in the internal /docker-entrypoint-initdb.d
folder (which is mounted from the external ./db/init.d
folder), else it will just run the postgres instance as normally.
Additionally, the folders /datastore/dumps
and /datastore/shared
are mounted in the internal /app/dumps
and /app/shared
folders of the container, so you can export and import dumps and other files from other containers and with the host machine.
The mondrian container runs an instance of mondrian-rest
, built from the datachile-mondrian repository. The container will mount the external ./mondrian/schema.xml
and ./mondrian/config.yaml
files, so make sure both exists. They're not included in the repo by default, but you can rename config.yaml.example
if you haven't done modifications to the db
container. The schema.xml
can be obtained from the datachile-mondrian repository, and you can update it and restart the container to apply the changes.
This container also mounts /datastore/shared
into the internal /app/shared
folder.
The canon container runs the frontend of the datachile project. Canon is the name of the framework datachile is built upon. When the image is built on the first run, the needed packages to run canon will be installed. To run the container, make sure all the needed environment variables are set in the docker-compose.yml
file.
The container mounts a docker volume to save the static files, so they can be served directly by nginx, and the /datastore/shared
folder into the internal /app/shared
path.
The nginx container connects all the other containers and expose the needed content to the internet. Only ports 80 and 443 are exposed, and by default all http traffic to port 80 is redirected to the https protocol. The nginx container will run when the canon, restui, and mondrian containers start correctly.
The configuration files are located in ./nginx/
, and are classified on 3 folders:
hosts/
, which is mounted in/etc/nginx/conf.d/
, contains the configuration for the virtual hosts nginx will handle.snippets/
, mounted in/etc/nginx/snippets/
, contains shared directives between various hosts files.ssl/
, mounted in/etc/nginx/ssl/
, contains the snippets that handle the path to the ssl certificates each virtual host needs.
Besides the volumes previously mentioned, there's a docker volume to handle the acme challenge files, the external /datastore/cache-canon
and /datastore/cache-mondrian
folders are mounted to the internal /ncache/canon
and /ncache/mondrian
to save the cache for canon and mondrian requests, and the external /datastore/shared
folder is mounted to the internal /app/shared
to access files from other containers.
Before meddling with docker containers, you must make sure the configuration files are correctly set. There are some example files that you can just rename to the needed file, and there are some other files that will need to be edited and/or downloaded from other repository.
Create a datastore
folder at the root level of the filesystem. This folder will contain all the big files needed to run the containers, and some other files with high I/O. It's recommended a SSD is mounted in this folder.
When the folder is prepared, create the cache-canon
, cache-mondrian
, dumps
, postgres
, public
, and shared
folders inside.
To initialize the database you will need to ingest the data for mondrian. You can use the ETL procedure, but if there's another database instance running in another server, you can dump that database and ingest it in this machine using the files included in this repository. The command to export the database to a file from another postgres database is:
pg_dump --no-acl --no-owner --file=/absolute/path/to/file.sql <dbname>
This will generate a file.sql in the intended path. The filename doesn't matter, but it should have .sql
extension. Move this file to /datastore/dumps
, and the script will ingest it automatically. Inside the db container, the file will be available on /app/dumps
.
The 01-init.sh
will create the database, the user, and will insert that file. You can change the connection data for that user here, but remember to also update it in the ./mondrian/config.yaml
file. For more information on what the 01-init.sh
file does, check its readme file.
This folder needs 2 files:
- A
config.yaml
, with the required info to connect to the database. You can rename theconfig.yaml.example
file an use it if you haven't changed anything in./db/init.d/01-init.sh
. - A
schema.xml
, with the mondrian cube schema to be used. You can use the one available in the datachile-mondrian repository.
On the first run, Mondrian will require to create a few functions and extensions in the database. To do this, the connection used must be for a superuser. The init.sh
script will take care of the replacement, so take it into account if you change the access credentials for the default postgres superuser. The file ./mondrian/config.su.yaml
has the connection parameters for this.
Make sure the environment variables are correctly set. No other files are needed here.
Check the hosts are correctly set. All relative paths use the internal /etc/nginx/
folder as base, absolute paths start with a /
. Especially important is to check that the server_name
directives match the root domain where this instance will run, the ./nginx/ssl/<root_domain>
exist, and the virtual hosts refer to this file as intended.
The included virtual host files suppose chilecube
(the endpoint where the mondrian-rest api is available) will run as a subdomain of the root domain. If that's not the case, make the needed modifications, and don't execute the normal setup from the next section.
This repository contains a ./init_local.sh
file to setup the environment on a local machine, and a ./init_prod.sh
file for production, both with the steps needed to run the setup. This setup includes steps like making the subfolders in the /datastore
folder, initializing the database, building the docker images needed, running the containers and getting the SSL certificates. You can comment parts of the scripts if needed.
If everything is configured correctly, run the following command:
# on a local machine
$ bash ./init_local.sh
# on a production machine
$ bash ./init_prod.sh datachile.io
Make sure you run the command through bash
, don't source
it.
For the local machine, the domain to test is datachile.localhost
. If your browser can't handle the redirection to localhost, update your hosts file accordingly.
On the production script, the first argument, datachile.io
is the root domain where this instance is running. This will do the procedure needed to get the certificates from the Let's Encrypt Authority.
As mentioned in the previous section, if the chilecube
endpoint won't be hosted as a subdomain of the same root domain, you can't run this file directly. Check the scripts to understand the procedure.