-
Notifications
You must be signed in to change notification settings - Fork 48
Installation Guide
This guide is for modern, UNIX-based systems such as Linux and macOS. (I recommend Ubuntu 22.04.)
You must have these installed:
- Python 3.8+ with
pip
- (Ubuntu 22:
apt install python3-pip && pip3 install --upgrade pip
)
- (Ubuntu 22:
- Postgres 9.5+ with support for
psycopg2
- (Ubuntu 22:
apt install postgresql
)
- (Ubuntu 22:
Switch to the user for Postgres. Typically, this shell command will work:
su postgres
Next, make the Gutendex database:
createdb gutendex
(You can replace gutendex
above with something else if you update the
DATABASE_NAME
environment variable accordingly, described below.)
Make a Postgres user for the database, and remember the password that you enter:
createuser -P gutendex
(You can also replace gutendex
above with another user name, updating the DATABASE_USER
environment variable below.)
Now, open Postgres on the command line with
psql
and enter this SQL:
GRANT ALL PRIVILEGES ON DATABASE gutendex TO gutendex;
(If you entered your own database and/or user name earlier, replace gutendex
above with them,
respectively.)
Exit Postgres by pressing ctrl+d
on your keyboard.
Switch back to the root user by pressing ctrl+d
again.
Python packages required by Gutendex are listed with their version numbers in requirements.txt
.
I recommend that you install these with pip in a virtual environment created with Virtualenv. You can do this as the root user in the Gutendex root directory like this:
pip install virtualenv
virtualenv -p python3 venv
source venv/bin/activate
pip install -r requirements.txt
This creates a new directory, venv
, which holds the files necessary files for the virtual
environment, and activates that environment. Later, when you are done working with Gutendex, you can
enter the deactivate
command to exit the virtual environment.
A number of environment variables are used by Gutendex, and they are listed in
gutendex/.env.template
.
I recommend that you copy this file to a new file called .env
and edit the values after the =
sign on each line with the proper data. The Django project will automatically read this file when
the server starts.
Some of the variables require a way to send email. I recommend Mailgun.
Descriptions of each required variable are below.
This is a list of email addresses of the project administrators in ADMIN_NAMES
. Addresses should
be separated by commas and be in the same quantity and order as the names in ADMIN_NAMES
.
This is a list of names of project administrators that will be emailed with catalog download logs and various Django messages, such as security warnings. Names should be separated with commas.
This is a list of domains and IP addresses on which you allow Gutendex to be served. Domains should
be separated by commas. To allow any subdomain of a domain, add a .
before the domain (e.g.
.gutendex.com
allows gutendex.com
, api.gutendex.com
, etc.). I recommend including 127.0.0.1
and/or localhost
for development and testing on your local machine.
This is the domain or IP address on which your Postgres database runs. It is typically 127.0.0.1
for local databases.
This is the name of the Postgres database that you used. (Instructions for creating this database
are above.) I recommend gutendex
.
This is the password for DATABASE_USER
.
This is the port number on which the Gutendex database runs. This will typically be 5432
.
This is the name of a database user with all privileges for the Gutendex database. I recommend
gutendex
.
This is a Django setting for displaying useful debugging information. true
will show this
information in API responses when errors occur during development. It is important for security
that you set this to false
before serving Gutendex on a public address.
This is the address of the SMTP
server that Gutendex will try to use when sending email. (For Mailgun, this is smtp.mailgun.org
.)
This is the email address where Gutendex email will appear to come from. (For Mailgun, this can probably be any email address.)
This is the password for the the below EMAIL_HOST_USER
. (For Mailgun, this is the 'Default Password' value in your Domain Information.)
This is the user name that Gutendex will use to send email from the SMTP server in EMAIL_HOST
. (For Mailgun, this is the 'Default SMTP Login' value in your Domain Information.)
This is a list of email addresses of the website managers in MANAGER_NAMES
. Addresses should be
separated by commas and be in the same quantity and order as the names in MANAGER_NAMES
.
This is a list of names of website managers that can be emailed with various Django messages. Names should be separated with commas.
This is the path to a server directory where any API user media (currently nothing) can be stored.
This is a password that Django uses for security reasons. It should be a long string of characters that should be kept secret. You do not need to copy or remember it anywhere.
This is the path to a server directory where website assets, such as CSS styles for HTML pages, are stored.
cd
to the root directory of the project.
Set up the database for storing the catalog data. Run this in the virtual environment mentioned above:
./manage.py migrate
Enter the Project Gutenberg catalog data into the Gutendex database. This takes a long time (several minutes on my machine):
./manage.py updatecatalog
This downloads a file archive of Project Gutenberg's catalog data and decompresses the files into a
new directory, catalog_files
. It places the contained files in catalog_files/rdf
, and it stores
a log in catalog_files/log
and emails it to the administrators in the environment variables
mentioned above.
If your database already contains catalog data, the above command will update it with any new or
updated data from Project Gutenberg. I recommend that you schedule this command to run on your
server daily – for example, using cron
on Unix-like machines – to keep your database up-to-date.
To show styled HTML pages (i.e. the home page and error pages), you must put the necessary stylesheets into a static-file directory:
./manage.py collectstatic
Now you can serve your Django project. On your local machine, you can run do this with the following command for development and testing purposes:
./manage.py runserver
In a production environment, I recommend using the Apache v2 HTTP Server instead. You can install this on Ubuntu 22 for use with Django with the following command:
apt install apache2 libapache2-mod-wsgi-py3
Next, you need to configure Apache to serve
- static files,
-
robots.txt
, - any future user media, and
- the web API itself.
You can do this by editing the file /etc/apache2/sites-available/000-default.conf
on your server
and adding the following lines before the line containing </VirtualHost>
, but replacing
/path/to/gutendex
to the Gutendex path on your server:
Alias /static /path/to/gutendex/static
<Directory /path/to/gutendex/static>
Require all granted
</Directory>
Alias /robots.txt /path/to/gutendex/static/robots.txt
Alias /media /path/to/gutendex/media
<Directory /path/to/gutendex/media>
Require all granted
</Directory>
<Directory /path/to/gutendex/gutendex>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
WSGIDaemonProcess gutendex python-home=/path/to/gutendex/venv python-path=/path/to/gutendex
WSGIProcessGroup gutendex
WSGIScriptAlias / /path/to/gutendex/gutendex/wsgi.py
You also need to give Apache's web server user permission to access the Gutendex files. You can do
this with the following command, again replacing /path/to/gutendex
to the Gutendex path on your
server:
chown :www-data /path/to/gutendex
You can now serve Gutendex with this command:
service apache2 restart
You should also collect the static files and run the above command again whenever you add or update Gutendex files.