-
Notifications
You must be signed in to change notification settings - Fork 64
Installing 4CAT
🎥 A video walkthrough installing 4CAT via Docker can be found on YouTube here.
You can install 4CAT on your local machine or a server (see also our guidelines for what type of hardware you need). This can be useful if you want to capture data from various online platforms, import and process data collected with Zeeschuimer, or otherwise analyse data you've captured previously. This page describes how you can install 4CAT and run it. Here you can review a full list of available data sources available.
Please note that various data sources require additional configuration after installation. It is recommended to go to the Control Panel's "Settings" page after installation to complete 4CAT's configuration.
The recommended method is to use Docker.
- Install Docker Desktop, and start it. Note that on Windows, you may need to ensure that WSL (Windows Subsystem for Linux) integration is enabled in Docker. You can find this in the Docker setting in Settings -> Resources-> WSL Integration -> Enable integration with required distros.
Simple: Use 4CAT's Docker Hub Image
- Download our
docker-compose.yml
file located here and our.env
file located here (you can use your browser's "Save page as" to download these files*). Make a folder in a stable location to save them in.- Optionally edit the
.env
file to change database passwords, server name, or ports-
DOCKER_TAG
can be set tostable
for the latest official release,latest
for the current under development version (which may have bugs AND bug fixes), or any previous release.
-
- Optionally edit the
- Open a terminal window and navigate to the folder you saved the files in (e.g.
cd ~/Documents/4cat
to change directories to your Documents/4cat folder).-
Sometimes downloading the
.env
file will be renamedenv.txt
which will not work.ls -a
will show you the files in the folder you are currently in. If the file is namedenv.txt
, you can rename it yourself with the commandmv env.txt .env
(newer Mac OS will not let you name a file starting with a.
so you will need to use Terminal andmv env.txt .env
from the same directory/folder).
-
Sometimes downloading the
- Run the command
docker-compose up --detach
in your Terminal/Command Prompt from the folder where you downloaded the above files- You may need to run this command as an administrator by either adding
sudo
to the beginning of the command (Mac/Linux, i.e.sudo docker-compose up --detach
; you will be asked for your password) or right clicking your Command Prompt and using "Run as administrator" (Windows) to open it.
- You may need to run this command as an administrator by either adding
- Congrats! Open http://localhost/ in your browser of choice to start using 4CAT.
- http://localhost/ opens an incorrect page such as the Docker welcome page (from their starter project). Try pressing F5 to reload or clearing your browser cache and then reloading the page.
Note: you can of course edit files directly in the Docker containers from the simple install, however, these will be lost if you rebuild the containers.
- Clone the 4CAT repository, or download the most recent release and unzip it.
- In a terminal/command prompt, navigate to the folder in which you just installed 4CAT (the folder that contains the
docker-compose_build.yml
file) - Run the command
docker-compose -f docker-compose_build.yml build
to build from your local 4CAT files - If this is the first time you're starting the Docker container, it will take a while for all components to be built.
- Run the command
docker-compose -f docker-compose_build.yml up -d
to start your container for the first time. - Once this is done, you can access the 4CAT interface via
http://localhost:80
. - You can now edit files locally and rebuild your 4CAT version via
docker-compose down
thendocker-compose up --build -d
(or just edit directly in the docker containers, restarting the backend viapython3 4cat-daemon.py restart
).
Note: if your computer/server is already using some of the same ports that Docker wishes to use, you can modify the .env
file in the home directory and change the ports that Docker uses. Any modifications to configuration files will require you to rebuild the docker images with docker-compose -f docker-compose_build.yml up --build
.
With Docker, 4CAT is set to host itself on localhost:80 by default. This can be modified in the .env
file located in the main directory. We recommend using Nginx or Apache for both performance and security if you plan on serving it to multiple users and/or at a public domain. SERVER_NAME
in the .env
file is only used by 4CAT when first installing, afterwards, you can update it via 4CAT's UI to your domain or IP address. Once you set this up, you can further configure 4CAT to take various proxy headers into account via Control Panel -> Settings -> Flask settings -> Use proxy headers for URL.
# Modify SERVER_NAME and/or PUBLIC_PORT to make 4CAT available externally
SERVER_NAME = 4cat.example.com # You could also use your server's IP address
PUBLIC_PORT = 80
# This example would allow you to navigate to http://4cat.example.com (or http://4cat.example.com:80) and access your version of 4CAT
If you cannot or don't want to use Docker, you can run 4CAT directly from the code rather than via Docker. This requires more set-up and the manual installation of various dependencies, but can be useful if you want to develop data sources or processors for 4CAT.
It is recommended that you run 4CAT on a UNIX-like system (e.g. Linux or MacOS). It will also run under Windows, but the instructions below are written with a UNIX-like in mind. 4CAT further requires Python 3.8 and PostgreSQL 9.5. Lower versions of either may work, but are not officially supported.
Clone the repository somewhere:
git clone https://www.github.com/digitalmethodsinitiative/4cat.git
After cloning the repository, copy config/config.ini-example
to config/config.ini
and edit
the file to match your machine's configuration. The various options are
explained in the file itself:
cd 4cat/config
cp config.ini-example config.ini
nano config.ini
Next, install the dependencies. On Linux systems that use apt
, the following
should suffice:
apt install python3-pip libpq-dev python3-dev postgresql-server-dev-all unzip postgresql-client ffmpeg
Adapt these to your own package manager (e.g. yum
, brew
, choco
) as necessary. From
here on, we are working within Python, so it is recommended you create a
virtual environment to install Python
packages and run 4CAT in. There are several ways to set up a virtual environment, the
link earlier in this paragraph lists the best practices.
Within your virtual environment, while in the 4CAT root folder, install the required Python packages:
pip3 install -r requirements.txt
Some of the dependencies may have their own dependencies. For instance, on
Windows the pyahocorasick
library needs to Microsoft Visual C++ Build
Tools to be installed.
If you encounter similar issues, please file an
issue!
Next, you should make sure a database is available for 4CAT. 4CAT requires a
PostgreSQL database to store dataset metadata, the job queue and other assorted
data. You should create the database yourself, and add the database login
details to config.ini
. After doing so, run the following command to create the
tables, indices, et cetera, required by 4CAT:
psql --user=[username] --dbname=[database name] < backend/database.sql
Replace [username]
and [database name]
with the relevant values. You may be
prompted for a password. On Windows, the following command should work:
psql -U [username] -d [database name] -a -f backend/database.sql
Finally, to make sure everything is in working order, run the following command and follow the instructions:
python3 helper-scripts/migrate.py
You can now run 4CAT!
The backend is run as a daemon that can be started and stopped using the
included 4cat-daemon.py
script:
python3 4cat-daemon.py start
Other valid arguments are stop
, restart
and status
. Note that if you
change any configuration options, you will need to restart
the daemon for the
changes to take effect. For development/testing it may be helpful to run
4cat-daemon.py interactively with the -i switch (i.e., python3 4cat-daemon.py -i start
).
This will log output to the terminal as well.
Note: The 4CAT daemon was made to run on a UNIX-like system and the above will not work on Windows. On Windows, the 4CAT daemon will always run interactively, and can be quit by entering 'q' and pressing Enter.
4CAT logs to 4cat.log
in the root folder by default.
The web tool is a Flask app. It is recommended that you run the web tool as a WSGI module: see the Flask documentation for more details. For testing and development, you can run the Flask app locally from the command line. For Mac:
FLASK_APP=webtool flask run
For Windows:
set FLASK_APP=webtool
flask run
With the default configuration, you can now navigate to
http://localhost:5000
where you'll find the web tool that allows you to query
the database and create datasets. On first visit you will be asked to create an
admin account through which the tool can be managed
Most settings are now accessible in the 4CAT database in the settings
table. You can update settings via the 4CAT web interface if you are an admin user by navigating to the "Control Panel" tab at the top of the interface and then via the buttons at the top of the page.
If you are unable to navigate to the 4CAT web interface with default settings (e.g., if you are deploying 4CAT on a server instead of localhost), you can also modify settings directly in the database.
- Connect to psql
-
psql --user=[username] --dbname=[database name]
(changing username and database name as appropriate)
-
- View settings
SELECT * FROM settings;
- or perhaps
SELECT * FROM settings WHERE name LIKE '%flask%';
to view flask settings
- Update as needed
UPDATE settings SET value = '"my.server.com"' WHERE name = 'flask.server_name';
UPDATE settings SET value = '["localhost", "my.server.com"]' WHERE name = 'flask.autologin.hostnames';
You may wish to redeploy 4CAT in the future using previously collected data. This can be done in principle though we of course cannot guarantee complete compatibility between datasets and analyses over time. As a general principle, we aim to keep the collected datasets as closely as possible to their original sources and we do not modify them. With that being said, sometimes the original sources modify their data and so datasets can contain different information depending on when they were collected.
The two most important pieces needed are your 4CAT database and the accompanying dataset files. There is also a version file that will let the migrate process know from what version your previous 4CAT instances needs to be updated.
- 4CAT database: stored in the Docker volume listed under
db
service in thedocker-compose.yml
file used (default name is4cat_db
) - 4CAT dataset files: stored in the Docker volume listed under
backend
andfrontend
services in thedocker-compose.yml
file used (default name is4cat_data
) - 4CAT
config/.current-version
file: stored in the Docker volume listed underbackend
andfrontend
services in thedocker-compose.yml
file used (default name is4cat_share
)
If you still have these volumes, you can download a new .env
file and new docker-compose.yml
file and install normally. If the volumes have not been deleted from Docker and are named them same as in your new docker-compose.yml
file, then when setting up the new 4CAT instance, the installation process should use these existing files (docker volume ls
or docker system df -v
to find more information). If you have copied/moved the volumes from a different source/system, you can modify the docker-compose.yml
file to point to the directory containing those files (e.g. /path/to/local/dir:/usr/src/app/data/
for the 4cat_data
volume). Use docker-compose pull
to ensure you have the desired version (as you define in .env
) and docker-compose up -d
to redeploy 4CAT.
- 4CAT database: this PSQL database was created by you; edit the
config/config.ini
file database section to connect to this database - 4CAT dataset files: files are stored in the
path_data
folder listed in theconfig/config.ini
file - 4CAT
config/.current-version
file: file stored in your 4CAT working directory in the config folder:config/.current-version
. Once you have these files, you can upgrade by following these instructions. Themigrate.py
step will ensure you database is upgraded from the version listed in.current-version
.
🐈🐈🐈🐈