This monorepo contains all relevant code for the ReCLAIM platform. The repository is structured into multiple packages.
ReCLAIM.Demo.Screencast.mp4
The monorepo uses turbo to manage the development environment. To install turbo, run the following command:
npm install turbo --global
To install all dependencies, run the following command:
turbo install
This will create the necessary virtual environments and install all dependencies.
Please also follow these steps:
- Create a
.env
file in the project root, by copying the.env.example
file. Fill in the necessary environment variables. Do not check the .env file into a versioning system, as it may contain secret information - Create symlinks into all subfolders. On a unix based system, you can use
ln -s ../.env .env
in each subfolder. - (If you are using the centralized neo4j database, you can skip this step)
If you are using a local neo4j database, you can start it by runningdocker-compose up dev_db
in thebackend
folder. For the search to work, the fulltext index must be created. Create it by running the following command in thebackend
folder:poetry run python createPrimaryFulltextIndex.py
.
To start the development environment, run the following command:
turbo dev
This will start the frontend and backend in development mode.
For streamlined development of both frontend and backend, we automatically generate a REST client for the backend API. To generate the client explicitly, run the following command:
turbo run generate:client
This requires the backend to be running.
To also create a development instance of our neo4j database, either run turbo db:up
to only start a detached neo4j database, or run turbo dx
to start the database as well as starting the server.
We use pre-commit to ensure that all code is formatted correctly. To install pre-commit, run the following command:
brew install pre-commit
using homebrew on MacOS, or
pip install pre-commit
otherwise.
To install the pre-commit hooks, run the following command:
pre-commit install
This will install the pre-commit hooks, which will run every time you commit code. If the hooks fail, the commit will be aborted. To run the hooks manually, run the following command:
pre-commit run --all-files
The project is hosted on our server at DHC-Lab. To deploy the latest version of the project from main and seed the database with the data from the current ETL scripts, log in to the server as root (type sudo -i
), and type:
./deploy
To deploy without re-seeding the neo4j database and executing the ETL scripts, type:
./deploy_without_seed
The deployment script relies on the docker-compose.yml file being placed in the same directory, which defines network and volume information as well as the environment variables for the containers. Both the script and the docker-compose.yml should be placed above the project root. The Dockerfiles for the frontend and backend are under kunstgraph/frontend/Dockerfile and kunstgraph/backend/Dockerfile respectively.
Requirements:
- Python 3.12
- Poetry for Python
To set up the project for deployment on a new Linux server, make sure Python 3.12 is installed. Then, install Poetry for Python by running curl -sSL https://install.python-poetry.org | python3 -
(refer to https://python-poetry.org/ for updates on the instructions, should they change). To clone the repository, it is recommended to add a GitHub ssh deploy key, such that the deployment scripts can fetch the repository.
It might be necessary to add poetry to PATH. Alternatively, run poetry using the explicit path.
After installation and cloning the project repository, install both the backend
as well as the etl
poetry projects in the backend
project venv by running the following commands:
cd kunstgraph/backend
poetry install
poetry shell
cd ../etl
poetry install
exit
Copy the deployment scripts into the parent folder:
cp deploy deploy_without_seed ..
cd ..
chmod +x deploy
chmod +x deploy_without_seed
You can now simply deploy the newest version by running either ./deploy
or ./deploy_without_seed
.
When deploying the neo4j docker instance, the password for the database access defaults to "PASSWORD" in this repo. To change it, change the NEO4J_PASSWORD environment variable both in the deploy
script and docker-compose.yml
file, as well as the NEO4J_AUTH password part in the docker-compose.yml
file to your desired password.
The backend is a Python FastAPI application that provides an API to extract data from the Neo4j database and the ontology. The backend also provides an API to search for cultural assets and other entities.
You can find the backend code in the backend
folder.
Instructions to run the backend can be found in the README.md file.
The frontend is written using React and Next.js. It provides a user interface to search for cultural assets and other entities
You can find the frontend code in the frontend
folder.
Instructions to run the frontend can be found in the README.md in the frontend
folder.
The matching package contains code for the matching of cultural assets. The code is written in Python using PyTorch. Instructions to run the matching code can be found in the README.md file.
The ETL package contains code to extract, transform and load data from the CSV files provided into the Neo4j database. Instructions to run the ETL code can be found in the README.md file.
The ontology package contains code to build the ontology we created for the project. Instructions to run the ontology code can be found in the README.md file.
The scraper package contains all code relevant to scraping information from different websites for the project. Instructions to run the scraper code can be found in the README.md file.
The delab package contains utility functions for running jobs on the DE-Lab cluster.
When developing e2e-tests using playwright, you should put them into the e2e_tests
folder.
Instructions to run the code can be found in the README.md file.
The raw data could not be made publicly available as part of the project repository. To add data sources for integration, use the data
folder.