- make sure you have python3 installed (https://docs.python-guide.org/starting/installation/)
- (Python 3.12 or newer is required)
- go to project root
- Run the following commands:
sudo apt install python3-dev python3-pip python3-venv libpq-dev -y
python3 -m venv .venv
source .venv/bin/activate
(on Linux Unix)
.venv\Scripts\activate.bat
(on Windows)
pip3 install poetry
poetry install
- Step 1: Make sure that you have Poetry v1.5.0+ installed
- for detailed instructions, please consult the Poetry Installation Guide
- Step 2: Open your terminal in the project root directory:
- Step 2.1: If you want to have your
.venv
to be created inside the project root directory:poetry config virtualenvs.in-project true
- (this is an optional, strictly personal preference)
- Step 2.1: If you want to have your
- Step 3: Install dependencies (according to
pyproject.toml
) by running:poetry install
If you have Docker installed, use docker-compose up
to start up the multi-container for Splash
and Playwright
-integration.
As a last step, set up your config variables by copying the .env.example
-file and modifying it if necessary:
cp converter/.env.example converter/.env
- A crawler can be run with
scrapy crawl <spider-name>
.- (It assumes that you have an edu-sharing v6.0+ instance in your
.env
settings configured which can accept the data.)
- (It assumes that you have an edu-sharing v6.0+ instance in your
- If a crawler has Scrapy Spider Contracts implemented, you can test those by running
scrapy check <spider-name>
git clone https://github.com/openeduhub/oeh-search-etl
cd oeh-search-etl
cp converter/.env.example .env
# modify .env with your edu sharing instance
export CRAWLER=your_crawler_id_spider # i.e. wirlernenonline_spider
docker compose build scrapy
docker compose up
- We use Scrapy as a framework. Please check out the guides for Scrapy spider (https://docs.scrapy.org/en/latest/intro/tutorial.html)
- To create a new spider, create a file inside
converter/spiders/<myname>_spider.py
- We recommend inheriting the
LomBase
class in order to get out-of-the-box support for our metadata model - You may also Inherit a Base Class for crawling data, if your site provides LRMI metadata, the
LrmiBase
is a good start. If your system provides an OAI interface, you may use theOAIBase
- As a sample/template, please take a look at the
sample_spider.py
orsample_spider_alternative.py
- To learn more about the LOM standard we're using, you'll find useful information at https://en.wikipedia.org/wiki/Learning_object_metadata
If you need help getting started or setting up your work environment, please don't hesitate to visit our GitHub Wiki at https://github.com/openeduhub/oeh-search-etl/wiki