The injestor is a daemon which runs a series of services for ingesting the edX data package. Each service scans a data directory for specific data and ingests it into either a MySQL or MongoDB database. These scans are run and repeated every 5 minutes. The daemon also provides a REST API to read the current state of the ingestion.
The ingested databases are designed to work in conjunction with the UQx-API application: https://github.com/UQ-UQx/uqx_api
To use the iptocountry service you will need to install the GeoIP2Country.mmdb (available from https://www.maxmind.com/en/country)
cp GeoIP2-Country.mmdb [BASE_PATH]/services/iptocountry/lib/
[BASE_PATH] is the path where you want the injestor installed (such as /var/injestor)
Clone the repository
git clone https://github.com/UQ-UQx/injestor.git [BASE_PATH]
Install pip requirements
sudo apt-get install libxml2-dev libxslt1-dev python-dev
pip install -r requirements.txt
Set injestor configuration
cp [BASE_PATH]/config.example.py [BASE_PATH]/config.py
vim [BASE_PATH]/config.py
[[EDIT THE VALUES]]
Install init.d script (optional, only tested on Centos6.5)
ln [BASE_PATH]/init.d/injestor /etc/init.d/injestor
Run injestor
/etc/init.d/injestor start
(or) [BASE_PATH]/injestor.py start
If you need to test a particular service, you can run
[BASE_PATH]/test.py [[SERVICENAME]]
If you wish to deploy the ingestor to a server, you can use the supplied fab (http://www.fabfile.org/) script (once the config is set)
fab prepare deploy
The architecture for the injestor is not meant to be standalone, but work with an unencrypted version of the Research Data Package. For more information on getting the data package please see: https://edx-wiki.atlassian.net/wiki/pages/viewpage.action?pageId=36044863 .
The flow of the data is as follows:
Currently the project is at an early stage and does not have reliable tests created.
This project is licensed under the terms of the MIT license.
Currently the injestor project is at a very early stage and unlikely to accept pull requests in a timely fashion as the structure may change without notice. However feel free to open issues that you run into and we can look at them ASAP.
The best contact point apart from opening github issues or comments is to email [email protected]