DataStore for 360 Giving data
Example:
In this example we create a user test and password test for dev usage.
$ sudo apt-get install postgresql-12 postgresql-server-dev-12
$ sudo -u postgres createuser -P -e test --interactive
$ createdb -U test -W 360givingdatastore
Note: Special Postgresql extensions are used for indexes, if you don't want to have these installed in the database or the database user doesn't have permissions to install extensions then set environment variable SKIP_SPECIAL_DB_INDEX=true
before you run migrate. In development you can also set the DATABASE_HOST
, DATABASE_NAME
,DATABASE_USER
and DATABASE_PASSWORD
environmental variables.
$ virtualenv --python=python3 ./.ve/
$ source ./.ve/bin/activate
$ pip install -r requirements.txt
$ export DJANGO_SETTINGS_MODULE=settings.settings_dev
$ manage.py migrate # see note above about Special Postgres extensions
$ manage.py createsuperuser
$ manage.py runserver
Note: before loading grant data you may wish to load additional_data sources
$ manage.py load_datagetter_data ../path/to/data/dir/from/datagetter/
Create/update the Recipient/Funder model entries from grant data.
$ python manage.py manage_entities_data --update
A number of the sources for additional_data
have their own local caches which need to be kept up-to-date.
To better understand additional data, refer to 360Giving Datastore - additional data.
For a script which combines all the steps, see datastore/additional_data/sources/update_all_sources.sh
Occasionally we also need to update the upstream URLs where data is fetched from, found in datastore/additional_data/sources/*.py
.
Our API docs / schema are based on OpenAPI 3.0 (as generated by drf-spectacular). OpenAPI 3.0 is incompatible with the JSON Schema used by 360G, so we keep a copy of 360G's schema converted into OpenAPI 3.0 format. When 360G updates their standard/schema, we should update this copy too.
To do this, first install the CLI tool used to convert JSON Scheam to OpenAPI 3.0:
npm install -g --save @openapi-contrib/json-schema-to-openapi-schema
When the schema changes, copy from standard repo to static/
, and convert from JSON Schema to OpenAPI 3.0, e.g.:
STANDARD_VERSION=1.3
cd datastore/static/
curl https://raw.githubusercontent.com/ThreeSixtyGiving/standard/${STANDARD_VERSION}/schema/360-giving-schema.json > 360-giving-schema-${STANDARD_VERSION}-jsonschema.json
json-schema-to-openapi-schema convert 360-giving-schema-${STANDARD_VERSION}-jsonschema.json > 360-giving-schema-${STANDARD_VERSION}-openapi.json
and update the TSG_SCHEMA_STATICFILE
setting in settings.py
.
Downloads codelists from the ThreeSixtyGiving/standard GitHub repo.
./manage.py load_codelist_codes
Look at the datastore_num_current_grants_with_beneficiary_location_geocode_without_lookup
metric of the getter run before and after updating geodata, it should go down.
./manage.py load_geocode_names # CHD Data
./manage.py load_geolookups # from https://github.com/drkane/geo-lookups
./manage.py load_nspl
# Got to delete the old org data before loading in the new
./manage.py delete_org_data --no-prompt
./additional_data/sources/load_all_org_data.sh
There are many useful management commands see:
$ manage.py --help
Developers can also use Docker Compose to get a local development environment.
docker-compose -f docker-compose.dev.yml up
The website should be available at http://localhost:8000
Use Ctrl-C to exit.
Whilst leaving the up command running, you should use docker-compose run
with the commands from the above sections.
eg; instead of running:
$ manage.py load_geocode_names
Run:
$ docker-compose -f docker-compose.dev.yml run datastore-web python datastore/manage.py load_geocode_names
Run:
$ docker-compose -f docker-compose.dev.yml run -e PGPASSWORD=postgres postgres psql -h postgres -U postgres
$ pip install -r ./requirements_dev.txt
You will also need the chromedriver for your machine's chromimum based browser. see https://chromedriver.chromium.org/downloads
Alternatively edit the selenium test setup in test_browser to use your preferred selenium setup.
$ ./manage.py test tests
$ flake8
$ black --check ./
Note: You may want to run this with SKIP_SPECIAL_DB_INDEX=true
to avoid the need for the test database user to have permissions for installing postgresql extensions when running the tests.
You can run any particular tests individually e.g.:
$ manage.py test tests.test_additional_data_tsgorgtype
see manage.py test --help
for more info
We target python3.8 for our requirements.
Use pip-compile
provided by pip-tools
package to process requirements .in files.
This module is the central datastore for 360 Giving data. It contains the models which define the database and the ORM for accessing, creating and updating the grant data.
A key function is managing the Latest
data which represent the created datasets that are built from datagetter
grant data. These datasets are used in GrantNav.
Management commands here allow for loading and managing datasets as well as a mechanism for external scripts to update the current status of the system (status is used in the UI and for GrantNav API).
This contains the API endpoints that are used to control the system from the UI, indicate the status and data download url for GrantNav updates as well as an experimental REST API built using django-rest-framework.
Templates and staic html/js live here, there is a basic dashboard which shows the current status of the system as well as a mechanism to trigger a full datarun (fetch and load).
During the load of grant data (datagetter
data) that is done by the db
module command load_datagetter_data
each grant is passed to the create
method of the AdditionalDataGenerator
, here various sources are used to add to an additional_data
object that is available on the Grant
model.
additional_data
data sources come in various forms, static files which are loaded, as well as caches of data in our local database (for example postcode lookups).
The generator
ensures a particular order to additional_data fields being added which allows for dependencies of one source to another.
Provides a prometheus endpoint to monitor vital metrics on the datastore
An example datarun script. This is an orchestrator of running a datagetter, updating the statuses and loading the data into the datastore.
Django Settings for the datastore. Includes location for data run logs, the data run script / pid
Various cross-module tests.