npg_porch is an OpenAPI web service and database schema that can be used for managing invocations of pipelines. The user of the pipeline provides definitions for how the pipeline will be invoked, a pipeline wrapper that can interpret the definitions, and whatever mechanism needed to initiate the pipeline with a job scheduler such as LSF or WR. npg_porch provides tracking for pipeline invocations in the same way that a workflow engine might track jobs
npg_porch's singular purpose is to ensure that pipelines are run once and only once for each definition of work. The requirements and software used by the Institute's faculty are so diverse that no single product can provide for tracking, scheduling and execution of pipelines. npg_porch handles the tracking element without restricting any other aspect of workflow use.
npg_porch does not:
- understand the pipelines
- store any pipeline data beyond that necessary to ensure idempotency of pipeline runs
- run your jobs for you
- provide deep insight into how each pipeline is running
npg_porch does:
- provide a central system to register new work
- enforce uniqueness of work to prevent replication of effort
- allow for re-running of pipelines
- store messages from pipeline wrappers that may help in failure diagnosis
Python >= 3.7 sqlite3 >= 3.9
To run the server, please execute the following from the root directory:
bash
pip3 install -e .
cd src
mkdir -p logs
export DB_URL=postgresql+asyncpg://npg_rw:$PASS@npg_porch_db:$PORT/$DATABASE
export DB_SCHEMA='non_default'
uvicorn npg_porch.server:app --host 0.0.0.0 --port 8080 --reload --log-config logging.json
and open your browser at http://localhost:8080
to see links to the docs.
On macOS you will need to ensure that a version of the sqlite3
library that supports SQLite extensions
is used when installing the pysqlite3
package. The system library on macOS does not, so an alternative
such as the one provided by MacPorts or Homebrew should be used. For example, when using MacPorts this
can be done by setting the CPPFLAGS
environment variable before running the pip install
command:
export CPPFLAGS="-I/opt/local/include"
The server will not start without DB_URL
in the environment
When you want HTTPS, logging and all that jazz:
uvicorn server:app --workers 2 --host 0.0.0.0 --port 8080 --log-config ~/logging.json --ssl-keyfile ~/.ssh/key.pem --ssl-certfile ~/.ssh/cert.pem --ssl-ca-certs /usr/local/share/ca-certificates/institute_ca.crt
Consider running with nohup or similar.
Some notes on arguments: --workers: How many pre-forks to run. Async should mean we don't need many. Directly increases memory consumption
--host: 0.0.0.0 = bind to all network interfaces. Reliable but greedy in some situations
--log-config: Refers to a JSON file for python logging library. An example file is found in /src/logging.json. Uvicorn provides its own logging configuration via uvicorn.access
and uvicorn.error
. These may behave undesirably, and can be overridden in the JSON file with an alternate config. Likewise, fastapi logs to fastapi
if that needs filtering. For logging to files, set use_colors = False
in the relevant handlers or shell colour settings will appear as garbage in the logs.
--ssl-keyfile: A PEM format key for the server certificate --ssl-certfile: A PEM format certificate for signing HTTPS communications --ssl-ca-certs: A CRT format certificate authority file that pleases picky clients. Uvicorn does not automatically find the system certificates, or so it seems.
export NPG_PORCH_MODE=TEST
# Only do this as needed
pip install -e .[test]
pytest
Individual tests are run in the form pytest tests/init_test.py
Fixtures reside under tests/fixtures
and are registered in tests/conftest.py
They can also be listed by invoking pytest --fixtures
Any fixtures that are not imported in conftest.py
will not be detected.
Create a schema on a postgres server:
psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d postgres
CREATE SCHEMA npg_porch;
SET search_path = npg_porch, public;
GRANT USAGE ON SCHEMA npg_porch TO npgtest_ro, npgtest_rw;
The SET command ensures that the new schema is visible for one session only in the \d*
commands you might use in psql. Then run a script that deploys the ORM to this schema
DB=npg_porch
export DB_URL=postgresql+psycopg2://npg_admin:$PASS@npg_porch_db:$PORT/$DB
# note that the script requires a regular PG driver, not the async version showed above
src/deploy_schema.py
psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d $DB
Permissions must be granted to the npg_rw and npg_ro users to the newly created schema
GRANT USAGE ON ALL SEQUENCES IN SCHEMA npg_porch TO npgtest_rw;
GRANT SELECT ON ALL TABLES IN SCHEMA npg_porch TO npgtest_ro;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA npg_porch TO npgtest_rw;
Note that granting usage on sequences is required to allow autoincrement columns to work during an insert. This is a trick of newer Postgres versions.
Until token support is implemented, a row will need to be inserted manually into the token table. Otherwise none of the event logging works.