Skip to content

Latest commit

 

History

History
115 lines (79 loc) · 3.15 KB

README.md

File metadata and controls

115 lines (79 loc) · 3.15 KB

airflow-pex-example

Deploying Apache Airflow typically involves installing Airflow and your custom libraries in a virtualenv on the production host, along with your and DAGs & other files (e.g.: SQL). To simplify deployment, here we explore using pex.

Build an Airflow pex by running:

./pants binary src/python/example/airflow

This produces airflow.pex, a single file that is analogous to a statically-lined binary. It's a self-contained, runnable Airflow you can scp to another machine and run.

You can then run Airflow commands as usual, using dist/airflow.pex.

$ ./dist/airflow.pex list_dags
[2018-04-13 17:34:39,885] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-04-13 17:34:39,924] {models.py:189} INFO - Filling up the DagBag from /home/travis/airflow/dags


-------------------------------------------------------------------
DAGS
-------------------------------------------------------------------
example_bash_operator
example_branch_dop_operator_v3
example_branch_operator
example_http_operator
example_passing_params_via_test_command
example_python_operator
example_short_circuit_operator
example_skip_dag
example_subdag_operator
example_subdag_operator.section-1
example_subdag_operator.section-2
example_trigger_controller_dag
example_trigger_target_dag
example_xcom
latest_only
latest_only_with_trigger
test_utils
tutorial

You can then scp or otherwise distribute this file to a production host.

python_app example

When using Pants 1.7.0rc0 or later which contains python_app support we can build a self-contained binary along with DAGs in a deployable artifact.

Note how src/dags:analytics contains a directory of DAGs, which may be useful if multiple teams require separate DAGs, or you can use just one DAG dir.

$ ./pants bundle src/dags:analytics --bundle-py-archive=tgz
$ $ cd dist/src.dags.analytics-bundle/
$ $ find .
.
./main.pex
./analytics
./analytics/analytics_daily.pyc
./analytics/analytics_daily.py
$ AIRFLOW_HOME=$(pwd) AIRFLOW__CORE__DAGS_FOLDER=$(pwd)/analytics ./main.pex list_tasks analytics_daily
[2018-06-01 17:34:47,127] {__init__.py:45} INFO - Using executor SequentialExecutor
[2018-06-01 17:34:47,174] {models.py:189} INFO - Filling up the DagBag from /Users/travis/src/airflow-pex-example/dist/src.dags.analytics-bundle/analytics
print_date

Working environment

We show a working example of Airflow integrated into pants for development. direnv is required to load bin/airflow and bin/gunicorn directly into the environment. This is explained in detail in bin/README.md. We use pyenv in the example, which is recommended but not required.

Install dependencies

brew install direnv pyenv openssl

Copy the environment file

cp .envrc.example .envrc && direnv allow

Build

make

View the help

airflow --help

List the available DAGs

airflow list_dags

Initialize the database

airflow initdb 

Run the example workflow

airflow backfill analytics_daily -s 2018-01-01 -e 2018-01-01

List projects

change_project

Change project

change_project eng