forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
INSTALL
304 lines (199 loc) · 15.7 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# INSTALL / BUILD instructions for Apache Airflow
## Basic installation of Airflow from sources and development environment setup
This is a generic installation method that requires minimum starndard tools to develop airflow and
test it in local virtual environment (using standard CPyhon installation and `pip`).
Depending on your system you might need different prerequisites, but the following
systems/prerequisites are known to work:
Linux (Debian Bookworm):
sudo apt install -y --no-install-recommends apt-transport-https apt-utils ca-certificates \
curl dumb-init freetds-bin gosu krb5-user libgeos-dev \
ldap-utils libsasl2-2 libsasl2-modules libxmlsec1 locales libffi8 libldap-2.5-0 libssl3 netcat-openbsd \
lsb-release openssh-client python3-selinux rsync sasl2-bin sqlite3 sudo unixodbc
You might need to install MariaDB development headers to build some of the dependencies
sudo apt-get install libmariadb-dev libmariadbclient-dev
MacOS (Mojave/Catalina) you might need to to install XCode command line tools and brew and those packages:
brew install sqlite mysql postgresql
## Downloading and installing Airflow from sources
While you can get Airflow sources in various ways (including cloning https://github.com/apache/airflow/), the
canonical way to download it is to fetch the tarball published at https://downloads.apache.org where you can
also verify checksum, signatures of the downloaded file. You can then and un-tar the source move into the
directory that was un-tarred.
When you download source packages from https://downloads.apache.org, you download sources of Airflow and
all providers separately, however when you clone the GitHub repository at https://github.com/apache/airflow/
you get all sources in one place. This is the most convenient way to develop Airflow and Providers together.
otherwise you have to separately install Airflow and Providers from sources in the same environment, which
is not as convenient.
## Creating virtualenv
Airflow pulls in quite a lot of dependencies in order to connect to other services. You generally want to
test or run Airflow from a virtual env to make sure those dependencies are separated from your system
wide versions. Using system-installed Python installation is strongly discouraged as the versions of Python
shipped with operating system often have a number of limitations and are not up to date. It is recommended
to install Python using either https://www.python.org/downloads/ or other tools that use them. See later
for description of `Hatch` as one of the tools that is Airflow's tool of choice to build Airflow packages.
Once you have a suitable Python version installed, you can create a virtualenv and activate it:
python3 -m venv PATH_TO_YOUR_VENV
source PATH_TO_YOUR_VENV/bin/activate
## Installing airflow locally
Installing airflow locally can be done using pip - note that this will install "development" version of
Airflow, where all providers are installed from local sources (if they are available), not from `pypi`.
It will also not include pre-installed providers installed from PyPI. In case you install from sources of
just Airflow, you need to install separately each provider that you want to develop. In case you install
from GitHub repository, all the current providers are available after installing Airflow.
pip install .
If you develop Airflow and iterate on it you should install it in editable mode (with -e) flag and then
you do not need to re-install it after each change to sources. This is useful if you want to develop and
iterate on Airflow and Providers (together) if you install sources from cloned GitHub repository.
Note that you might want to install `devel` extra when you install airflow for development in editable env
as this one contains minimum set of tools and dependencies that are needed to run unit tests.
pip install -e ".[devel]"
You can also install optional packages that are needed to run certain tests. In case of local installation
for example you can install all prerequisites for google provider, tests and
all hadoop providers with this command:
pip install -e ".[google,devel-tests,devel-hadoop]"
or you can install all packages needed to run tests for core, providers and all extensions of airflow:
pip install -e ".[devel-all]"
You can see the list of all available extras below.
# Using Hatch to manage your Python, virtualenvs and build packages
Airflow uses [hatch](https://hatch.pypa.io/) as a build and development tool of choice. It is one of popular
build tools and environment managers for Python, maintained by the Python Packaging Authority.
It is an optional tool that is only really needed when you want to build packages from sources, but
it is also very convenient to manage your Python versions and virtualenvs.
Airflow project contains some pre-defined virtualenv definitions in ``pyproject.toml`` that can be
easily used by hatch to create your local venvs. This is not necessary for you to develop and test
Airflow, but it is a convenient way to manage your local Python versions and virtualenvs.
## Installing Hatch
You can install hat using various other ways (including Gui installers).
Example using `pipx`:
pipx install hatch
We recommend using `pipx` as you can manage installed Python apps easily and later use it
to upgrade `hatch` easily as needed with:
pipx upgrade hatch
## Using Hatch to manage your Python versions
You can also use hatch to install and manage airflow virtualenvs and development
environments. For example, you can install Python 3.10 with this command:
hatch python install 3.10
or install all Python versions that are used in Airflow:
hatch python install all
## Using Hatch to manage your virtualenvs
Airflow has some pre-defined virtualenvs that you can use to develop and test airflow.
You can see the list of available envs with:
hatch show env
This is what it shows currently:
┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ Type ┃ Features ┃ Description ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ default │ virtual │ devel │ Default environment with Python 3.8 for maximum compatibility │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-38 │ virtual │ │ Environment with Python 3.8. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-39 │ virtual │ │ Environment with Python 3.9. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-310 │ virtual │ │ Environment with Python 3.10. No devel installed. │
├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
│ airflow-311 │ virtual │ │ Environment with Python 3.11. No devel installed │
└─────────────┴─────────┴──────────┴───────────────────────────────────────────────────────────────┘
The default env (if you have not used one explicitly) is `default` and it is a Python 3.8
virtualenv for maximum compatibility with `devel` extra installed - this devel extra contains the minimum set
of dependencies and tools that should be used during unit testing of core Airflow and running all `airflow`
CLI commands - without support for providers or databases.
The other environments are just bare-bones Python virtualenvs with Airflow core requirements only,
without any extras installed and without any tools. They are much faster to create than the default
environment, and you can manually install either appropriate extras or directly tools that you need for
testing or development.
hatch env create
You can create specific environment by using them in create command:
hatch env create airflow-310
You can install extras in the environment by running pip command:
hatch -e airflow-310 run -- pip install -e ".[devel,google]"
And you can enter the environment with running a shell of your choice (for example zsh) where you
can run any commands
hatch -e airflow-310 shell
Once you are in the environment (indicated usually by updated prompt), you can just install
extra dependencies you need:
[~/airflow] [airflow-310] pip install -e ".[devel,google]"
You can exit the environment by just exiting the shell.
You can also see where hatch created the virtualenvs and use it in your IDE or activate it manually:
hatch env find airflow-310
You will get path similar to:
/Users/jarek/Library/Application Support/hatch/env/virtual/apache-airflow/TReRdyYt/apache-airflow
Then you will find `python` binary and `activate` script in the `bin` sub-folder of this directory and
you can configure your IDE to use this python virtualenv if you want to use that environment in your IDE.
You can also set default environment name by HATCH_ENV environment variable.
You can clean the env by running:
hatch env prune
More information about hatch can be found in https://hatch.pypa.io/1.9/environment/
## Using Hatch to build your packages
You can use hatch to build installable package from the airflow sources. Such package will
include all metadata that is configured in `pyproject.toml` and will be installable with pip.
The packages will have pre-installed dependencies for providers that are always
installed when Airflow is installed from PyPI. By default both `wheel` and `sdist` packages are built.
hatch build
You can also build only `wheel` or `sdist` packages:
hatch build -t wheel
hatch build -t sdist
## Installing recommended version of dependencies
Whatever virtualenv solution you use, when you want to make sure you are using the same
version of dependencies as in main, you can install recommended version of the dependencies by using
constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as `constraint` file. This might be useful
to avoid "works-for-me" syndrome, where you use different version of dependencies than the ones
that are used in main, CI tests and by other contributors.
There are different constraint files for different python versions. For example this command will install
all basic devel requirements and requirements of google provider as last successfully tested for Python 3.8:
pip install -e ".[devel,google]"" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.8.txt"
You can upgrade just airflow, without paying attention to provider's dependencies by using
the 'constraints-no-providers' constraint files. This allows you to keep installed provider dependencies
and install to latest supported ones by pure airflow core.
pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.8.txt"
## All airflow extras
Airflow has a number of extras that you can install to get additional dependencies. They sometimes install
providers, sometimes enable other features where packages are not installed by default.
You can read more about those extras in the extras reference:
https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html
The list of available extras is below.
Regular extras that are available for users in the Airflow package.
# START REGULAR EXTRAS HERE
aiobotocore, airbyte, alibaba, all, all-core, all-dbs, amazon, apache-atlas, apache-beam, apache-
cassandra, apache-drill, apache-druid, apache-flink, apache-hdfs, apache-hive, apache-impala,
apache-kafka, apache-kylin, apache-livy, apache-pig, apache-pinot, apache-spark, apache-webhdfs,
apprise, arangodb, asana, async, atlas, atlassian-jira, aws, azure, cassandra, celery, cgroups,
cloudant, cncf-kubernetes, cohere, common-io, common-sql, crypto, databricks, datadog, dbt-cloud,
deprecated-api, dingding, discord, docker, druid, elasticsearch, exasol, fab, facebook, ftp, gcp,
gcp_api, github, github-enterprise, google, google-auth, graphviz, grpc, hashicorp, hdfs, hive,
http, imap, influxdb, jdbc, jenkins, kerberos, kubernetes, ldap, leveldb, microsoft-azure,
microsoft-mssql, microsoft-psrp, microsoft-winrm, mongo, mssql, mysql, neo4j, odbc, openai,
openfaas, openlineage, opensearch, opsgenie, oracle, otel, pagerduty, pandas, papermill, password,
pgvector, pinecone, pinot, postgres, presto, rabbitmq, redis, s3, s3fs, salesforce, samba, saml,
segment, sendgrid, sentry, sftp, singularity, slack, smtp, snowflake, spark, sqlite, ssh, statsd,
tableau, tabular, telegram, trino, vertica, virtualenv, weaviate, webhdfs, winrm, yandex, zendesk
# END REGULAR EXTRAS HERE
Devel extras - used to install development-related tools. Only available during editable install.
# START DEVEL EXTRAS HERE
devel, devel-all, devel-all-dbs, devel-ci, devel-debuggers, devel-devscripts, devel-duckdb, devel-
hadoop, devel-mypy, devel-sentry, devel-static-checks, devel-tests
# END DEVEL EXTRAS HERE
Doc extras - used to install dependencies that are needed to build documentation. Only available during
editable install.
# START DOC EXTRAS HERE
doc, doc-gen
# END DOC EXTRAS HERE
## Compiling front end assets
Sometimes you can see that front-end assets are missing and website looks broken. This is because
you need to compile front-end assets. This is done automatically when you create a virtualenv
with hatch, but if you want to do it manually, you can do it after installing node and yarn and running:
yarn install --frozen-lockfile
yarn run build
Currently we are running yarn coming with note 18.6.0, but you should check the version in
our `.pre-commit-config.yaml` file (node version).
Installing yarn is described in https://classic.yarnpkg.com/en/docs/install
Also - in case you use `breeze` or have `pre-commit` installed you can build the assets with:
pre-commit run --hook-stage manual compile-www-assets --all-files
or
breeze compile-www-assets
Both commands will install node and yarn if needed to a dedicated pre-commit node environment and
then build the assets.
Finally you can also clean and recompile assets with ``custom`` build target when running hatch build
hatch build -t custom -t wheel -t sdist
This will also update `git_version` file in airflow package that should contain the git commit hash of the
build. This is used to display the commit hash in the UI.