forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathINSTALL
331 lines (216 loc) · 14.4 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
INSTALL / BUILD instructions for Apache Airflow
Basic installation of Airflow from sources and development environment setup
============================================================================
This is a generic installation method that requires minimum standard tools to develop Airflow and
test it in a local virtual environment (using standard CPython installation and `pip`).
Depending on your system, you might need different prerequisites, but the following
systems/prerequisites are known to work:
Linux (Debian Bookworm):
sudo apt install -y --no-install-recommends apt-transport-https apt-utils ca-certificates \
curl dumb-init freetds-bin krb5-user libgeos-dev \
ldap-utils libsasl2-2 libsasl2-modules libxmlsec1 locales libffi8 libldap-2.5-0 libssl3 netcat-openbsd \
lsb-release openssh-client python3-selinux rsync sasl2-bin sqlite3 sudo unixodbc
You might need to install MariaDB development headers to build some of the dependencies
sudo apt-get install libmariadb-dev libmariadbclient-dev
MacOS (Mojave/Catalina) you might need to install XCode command line tools and brew and those packages:
brew install sqlite mysql postgresql
The `pip` is one of the build packaging front-ends that might be used to install Airflow. It's the one
that we recommend (see below) for reproducible installation of specific versions of Airflow.
As of version 2.8, Airflow follows PEP 517/518 and uses `pyproject.toml` file to define build dependencies
and build process, and it requires relatively modern versions of packaging tools to get airflow built from
local sources or sdist packages, as PEP 517 compliant build hooks are used to determine dynamic build
dependencies. In the case of `pip` it means that at least version 22.1.0 is needed (released at the beginning of
2022) to build or install Airflow from sources. This does not affect the ability to install Airflow from
released wheel packages.
Downloading and installing Airflow from sources
-----------------------------------------------
While you can get Airflow sources in various ways (including cloning https://github.com/apache/airflow/), the
canonical way to download it is to fetch the tarball (published at https://downloads.apache.org), after
verifying the checksum and signatures of the downloaded file.
When you download source packages from https://downloads.apache.org, you download sources of Airflow and
all providers separately. However, when you clone the GitHub repository at https://github.com/apache/airflow/
you get all sources in one place. This is the most convenient way to develop Airflow and Providers together.
Otherwise, you have to install Airflow and Providers separately from sources in the same environment, which
is not as convenient.
Using ``uv`` to manage your Python, virtualenvs, and install airflow for development (recommended)
==================================================================================================
While you can manually install airflow locally from sources, Airflow committers recommend using
uv - https://docs.astral.sh/uv/) as a build and development tool. It is a modern,
recently introduced popular packaging front-end tool and environment managers for Python.
It is an optional tool that is only really needed when you want to build packages from sources, you can use
many other packaging frontends (for example ``hatch``) but ``uv`` is very fast and convenient to manage
also your Python versions and virtualenvs.
Installing ``uv``
-----------------
You can install uv following the instructions: https://docs.astral.sh/uv/getting-started/installation/
Using ``uv`` to manage your project dependencies
------------------------------------------------
You can sync to latest versions of airflow dependencies using:
uv sync
This will download and install appropriate python version, create a virtual environment in `.venv`
folder and installs all necessary dependencies needed to run tests for airflow and import providers.
This is the recommended way to install Airflow for development. You can repeat `uv sync` command any time
to synchronize your environment with the latest dependencies.
You can also synchronize all packages, including development dependencies of some providers, you can do it
by running the following command:
uv sync --all-packages
With `uv` you can also install tools that are needed for other tasks described later - breeze,
pre-commit, hatch, cherry-picker etc. It is highly recommended to install breeze and pre-commit, hatch
and flit are useful to build packages (so might be useful by release managers), and cherry-picker is
useful for backporting changes to previous versions of Airflow.
uv tool install -e ./dev/breeze
uv tool install pre-commit --with pre-commit-uv
uv tool install hatch
uv tool install flit
uv tool install cherry-picker
Those are all tools useful for Airflow development.
It is recommended to run `pre-commit install` after installing `pre-commit` to install the git hooks - they
will take care about running airflow pre-commit checks automatically.
pre-commit install
You can run any command in the virtual environment created by `uv` by prefixing it with `uv run`:
uv run pytest
uv run airflow standalone
This will automatically synchronize your dependencies to latest dependencies needed.
Compiling front-end assets
--------------------------
In order to see UI in Airflow, you need to compile front-end assets first.
In case you already installed `breeze` and `pre-commit`, you can build the assets with
the following commands:
pre-commit run --hook-stage manual compile-ui-assets --all-files
or simply:
breeze compile-ui-assets
Both commands will install node and pmpm under the hood, to a dedicated pre-commit
node environment and then build the assets.
If you want to manually run the build check for node and pnpm version in `.pre-commit-config.yaml` file,
and there are those manual ways to build the assets.
pnpm install -frozen-lockfile --config.confirmModulesPurge=false
pnpm run build
Finally, you can also clean and recompile assets with `custom` build target when running the Hatch build:
hatch build -t custom -t wheel -t sdist
This will also update `git_version` file in the Airflow package that should contain the git commit hash of the
build. This is used to display the commit hash in the UI. Result of this command is airflow package built
in the `dist` folder.
Using pip and manually managing your virtualenv
===============================================
While `uv` manages dependencies and venv automatically you might want to manage both manually with
pip and virtualenv. You need to have Python installed in your preferred way for that to work. It is also
way slower than with `uv` and you need to manage your environment manually.
Creating virtualenv
-------------------
Airflow pulls in quite a lot of dependencies to connect to other services. You generally want to
test or run Airflow from a virtualenv to ensure those dependencies are separated from your system-wide
versions. Using system-installed Python installation is strongly discouraged as the versions of Python
shipped with the operating system often have some limitations and are not up to date. It is recommended
to install Python using the official release (https://www.python.org/downloads/), or Python project
management tools such as Hatch. See later for a description of `Hatch` as one of the tools that
is Airflow's tool of choice to build Airflow packages.
Once you have a suitable Python version installed, you can create a virtualenv and activate it:
python3 -m venv PATH_TO_YOUR_VENV
source PATH_TO_YOUR_VENV/bin/activate
Installing Airflow locally with pip
-----------------------------------
Installing Airflow locally can be done using pip - note that this will install "development" version of
Airflow, where no providers are installed.
pip install .
If you develop Airflow and iterate on it, you should install it in editable mode (with -e) flag, and then
you do not need to re-install it after each change to sources.
pip install -e .
Note that for development you need to install also `devel-common` in editable way.
This distribution contains the set of tools and dependencies needed to run unit tests and perform all
development tasks.
pip install -e "./devel-common"
If you want to develop provider, you need to install it in editable mode from "providers" folder. For
example this will install Amazon provider in editable mode:
pip install -e "./providers/amazon"
However with pip in order to run tests, in some providers You also (for now) install optional extras that are
needed to run tests. In case of local installation for example, you can install all prerequisites for
Google provider, tests, and all Hadoop providers with this command:
pip install -e ".[google, hadoop]"
This is not needed with `uv` and in the near future, when `pip` will support https://peps.python.org/pep-0735/
this installation will be simplified.
You can also install ALL possible dependencies of airflow (in editable mode only) with:
pip install -e ".[all]"
This is not really recommended usually (it takes a lot of time to resolve and install all dependencies) but
if you want to replicate the CI environment where "all" extras are installed, you can do it this way.
You can also install ALL possible core dependencies of airflow (in editable mode only) with:
pip install -e ".[all-core]"
Building airflow packages with Hatch
====================================
While building packages will work with any compliant packaging front-end tool, for reproducibility, we
recommend using ``hatch``. It is a modern, fast, and convenient tool to build packages from sources managed
by the Python Packaging Authority. It is also used by Airflow to build packages in CI/CD as well as by
release managers to build locally packages for verification of reproducibility of the build.
Installing ``hatch``
--------------------
More information about hatch can be found in https://hatch.pypa.io/
We recommend to install ``hatch`` using ```uv tool`` command which will make hatch available as a CLI
command globally:
uv tool install hatch
You can still install ``hatch`` using ``pipx`` if you prefer:
pipx install hatch
It's important to keep your hatch up to date. You can do this by running:
uv tool upgrade hatch
Using Hatch to build your packages
----------------------------------
You can use Hatch to build installable packages from the Airflow sources. Such package will
include all metadata configured in `pyproject.toml` and will be installable with ``pip`` and and any other
PEP-compliant packaging front-end.
The packages will have pre-installed dependencies for providers that are available when Airflow is i
onstalled from PyPI. Both `wheel` and `sdist` packages are built by default.
hatch build
You can also build only `wheel` or `sdist` packages:
hatch build -t wheel
hatch build -t sdist
Installing recommended version of dependencies
==============================================
Whatever virtualenv solution you use, when you want to make sure you are using the same
version of dependencies as in main, you can install the recommended version of the dependencies by using
constraint-python<PYTHON_MAJOR_MINOR_VERSION>.txt files as `constraint` file. This might be useful
to avoid "works-for-me" syndrome, where you use different versions of dependencies than the ones
that are used in main CI tests and by other contributors.
There are different constraint files for different Python versions. For example, this command will install
all basic devel requirements and requirements of Google provider as last successfully tested for Python 3.9:
uv pip install -e ".[devel,google]"" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-3.9.txt"
Using the 'constraints-no-providers' constraint files, you can upgrade Airflow without paying attention to the provider's dependencies. This allows you to keep installed provider dependencies and install the latest supported ones using pure Airflow core.
uv pip install -e ".[devel]" \
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-main/constraints-no-providers-3.9.txt"
Note that you can also use `pip install` if you do not use `uv`.
Airflow extras
==============
Airflow has several extras that you can install to get additional dependencies. They sometimes install
providers, sometimes enable other features where packages are not installed by default.
You can read more about those extras in the extras reference:
https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html
The list of available extras is below.
Core extras
-----------
Those extras are available as regular core airflow extras - they install optional features of Airflow.
# START CORE EXTRAS HERE
aiobotocore, apache-atlas, apache-webhdfs, async, cgroups, cloudpickle, github-enterprise, google-
auth, graphviz, kerberos, ldap, leveldb, otel, pandas, password, rabbitmq, s3fs, sentry, statsd, uv
# END CORE EXTRAS HERE
Provider extras
---------------
Those extras are available as regular Airflow extras; they install provider packages in standard builds
or dependencies that are necessary to enable the feature in an editable build.
# START PROVIDER EXTRAS HERE
airbyte, alibaba, amazon, apache.beam, apache.cassandra, apache.drill, apache.druid, apache.flink,
apache.hdfs, apache.hive, apache.iceberg, apache.impala, apache.kafka, apache.kylin, apache.livy,
apache.pig, apache.pinot, apache.spark, apprise, arangodb, asana, atlassian.jira, celery, cloudant,
cncf.kubernetes, cohere, common.compat, common.io, common.messaging, common.sql, databricks,
datadog, dbt.cloud, dingding, discord, docker, edge, elasticsearch, exasol, fab, facebook, ftp,
github, google, grpc, hashicorp, http, imap, influxdb, jdbc, jenkins, microsoft.azure,
microsoft.mssql, microsoft.psrp, microsoft.winrm, mongo, mysql, neo4j, odbc, openai, openfaas,
openlineage, opensearch, opsgenie, oracle, pagerduty, papermill, pgvector, pinecone, postgres,
presto, qdrant, redis, salesforce, samba, segment, sendgrid, sftp, singularity, slack, smtp,
snowflake, sqlite, ssh, standard, tableau, telegram, teradata, trino, vertica, weaviate, yandex,
ydb, zendesk
# END PROVIDER EXTRAS HERE
Doc extras
----------
Doc extras are used to install dependencies that are needed to build documentation. Only available during
editable install.
# START DOC EXTRAS HERE
doc, doc-gen
# END DOC EXTRAS HERE