This is a proof of concept of using [external]
metadata - i.e., metadata for
Python packages of build and runtime dependencies on non-Python packages, see
PEP 725 - plus a "name mapping mechanism"
to build wheels from source in clean Docker containers with a plain:
pip install <package-name> --no-binary <package-name>
The purpose of the name mapping mechanism is to translate [external]
metadata,
which uses package URLs's (PURLs)
plus PURL-like "virtual dependencies" for more abstract requirements like "a
C++ compiler", into system package manager specific package names.
The CLI interface to the name mapping mechanism is a py-show
CLI tool. It can
also show install commands specific to the system package manager, which is
potentially useful for end users.
Note: all of this is currently experimental, and under the hood doesn't look anything like a production-ready version would. Please don't use this for anything beyond experimenting.
The scripts, CI setup and results in the repo basically do the following:
-
Determine which of the top 150 most downloaded packages (current monthly downloads, data from hugovk/top-pypi-packages) have platform-specific wheels on PyPI.
-
For each such package, determine its external dependencies and write those into a
package_name.toml
file. -
In a matrix'ed set of CI jobs, build each package separately from source in a clean Docker container, with the external dependencies being installed with a "system" package manager. This is currently done for three package managers and distros:
dnf
(Fedora),pacman
(Arch Linux), andmicromamba
(conda-forge). The CI jobs do roughly the following:- Spin up a clean Docker container for the base OS
- Install
python
with the system package manager - Download the sdist for the latest release of the package being built from PyPI
- Patch the sdist to append the
[external]
metadata at the end ofpyproject.toml
(for packages without apyproject.toml
, inject a basic 3-line one to enablesetuptools.build_meta
as the build backend) - Use the
py-show
tool to read the[external]
metadata and generate an install command for the system package manager from that. - Invoke the package manager to install the external dependencies.
- Build the package with
pip install amended_sdist.tar.gz
(no custom config-settings, environment variables or other tweaks allowed). - If the build succeeds, do a basic
import pkg_import_name
check.
-
Analyze the results - successful package builds yes/no, duration, dependencies used.
These are the main results as of 19 Oct 2023.
Overall number of successful builds per distro:
distro | success |
---|---|
Arch | 35/37 |
Fedora | 33/37 |
conda-forge | 33/37 |
Average CI job duration per package for the heaviest builds:
package | duration |
---|---|
scipy | 13m 39s |
scikit-learn | 13m 5s |
grpcio-tools | 9m 17s |
pandas | 7m 55s |
pyarrow | 5m 44s |
numpy | 5m 16s |
pynacl | 4m 20s |
pydantic-core | 4m 6s |
matplotlib | 3m 41s |
cryptography | 2m 26s |
pillow | 1m 56s |
sqlalchemy | 1m 41s |
Per-package success/failure:
package | Fedora | Arch | conda-forge |
---|---|---|---|
charset-normalizer | ✔️ | ✔️ | ✔️ |
cryptography | ✔️ | ✔️ | ✔️ |
pyyaml | ✔️ | ✔️ | ✔️ |
numpy | ✔️ | ✔️ | ✔️ |
protobuf | ✔️ | ✔️ | ✔️ |
pandas | ✔️ | ✔️ | ✔️ |
markupsafe | ✔️ | ✔️ | ✔️ |
cffi | ✔️ | ✔️ | ✔️ |
psutil | ✔️ | ✔️ | ✔️ |
lxml | ❌ | ❌ | ❌ |
sqlalchemy | ✔️ | ✔️ | ✔️ |
aiohttp | ❌ | ✔️ | ❌ |
grpcio | ❌ | ❌ | ❌ |
pyarrow | ✔️ | ✔️ | ✔️ |
wrapt | ✔️ | ✔️ | ✔️ |
frozenlist | ✔️ | ✔️ | ✔️ |
coverage | ✔️ | ✔️ | ✔️ |
pillow | ✔️ | ✔️ | ✔️ |
greenlet | ✔️ | ✔️ | ✔️ |
yarl | ✔️ | ✔️ | ✔️ |
multidict | ✔️ | ✔️ | ✔️ |
scipy | ❌ | ✔️ | ✔️ |
httptools | ✔️ | ✔️ | ✔️ |
pynacl | ✔️ | ✔️ | ✔️ |
psycopg2-binary | ✔️ | ✔️ | ✔️ |
rpds-py | ✔️ | ✔️ | ✔️ |
bcrypt | ✔️ | ✔️ | ✔️ |
scikit-learn | ✔️ | ✔️ | ✔️ |
msgpack | ✔️ | ✔️ | ✔️ |
matplotlib | ✔️ | ✔️ | ❌ |
regex | ✔️ | ✔️ | ✔️ |
kiwisolver | ✔️ | ✔️ | ✔️ |
pydantic-core | ✔️ | ✔️ | ✔️ |
pyrsistent | ✔️ | ✔️ | ✔️ |
grpcio-tools | ✔️ | ✔️ | ✔️ |
pycryptodomex | ✔️ | ✔️ | ✔️ |
google-crc32c | ✔️ | ✔️ | ✔️ |