-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pre-compiled Python from official Docker image #7934
Use pre-compiled Python from official Docker image #7934
Conversation
6429875
to
4d45fd0
Compare
This seems like a better way forward than #7928 for the reasons explained here: In particular, I was surprised by the 15s speedup in build times compared to that. |
4d45fd0
to
b97b5dd
Compare
## 3.8 | ||
# Docker doesn't support parametrizing `COPY --from:python:$PY_1_23-bookworm`, so work around it using an alias. | ||
# TODO: If upstream adds support for Ubuntu, use that instead of Debian as the base suffix: https://github.com/docker-library/python/pull/791 | ||
FROM python:$PY_3_8-bookworm as upstream-python-3.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phillipuniverse will be happy to see that this is fully parametrized so it doesn't have to be updated when we bump to latest python patch releases. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!! I am indeed happy to see this!
Also, good notes overall on optimizing the Docker build with pyenv. I have struggled with long build times myself in other projects for the same reasons!
# `pip` and other scripts need their shebangs rewritten for the new location | ||
RUN find $PYTHON_INSTALL_LOCATION/bin -type f -exec sed -i "1s|^#!/usr/local/bin/python|#!$PYTHON_INSTALL_LOCATION/bin/python|" {} + | ||
# Ensure pyenv works and it's the python version we expect | ||
RUN PYENV_VERSION=$PY_3_8 pyenv exec python --version | grep "Python $PY_3_8" || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this command fails and returns exit 1
? Does the whole Docker build fail? and would we be able to notice that it's failing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
IMO we should be doing this test on main
regardless of whether we go with this approach or not, as a safeguard against copy/pasting a change from one version to another and forgetting to update the version number... although I suppose in that case having it parametrized kinda defeats the purpose 😁.
But I figured this guards against at least some breakage, or if the upstream python docker image or pyenv
changed behavior / file structure in some way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a safeguard against copy/pasting a change from one version to another and forgetting to update the version number... although I suppose in that case having it parametrized kinda defeats the purpose 😁.
Agreed, this is a good idea. It would also reveal if pyenv changes where/how it recognizes valid python installs
But I figured this guards against at least some breakage, or if the upstream python docker image or pyenv changed behavior / file structure in some way...
Yeah I think this method of installing python packages could be seen as temperamental if upstream changes their behavior (especially with swapping out the shebang path), but we can always revert to using pyenv if we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't super excited about needing to mung the shebang path, and for compiling ourselves we could even tune the compiler for our use cases, but since our use case is mostly CI / ephemeral jobs that run just a few lines of Python, rather than a long-lived service, it makes a lot more sense to trade faster build times for slower runtimes since it's likely a few ms
runtime improvement at most...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach of using the official docker images. The faster CI build step is a huge win 🏅
Also, merging this PR would mean #7891 is no longer necessary.
Your comments helped a lot in understanding the changes, thanks! You may want to open this as a separate issue though
TODO: Now that switched from
pyenv install
which compiled from source to downloading / copying a pre-compiled python we could entirely drop pyenv if we change our ruby code that callspyenv exec
to track which version of python to call and uses the full python paths.
Our Python docker builds are slow... a typical CI test run for Python takes ~21 minutes, of which the first ~10 minutes are spent building the image. This results in slower local development, slow CI test suites, and slow deployments. The main culprit is `pyenv install` which under the covers downloads the python source and then compiles it locally. Profiling showed that the download was quick, so even though `pyenv` supports `aria2c`, there's not much to be gained there. Unfortunately, a quick look at the `pyenv` issue tracker showed [there's no way to to pass pre-compiled artifacts to `pyenv`](https://github.com/orgs/pyenv/discussions/1872). For a long time we've bandied about the idea of switching from `pyenv` to downloading pre-compiled Pythons.However, we use `pyenv local` + `pyenv exec` throughout our Ruby code for switching to different Python versions. So we thought that it'd take a week or more to fully migrate away from `pyenv`. Today I had to rebuild the python image multiple times, and got so annoyed that I decided to poke at it a bit. It turns out that `pyenv` is simply a shim layer, and as long as `/usr/local/.pyenv/versions/<x.y.z>/bin` exists, it will happily pass commands to anything in that folder. So I was able to come up with an intermediate solution that speeds the builds up drastically without requiring a large code refactor. Running this locally results in the Python download/install/build step going from ~500 seconds all the way down to ~33 seconds, a savings of nearly 8 minutes. Given that a full CI run of the python test suite previously took ~21 minutes, this cuts it by 1/3.
b97b5dd
to
c2c420d
Compare
Our Python docker builds are slow... a typical CI test run for Python takes ~21 minutes, of which the first ~10 minutes are spent building the image.
This results in slower local development, slow CI test suites, and slow deployments.
The main culprit is
pyenv install
which under the covers downloads the python source and then compiles it locally. Profiling showed that the download was quick, so even thoughpyenv
supportsaria2c
, there's not much to be gained there. Unfortunately, a quick look at thepyenv
issue tracker showed there's no way to to pass pre-compiled artifacts topyenv
.For a long time we've bandied about the idea of switching from
pyenv
to downloading pre-compiled Pythons.However, we usepyenv local
+pyenv exec
throughout our Ruby code for switching to different Python versions. So we thought that it'd take a week or more to fully migrate away frompyenv
.Today I had to rebuild the python image multiple times, and got so annoyed that I decided to poke at it a bit.
It turns out that
pyenv
is simply a shim layer, and as long as/usr/local/.pyenv/versions/<x.y.z>/bin
exists, it will happily pass commands to anything in that folder.So I was able to come up with an intermediate solution that speeds the builds up drastically without requiring a large code refactor.
Running this locally results in the Python download/install/build step going from ~500 seconds all the way down to ~33 seconds, a savings of nearly 8 minutes. Given that a full CI run of the python test suite previously took ~21 minutes, this cuts it by 1/3.
Related but pulling the pre-compiled Python from a different source: