Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heroku/heroku:24 and heroku/heroku:24-build as multiarch images #245

Merged
merged 58 commits into from
Mar 1, 2024

Conversation

joshwlewis
Copy link
Member

@joshwlewis joshwlewis commented Jan 18, 2024

This PR will make heroku-24 available as a multi-arch image in Dockerhub. However, heroku-24 is still in development and not recommended for production workloads. heroku-24 will not be made available on the Heroku platform until a later date.

General Changes from heroku-22

  • heroku/heroku:24 and heroku/heroku:24-build are multi-arch images, supporting linux/amd64 and linux/arm64
  • Separate run user (heroku / 1001) and build user (heroku-build / 1002) for heightened security posture in CNB environments. Edit: This was reverted in Heroku-24: Use the same user for the run and build images #281.
  • Default user is no longer root. It is now 1001 or 1002 depending on the image, as described above. It is now heroku (with ID 1000).
  • No *cnb* variants (like heroku/heroku:22-cnb-build) in the heroku-24 line. heroku/heroku:24 and heroku/heroku:24-build are compatible with Cloud Native Buildpacks.

Package changes from heroku-22

We are dropping some of the less popular packages in an effort to reduce the image size. Significant demand could mean we reintroduce packages prior to release.

  • Updated several packages versions to versions available in ubuntu:24.04 / noble.
  • Dropped syslinux, which is not available for arm64. A collection of DOS bootloaders is unlikely to be used on the platform.
  • Dropped libgnutlsxx28, which is not available in noble. There isn't a clear drop-in replacement. However, it's dependencies are still included via other packages.
  • Dropped mercurial from the -build variant. Heroku's deployment experience is based on git, and most of the tools in our ecosystem are hosted via git. Even mercurial servers can serve git protocols these days.
  • Dropped bzr / brz from the -build variant. For the same reasons mentioned about mercurial.
  • Dropped python3-dev from the -build variant.
  • Dropped ghostscript (and it's dependencies/recommends libgs10, libgs10-dev, and gsfonts). Many languages have native PDF library implementations and/or use other tools like wkhtmltopdf, webkit, convert, and/or pdfkit for their PDF needs. ghostscript isn't as ubiquitous as it once was, and has seen a number of vulnerabilities in recent years too.
  • Dropped language-pack-en. This generates a locale data for around 20 english variants (like en_AG UTF-8,
    en_NZ.UTF-8 UTF-8,en_ZW.UTF-8 UTF-8, etc.). en_US.UTF-8 is now made available via locale-gen instead. The default locale remains unchanged.

Build and Publish Changes

Docker Desktop can build, store, and retrieve multi-arch images locally. To do so, either enable the containerd image store in Docker Desktop settings and/or enable the docker-container driver with docker buildx create --use.

Linux based docker can build multi-arch images as long as the docker-container driver is enabled with docker buildx create --use. It does not have the ability to store or retrieve multi-arch images locally, though. The only place to store these images is in an image registry. This introduces some challenges to the CI process since it currently verifies packages prior to pushing.

Potentially, the ideal CI solution for this is to build architecture specific images on instances with that architecture, then have another process that creates and pushes the multi-arch manifest list afterwards. However, arm64 GitHub actions runners aren't generally available.

Instead, this PR modifies the CI pipeline. When running tests, images are built for amd64 only, and the package list is generated and checked for amd64. Then, when publishing images, multi-arch images are built and pushed to temp tags, then retagged to stable tags once that succeeds.

GUS-W-14658761.
GUS-W-15157866.

@joshwlewis joshwlewis force-pushed the heroku-24 branch 2 times, most recently from ec74bc0 to 20fad1a Compare January 18, 2024 21:32
@edmorley
Copy link
Member

ubuntu:24.04 ships with a ubuntu/1000 user by default. There is no need to create the heroku/1000 user anymore.

I wonder if removing our custom user will resolve:
https://cloud-native.slack.com/archives/C033DV8D9FB/p1705518815253729

@edmorley
Copy link
Member

Some other potential package candidates to drop:

  • bzr
  • python3-dev
  • socat
  • telnet
  • ed

bin/build.sh Outdated Show resolved Hide resolved
@joshwlewis joshwlewis marked this pull request as ready for review January 29, 2024 21:10
@joshwlewis joshwlewis requested a review from a team as a code owner January 29, 2024 21:10
@runesoerensen
Copy link

runesoerensen commented Feb 5, 2024

Instead, this PR modifies the CI pipeline. When running tests, images are built for amd64 only, and the package list is generated and checked for amd64. Then, when publishing images, multi-arch images are built and pushed to a private repo, then retagged to public repo tags at the end.

Seems like we'd want to generate/check/test both architectures (when possible and feasible). But for multiarch base images it seems we probably shouldn't include installed-packages.txt files that only reflect installed packages on one of the supported architectures?

The differences are pretty minor, but would it perhaps make sense to generate package lists for both supported architectures (e.g. installed-packages-amd64.txt/installed-packages-arm64.txt) in the heroku-24 and heroku-24-build directories)?

A rough implementation of a write_package_list implementation that'd achieve this while supporting the current stacks:

write_package_list() {
    local image_tag="$1"
    local dockerfile_dir="$2"

    # Extract the stack version from the dockerfile_dir variable (e.g., heroku-24)
    local stack_version=$(echo "$dockerfile_dir" | sed -n 's/^heroku-\([0-9]*\).*$/\1/p')

    if [ "$stack_version" -lt 24 ]; then
        # For Heroku-22 and older, use the generic file name
        local output_file="${dockerfile_dir}/installed-packages.txt"
        echo '# List of packages present in the final image. Regenerate using bin/build.sh' > "$output_file"
        docker run --rm "$image_tag" dpkg-query --show --showformat='${Package}\n' >> "$output_file"
    else
        # For Heroku-24 and newer, generate architecture-specific package list filenames (only works with containerd)
        if [ "$containerd_snapshotter" = 0 ]; then
            local archs=("arm64" "amd64")
            for arch in ${archs[@]}; do
                local output_file="${dockerfile_dir}/installed-packages-${arch}.txt"
                echo "# List of packages present in the final image (${arch}). Regenerate using bin/build.sh" > "$output_file"
                docker run --rm --platform=linux/${arch} "$image_tag" dpkg-query --show --showformat='${Package}\n' >> "$output_file"
            done
        else
            echo "Writing list of installed packages for multi-arch images is only supported when containerd is enabled."
        fi
    fi
}

Not suggesting we actually use that, just a quick example that'd work without changes to callers of the function :)

@joshwlewis
Copy link
Member Author

would it perhaps make sense to generate package lists for both supported architectures

@runesoerensen - Yes, makes sense to me. I added a modified version of your suggestion in b739fbe.

@edmorley edmorley self-requested a review February 7, 2024 09:15
Copy link
Member

@edmorley edmorley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! It's great to see both multi-arch support, plus some simplifications/cleanups of the image hierarchy + contents for the CNB-specific parts.

I didn't review the list of packages, since that will take much longer (this initial review pass took 90 mins as is, and it's not entirely complete even ignoring the packages list), and is something I can do after this merges, since we agreed it was fine to change/remove packages prior to GA.

.github/workflows/ci.yml Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
heroku-24/Dockerfile Outdated Show resolved Hide resolved
heroku-24/Dockerfile Outdated Show resolved Hide resolved
heroku-24-build/Dockerfile Outdated Show resolved Hide resolved
heroku-24/Dockerfile Show resolved Hide resolved
heroku-24/Dockerfile Show resolved Hide resolved
bin/publish-to-registries.sh Outdated Show resolved Hide resolved
bin/publish-to-registries.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
@dzuelke
Copy link
Contributor

dzuelke commented Feb 9, 2024

Some other potential package candidates to drop:

  • bzr
  • python3-dev
  • socat
  • telnet
  • ed

I think we should keep socat, telnet and ed. They're very basic, bare-bones tools one can use for simple debugging tasks.

Definitely okay with dropping bzr and python3-dev.

dzuelke
dzuelke previously requested changes Feb 9, 2024
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
Copy link

@runesoerensen runesoerensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit of testing (with and without containerd enabled, heroku-22 and heroku-24) this afternoon and it seems to be working great.

Looks good to me! But I know @dzuelke is planning to review as well, so probably best he approve it :)

bin/build.sh Outdated Show resolved Hide resolved
heroku-24-build/installed-packages-amd64.txt Show resolved Hide resolved
heroku-24-build/installed-packages-arm64.txt Show resolved Hide resolved
Signed-off-by: Josh W Lewis <josh.lewis@salesforce.com>
Copy link

@runesoerensen runesoerensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Member

@edmorley edmorley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you!

Since today is the upstream feature freeze day, it's likely there will be more package churn - so we may want to (a) hold off merging until later in the day/tomorrow, (b) retrigger CI before merging in case there has already been further changes.

bin/unpublish-tags.sh Outdated Show resolved Hide resolved
bin/unpublish-tags.sh Outdated Show resolved Hide resolved
bin/unpublish-tags.sh Outdated Show resolved Hide resolved
@edmorley edmorley mentioned this pull request Feb 29, 2024
Copy link
Contributor

@dzuelke dzuelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/build.sh still has spaces for indentation in a few places, should be tabs throughout the entire file

Copy link
Contributor

@dzuelke dzuelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bin/build.sh still has spaces for indentation in a few places, should be tabs throughout the entire file

bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
bin/build.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@dzuelke dzuelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See @edmorley's remaining suggestions re: curl and jq error handling etc.

joshwlewis and others added 2 commits February 29, 2024 10:54
Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
Co-authored-by: Ed Morley <501702+edmorley@users.noreply.github.com>
bin/build.sh Show resolved Hide resolved
@joshwlewis joshwlewis enabled auto-merge (squash) March 1, 2024 18:03
@joshwlewis joshwlewis merged commit d1ab0cb into main Mar 1, 2024
4 of 5 checks passed
@joshwlewis joshwlewis deleted the heroku-24 branch March 1, 2024 18:06
@boboldehampsink
Copy link

I don't see the arm64 build in Docker Hub yet, correct? https://hub.docker.com/r/heroku/heroku/tags

@edmorley
Copy link
Member

edmorley commented Mar 4, 2024

@joshwlewis It looks like the ARM64 build isn't being uploaded to the .nightly tag variants, only the linux/amd64 variant:
https://hub.docker.com/repository/docker/heroku/heroku/tags?page=1&ordering=last_updated&name=24

eg compare:

Screenshot - heroku

...with the upstream ubuntu image:

Screenshot - upstream

@edmorley
Copy link
Member

edmorley commented Mar 4, 2024

The above issue has now been resolved by #252 + #253:
https://hub.docker.com/repository/docker/heroku/heroku/tags?page=1&ordering=last_updated&name=24

(Though do bear in mind that Ubuntu 24.04 hasn't even reached beta yet, so the Heroku-24 images are extremely experimental, and we also plan on removing some packages over the coming weeks to reduce image size, so expect breaking changes at any point before Heroku-24 GA.)

@edmorley
Copy link
Member

edmorley commented Mar 5, 2024

I noticed that after this merged, the CI build/publish times increased from ~5 mins to ~40 mins:
https://github.com/heroku/base-images/actions/workflows/ci.yml?query=branch%3Amain

I presume this is due to two things:

  1. We're now building two architectures in series (ie: double the work)
  2. For the ARM64 build, Docker's QEMU emulation is being used since it's an AMD64 GHA runner, which makes the build much slower (now that we're using Rosetta support locally on macOS I'd forgotten how much slower QEMU was)

ie: This is expected/unavoidable for now, until we switch to GitHub Actions' upcoming ARM64 runners and build the architectures in parallel across multiple jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants