Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to push image to multiple repositories? #248

Open
prestonvanloon opened this issue May 19, 2023 · 16 comments
Open

How to push image to multiple repositories? #248

prestonvanloon opened this issue May 19, 2023 · 16 comments
Labels
enhancement New feature or request performance

Comments

@prestonvanloon
Copy link
Contributor

I am trying to replicate logic from rules_docker where I can have a container_bundle given to a docker_push.

container_bundle(
    name = "image_bundle",
    images = {
        "gcr.io/prysmaticlabs/prysm/beacon-chain:latest": ":image_with_creation_time",
        "index.docker.io/prysmaticlabs/prysm-beacon-chain:latest": ":image_with_creation_time",
    },
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

docker_push(
    name = "push_images",
    bundle = ":image_bundle",
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

The end result was that I pushed an image to multiple repositories with a single target.

@thesayyn
Copy link
Collaborator

oci_push(
    name = "push",
    image = ":image_with_creation_time",
    repository = "gcr.io/prysmaticlabs/prysm/beacon-chain",
    remote_tags = ["latest"]
)

oci_push(
    name = "push",
    image = ":image_with_creation_time",
    repository = "index.docker.io/prysmaticlabs/prysm-beacon-chain",
    remote_tags = ["latest"]
)

this can be done by simply having two oci_push targets.

@prestonvanloon
Copy link
Contributor Author

prestonvanloon commented May 22, 2023

this can be done by simply having two oci_push targets.

I understand that, but it doesn't scale well. In our case, we have 4 repository / tag variations per image. Even with macros to expand to multiple targets, it is not possible push all from one target / bazel command. (bazelbuild/bazel#10855)

I'm looking for feature parity with the functionality of docker_push from rules_docker.

@thesayyn
Copy link
Collaborator

Even with macros to expand to multiple targets, it is not possible to push all from one target / bazel command.

this could be done by running all oci_push targets in a sh_binary target.

I'm looking for feature parity with the functionality of docker_push from rules_docker.

this is essentially what container_bundle does as I said above.

Unfortunately, we are -rc so we can not introduce breaking changes.

@alexeagle
Copy link
Collaborator

You can use https://github.com/keith/rules_multirun to make a single bazel runnable target.

You could write a macro that emulates container_bundle or even spell it out in a BUILD file:

load("@rules_oci//oci:defs.bzl", "oci_image", "oci_push")
load("@rules_multirun//:defs.bzl", "command", "multirun")

oci_image(
    name = "image",
    os = "linux",
    architecture = "amd64",
)

_REPOS = ["index.docker.io/alexeagle/test1", "ghcr.io/<OWNER>/image"]

[
    oci_push(
        name = "push{}".format(i),
        image = ":image",
        repository = repo,
    )
    for i, repo in enumerate(_REPOS)
]

[
    command(
        name = "cmd{}".format(i),
        command = ":push{}".format(i),
        arguments = ["--tag", "latest"],
    )
    for i in range(len(_REPOS))
]

multirun(
    name = "deliver",
    commands = [
       "cmd{}".format(i)
        for i in range(len(_REPOS))
    ],
    jobs = 0, # Set to 0 to run in parallel, defaults to sequential
)

WDYT?

As a design choice, we want rules_oci to only contain things that aren't already possible by layering with other rulesets, keeping it orthogonal and low-maintenance.

@aignas
Copy link
Contributor

aignas commented May 31, 2023

@alexeagle, thanks for the example, having it in the rules_oci docs on how things in rules_docker translate to rules_oci would be useful. I agree with keeping it orthogonal may be the right approach here.

@malt3

This comment was marked as off-topic.

@prestonvanloon
Copy link
Contributor Author

prestonvanloon commented Jun 26, 2023

@alexeagle sorry for the late reply... Thanks for the suggestion. The use of multi-run works OK, but I am not able to use -- --tag latest in the command like I could with oci_push. In your suggestion, it's hard coded to "latest" but it won't always be "latest" in our CI.

Edit: My original example was also hardcoded, but we use environment variable from workspace status which worked in rules_docker but does not work here.

container_bundle(
    name = "image_bundle",
    images = {
        "gcr.io/prysmaticlabs/prysm/beacon-chain:{DOCKER_TAG}": ":image_with_creation_time",
        "index.docker.io/prysmaticlabs/prysm-beacon-chain:{DOCKER_TAG}": ":image_with_creation_time",
    },
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

docker_push(
    name = "push_images",
    bundle = ":image_bundle",
    tags = ["manual"],
    visibility = ["//beacon-chain:__pkg__"],
)

See: https://github.com/bazelbuild/rules_docker#stamping

@wyattanderson
Copy link

WDYT?

As a design choice, we want rules_oci to only contain things that aren't already possible by layering with other rulesets, keeping it orthogonal and low-maintenance.

An example of a "thing that isn't possible by layering other rulesets" might be efficiently pushing multiple images at once. For example, using the MultiWrite API from google/go-containerregistry (which I think crane uses under the hood) to push multiple images in an efficient fashion. We have a build process where we push potentially hundreds of images with new tags but very few (if any) actual layer changes, and it sounds like this would be the most performant way to push all of those images.

It feels like this should eventually be possible when this crane issue is resolved; if there are other feature additions that need to be made to crane to facilitate this, I'd be happy to lend a hand there.

I don't think go-containerregistry adequately handles rate limiting either at the moment, but that's another thing that I think would only be possible with an in-process implementation of parallel push, versus a naive approach of just spawning as many processes as there are images and hoping for the best. We currently run into issues with pushing to AWS ECR because container_push from rules_docker doesn't have any knobs for controlling concurrency.

@SanjayVas
Copy link

Building off of #248 (comment), there are arguably two separate but related issues here:

  1. Pushing the same image to multiple registries/repositories.
  2. Efficiently bundling image pushes (e.g. pushing multiple images to the same registry).

Perhaps (2) should be split off into another issue, as that's the part that it's difficult to do on top of rules_oci. The fact that a solution for it might also resolve (1) is just a bonus.

@blackliner
Copy link

IMHO oci_push should not be a run target, but something that happens during build time. This way, bazel will automatically handle the parallelism. For small scale the run approach is nice, but if you have a monorepo and potentially hundreds of container images that need to be released ...

@alexeagle
Copy link
Collaborator

bazel build shouldn't have side-effects, at least following the idiom you only expect it to result in updates to the bazel-out tree. It should be idempotent, but if you talk to a remote registry then building the same thing twice will do two different things.
OTOH you can squint and see a Bazel remote cache as a CAS with an Action Cache in front of it, which is a lot like an OCI registry. So perhaps OCI artifacts are just 'cached intermediate artifacts' and populating that cache is a fine side-effect for bazel build to have.

There's some discussion of this on some other thread that I'm having trouble finding right now. TBH the maintainers here just don't have time and effort to make progress on a design for that right now. I'm also not clear whether such an experiment necessarily has to be performed in rules_oci, or if you could prove the concept in a separate derivative ruleset to start with.

@blackliner
Copy link

Agree with your reasoning. It is all about reproducability, and in my books this includes oci_images and honestly any pkg_tar that ends up synced to S3.

But with the current (design) limitation of being able to only run a single "run target" per invocation (need to take a crack at rules_multirun at some point), it is really tricky to scale nicely. We are really just at the beginning of migrating to rules_oci, but we have a plethora of container images in our monorepo to migrate, most of them serve as CI containers, others are being released to customers. Having to specify each target individually is somewhat more annoying than just saying bazel build --config=some_release_config //... (the oci_image and s3_sync target would be manual by default, and the config would make them part of the :all build tree)

@alexeagle
Copy link
Collaborator

There shouldn't be a need to list them explicitly. A bazel query command can be piped in a shell one-liner

@blackliner
Copy link

blackliner commented Jul 22, 2024

correct, but I would now need to maintain this additional "job executor" (even if it is just some xargs and parallel). I would rather see bazel handling this from the getgo.

Don't get me wrong please, I know it is doable, I am just questioning if it is the right way/design.

@SanjayVas
Copy link

SanjayVas commented Jul 22, 2024

IMO the issue here isn't really about how to push to multiple repositories with a single target, as that's reasonably simple to do with your own wrapper rules or something like rules_multirun. Also I don't think considering changing something fundamental to Bazel like bazel build having side effects outside of Bazel output trees/caches is in scope or even makes sense.

The focus should be on what functionality can either only be provided by implementing this in rules_oci or would be significantly more difficult to do so. I believe that's primarily making it more efficient.

There is additionally an argument for having common functionality be in rules_oci so it can be maintained by those who already have experience writing/maintaining Bazel rules. That is to say there has been a general philosophy thus far that only maintainers of Bazel rules repos should need to know how to write a rule. For better or worse, I believe that's an argument that the rules_oci maintainers have rejected.

@jjmaestro
Copy link
Contributor

Adding to what @SanjayVas said, and replying to @alexeagle:

As a design choice, we want rules_oci to only contain things that aren't already possible by layering with other rulesets, keeping it orthogonal and low-maintenance.

I agree on low-maintenance and re-using other rulesets but that's precisely the point, implementing it once in rules_oci, where IMO it's arguably the best place for a "push OCI images to multiple registries" functionality to be, and where advanced maintainers (and the community in general) can maintain one implementation, instead of having each user / team re-write their own.

Of course there's always a fine line of what functionality should be added to any project but I think that, in this case, this is a really good one to have in rules_oci. Plus, since it was already in rules_docker and the expectation (AFAIK) is for people to migrate to rules_oci, it would be a very nice thing to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

9 participants