Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support entrypoint extraction from inside prow job #25383

Closed
wants to merge 3 commits into from

Conversation

chaodaiG
Copy link
Contributor

This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.

Ran ./hack/update-deps.sh and it failed, so probably will defer this to be after the bazel deprecation. Will also fix all failed unit tests afterwards.

(This PR was an effort baked on top of separate offline brainstorming with @cjwagner and @BenTheElder )
(The entrypoint extraction and docker auth parts were mainly from @imjasonh's work at ko-build/ko#581)

Early feedbacks are welcome:
/cc @cjwagner @BenTheElder @alvaroaleman

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/prow Issues or PRs related to prow area/prow/entrypoint Issues or PRs related to prow's entrypoint component area/prow/pod-utilities Issues or PRs related to prow's pod-utilities component area/prow/sidecar Issues or PRs related to prow's sidecar component area/prow/spyglass Issues or PRs related to prow's spyglass UI sig/testing Categorizes an issue or PR as relevant to SIG Testing. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Feb 23, 2022
@BenTheElder
Copy link
Member

The implementation is even cleaner than I expected 👍 , that was a fast turnaround 🙃

If we do support this, we should clearly document the limitations (around authentication in particular ...), it will be relevant both where the image is pulled from for the networking perspective (e.g. imagine localhost registry on the node) and the auth perspective.

Copy link

@imjasonh imjasonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need to support non-linux/amd64 platforms with this code? As it stands, if the image in question is multi-platform, remote.Image will default to only returning the image for linux/amd64 (overrideable with remote.WithPlatform). If you only need to support that one platform, or if the image happens to have the same entrypoint for all platforms (as ko-built images do) you'll be fine, just checking if this is something you need to consider.

(Tekton does a lot of gross hacks to support this, and I wouldn't recommend following in our footsteps unless you really need it!)

prow/entrypoint/run.go Outdated Show resolved Hide resolved
prow/entrypoint/run.go Show resolved Hide resolved
prow/entrypoint/run.go Outdated Show resolved Hide resolved
@chaodaiG chaodaiG force-pushed the entrypoint-respected branch from e39da9f to 7bfc06d Compare February 23, 2022 16:28
@chaodaiG
Copy link
Contributor Author

Is there any need to support non-linux/amd64 platforms with this code? As it stands, if the image in question is multi-platform, remote.Image will default to only returning the image for linux/amd64 (overrideable with remote.WithPlatform). If you only need to support that one platform, or if the image happens to have the same entrypoint for all platforms (as ko-built images do) you'll be fine, just checking if this is something you need to consider.

This is great to know, as the first step would prefer support only the scenario that all arches variants share the same entrypoints for simplicity. Left a comment in the code explaining this

@@ -404,6 +404,9 @@ type DecorationConfig struct {
// after sending SIGINT to send SIGKILL when aborting
// a job. Only applicable if decorating the PodSpec.
GracePeriod *Duration `json:"grace_period,omitempty"`
// UseDefaultEntrypoint indicates whether entrypoint should try to discover
// the default entrypoint from the image
UseDefaultEntrypoint *bool `json:"use_default_entrypoint,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this implicit by command being unset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now prow is generous enough to combine command and args, then pick the [0] as executable, so it's totally fine for a job to set executable as the first arg under args. I have done a quick search in this repo and there are 140+ jobs going down this route, so relying on command being unset would be a breaking change. (it's trivial to change though)

on the other hand, the initial idea is to start this feature as opt-in mode, so that it's not surprising for scenarios that this feature doesn't work well. If it proved to work well we can migrate to opt-out mode


amazonKeychain authn.Keychain = authn.NewKeychainFromHelper(ecr.NewECRHelper(ecr.WithLogOutput(ioutil.Discard)))
azureKeychain authn.Keychain = authn.NewKeychainFromHelper(credhelper.NewACRCredentialsHelper())
keychain = authn.NewMultiKeychain(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most common way to authenticate pulls in a cluster is to a) set an image pull secret on the pod b) bind a pull secret to a SA and c) set it up pull secret on the kubelet, all of which won't work here. Maybe we should document this somehow?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a keychain implementation that can use a k8s client to find imagePullSecrets and use those, in case that's helpful here or in the future: https://pkg.go.dev/github.com/google/go-containerregistry/pkg/authn/kubernetes#New

Example:

k8skc, err := kubernetes.New(ctx, client, kubernetes.Options{"my-ns", "my-sa", []string{"my-secret1"})
...
keychain := authn.NewMultiKeychain(
  authn.DefaultKeychain,
  amazonKeychain,
  ...
  k8skc,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that would be elegant and likely cover almost all cases

// https://github.com/kubernetes/test-infra/pull/25383#pullrequestreview-891109864,
// so for now we only support either "linux/amd64" or other arches that
// share the same entrypoint with its "linux/amd64" variant.
img, err := remote.Image(o.ImageRef,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just downloads some metadata, not the actual image, correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the img is just a struct containing metadata like layer blob digests (not contents). When you call img.ConfigFile it reuses the creds it used to fetch the image manifest to fetch the config blob (again, just a small JSON blob, no image contents).

Only if you call img.Layers() and consume any of the resulting layers' Compressed() or Uncompressed() ReadClosers will you actually fetch any of the layer data. Otherwise it'll only fetch small JSON blobs for the manifest and image config.

So that entrypoint can use the info to decide whether the prow job is trying to use the default entrypoint or not
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed
@chaodaiG chaodaiG force-pushed the entrypoint-respected branch from 7bfc06d to ef98d2e Compare February 24, 2022 21:36
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 24, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chaodaiG

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +184 to +185
case 0: // Fall back to use CMD if Entrypoint is empty
if cfg.Config.Cmd != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both Entrypoint and Cmd are empty we should probably log a warning or error like we do in the default case.

@k8s-ci-robot
Copy link
Contributor

@chaodaiG: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-test-infra-bazel ef98d2e link true /test pull-test-infra-bazel
pull-test-infra-verify-lint ef98d2e link true /test pull-test-infra-verify-lint
pull-test-infra-integration ef98d2e link true /test pull-test-infra-integration
pull-test-infra-prow-image-build-test ef98d2e link true /test pull-test-infra-prow-image-build-test
pull-test-infra-unit-test ef98d2e link true /test pull-test-infra-unit-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot
Copy link
Contributor

@chaodaiG: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 10, 2022
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Mar 14, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.

(This PR was an effort baked on top of separate offline brainstorming with @cjwagner and @BenTheElder )
(The entrypoint extraction and docker auth parts were mainly from @imjasonh's work at ko-build/ko#581)
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Mar 14, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.

(This PR was an effort baked on top of separate offline brainstorming with @cjwagner and @BenTheElder )
(The entrypoint extraction and docker auth parts were mainly from @imjasonh's work at ko-build/ko#581)
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Mar 14, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.
@chaodaiG
Copy link
Contributor Author

chaodaiG commented Mar 14, 2022

it might not be the best idea to determine the entrypoint from the test process itself, prefers #25618, which contains more detail in its description

chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Mar 14, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Mar 14, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.
@chaodaiG
Copy link
Contributor Author

/close

@k8s-ci-robot
Copy link
Contributor

@chaodaiG: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request Apr 21, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.
chaodaiG added a commit to chaodaiG/test-infra that referenced this pull request May 3, 2022
Supercedes kubernetes#25383, where in kubernetes#25383 the logic is in the pod where tests run, the problem there is that imagePullSecret defined on job pod is not accessible, which limits it's use case. The other problem is that the containers inside a test pod are not aware of the image they are from, so it's a little bit weird to let them know about it.

This PR moves the logic to plank, which feels more natural

-----------------------------
This is an unfortunate fact of prow, that user need to explictly set the entrypoint.

The migration of prow images from being built with bazel to ko introduced a side effect of all prow jobs that use gcr.io/k8s-prow images, such as robots/comment, robots/pr-creator etc. would fail due to the location of default entrypoint change. It would be trivial amount of work to update the binary location in prow jobs definition, but would like to use this opportunity to try to get this fixed.
@chaodaiG chaodaiG deleted the entrypoint-respected branch September 26, 2022 01:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/prow/entrypoint Issues or PRs related to prow's entrypoint component area/prow/pod-utilities Issues or PRs related to prow's pod-utilities component area/prow/sidecar Issues or PRs related to prow's sidecar component area/prow/spyglass Issues or PRs related to prow's spyglass UI area/prow Issues or PRs related to prow cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants