-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configurable sandbox (pause) image #2020
Comments
According to the kubeadm charter, we expect that users have configured their container runtimes properly to be able to be used in a Kubernetes cluster. As such, kubeadm refrains from having to start, stop, configure or in any way try to change the configuration and state of the container runtime itself. |
I agree that kubeadm should not enter in the business of configuring CRIs, unless this is not done through the kubelet.
|
I also agree that configuring the CRI is out of the scope for kubeadm. Adding the In general, we can add to the documentation that if you want to configure the pause image, you have several options:
|
This command line flag is tightly coupled to the docker shim. So I suspect that its long term life is going to be connected to that (rather than the kubelet itself).
We already do that through kubelet's docker shim/command line. Also, I don't see a clear user benefit here (in kubeadm managing the sandbox image). Image pre-pull for air-gaped env? - That can be done by users way before the k8s packages are installed. |
Is there any way today to tell kubeadm today "hey actually I'm using this pause image"? While I don't need kubeadm to configure the CRI I'm aware of what image the CRI implementation is using and I'd like kubelet flag and the image pull to match (or just no image pulling, which seems to be possible in 1.13+ with Not configuring this properly in kubeadm indeed breaks airgapping. Also worth noting: IIRC kubelet has an image GC blacklist internally of the pod sandbox image specifically, so you probably want kubelet to know about this until we get a more flexible whitelis t.. (at some point I want to revive an old KEP for a configurable whitelist) Other than the image pulling (which is never desirable in our case..) and the image GC thing, in practice the CRI implementation implements the pod sandbox and it should be an implementation detail what this image is without any kubeadm involvement, unless kubeadm wants to start trying to manage CRI implementations. |
TBD, though there is provisional support now for building kubelet w/o dockershim, I think eventually it will move out.
IMHO it's mildly useful to make sure the flag aligns, but it would be just as good for kind at least to just not have kubeadm manage the pause image at all and leave us to supply this flag. Probably the main reason this is a thing is for people configuring IMHO this is a somewhat questionable feature to begin with though, users could instead configure docker / containerd / cri-o to use a mirror for |
The other problem with NOT having a structured field for this in kubeadm config is that kubeadm also doesn't have a structured way to discover what will be used, so you can't automate aligning these cleanly when using CRI instead of dockershim. At best you can do It's also entirely possible to implement the CRI spec without the pause image or a totally custom pause image for that matter. |
users (especially in China) have air-gap local registries where they rely on from there CR docs need to tell them where to apply this image in the CR config. from my outdated investigation this is not exactly well documented by the CRI providers. 70% of the kubeadm users use dockershim and "pause" is managed for them (minus upgrades)
1.18 kubeadm has structured output for
i think the dockershim changes will tell us what is going to happen in terms of the sandbox image. the two options are:
|
... precisely? but only if kubeadm is managing this image / flag. Do you have any additional reasons for managing this image in kubeadm?
It is abnormal to override this in a container runtime implementing pods because the sandbox image is part of the pods. However it's not difficult to find the pod sandbox image option in containerd CRI plugin config, it's no less well documented than anything else. https://github.com/containerd/cri/blob/master/docs/config.md#cri-plugin-config-guide
I don't know what percentage kind users are but the issue here is that there isn't not only does this not work properly it doesn't functionally make sense. Even for the dockershim users kubeadm only needs to be specifying this if you're overriding the image registry. Otherwise kubelet defaults this just fine, and the image pulling for pause is of questionable value.
Unfortunately this output while "structured" contains no more data, it's still just a grab bag list of images.
I see no related details there -- it's still just in dockershim mode the pod is implemented by us and we specify this and in CRI mode it's only relevant to the kubelet imageGC.
As I said if you don't manage this, usage of imageRepository will become problematic, users will wind up reimplementing imageRepository for this image anyhow. The entire imageRepository concept could be obviated by setting a comparable mirror at the runtime level though. If users override nothing "manage pause upgrades" is a no-op except when airgapped. The new kubelet / dockershim will have a new default pause image. External CRI implementations upgrade on their own schedule and aren't tied to kubernetes versions. |
only prepull / air-gap, i'd say.
is the request on the kind side related to?
actually it has to be
that is a side effect, but we may have to tolerate this. |
if we end up adding #524 with support for a pull-policy of alternatively we can break down preflight into sub-phases and this way one can skip |
the kind issue is just what brought me here, we'll need to employ workarounds for older k8s versions anyhow, but I dont think this is a good state to be in going forward with CRI becoming more common (e.g. CAP*)
yeah, which sadly 1.11 / 1.12 don't appear to have. that still leaves us doing substring matches on the images which I guess works but feels wrong.
yeah, again I might suggest documenting the simpler way: https://docs.docker.com/registry/recipes/mirror/ |
it's hard to maintain such a big skew. i'd drop support for these older versions at this point. related issue to the kind issue, that was logged today; the user is asking why they have two pause images: i see this as a potential miss configuration of containerd. yet this combined with the kind issue maybe signals that it's time kubeadm to stop managing the sandbox image. |
er, the pause image is not kubernetes version specific unless we're talking about dockershim because that is itself a partial container runtime. containerd, cri-o etc. should use whatever pod sandbox image meets their needs. for example, using the latest containerd with any kubernetes version works fine, but will use the 3.2 image, while older kubeadm will assume the 3.1 image. you can make containerd / the CRI plugin use whatever pause image kubeadm expects, but there's not a good reason to except for the double image pulling. FTR we are currently rewriting containerd config to match roughly Ideally we should use defer to the CR for podsandbox management, which probably doesn't need to involve kubeadm except that's it's going to be best practice to prevent imageGC of the sandbox image since all pods need it. (currently this is done via the kubelet sandbox image flag even if not using dockershim, imho it should be an image GC denylist in the future) |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
minor update here. with the dockershim extraction supposedly happening in the next few releases, kubeadm no longer has to manage pause. but kubeadm continuing to manage the pause image will allow users to see the default pause bundled with a k8s version. this is a good idea in general, related to security fixes in pause, although it doesn't change often. the API change requested here (which can land in v1beta3) still makes sense for pre-pull - e.g. users might choose to use the GCR pause, but configure an image repository for everything else. |
One side effect of this flag is kubelet will never try to GC this image ever (and only this image), which may be desirable. Kind now avoids the preflight / pull, but I still think it makes sense for other users. I've also seen chitchat about CRI implemented without the pause image entirely / your CRI implementation will have a tested default independent of the Kubernetes release cycle, so it may make sense to decouple this entirely. At the very least making it configurable in kubeadm would make it easier to align both on the CRI preferred version. |
that's a good point. if the kubelet actually guarantees that via this flag we should be passing it from kubeadm for all CRs. |
Yeah that's it exactly, I investigated the sources previously and found the behavior described in that issue: the flag is not gated only to dockershim as documented and imageGC at least will skip this image. It also has a default. I think the better approach would be a way to tell kubelet "don't ever delete these images" but I don't know if anyone has the bandwidth to get something like that through (technically easy, but features take a lot of effort to get approved). It could also arguably be done in the CRI instead (rejecting deletes?) |
In kubernetes/kubernetes#97438 (comment), containerd uses its configuration in |
cc @adisky |
As I can understand there are two things here -- for which same flag
[1] applies only to docker-shim, [2] applies to all container runtimes. (possibly missed/side-effect) I am not sure but AFAIU we do not want [1] for all container runtimes but we are concerned about [2].
|
i don't think the kubelet would like to check the config.toml. |
similar to the cgroup driver problem, instead of passing a flag to the kubelet for feature X, the kubelet should be able to ask about the configuration at the CRI socket. i can see that |
@neolit123 Thanks for the pointer, I too tried containerd returns "sandboxImage":"k8s.gcr.io/pause:3.2" in |
it has to be standardized eventually, until then the kubelet may have to accept the image to not be GCed. |
any objections to starting to pass --pod-infra-container-image to the kubelet for all container runtimes (not only docker) in 1.21? as discussed above this flag tells the kubelet image GC to not remove it. code in kubeadm is here: |
I still think there should be a more flexible way to not evict critical images, but short of that any configured "don't evict this pause image" autodetected from CRI SGTM. The desync between dockershim's / kubeadm's preferred pause image and CRI has caused us to avoid preflight entirely (so as not to break airgapping). GC we have to blanket disable for now anyhow though. |
here is the PR: |
had a chat with @fabriziopandini about this API change for v1beta3 (which is to be released in v1.22) main argument was already covered in discussions above. in the future the kubelet should stop allowing the user to configure the "pod infra container". CRI implementors and the kubelet ideally should be able to communicate what image to:
with respect to kubeadm image pulls:
|
Connecting the dots:
I agree that long term it probably makes sense to move towards decoupling awareness of the pause image, since it's an implementation detail of the CRI implementation at this point. |
Originally posted by @Random-Liu in #1796 (comment)
Example PR by @Random-Liu kubernetes/kubernetes#87827
Moving the discussion here for the sake of keeping the v1beta3 issue relatively clean and on track.
/kind api-change
/area ecosystem
/assign
The text was updated successfully, but these errors were encountered: