Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

while create v1.18 cluster, it still pull k8s.gcr.io/pause:3.1 image #1471

Closed
tao12345666333 opened this issue Apr 13, 2020 · 16 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@tao12345666333
Copy link
Member

What happened:

When I create v1.18 cluster (offline), it faild.

(MoeLove) ➜  ~ kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18                                                                                                           
Creating cluster "v1.18" ...                                                                                                                                                                                                                                   
 ✓ Ensuring node image (kindest/node:v1.18.0) 🖼                                                                                                                                                                                                                
 ✓ Preparing nodes 📦                                                                                                                                                                                                                                          
 ✓ Writing configuration 📜                                                                                                                                                                                                                                    
 ✗ Starting control-plane 🕹️                                                                                                                                                                                                                                    
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged v1.18-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1  

Debug:

(MoeLove) ➜  kubernetes git:(master) kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18 --retain                                                                                                                                                    
Creating cluster "v1.18" ...                                                                                                                                                                                                                                   
 ✓ Ensuring node image (kindest/node:v1.18.0) 🖼                                                                                                                                                                                                                
 ✓ Preparing nodes 📦                                                                                                                                                                                                                                          
 ✓ Writing configuration 📜                                                                                                                                                                                                                                    
 ✗ Starting control-plane 🕹️                                                                                                                                                                                                                                    
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged v1.18-control-plane kubeadm init --ignore-preflight-errors=all --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1     
(MoeLove) ➜  kubernetes git:(master) docker ps                                                                                                                                                                                                                 
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                       NAMES                                                                                                                  
01fad0389680        kindest/node:v1.18.0   "/usr/local/bin/entr…"   6 minutes ago       Up 6 minutes        127.0.0.1:39295->6443/tcp   v1.18-control-plane  
(MoeLove) ➜  kubernetes git:(master) docker exec -it 01fad0389680 bash
root@v1:/# systemctl status containerd                                                                                                                                                                                                                         
● containerd.service - containerd container runtime                                                                                                                                                                                                            
   Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: enabled)                                                                                                                                                                    
   Active: active (running) since Sun 2020-04-12 00:51:13 UTC; 10min ago                                                                                                                                                                                       
     Docs: https://containerd.io                                                                                                                                                                                                                               
 Main PID: 126 (containerd)                                                                                                                                                                                                                                    
    Tasks: 15                                                                                                                                                                                                                                                  
   Memory: 26.6M                                                                                                                                                                                                                                               
   CGroup: /system.slice/docker-01fad03896801f36a8de368f3265dad862bb66cf02d52a1a2f341d062a180578.scope/system.slice/containerd.service                                                                                                                         
           └─126 /usr/local/bin/containerd                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                               
Apr 12 01:01:16 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:16.024975301Z" level=error msg="RunPodSandbox for &PodSandbo                                                                                                                       
xMetadata{Name:kube-scheduler-v1.18-control-plane,Uid:2208005057033f6461474a4b1eaeb34f,Namespace:kube-system,Attempt:0,} failed, error"                                                                                                                        
error="failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack im                                                                                                                       
age \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.io/v2/pau                                                                                                                       
se/manifests/3.1: dial tcp 74.125.203.82:443: i/o timeout"                                                                                                                                                                                                     
Apr 12 01:01:19 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:19.345354143Z" level=error msg="Failed to load cni configura
tion" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"                                                                                                                         
Apr 12 01:01:20 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:20.024485908Z" level=error msg="RunPodSandbox for &PodSandbo                                                                                                                       
xMetadata{Name:etcd-v1.18-control-plane,Uid:c9a0fb45a6d0163b4056d67af760b788,Namespace:kube-system,Attempt:0,} failed, error" error="fai
led to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.
gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.io/v2/pause/manifes
ts/3.1: dial tcp 74.125.203.82:443: i/o timeout"                                                                                                                                                                                                               
Apr 12 01:01:22 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:22.023952642Z" level=error msg="RunPodSandbox for &PodSandbo                                     
xMetadata{Name:kube-controller-manager-v1.18-control-plane,Uid:b1c39986355aaa05d871c42958815492,Namespace:kube-system,Attempt:0,} failed                  
, error" error="failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and                                                                                                                        
unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.io/pause:3.1\": failed to do request: Head https://k8s.gcr.
io/v2/pause/manifests/3.1: dial tcp 74.125.203.82:443: i/o timeout"                                                                                                                                                                                            
Apr 12 01:01:24 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:24.347973630Z" level=error msg="Failed to load cni configura
root@v1:/# journalctl -u containerd                                                                                            
-- Logs begin at Sun 2020-04-12 00:51:13 UTC. --                                                                                                                                                                                                               
Apr 12 01:01:31 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:31.239271331Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-scheduler-v1.18-control-plane,Uid:2208005057033f6461474a4b1eaeb34f,Namespace:kube-system,Attempt:0,
} failed, error" error="rpc error: code = Canceled desc = failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.
io/pause:3.1\": failed to do request: context canceled"                                                                                                                                                                                                        
Apr 12 01:01:31 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:31.241440158Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kube-apiserver-v1.18-control-plane,Uid:49aefb7d9ab550220c600e6b2d8245f9,Namespace:kube-system,Attempt:0,
} failed, error" error="rpc error: code = Canceled desc = failed to get sandbox image \"k8s.gcr.io/pause:3.1\": failed to pull image \"k8s.gcr.io/pause:3.1\": failed to pull and unpack image \"k8s.gcr.io/pause:3.1\": failed to resolve reference \"k8s.gcr.
io/pause:3.1\": failed to do request: context canceled"                                                                                                                                                                                                        
Apr 12 01:01:42 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:42.646801294Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to l
oad cni config"                                                                                                                                                                                                                                                
Apr 12 01:01:42 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:42.715350898Z" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to l
oad cni config"                                                                                                                                                                                                                                                
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.077874875Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:etcd-v1.18-control-plane,Uid:c9a0fb45a6d0163b4056d67af760b788,Namespace:kube-system,Attempt:0,}"         
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.081317436Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-apiserver-v1.18-control-plane,Uid:49aefb7d9ab550220c600e6b2d8245f9,Namespace:kube-system,Attempt:0,}
"                                                                                                                                                                                                                                                              
Apr 12 01:01:43 v1.18-control-plane containerd[126]: time="2020-04-12T01:01:43.085858677Z" level=info msg="RunPodsandbox for &PodSandboxMetadata{Name:kube-controller-manager-v1.18-control-plane,Uid:b1c39986355aaa05d871c42958815492,Namespace:kube-system,At
tempt:0,}" 

containerd image list:

root@v1:/# ctr --namespace=k8s.io i ls                                                                                         
REF                                                                     TYPE                                       DIGEST                                                                  SIZE      PLATFORMS   LABELS                          
docker.io/kindest/kindnetd:0.5.4                                        application/vnd.oci.image.manifest.v1+json sha256:f7dbcdbc1e1cfda232bf13225de69fcdeeb64a81fd496e3c25414e6347ce374d 108.0 MiB linux/amd64 io.cri-containerd.image=managed 
docker.io/rancher/local-path-provisioner:v0.0.12                        application/vnd.oci.image.manifest.v1+json sha256:dd36600950cf353e88107d524031334abd32c8cc2982e331d2b5f6e200af7913 40.0 MiB  linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/coredns:1.6.7                                                application/vnd.oci.image.manifest.v1+json sha256:5dfcb0bdbe73888a8a8a8fad86b8a1943579e3ea482148225fc505c80f32757b 41.9 MiB  linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/debian-base:v2.0.0                                           application/vnd.oci.image.manifest.v1+json sha256:810d45197dc61cee861b30e6311e9a14a36050f758b47bc278ae8dfb578e4404 51.4 MiB  linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/etcd:3.4.4-0                                                 application/vnd.oci.image.manifest.v1+json sha256:8cf466d7ca35c35198f4ff270a9e5ae0ab9ad52e5c8d986ab5b3887568359a39 324.9 MiB linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/kube-apiserver:v1.19.0-alpha.1.512_ee6b88ddf904b4            application/vnd.oci.image.manifest.v1+json sha256:0d3e92bfe4e4df2e38a85276040b252cae53370feadf3cbc23e6ab124ca800e9 140.0 MiB linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/kube-controller-manager:v1.19.0-alpha.1.512_ee6b88ddf904b4   application/vnd.oci.image.manifest.v1+json sha256:3f72e5726c3605fb0a8e11c39b5c5f02e78d312cc6b6ac5173fec6fbe5fbc99e 127.0 MiB linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/kube-proxy:v1.19.0-alpha.1.512_ee6b88ddf904b4                application/vnd.oci.image.manifest.v1+json sha256:b1c21981b1f730269df234ec5dab759f3bed01e7250345771428dc60365d559e 127.3 MiB linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/kube-scheduler:v1.19.0-alpha.1.512_ee6b88ddf904b4            application/vnd.oci.image.manifest.v1+json sha256:33348ce6e79f45fb4f399133fbfabbee5de2c2dc7ad5e04b1ce764b3c42b81d3 108.2 MiB linux/amd64 io.cri-containerd.image=managed 
k8s.gcr.io/pause:3.2                                                    application/vnd.oci.image.manifest.v1+json sha256:61e45779fc594fcc1062bb9ed2cf5745b19c7ba70f0c93eceae04ffb5e402269 669.7 KiB linux/amd64 io.cri-containerd.image=managed 
sha256:0e8d7e76ed346ae63c1eb2f17047b3c727bc5783fa6b51d3ee12f89cea964dbc application/vnd.oci.image.manifest.v1+json sha256:0d3e92bfe4e4df2e38a85276040b252cae53370feadf3cbc23e6ab124ca800e9 140.0 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:12f992c4835e95e8e820cabd88d3ee5e55c2cb456e45b358bf9631e78814de2b application/vnd.oci.image.manifest.v1+json sha256:3f72e5726c3605fb0a8e11c39b5c5f02e78d312cc6b6ac5173fec6fbe5fbc99e 127.0 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:2186a1a396deb58f1ea5eaf20193a518ca05049b46ccd754ec83366b5c8c13d5 application/vnd.oci.image.manifest.v1+json sha256:f7dbcdbc1e1cfda232bf13225de69fcdeeb64a81fd496e3c25414e6347ce374d 108.0 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:30f347e5200f5451133fd7b8966c2403d94c3336600b756cd865bd8c40c7c314 application/vnd.oci.image.manifest.v1+json sha256:b1c21981b1f730269df234ec5dab759f3bed01e7250345771428dc60365d559e 127.3 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:67da37a9a360e600e74464da48437257b00a754c77c40f60c65e4cb327c34bd5 application/vnd.oci.image.manifest.v1+json sha256:5dfcb0bdbe73888a8a8a8fad86b8a1943579e3ea482148225fc505c80f32757b 41.9 MiB  linux/amd64 io.cri-containerd.image=managed 
sha256:6fab4b32ce98a757fa14abc91d504d992a972844326b9fcd70080397343403a5 application/vnd.oci.image.manifest.v1+json sha256:8cf466d7ca35c35198f4ff270a9e5ae0ab9ad52e5c8d986ab5b3887568359a39 324.9 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:80d28bedfe5dec59da9ebf8e6260224ac9008ab5c11dbbe16ee3ba3e4439ac2c application/vnd.oci.image.manifest.v1+json sha256:61e45779fc594fcc1062bb9ed2cf5745b19c7ba70f0c93eceae04ffb5e402269 669.7 KiB linux/amd64 io.cri-containerd.image=managed 
sha256:9bd6154724425e6083550fd85a91952fa2f79ef0b9844f0d009c37a72d075757 application/vnd.oci.image.manifest.v1+json sha256:810d45197dc61cee861b30e6311e9a14a36050f758b47bc278ae8dfb578e4404 51.4 MiB  linux/amd64 io.cri-containerd.image=managed 
sha256:c5161a19f4e358a6b4df024b355aefe04e1afb1b9be0a9c1224414b75037dc2c application/vnd.oci.image.manifest.v1+json sha256:33348ce6e79f45fb4f399133fbfabbee5de2c2dc7ad5e04b1ce764b3c42b81d3 108.2 MiB linux/amd64 io.cri-containerd.image=managed 
sha256:db10073a6f829f72cc09655e92fbc3c74410c647c626b431ecd5257d1f6b59c1 application/vnd.oci.image.manifest.v1+json sha256:dd36600950cf353e88107d524031334abd32c8cc2982e331d2b5f6e200af7913 40.0 MiB  linux/amd64 io.cri-containerd.image=managed 

What you expected to happen:

create cluster.

How to reproduce it (as minimally and precisely as possible):

In offline:

kind create cluster --image=kindest/node:v1.18.0@sha256:0e20578828edd939d25eb98496a685c76c98d54084932f76069f886ec315d694 --name=v1.18

Anything else we need to know?:

Environment:

  • kind version: (use kind version): kind v0.8.0-alpha+c68a1cf537d680 go1.14.2 linux/amd64
  • Kubernetes version: (use kubectl version):
(MoeLove) ➜  ~ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.0-rc.1", GitCommit:"dbbed7806681109f541264ab37284f9a51c87fcc", GitTreeState:"clean", BuildDate:"2020-03-17T17:16:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
  • Docker version: (use docker info):
  • OS (e.g. from /etc/os-release):
@tao12345666333 tao12345666333 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 13, 2020
@tao12345666333
Copy link
Member Author

We need update cri to use new pause:3.2

@BenTheElder
Copy link
Member

We really actually need kind & containerd to be in sync here. Older kubernetes versions will not use 3.2

We can probably configure CRI & kubeadm ourselves and auto preload it ourselves

@tao12345666333
Copy link
Member Author

tao12345666333 commented Apr 14, 2020

We can probably configure CRI & kubeadm ourselves and auto preload it ourselves

Do you mean that the user uses the pod-infra-container-image configuration item in the configuration file?

Edit: --pod-infra-container-image only support for docker.

Or do we directly preload both pause: 3.1 and pause: 3.2 ? pause image is small.
I consider that users may build node image by themselves, but not necessarily build base image.

k8s.gcr.io/pause                                                              3.2                            80d28bedfe5d        8 weeks ago         683kB
k8s.gcr.io/pause                                                              3.1                            da86e6ba6ca1        2 years ago         742kB

@BenTheElder
Copy link
Member

BenTheElder commented Apr 14, 2020 via email

@BenTheElder BenTheElder added this to the v0.8.0 milestone Apr 14, 2020
@BenTheElder BenTheElder added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 16, 2020
@BenTheElder
Copy link
Member

this one is going to be frustrating. there is no non-brittle way to detect this.

@tao12345666333
Copy link
Member Author

Maybe we can change containerd's config file ?

There has a sandbox_image configuration item of containerd's cri plugin.

@BenTheElder
Copy link
Member

Yes, that's known. The problem is knowing which image to use in a non brittle fashion.

Also kubeadm does not have an option to specify this.

Frankly kubeadm should not be pulling images in kind and the preflights are useless and time wssting in this context.

If we finally drop 1.11 and 1.12 k8s we can just skip the preflight and ship whatever image we tell containerd to use.

We'd wind up back at square 1 for detecting etcd vs pause though.

I think we're going to have to do the awful substring match that totally won't ever bite us 🙄

@BenTheElder
Copy link
Member

/lifecycle active

@k8s-ci-robot k8s-ci-robot added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 22, 2020
@BenTheElder
Copy link
Member

xref: kubernetes/kubeadm#2020

@BenTheElder
Copy link
Member

AFAICT kubeadm config images list at least does not respect kubeletExtraArgs containing pod-infra-container-image.

Per kubernetes/kubeadm#2020 there's no config field for this and they're currently not inclined to add one.

We can do the kubeadm config images list | grep pause => inject to containerd config at node build time, but that's pretty brittle and we really should actually just be using the preferred pod sandbox image for the CRI.

There's no reason for this to be tied to the kubernetes version.

We could work around for Kubernetes 1.13+ by skipping preflight entirely (and thus no pulling / we don't care if kubeadm is aware of this, we don't really have much use for preflight anyhow) but we'd still need to ignore it from the kubeadm config images list output at node build time and have some workaround for 1.11 / 1.12 (unless we drop those entirely).

@tao12345666333
Copy link
Member Author

We can do the kubeadm config images list | grep pause => inject to containerd config at node build time, but that's pretty brittle and we really should actually just be using the preferred pod sandbox image for the CRI.

+1

I'm thinking, for this problem, it is easier to load two images directly than maintaining a brittle logic (although this is a temporary solution)

Then we can consider publishing pre-build base-image of different kubernetes version.
For example kindest/base:v1.12 , kindest/base:v1.13 etc. (just like cri https://github.com/kubernetes-sigs/cri-tools#current-status)

WDYT?

@BenTheElder
Copy link
Member

I'm thinking, for this problem, it is easier to load two images directly than maintaining a brittle logic (although this is a temporary solution)

That's a ~1MB cop out already though, and we'll eventually need 3 or more.

Then we can consider publishing pre-build base-image of different kubernetes version.
For example kindest/base:v1.12 , kindest/base:v1.13 etc. (just like cri https://github.com/kubernetes-sigs/cri-tools#current-status)

CRI tools is backwards compatible, it adds functionality. Those versions are just new over time.

Adding multi version makes custom base images less manageable (e.g. for another architecture), and then we are still incorrectly managing the pause image, it should be related only to the pod implementation.

@BenTheElder
Copy link
Member

(also the pause image is loaded and node image time so multiple bases is unnecessary)

@BenTheElder
Copy link
Member

I started writing something after thinking about this morning's discussion, but the nod build code is such a mess and I don't want to just insert yet another hack here.

I'll get a PR up soon.

@BenTheElder
Copy link
Member

should be fixed by #1505

we may modify the approach in the future, but at least this is good enough for now I hope.

@tao12345666333
Copy link
Member Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants