Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot talk to cluster inside dind container #52

Closed
mitar opened this issue Oct 3, 2018 · 36 comments
Closed

Cannot talk to cluster inside dind container #52

mitar opened this issue Oct 3, 2018 · 36 comments
Assignees
Labels
kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@mitar
Copy link
Contributor

mitar commented Oct 3, 2018

I am trying to use it on GitLab CI which uses DIND. I am trying to setup cluster inside a Docker container. I have tried the following:

$ docker run --privileged --rm -d --name dind docker:dind

$ docker run --rm -t -i --link dind:docker -e DOCKER_HOST=tcp://docker:2375 ubuntu:artful

Inside container:

$ apt-get update
$ apt-get install --yes golang-go git curl unzip wget apt-transport-https curl ca-certificates

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF

$ curl -s https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" > /etc/apt/sources.list.d/docker.list

$ apt-get update
$ apt-get install --yes kubectl docker-ce

$ export GOPATH=/usr/local/go
$ export PATH="${GOPATH}/bin:${PATH}"

$ go get sigs.k8s.io/kind

$ kind create

$ export KUBECONFIG="/root/.kube/kind-config-1"
$ kubectl cluster-info 
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server localhost:32771 was refused - did you specify the right host or port?

$ docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                     NAMES
898cd8aeedce        kindest/node:v1.11.3   "/usr/local/bin/entr…"   2 minutes ago       Up 2 minutes        0.0.0.0:32771->6443/tcp   kind-1-control-plane

$ docker exec kind-1-control-plane ps
  PID TTY          TIME CMD
    1 ?        00:00:00 systemd
   55 ?        00:00:00 systemd-journal
   71 ?        00:00:12 dockerd
   88 ?        00:00:00 docker-containe
  812 ?        00:00:08 kubelet
 1755 ?        00:00:00 ps

$ kind delete
@BenTheElder
Copy link
Member

Hmm, we're running kind in our DIND setup, I've not tried it in this particular fashion yet though.
/assign

@BenTheElder
Copy link
Member

BenTheElder commented Oct 3, 2018

er if I'm reading this correctly, you installed a new version of docker after starting a dind container? that seems like a bad idea. investigating locally with a dind container

edit: nevermind, reread that 🙃

@BenTheElder
Copy link
Member

BenTheElder commented Oct 3, 2018

So the problem here appears to be is the network connection from your linked container to kind, the cluster is actually running but you can't talk to it, since it's actually running over in the dind container.

In our CI we do it like:

(host vm / kubernetes node, running docker) 
|-> [a kubernetes pod, in which we run docker in docker, run `kind` in this container] 
      |-> [kind "node" container is a sub container, which itself runs docker inside]
             |-> [kubernetes / docker containers for things running on kind]

But with your setup it appears to be more like:

(host vm, running docker presubmably?)
|-> [dind container, running docker]
|     |-> [kind "node" container, running docker itself]
|           |-> [kubernetes pod container(s)]
|
|-> [ubuntu container, linked to the dind container,    ]
|-> [running `kind`, talking to docker in dind container]

Would it be possible for you to avoid the dind container if you can already run docker containers? Can you give more details on your setup? Also note that --link appears to be a legacy feature docker may remove.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

So I am using --link just for debugging purposes to simulate what I believe GitLab CI is doing. Otherwise I am targeting Docker executor with docker-in-docker. They have a concept of services and then you connect to those services.

@BenTheElder
Copy link
Member

Thanks, taking a look.

This job runs kind in a docker-in-docker pod on kubernetes to run the conformance tests, so kind in docker in docker is definitely a supported use case, however we are also mounting some things to the pod (read only /lib/modules and /sys/fs/cgroup) ...

When you run in the docker executor, are you running everything with the dind container, or are you running the dind container alongside another container there as well?

@BenTheElder
Copy link
Member

BenTheElder commented Oct 3, 2018

I think the docker executor is actually a thin layer over kubernetes, cc @munnerz who I believe was involved here...

edit: nope, but it has very similar config, we can mount the volumes if necessary but they shouldn't strictly be necessary

It looks like we can mimic our pod setup if needed, per:
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section

One of our actual pods looks like this: https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-kind-conformance/508/artifacts/prow_podspec.yaml

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

When you run in the docker executor, are you running everything with the dind container, or are you running the dind container alongside another container there as well?

So currently I run dind as a service container, and then I have my main container in which I would like to run my tests on the kubernetes cluster. I have done this setup in the past for using regular Docker images/containers and it works well.

See a bit about this here as well.

@BenTheElder
Copy link
Member

Have you done it while talking to a networked service running over in the dind container before? We should just need to fix up:

  • a line in the kubeconfig that currently points to localhost:some-port as where the kubernetes API server is running
  • possibly tell kubeadm that it needs to sign the server certs for some other IP / host depending on the setup details

I can't tell from these details what that address is though, but given what you've shown and local replication of this setup as best as I can tell, otherwise things should be working fine.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

Have you done it while talking to a networked service running over in the dind container before?

Oh, I remember. I think I had issues with that in the past. The issue was that from outside, I can see only the dind container, and not any network behind it. So I had to publish ports in Docker containers running through dind, so that they got available on the dind container. So dind container is like host, and you do not have access directly to containers behind.

Which might be also additional problem for me because I want to run then pods on the cluster, which again might not be available from my testing container, because it would again be behind dind container/host.

@BenTheElder
Copy link
Member

Yes exactly, it may be possible to forward ports from the dind container but it might be tricky to manage, and we'd need to possibly add some small feature to kind to inform it of the expected address instead of localhost.

Alternatively, if you can run your other code + kind within a container or host running docker (eg dind), it will be a bit simpler. We know this works.

If we can get to the "kubernetes API server is forwarded through dind, and we've told kind to sign the certs for this, and point the kubeconfig at it", then we could use the kubernetes api server proxy functionality to talk to pods on the cluster...
but it will again likely be simpler and more robust if we can avoid the kind cluster being in a different "host" docker / network space.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

OK. So I do not know about gitlab.com CI, but on our private GitLab instance I discovered that it seems I am given docker.sock mounted into my container from host, I guess. (I have some thoughts about such setup and security of it, but I will not complain at the moment.) So I can simply have one Docker container inside which I do everything. I do the following (in an image with Go and Docker client already installed):

go get sigs.k8s.io/kind
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
apt-get update -q -q
apt-get install --yes kubectl
kind create
export KUBECONFIG="/root/.kube/kind-config-1"
kubectl cluster-info || true
docker ps
docker exec kind-1-control-plane ps

So I just install kind and kubectl and then test it out. Sadly, it still does not work but I think this is closer. The output of final commands is is as follows:

$ export KUBECONFIG="/root/.kube/kind-config-1"
$ kubectl cluster-info || true

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server localhost:32768 was refused - did you specify the right host or port?
$ docker ps
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS              PORTS                     NAMES
4aaf4aaa8acf        kindest/node:v1.11.3   "/usr/local/bin/entr…"   2 minutes ago       Up About a minute   0.0.0.0:32768->6443/tcp   kind-1-control-plane
83494fdd744b        a37a729110ce           "sh -c 'if [ -x /usr…"   3 minutes ago       Up 3 minutes                                  runner-b938861c-project-880-concurrent-0-build
$ docker exec kind-1-control-plane ps
  PID TTY          TIME CMD
    1 ?        00:00:01 systemd
   52 ?        00:00:00 systemd-journal
   66 ?        00:00:38 dockerd
   97 ?        00:00:00 docker-containe
  814 ?        00:00:06 kubelet
  922 ?        00:00:00 docker-containe
  923 ?        00:00:00 docker-containe
  925 ?        00:00:00 docker-containe
  926 ?        00:00:00 docker-containe
  992 ?        00:00:00 pause
  995 ?        00:00:00 pause
 1003 ?        00:00:00 pause
 1005 ?        00:00:00 pause
 1069 ?        00:00:00 docker-containe
 1088 ?        00:00:01 kube-scheduler
 1097 ?        00:00:00 docker-containe
 1118 ?        00:00:32 kube-apiserver
 1119 ?        00:00:00 docker-containe
 1135 ?        00:00:00 docker-containe
 1168 ?        00:00:06 etcd
 1180 ?        00:00:03 kube-controller
 1412 ?        00:00:00 docker-containe
 1431 ?        00:00:00 pause
 1453 ?        00:00:00 docker-containe
 1470 ?        00:00:00 kube-proxy
 1555 ?        00:00:00 docker-containe
 1591 ?        00:00:00 pause
 1695 ?        00:00:00 exe
 1707 ?        00:00:00 ps

So you see that docker ps now shows my CI container alongside kind-1-control-plane container. I think this is better because it means I should be able to connect directly to stuff in kind-1-control-plane container. But I am not yet able. Any suggestions here?

@BenTheElder
Copy link
Member

I think you just need your second container to use --network=host when creating it, I think I've got this setup replicated locally and that works for me.

@BenTheElder
Copy link
Member

from within a dind container I first tried what I think was your setup and confirmed the issue (still not the same network, but I can see the "cluster" running), then I created another container from within the dind container with:
docker run -it --network=host -v /var/run/docker.sock:/var/run/docker.sock ubuntu /bin/bash

and proceeded to install docker + kubectl, copy the kind config over from the other container, etc., and I can talk to the cluster, listing pods etc.

@BenTheElder
Copy link
Member

Also:

(I have some thoughts about such setup and security of it, but I will not complain at the moment.)

Absolutely! Any dind solution should be a major security concern, including this one. Please be careful.

kind and friends are cheap, fast, and work in situations where nested VMs are not available; but any current docker in docker solution is effectively ~root on the host regardless of whether the docker socket is passed due to needing either the socket (in which case they can create arbitrary containers on the host...) or being run with --privileged to run docker themselves (which is also ~root). We make sure to run this on CI VMs that don't have any particularly sensitive credentials. More sensitive workloads are on entirely separate CI cluster(s) .

I of course also run kind locally, but I don't schedule any untrusted workloads on it, unless you count Kubernetes itself to some degree.

@BenTheElder BenTheElder changed the title Cannot run it inside DIND Cannot talk to cluster inside dind container Oct 3, 2018
@BenTheElder BenTheElder added the kind/documentation Categorizes issue or PR as related to documentation. label Oct 3, 2018
@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

Hm, running with --host is sadly not possible for me. I think the issue is simply to update the address of where the kubernetes is running. I tried:

$ docker run --rm -t -i --privileged -v /var/run/docker.sock:/var/run/docker.sock ubuntu:artful

$ apt-get update
$ apt-get install --yes golang-go git curl unzip wget apt-transport-https curl ca-certificates

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list

$ curl -s https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" > /etc/apt/sources.list.d/docker.list

$ apt-get update
$ apt-get install --yes kubectl docker-ce

$ export GOPATH=/usr/local/go
$ export PATH="${GOPATH}/bin:${PATH}"

$ go get sigs.k8s.io/kind

$ kind create

$ sed -i "s/localhost:32781/$(docker inspect --format '{{.NetworkSettings.IPAddress}}' kind-1-control-plane):6443/" /root/.kube/kind-config-1

$ export KUBECONFIG="/root/.kube/kind-config-1"
$ kubectl cluster-info

But I am guessing certificates do not match? I still get connection refused. How can I be sure that the other container really runs properly? I can ping it now from my container.

I did nmap port scan and only port 10250/tcp is open on the container.

@BenTheElder
Copy link
Member

Are you sure --network=host is not doable? it should work if we're talking about another container in the dind. It should be the network of the dind container, not the actual host network.
EG it looks like network_mode = "host" will tell an executor to do this.
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-docker-section

The certificates may not match if you use another address, when kind inits kubernetes it requests localhost as an additional address for the API server certs so it an authenticate, besides the auto-detected IP(s). We can add a field to add another address to sign but we'd have to specify it ahead of time.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

So currently I am not using dind anymore, but Docker socket from the host. So the container in which I am already runs. I could try to create another container inside and then go inside it and so on, but to me it looks like the issue is somewhere else because the container runs and I can ping it (it is jut not on localhost), but no ports besides 10250 are open in the container.

@BenTheElder
Copy link
Member

BenTheElder commented Oct 3, 2018

There should be one randomly allocated port (allocated by docker) open on the container forwarding to the secure API server port (6443), and the exported kubeconfig will match localhost:${THE_PORT}. If you run a process on the same level as the docker daemon it should be able to talk to it. 10250 is probably the random port from that session.

If you run something in an nested container that container ideally needs to use --network=host to avoid going into another network namespace.
Or we can point at the container IP instead (which the cert should actually already be signed for if it's just the docker container IP), the config with that IP is currently obtainable with docker cp kind-1-control-plane:/etc/kubernetes/admin.conf admin.conf.

EDIT: adjacent -> nested. for adjacent we just want to use the actual node container IP, which is actually in the default config, when we export the config to the host we rewrite this to match the forwarded port. I'm thinking about ways we could better expose that...

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

So something else is wrong. So I am connecting directly to the container, bypassing Docker port mapping. Connecting to 6443 does not work. 10250 is port on which kubelet is listening. But why there are no other things running correctly in the container.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

I managed to get it working with the following:

$ docker run --rm -t -i --privileged ubuntu:artful

$ apt-get update
$ apt-get install --yes golang-go git curl unzip wget apt-transport-https curl ca-certificates

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list

$ curl -s https://download.docker.com/linux/ubuntu/gpg | apt-key add -
$ echo "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable" > /etc/apt/sources.list.d/docker.list

$ apt-get update
$ apt-get install --yes kubectl docker-ce

$ echo '{"storage-driver": "vfs"}' > /etc/docker/daemon.json
$ service docker start

$ export GOPATH=/usr/local/go
$ export PATH="${GOPATH}/bin:${PATH}"

$ go get sigs.k8s.io/kind

$ kind create

$ export KUBECONFIG="/root/.kube/kind-config-1"
$ kubectl cluster-info

So instead of using host's Docker socket, I do simply a proper dind inside my container.

@BenTheElder
Copy link
Member

Awesome!
You can likely avoid some peformance loss (I don't have numbers currently) by making /var/lib/docker a volume of some kind (eg tmpfs) instead of switching to the vfs driver.

If you use kind with defaults this should continue to work as is for the foreseeable future. The config is not yet stable (PR #36) and logging etc needs work. I'll be stabilizing it and looking into multinode this quarter though, we intend to use it for more CI ourselves. :-)

Please let me know if you have any more feedback or issues. I know user and development guides are very high on my list currently besides UX and stability fixes.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

So while I was able to make this work, it would be great if this would work also no gitlab.com. It would be useful to try it there as well.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

And thanks for all this work and thank you for all the help.

@mitar
Copy link
Contributor Author

mitar commented Oct 3, 2018

OK, as a note to my future self and others. I had issues running Docker inside my own privileged container so that I could run kind inside, and the reason was that I wanted to use overlay2 on top of overlay2 already. The solution was to define VOLUME /var/lib/docker so that Docker files went directly to host's volume and not overlay inside the container.

@BenTheElder
Copy link
Member

Yes exactly, I tried to mention this above but failed I think. kind does something like this for it's own docker in docker on the "node"s for the same reason 🙃

Doing that has worked flawlessly for dind in our CI at least. overlay fs don't stack, but it works fine if you just make sure the docker graph (/var/lib/docker) is a volume for any dind containers.

I'll be sure to add this to the docs soon!

@mitar mitar mentioned this issue Oct 3, 2018
@BenTheElder BenTheElder added this to the 2018 Goals milestone Dec 18, 2018
@BenTheElder BenTheElder modified the milestones: 2018 Goals, 2019 goals Mar 10, 2019
@BenTheElder BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 10, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2019
@TheErk
Copy link

TheErk commented Jun 18, 2019

After fixing the cluster name issue from #619 I think I hit the same issue.
In my CI get:

kind cluster name is kind947216
Creating cluster "kind947216" ...
 • Ensuring node image (kindest/node:v1.14.2) 🖼  ...
 ....
 ✓ Joining worker nodes 🚜
Cluster creation complete. You can now use the cluster with:
export KUBECONFIG="$(kind get kubeconfig-path --name="kind947216")"
kubectl cluster-info
+ kubectl cluster-info

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
The connection to the server localhost:40781 was refused - did you specify the right host or port?

I am not so sure to understand in which container /var/lib/docker must be specified as a volume?
Is it the one I use as image for my gitlab job?

Since this issue is quite old now is there some bits of docs about that?

@BenTheElder
Copy link
Member

@TheErk to run docker in docker that path must be a volume in the container you run docker in.

There are no docs for this because we don't have any gitlab CI and nobody has contributed any 😅

As mentioned previously, any contributions tohttps://github.com/kind-ci/examples would also be extremely welcome, we aim to eventually have starter configs etc. for use everywhere..

@BenTheElder
Copy link
Member

xref: #620 (comment)

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 19, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@remram44
Copy link

remram44 commented Nov 17, 2019

I've had success running kind with the docker:dind service by using the Networking section of the config, e.g. setting networking.apiServerAddress to the IP address of the docker service.

Using something like apiServerAddress: 0.0.0.0 won't work because this single setting is used for three things:

  • The listening address for the node container's exported port
  • The certificate subject name
  • The address kubectl on the host will be configured to connect to

Example:

cat >>kind.yaml <<END
networking:
  apiServerAddress: $(host docker)
END

See my full setup here: https://gitlab.com/ViDA-NYU/reproserver/commit/4e9e8adfca37ca091e5c02ad3a3b070736e3b0ec

@imrajdas
Copy link

imrajdas commented Dec 31, 2019

SOLUTION:
@mitar You can run KIND inside the docker container by allowing network to use the host network using the following command:

  • docker run -ti --privileged --network="host" gitlab/dind:latest

Note:

  • use privileged mode (To mount the kubeconfig path and for the docker runtime)
  • use gitlab/dind official image

@BenTheElder
Copy link
Member

FYI for future folks finding this issue we now have a contrib repo https://kind.sigs.k8s.io/docs/user/resources/#using-kind-in-ci that documents CI setups such as this.

@BenTheElder BenTheElder removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 23, 2021
@BenTheElder
Copy link
Member

I think this is covered there now. See also perhaps #303.

yankay pushed a commit to yankay/kind that referenced this issue Mar 17, 2022
Removing the old kind binary install since .11 is released and kube-p…
stg-0 pushed a commit to stg-0/kind that referenced this issue Mar 14, 2023
…ble_iam_avoid_creation

[EOS-11007] No ejecutar el paso de "IAM security" si se indica --avoid-creation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

7 participants