Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nodes fail to come up when using custom CA and Kubeconfig #2778

Closed
karansinghneu opened this issue Nov 5, 2022 · 6 comments
Closed

Nodes fail to come up when using custom CA and Kubeconfig #2778

karansinghneu opened this issue Nov 5, 2022 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@karansinghneu
Copy link
Contributor

karansinghneu commented Nov 5, 2022

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide?]
Yes

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

  1. Created a bootstrap kind cluster
  2. Provisioned a Target Management Cluster that has a VM identity using ServicePrincipal
  3. Bootstrap and pivot to Target Management Cluster
  4. Deleted Kind cluster
  5. Created CA certs and used them to create a kubeconfig
  6. Mounted CA certs and kubeconfig as secrets onto the management cluster
  7. Provisioned a Workload cluster from the management cluster that has a VM identity using UserAssignedManagedIdentity, uses custom ca certs and kubeconfig, and custom frontEndIPs name, ip name and fqdn

What did you expect to happen:
Workload cluster to provision successfully

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

  1. Control Plane node is Provisioned and has a provider ID but no Nodename
  2. Worker nodes neither have a provider ID nor a Nodename

Environment:

  • cluster-api-provider-azure version: v1.5.3
  • Kubernetes version: (use kubectl version): Client Version: v1.25.3, Kustomize Version: v4.5.7, Server Version: v1.25.0
  • OS (e.g. from /etc/os-release): Linux (ubuntu 20.04)
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 5, 2022
@CecileRobertMichon
Copy link
Contributor

CecileRobertMichon commented Nov 7, 2022

@karansinghneu can you please upload any logs that you've collected (controller logs, cloud init logs would be helpful - see https://capz.sigs.k8s.io/topics/troubleshooting.html) and the cluster yaml spec you used for the AzureCluster ?

(make sure to redact any secrets)

@karansinghneu
Copy link
Contributor Author

karansinghneu commented Nov 7, 2022

@CecileRobertMichon
Controller logs:

I1107 22:57:13.937207 1 azuremachine_controller.go:243] controllers.AzureMachineReconciler.reconcileNormal "msg"="Reconciling AzureMachine" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-6xc4w","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-6xc4w" "namespace"="default" "reconcileID"="60fcfd13-cb41-452a-883e-288cfb3bcbc0" "x-ms-correlation-request-id"="dd9ef87a-942f-4d58-971a-3e39b98684d3"
I1107 22:57:13.938142 1 machine.go:655] scope.MachineScope.GetVMImage "msg"="No image specified for machine, using default Linux Image" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-6xc4w","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-6xc4w" "namespace"="default" "reconcileID"="60fcfd13-cb41-452a-883e-288cfb3bcbc0" "x-ms-correlation-request-id"="dd9ef87a-942f-4d58-971a-3e39b98684d3" "machine"="capz-acr-cluster-workload-1-control-plane-6xc4w"
I1107 22:57:13.938245 1 images.go:124] virtualmachineimages.Service.getSKUAndVersion "msg"="Getting VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-6xc4w","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-6xc4w" "namespace"="default" "reconcileID"="60fcfd13-cb41-452a-883e-288cfb3bcbc0" "x-ms-correlation-request-id"="dd9ef87a-942f-4d58-971a-3e39b98684d3" "k8sVersion"="v1.25.0" "location"="australiaeast" "offer"="capi" "osAndVersion"="ubuntu-2004" "publisher"="cncf-upstream"
I1107 22:57:13.938331 1 cache.go:122] virtualmachineimages.Cache.Get "msg"="VM images cache hit" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-6xc4w","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-6xc4w" "namespace"="default" "reconcileID"="60fcfd13-cb41-452a-883e-288cfb3bcbc0" "x-ms-correlation-request-id"="dd9ef87a-942f-4d58-971a-3e39b98684d3" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"
I1107 22:57:13.938403 1 images.go:176] virtualmachineimages.Service.getSKUAndVersion "msg"="Found VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-6xc4w","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-6xc4w" "namespace"="default" "reconcileID"="60fcfd13-cb41-452a-883e-288cfb3bcbc0" "x-ms-correlation-request-id"="dd9ef87a-942f-4d58-971a-3e39b98684d3" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1" "version"="125.0.20220824"
I1107 22:57:13.939590 1 machine.go:655] scope.MachineScope.GetVMImage "msg"="No image specified for machine, using default Linux Image" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-r5b7j","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-r5b7j" "namespace"="default" "reconcileID"="dfe5729b-701f-47d6-bfb5-2a97d61e150d" "x-ms-correlation-request-id"="37e77e38-10a2-42b5-a9f6-5530c55cff8e" "machine"="capz-acr-cluster-workload-1-control-plane-r5b7j"
I1107 22:57:13.941461 1 azuremachine_controller.go:243] controllers.AzureMachineReconciler.reconcileNormal "msg"="Reconciling AzureMachine" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-d72hp","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-d72hp" "namespace"="default" "reconcileID"="fbe64a85-3b54-4479-a1ca-bda436260e4b" "x-ms-correlation-request-id"="791b2ed9-5e25-4a0f-8635-1a24ac5a0f8c"
I1107 22:57:13.944061 1 azuremachine_controller.go:243] controllers.AzureMachineReconciler.reconcileNormal "msg"="Reconciling AzureMachine" "azureMachine"={"name":"capz-acr-cluster-workload-1-md-0-x59mx","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-md-0-x59mx" "namespace"="default" "reconcileID"="632b9667-aff5-4f6c-9210-c93d633fd0dc" "x-ms-correlation-request-id"="f046dd0c-e4d2-48e8-b5de-b0036afd0e60"
I1107 22:57:13.949187 1 images.go:124] virtualmachineimages.Service.getSKUAndVersion "msg"="Getting VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-r5b7j","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-r5b7j" "namespace"="default" "reconcileID"="dfe5729b-701f-47d6-bfb5-2a97d61e150d" "x-ms-correlation-request-id"="37e77e38-10a2-42b5-a9f6-5530c55cff8e" "k8sVersion"="v1.25.0" "location"="australiaeast" "offer"="capi" "osAndVersion"="ubuntu-2004" "publisher"="cncf-upstream"
I1107 22:57:13.945320 1 azuremachine_controller.go:243] controllers.AzureMachineReconciler.reconcileNormal "msg"="Reconciling AzureMachine" "azureMachine"={"name":"capz-acr-cluster-workload-2-control-plane-wwr6v","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-2-control-plane-wwr6v" "namespace"="default" "reconcileID"="9c5fa450-e2ae-471e-befa-1b566a8f60d0" "x-ms-correlation-request-id"="5c6f5b6e-9e9c-4380-aec5-d2e7ca93f816"
I1107 22:57:13.949270 1 cache.go:122] virtualmachineimages.Cache.Get "msg"="VM images cache hit" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-r5b7j","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-r5b7j" "namespace"="default" "reconcileID"="dfe5729b-701f-47d6-bfb5-2a97d61e150d" "x-ms-correlation-request-id"="37e77e38-10a2-42b5-a9f6-5530c55cff8e" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"
I1107 22:57:13.949344 1 images.go:176] virtualmachineimages.Service.getSKUAndVersion "msg"="Found VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-r5b7j","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-r5b7j" "namespace"="default" "reconcileID"="dfe5729b-701f-47d6-bfb5-2a97d61e150d" "x-ms-correlation-request-id"="37e77e38-10a2-42b5-a9f6-5530c55cff8e" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1" "version"="125.0.20220824"
I1107 22:57:13.957569 1 machine.go:655] scope.MachineScope.GetVMImage "msg"="No image specified for machine, using default Linux Image" "azureMachine"={"name":"capz-acr-cluster-workload-2-control-plane-wwr6v","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-2-control-plane-wwr6v" "namespace"="default" "reconcileID"="9c5fa450-e2ae-471e-befa-1b566a8f60d0" "x-ms-correlation-request-id"="5c6f5b6e-9e9c-4380-aec5-d2e7ca93f816" "machine"="capz-acr-cluster-workload-2-control-plane-wwr6v"
I1107 22:57:13.957655 1 images.go:124] virtualmachineimages.Service.getSKUAndVersion "msg"="Getting VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-2-control-plane-wwr6v","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-2-control-plane-wwr6v" "namespace"="default" "reconcileID"="9c5fa450-e2ae-471e-befa-1b566a8f60d0" "x-ms-correlation-request-id"="5c6f5b6e-9e9c-4380-aec5-d2e7ca93f816" "k8sVersion"="v1.25.0" "location"="australiaeast" "offer"="capi" "osAndVersion"="ubuntu-2004" "publisher"="cncf-upstream"
I1107 22:57:13.957685 1 machine.go:655] scope.MachineScope.GetVMImage "msg"="No image specified for machine, using default Linux Image" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-d72hp","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-d72hp" "namespace"="default" "reconcileID"="fbe64a85-3b54-4479-a1ca-bda436260e4b" "x-ms-correlation-request-id"="791b2ed9-5e25-4a0f-8635-1a24ac5a0f8c" "machine"="capz-acr-cluster-workload-1-control-plane-d72hp"
I1107 22:57:13.957729 1 cache.go:122] virtualmachineimages.Cache.Get "msg"="VM images cache hit" "azureMachine"={"name":"capz-acr-cluster-workload-2-control-plane-wwr6v","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-2-control-plane-wwr6v" "namespace"="default" "reconcileID"="9c5fa450-e2ae-471e-befa-1b566a8f60d0" "x-ms-correlation-request-id"="5c6f5b6e-9e9c-4380-aec5-d2e7ca93f816" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"
I1107 22:57:13.957787 1 images.go:124] virtualmachineimages.Service.getSKUAndVersion "msg"="Getting VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-d72hp","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-d72hp" "namespace"="default" "reconcileID"="fbe64a85-3b54-4479-a1ca-bda436260e4b" "x-ms-correlation-request-id"="791b2ed9-5e25-4a0f-8635-1a24ac5a0f8c" "k8sVersion"="v1.25.0" "location"="australiaeast" "offer"="capi" "osAndVersion"="ubuntu-2004" "publisher"="cncf-upstream"
I1107 22:57:13.957799 1 images.go:176] virtualmachineimages.Service.getSKUAndVersion "msg"="Found VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-2-control-plane-wwr6v","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-2-control-plane-wwr6v" "namespace"="default" "reconcileID"="9c5fa450-e2ae-471e-befa-1b566a8f60d0" "x-ms-correlation-request-id"="5c6f5b6e-9e9c-4380-aec5-d2e7ca93f816" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1" "version"="125.0.20220824"
I1107 22:57:13.957875 1 cache.go:122] virtualmachineimages.Cache.Get "msg"="VM images cache hit" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-d72hp","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-d72hp" "namespace"="default" "reconcileID"="fbe64a85-3b54-4479-a1ca-bda436260e4b" "x-ms-correlation-request-id"="791b2ed9-5e25-4a0f-8635-1a24ac5a0f8c" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"
I1107 22:57:13.957961 1 images.go:176] virtualmachineimages.Service.getSKUAndVersion "msg"="Found VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-control-plane-d72hp","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-control-plane-d72hp" "namespace"="default" "reconcileID"="fbe64a85-3b54-4479-a1ca-bda436260e4b" "x-ms-correlation-request-id"="791b2ed9-5e25-4a0f-8635-1a24ac5a0f8c" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1" "version"="125.0.20220824"
I1107 22:57:13.959239 1 machine.go:655] scope.MachineScope.GetVMImage "msg"="No image specified for machine, using default Linux Image" "azureMachine"={"name":"capz-acr-cluster-workload-1-md-0-x59mx","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-md-0-x59mx" "namespace"="default" "reconcileID"="632b9667-aff5-4f6c-9210-c93d633fd0dc" "x-ms-correlation-request-id"="f046dd0c-e4d2-48e8-b5de-b0036afd0e60" "machine"="capz-acr-cluster-workload-1-md-0-x59mx"
I1107 22:57:13.959538 1 images.go:124] virtualmachineimages.Service.getSKUAndVersion "msg"="Getting VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-md-0-x59mx","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-md-0-x59mx" "namespace"="default" "reconcileID"="632b9667-aff5-4f6c-9210-c93d633fd0dc" "x-ms-correlation-request-id"="f046dd0c-e4d2-48e8-b5de-b0036afd0e60" "k8sVersion"="v1.25.0" "location"="australiaeast" "offer"="capi" "osAndVersion"="ubuntu-2004" "publisher"="cncf-upstream"
I1107 22:57:13.960282 1 cache.go:122] virtualmachineimages.Cache.Get "msg"="VM images cache hit" "azureMachine"={"name":"capz-acr-cluster-workload-1-md-0-x59mx","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-md-0-x59mx" "namespace"="default" "reconcileID"="632b9667-aff5-4f6c-9210-c93d633fd0dc" "x-ms-correlation-request-id"="f046dd0c-e4d2-48e8-b5de-b0036afd0e60" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1"
I1107 22:57:13.960378 1 images.go:176] virtualmachineimages.Service.getSKUAndVersion "msg"="Found VM image SKU and version" "azureMachine"={"name":"capz-acr-cluster-workload-1-md-0-x59mx","namespace":"default"} "controller"="azuremachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachine" "name"="capz-acr-cluster-workload-1-md-0-x59mx" "namespace"="default" "reconcileID"="632b9667-aff5-4f6c-9210-c93d633fd0dc" "x-ms-correlation-request-id"="f046dd0c-e4d2-48e8-b5de-b0036afd0e60" "location"="australiaeast" "offer"="capi" "publisher"="cncf-upstream" "sku"="ubuntu-2004-gen1" "version"="125.0.20220824"

$kubectl get azuremachines

capz-acr-cluster-mgmt-control-plane-xrtk9 True Succeeded
capz-acr-cluster-mgmt-md-0-pqdnb True Succeeded
capz-acr-cluster-workload-1-control-plane-6xc4w True Succeeded
capz-acr-cluster-workload-1-control-plane-d72hp True Succeeded
capz-acr-cluster-workload-1-control-plane-r5b7j True Succeeded
capz-acr-cluster-workload-1-md-0-2t9t8 True Succeeded
capz-acr-cluster-workload-1-md-0-nldvq True Succeeded
capz-acr-cluster-workload-1-md-0-x59mx True Succeeded
capz-acr-cluster-workload-2-control-plane-wwr6v True Succeeded
capz-acr-cluster-workload-2-md-0-hfd6v False WaitingForBootstrapData

From control plane node:
$kubectl get azuremachines

The connection to the server localhost:8080 was refused - did you specify the right host or port?

$less /var/log/cloud-init-output.log
[2022-11-04 21:39:42] Generating public/private rsa key pair.
[2022-11-04 21:39:42] Your identification has been saved in /etc/ssh/ssh_host_rsa_key
[2022-11-04 21:39:42] Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub
[2022-11-04 21:39:42] The key fingerprint is:
[2022-11-04 21:39:42] SHA256 *
root@capz-acr-cluster-workload-2-control-plane-wwr6v
[2022-11-04 21:39:42] The key's randomart image is:
...
...
[2022-11-04 21:39:42] Generating public/private dsa key pair.
[2022-11-04 21:39:42] Your identification has been saved in /etc/ssh/ssh_host_dsa_key
[2022-11-04 21:39:42] Your public key has been saved in /etc/ssh/ssh_host_dsa_key.pub
[2022-11-04 21:39:42] The key fingerprint is:
[2022-11-04 21:39:42] SHA256:*
oot@capz-acr-cluster-workload-2-control-plane-wwr6v
[2022-11-04 21:39:42] The key's randomart image is:
...
...
[2022-11-04 21:39:42] Generating public/private ecdsa key pair.
[2022-11-04 21:39:42] Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key
[2022-11-04 21:39:42] Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub
[2022-11-04 21:39:42] The key fingerprint is:
[2022-11-04 21:39:42] SHA256:*
root@capz-acr-cluster-workload-2-control-plane-wwr6v
[2022-11-04 21:39:42] The key's randomart image is:
...
...
2022-11-04 21:39:42] Generating public/private ed25519 key pair.
[2022-11-04 21:39:42] Your identification has been saved in /etc/ssh/ssh_host_ed25519_key
[2022-11-04 21:39:42] Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub
[2022-11-04 21:39:42] The key fingerprint is:
[2022-11-04 21:39:42] SHA256*
root@capz-acr-cluster-workload-2-control-plane-wwr6v
[2022-11-04 21:39:42] The key's randomart image is:
...
...
[2022-11-04 21:39:50] Cloud-init v. 22.2-0ubuntu1~20.04.3 running 'modules:config' at Fri, 04 Nov 2022 21:39:49 +0000. Up 26.88 seconds.
[2022-11-04 21:39:55] [init] Using Kubernetes version: v1.25.0
[2022-11-04 21:39:55] [preflight] Running pre-flight checks
[2022-11-04 21:39:59] [preflight] Pulling images required for setting up a Kubernetes cluster
[2022-11-04 21:39:59] [preflight] This might take a minute or two, depending on the speed of your internet connection
[2022-11-04 21:39:59] [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[2022-11-04 21:39:59] [certs] Using certificateDir folder "/etc/kubernetes/pki"
[2022-11-04 21:39:59] [certs] Using existing ca certificate authority
[2022-11-04 21:39:59] [certs] Generating "apiserver" certificate and key
[2022-11-04 21:39:59] [certs] apiserver serving cert is signed for DNS names [capz-acr-cluster-workload-2-control-plane-wwr6v capz-acr-cluster-workload-2-pdns.australiaeast.cloudapp.azure.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.4]
[2022-11-04 21:39:59] [certs] Generating "apiserver-kubelet-client" certificate and key
[2022-11-04 21:39:59] [certs] Using existing front-proxy-ca certificate authority
[2022-11-04 21:39:59] [certs] Generating "front-proxy-client" certificate and key
[2022-11-04 21:39:59] [certs] Using existing etcd/ca certificate authority
[2022-11-04 21:40:00] [certs] Generating "etcd/server" certificate and key
[2022-11-04 21:40:00] [certs] etcd/server serving cert is signed for DNS names [capz-acr-cluster-workload-2-control-plane-wwr6v localhost] and IPs [10.0.0.4 127.0.0.1 ::1]
[2022-11-04 21:40:00] [certs] Generating "etcd/peer" certificate and key
[2022-11-04 21:40:00] [certs] etcd/peer serving cert is signed for DNS names [capz-acr-cluster-workload-2-control-plane-wwr6v localhost] and IPs [10.0.0.4 127.0.0.1 ::1]
[2022-11-04 21:40:00] [certs] Generating "etcd/healthcheck-client" certificate and key
[2022-11-04 21:40:00] [certs] Generating "apiserver-etcd-client" certificate and key
[2022-11-04 21:40:00] [certs] Using the existing "sa" key
[2022-11-04 21:40:00] [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[2022-11-04 21:40:01] [kubeconfig] Writing "admin.conf" kubeconfig file
[2022-11-04 21:40:01] [kubeconfig] Writing "kubelet.conf" kubeconfig file
[2022-11-04 21:40:01] [kubeconfig] Writing "controller-manager.conf" kubeconfig file
[2022-11-04 21:40:01] [kubeconfig] Writing "scheduler.conf" kubeconfig file
[2022-11-04 21:40:01] [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[2022-11-04 21:40:01] [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[2022-11-04 21:40:01] [kubelet-start] Starting the kubelet
[2022-11-04 21:40:02] [control-plane] Using manifest folder "/etc/kubernetes/manifests"
[2022-11-04 21:40:02] [control-plane] Creating static Pod manifest for "kube-apiserver"
[2022-11-04 21:40:02] [control-plane] Creating static Pod manifest for "kube-controller-manager"
[2022-11-04 21:40:02] [control-plane] Creating static Pod manifest for "kube-scheduler"
[2022-11-04 21:40:02] [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[2022-11-04 21:40:02] [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 20m0s
[2022-11-04 21:40:35] [apiclient] All control plane components are healthy after 32.567248 seconds
[2022-11-04 21:40:35] [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[2022-11-04 21:40:35] [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[2022-11-04 21:40:35] [upload-certs] Skipping phase. Please see --upload-certs
[2022-11-04 21:40:35] [mark-control-plane] Marking the node capz-acr-cluster-workload-2-control-plane-wwr6v as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[2022-11-04 21:40:35] [mark-control-plane] Marking the node capz-acr-cluster-workload-2-control-plane-wwr6v as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[2022-11-04 21:40:36] [bootstrap-token] Using token: evxv6y.vdzajrctk6p0jn8h
[2022-11-04 21:40:36] [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[2022-11-04 21:40:36] [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[2022-11-04 21:40:36] [bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[2022-11-04 21:40:36] [bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[2022-11-04 21:40:36] [bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[2022-11-04 21:40:36] [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[2022-11-04 21:40:36] [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[2022-11-04 21:40:36] [addons] Applied essential addon: CoreDNS
[2022-11-04 21:40:36] [addons] Applied essential addon: kube-proxy
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] Your Kubernetes control-plane has initialized successfully!
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] To start using your cluster, you need to run the following as a regular user:
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] mkdir -p $HOME/.kube
[2022-11-04 21:40:36] sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[2022-11-04 21:40:36] sudo chown $(id -u):$(id -g) $HOME/.kube/config
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] Alternatively, if you are the root user, you can run:
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] export KUBECONFIG=/etc/kubernetes/admin.conf
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] You should now deploy a pod network to the cluster.
[2022-11-04 21:40:36] Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
[2022-11-04 21:40:36] https://kubernetes.io/docs/concepts/cluster-administration/addons/
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] You can now join any number of control-plane nodes by copying certificate authorities
[2022-11-04 21:40:36] and service account keys on each node and then running the following as root:
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] kubeadm join capz-acr-cluster-workload-2-pdns.australiaeast.cloudapp.azure.com:6443 --token evxv6y.vdzajrctk6p0jn8h
[2022-11-04 21:40:36] --discovery-token-ca-cert-hash sha256:1f408c6bd2e95036ceeca613eaf3340452abd3c376de7b9852d6729148e9ba13
[2022-11-04 21:40:36] --control-plane
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] Then you can join any number of worker nodes by running the following on each as root:
[2022-11-04 21:40:36]
[2022-11-04 21:40:36] kubeadm join capz-acr-cluster-workload-2-pdns.australiaeast.cloudapp.azure.com:6443 --token evxv6y.vdzajrctk6p0jn8h
[2022-11-04 21:40:36] --discovery-token-ca-cert-hash sha256:1f408c6bd2e95036ceeca613eaf3340452abd3c376de7b9852d6729148e9ba13
[2022-11-04 21:40:36] Cloud-init v. 22.2-0ubuntu1~20.04.3 running 'modules:final' at Fri, 04 Nov 2022 21:39:51 +0000. Up 29.01 seconds.
[2022-11-04 21:40:36] Cloud-init v. 22.2-0ubuntu1~20.04.3 finished at Fri, 04 Nov 2022 21:40:36 +0000. Datasource DataSourceAzure [seed=/dev/sr0]. Up 74.55 seconds

YAML Spec:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cni: calico
  name: capz-acr-cluster-workload-2
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: capz-acr-cluster-workload-2-control-plane
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureCluster
    name: capz-acr-cluster-workload-2

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureCluster
metadata:
  name: capz-acr-cluster-workload-2
  namespace: default
spec:
  identityRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: AzureClusterIdentity
    name: dogfood5-acr-custom-script-identity
    namespace: default
  location: australiaeast
  networkSpec:
    apiServerLB:
      type: Public
      frontendIPs:
        - name: capz-acr-cluster-workload-2-public-lb-frontEnd
          publicIP:
            name: pip-capz-acr-cluster-workload-2-apiserver
            dnsName: capz-acr-cluster-workload-2-pdns.australiaeast.cloudapp.azure.com
    subnets:
    - name: control-plane-subnet
      role: control-plane
    - name: node-subnet
      natGateway:
        name: node-natgateway
      role: node
    vnet:
      name: capz-acr-cluster-workload-2-vnet
  resourceGroup: capz-acr-cluster-workload-2
  subscriptionID: a8a17819

apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: capz-acr-cluster-workload-2-control-plane
  namespace: default
spec:
  kubeadmConfigSpec:
    clusterConfiguration:
      apiServer:
        extraArgs:
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        extraVolumes:
        - hostPath: /etc/kubernetes/azure.json
          mountPath: /etc/kubernetes/azure.json
          name: cloud-config
          readOnly: true
        timeoutForControlPlane: 20m
      controllerManager:
        extraArgs:
          allocate-node-cidrs: "false"
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
          cluster-name: capz-acr-cluster-workload-2
        extraVolumes:
        - hostPath: /etc/kubernetes/azure.json
          mountPath: /etc/kubernetes/azure.json
          name: cloud-config
          readOnly: true
      etcd:
        local:
          dataDir: /var/lib/etcddisk/etcd
          extraArgs:
            quota-backend-bytes: "8589934592"
    diskSetup:
      filesystems:
      - device: /dev/disk/azure/scsi1/lun0
        extraOpts:
        - -E
        - lazy_itable_init=1,lazy_journal_init=1
        filesystem: ext4
        label: etcd_disk
      - device: ephemeral0.1
        filesystem: ext4
        label: ephemeral0
        replaceFS: ntfs
      partitions:
      - device: /dev/disk/azure/scsi1/lun0
        layout: true
        overwrite: false
        tableType: gpt
    files:
    - contentFrom:
        secret:
          key: control-plane-azure.json
          name: capz-acr-cluster-workload-2-control-plane-azure-json
      owner: root:root
      path: /etc/kubernetes/azure.json
      permissions: "0644"
    initConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        name: '{{ ds.meta_data["local_hostname"] }}'
    joinConfiguration:
      nodeRegistration:
        kubeletExtraArgs:
          azure-container-registry-config: /etc/kubernetes/azure.json
          cloud-config: /etc/kubernetes/azure.json
          cloud-provider: azure
        name: '{{ ds.meta_data["local_hostname"] }}'
    mounts:
    - - LABEL=etcd_disk
      - /var/lib/etcddisk
    postKubeadmCommands: []
    preKubeadmCommands: []
  machineTemplate:
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: AzureMachineTemplate
      name: capz-acr-cluster-workload-2-control-plane
  replicas: 1
  version: v1.25.0

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: capz-acr-cluster-workload-2-control-plane
  namespace: default
spec:
  template:
    spec:
      identity: UserAssigned
      dataDisks:
      - diskSizeGB: 256
        lun: 0
        nameSuffix: etcddisk
      osDisk:
        diskSizeGB: 128
        osType: Linux
      sshPublicKey: ""
      userAssignedIdentities:
      - providerID: dogfood5-acr-custom-script-identity
      vmSize: Standard_D2s_v3

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: capz-acr-cluster-workload-2-md-0
  namespace: default
spec:
  clusterName: capz-acr-cluster-workload-2
  replicas: 1
  selector:
    matchLabels: null
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: capz-acr-cluster-workload-2-md-0
      clusterName: capz-acr-cluster-workload-2
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureMachineTemplate
        name: capz-acr-cluster-workload-2-md-0
      version: v1.25.0

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachineTemplate
metadata:
  name: capz-acr-cluster-workload-2-md-0
  namespace: default
spec:
  template:
    spec:
      identity: UserAssigned
      osDisk:
        diskSizeGB: 128
        osType: Linux
      sshPublicKey: ""
      userAssignedIdentities:
      - providerID: dogfood5-acr-custom-script-identity
      vmSize: Standard_D2s_v3

apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: capz-acr-cluster-workload-2-md-0
  namespace: default
spec:
  template:
    spec:
      files:
      - contentFrom:
          secret:
            key: worker-node-azure.json
            name: capz-acr-cluster-workload-2-md-0-azure-json
        owner: root:root
        path: /etc/kubernetes/azure.json
        permissions: "0644"
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            azure-container-registry-config: /etc/kubernetes/azure.json
            cloud-config: /etc/kubernetes/azure.json
            cloud-provider: azure
          name: '{{ ds.meta_data["local_hostname"] }}'
      preKubeadmCommands: []

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureClusterIdentity
metadata:
  labels:
    clusterctl.cluster.x-k8s.io/move-hierarchy: "true"
  name: dogfood5-acr-custom-script-identity
  namespace: default
spec:
  allowedNamespaces: {}
  clientID: cfa59eda-e284-4d05-9582-c540d1379376
  resourceID: "dogfood5-acr-custom-script-identity"
  tenantID: 33e01921-4d64-4f8c-a055-5bdaffd5e33d
  type: UserAssignedMSI

@karansinghneu
Copy link
Contributor Author

karansinghneu commented Nov 9, 2022

Update: It's most likely the custom FQDN causing the issue.
I tried spinning up a cluster by just mounting the certificates as secrets without the custom fqdn and everything works just fine but as soon as I put in the custom fqdn, things start to fail. Still investigating further!

Further investigation:
Mounting CA certs as secrets and providing a custom FQDN results in 1 worker node unable to join the cluster, rest everything comes up normally. When I spin up a workload cluster with 3 control plane nodes and 3 worker nodes then 2 worker nodes come up and 1 doesn't while the MachineDeployment gets stuck in WaitingForAvailableMachines state. Similarly when I spin up a workload cluster with 1 control plane node and 1 worker node then the 1 worker node fails to come up.
NOTE: The worker VMs are created successfully, it just fails to join as a node.

@karansinghneu
Copy link
Contributor Author

@CecileRobertMichon I think I have reached a point where I am now intermittently hitting this: kubernetes-sigs/cluster-api#6029

@CecileRobertMichon
Copy link
Contributor

@karansinghneu did you ever figure this one out? Is there anything that needs to be fixed in CAPZ and/or CAPI?

@karansinghneu
Copy link
Contributor Author

As far as I recall it was a minor mistake from my end where I used an incorrect region name in the subdomain of the FQDN field. I should have closed this earlier, sorry about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants