The Gardener Controller Manager (often refered to as "GCM") is a component that runs next to the Gardener API server, similar to the Kubernetes Controller Manager. It runs several control loops that do not require talking to any seed or shoot cluster. Also, as of today it exposes a HTTPS server that is serving several endpoints for webhooks for certain resources.
This document explains the various functionalities of the Gardener Controller Manager and their purpose.
This controller consists out of two reconciliation loops:
The main loop is reconciling Project
resources while the second loop is controlling the necessary actions for stale projects.
This reconciler will create a dedicated Namespace
prefixed with garden-
for each Project
resource.
The name of the namespace can either be stated in the .spec.namespace
, or it will be auto-generated by the reconciler.
If .spec.namespace
is set then it creates it if it does not exist yet.
Otherwise, it tries to adopt it.
This will only succeed if the Namespace
was previously labeled with gardener.cloud/role=project
and project.gardener.cloud/name=<project-name>
.
This is to prevent that end-users can adopt arbitrary namespaces and escalate their privileges, e.g. the kube-system
namespace.
After the namespace was created/adopted the reconciler creates several ClusterRole
s and ClusterRoleBinding
s that allow the project members to access related resources based on their roles.
These RBAC resources are prefixed with gardener.cloud:system:project{-member,-viewer}:<project-name>
.
Gardener administrators and extension developers can define their own roles, see this document for more information.
In addition, operators can configure the Project controller to maintain a default ResourceQuota for project namespaces.
Quotas can especially limit the creation of user facing resources, e.g. Shoots
, SecretBindings
, Secrets
and thus protect the Garden cluster from massive resource exhaustion but also enable operators to align quotas with respective enterprise policies.
⚠️ Gardener itself is not exempted from configured quotas. For example, Gardener createsSecrets
for every shoot cluster in the project namespace and at the same time increases the available quota count. Please mind this additional resource consumption.
The GCM configuration provides a template section controllers.project.quotas
where such a ResourceQuota (see example below) can be deposited.
controllers:
project:
quotas:
- config:
apiVersion: v1
kind: ResourceQuota
spec:
hard:
count/shoots.core.gardener.cloud: "100"
count/secretbindings.core.gardener.cloud: "10"
count/secrets: "800"
projectSelector: {}
The Project controller takes the shown config
and creates a ResourceQuota
with the name gardener
in the project namespace.
If a ResourceQuota
resource with the name gardener
already exists, the controller will only update fields in spec.hard
which are unavailable at that time.
An optional projectSelector
narrows down the amount of projects that are equipped with the given config
.
If multiple configs match for a project, then only the first match in the list is applied to the project namespace.
The .status.phase
of the Project
resources will be set to Ready
or Failed
by the reconciler to indicate whether the reconciliation loop was performed successfully.
Also, it will generate Event
s to provide further information about its operations.
As Gardener is a large-scale Kubernetes as a Service it is designed for being used by a large amount of end-users.
Over time, it is likely to happen that some of the hundreds or thousands of Project
resources are no longer actively used.
Gardener offers the "stale projects" reconciler which will take care of identifying such stale projects, marking them with a "warning", and eventually deleting them after a certain time period. This reconciler is enabled by default and works as following:
- Projects are considered as "stale"/not actively used when all of the following conditions apply: The namespace associated with the
Project
does not have any...Shoot
resources.Plant
resources.BackupEntry
resources.Secret
resources that are referenced by aSecretBinding
that is in use by aShoot
(not necessarily in the same namespace).Quota
resources that are referenced by aSecretBinding
that is in use by aShoot
(not necessarily in the same namespace).
If a project is considered "stale" then its .status.staleSinceTimestamp
will be set to the time when it was first detected to be stale.
If it gets actively used again this timestamp will be removed.
After some time the .status.staleAutoDeleteTimestamp
will be set to a timestamp after which Gardener will auto-delete the Project
resource if it still is not actively used.
The component configuration of the Gardener Controller Manager offers to configure the following options:
minimumLifetimeDays
: Don't consider newly createdProject
s as "stale" too early to give people/end-users some time to onboard and get familiar with the system. The "stale project" reconciler won't set any timestamp forProject
s younger thanminimumLifetimeDays
. When you change this value then projects marked as "stale" may be no longer marked as "stale" in case they are young enough, or vice versa.staleGracePeriodDays
: Don't compute auto-delete timestamps for staleProject
s that are unused for only less thanstaleGracePeriodDays
. This is to not unnecessarily make people/end-users nervous "just because" they haven't actively used theirProject
for a given amount of time. When you change this value then already assigned auto-delete timestamps may be removed again if the new grace period is not yet exceeded.staleExpirationTimeDays
: Expiration time after which staleProject
s are finally auto-deleted (after.status.staleSinceTimestamp
). If this value is changed and an auto-delete timestamp got already assigned to the projects then the new value will only take effect if it's increased. Hence, decreasing thestaleExpirationTimeDays
will not decrease already assigned auto-delete timestamps.
Gardener administrators/operators can exclude specific
Project
s from the stale check by annotating the relatedNamespace
resource withproject.gardener.cloud/skip-stale-check=true
.
With the Gardener Event Controller you can prolong the lifespan of events related to Shoot clusters. This is an optional controller which will become active once you provide the below mentioned configuration.
All events in K8s are deleted after a configurable time-to-live (controlled via a kube-apiserver argument called --event-ttl
(defaulting to 1 hour)).
The need to prolong the time-to-live for Shoot cluster events frequently arises when debugging customer issues on live systems.
This controller leaves events involving Shoots untouched while deleting all other events after a configured time.
In order to activate it, provide the following configuration:
concurrentSyncs
: The amount of goroutines scheduled for reconciling events.ttlNonShootEvents
: When an event reaches this time-to-live it gets deleted unless it is a Shoot-related event (defaults to1h
, equivalent to theevent-ttl
default).
⚠️ In addition, you should also configure the--event-ttl
for the kube-apiserver to define an upper-limit of how long Shoot-related events should be stored. The--event-ttl
should be larger than thettlNonShootEvents
or this controller will have no effect.
Shoot objects may specify references to further objects in the Garden cluster which are required for certain features.
For example, users can configure various DNS providers via .spec.dns.providers
and usually need to refer to a corresponding secret
with valid DNS provider credentials inside.
Such objects need a special protection against deletion requests as long as they are still being referenced by one or multiple shoots.
Therefore, the Shoot Reference Controller scans shoot clusters for referenced objects and adds the finalizer gardener.cloud/reference-protection
to their .metadata.finalizers
list.
The scanned shoot also gets this finalizer to enable a proper garbage collection in case the Gardener-Controller-Manager is offline at the moment of an incoming deletion request.
When an object is not actively referenced anymore because the shoot specification has changed or all related shoots were deleted (are in deletion), the controller will remove the added finalizer again, so that the object can safely be deleted or garbage collected.
The Shoot Reference Controller can inspect the following references:
- Enabled by default:
- DNS provider secrets (
.spec.dns.provider
)
- DNS provider secrets (
- Disabled by default:
- Audit policy configmaps (
.spec.kubernetes.kubeAPIServer.auditConfig.auditPolicy.configMapRef
)
- Audit policy configmaps (
If you want to enable the audit policy configmap protection then you can set the
.controllers.shootReference.protectAuditPolicyConfigMaps
totrue
in the component configuration.
Further checks might be added in the future.
The Seed controller in the Gardener Controller Manager reconciles Seed
objects with the help of the following reconcilers.
This reconciliation loop takes care about seed related operations in the Garden cluster. When a new Seed
object is created
the reconciler creates a new Namespace
in the garden cluster seed-<seed-name>
. Namespaces
dedicated to single
seed clusters allow us to segregate access permissions i.e., a Gardenlet must not have permissions to access objects in
all Namespaces
in the Garden cluster.
There are objects in a Garden environment which are created once by the operator e.g., default domain secret,
alerting credentials, and required for operations happening in the Gardenlet. Therefore, we not only need a seed specific
Namespace
but also a copy of these "shared" objects.
The "main" reconciler takes care about this replication:
Kind | Namespace | Label Selector |
---|---|---|
Secret | garden | gardener.cloud/role |
Every time a BackupBucket
object is created or updated, the referenced Seed
object is enqueued for reconciliation.
It's the reconciler's task to check the status
subresource of all existing BackupBuckets
that belong to this seed.
If at least one BackupBucket
has .status.lastError
, the seed condition BackupBucketsReady
will turn false
and
consequently the seed is considered as NotReady
. Once the BackupBucket
is healthy again, the seed will be re-queued
and the condition will turn true
.
The "Lifecycle" reconciler processes Seed
objects which are enqueued every 10 seconds in order to check if the responsible
Gardenlet is still responding and operable. Therefore, it checks renewals via Lease
objects of the seed in the garden cluster
which are renewed regularly by the Gardenlet.
In case a Lease
is not renewed for the configured amount in config.controllers.seed.monitorPeriod.duration
, the reconciler
assumes that the Gardenlet stopped operating and updates the GardenletReady
condition to Unknown
.
Additionally, conditions and constraints of all Shoot
resources scheduled on the affected seed are set to Unknown
as well
because a striking Gardenlet won't be able to maintain these conditions any more.