Skip to content

Commit

Permalink
Make antrea-controller not tolerate Node unreachable
Browse files Browse the repository at this point in the history
When a Node becomes unreachable, currently it takes 5m45s+ for
Kubernetes to move antrea-controller Pod to another Node. The time spent
in the process includes:

* 40s (default value of NodeMonitorGracePeriod) to mark a Node's Ready
  condition to Unknown
* 5s to taint the Node with `node.kubernetes.io/unreachable:NoExecute`
* 5m (default value of defaultUnreachableTolerationSeconds) to tolerate
  the taint

The 1st duration is kind of inevitable. The 2nd duration seems a bug in
kube-controller-manager, which I have opened an issue
kubernetes/kubernetes#120815 and may be fixed in a future release. The
3rd duration is because Kubernetes automatically adds a default
toleration for `node.kubernetes.io/unreachable:NoExecute` with
tolerationSeconds of 300s if the Pod doesn't have one.

This commit adds a toleration for not tolerate Node unreachable
explicitly, which reduces the failover time by 5m.

Signed-off-by: Quan Tian <qtian@vmware.com>
  • Loading branch information
tnqn committed Sep 22, 2023
1 parent 813372b commit 0e4e4c4
Show file tree
Hide file tree
Showing 6 changed files with 27 additions and 0 deletions.
7 changes: 7 additions & 0 deletions build/charts/antrea/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,13 @@ controller:
# Control-plane taint for Kubernetes >= 1.24.
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
# Evict it immediately once Node is detected unreachable.
# Must be set explicitly, otherwise DefaultTolerationSeconds plugin will
# add a default toleration with tolerationSeconds of 300s.
- key: node.kubernetes.io/unreachable
effect: NoExecute
operator: Exists
tolerationSeconds: 0
# -- Node selector for the antrea-controller Pod.
nodeSelector:
kubernetes.io/os: linux
Expand Down
4 changes: 4 additions & 0 deletions build/yamls/antrea-aks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7078,6 +7078,10 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0
serviceAccountName: antrea-controller
containers:
- name: antrea-controller
Expand Down
4 changes: 4 additions & 0 deletions build/yamls/antrea-eks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7079,6 +7079,10 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0
serviceAccountName: antrea-controller
containers:
- name: antrea-controller
Expand Down
4 changes: 4 additions & 0 deletions build/yamls/antrea-gke.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7076,6 +7076,10 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0
serviceAccountName: antrea-controller
containers:
- name: antrea-controller
Expand Down
4 changes: 4 additions & 0 deletions build/yamls/antrea-ipsec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7135,6 +7135,10 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0
serviceAccountName: antrea-controller
containers:
- name: antrea-controller
Expand Down
4 changes: 4 additions & 0 deletions build/yamls/antrea.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7076,6 +7076,10 @@ spec:
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 0
serviceAccountName: antrea-controller
containers:
- name: antrea-controller
Expand Down

0 comments on commit 0e4e4c4

Please sign in to comment.