-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optionally allow CLUSTER_ENDPOINT to be used rather than the cluster-ip #2138
Conversation
@ bwagner5 We were exploring on removing the API Server connectivity check dependency from CNI altogether. We just do it once during boot up and it is not the CNIs job to setup and/or validate the worker nodes to CPI connectivity anyways...So, we can just remove that check altogether. |
That would be nice, although it doesn't solve the latency problem since something will try to connect to the apiserver within the VPC CNI and then likely timeout if kube-proxy isn't done yet. AFAIK nothing else guarantees that kube-proxy is done setting up routes though right? So pods could schedule and not have connectivity which could result in weird crashing behavior or higher latency pod start times like the VPC CNI runs into. |
VPC CNI only checks for API Server connectivity during the initial start up phase and so once we remove it we shouldn't run in to any additional init latency due to this race condition. CNI itself runs a controller when it is operating in custom networking mode but at that point it is like any other controller running on worker nodes. As far as other application pods running in to issues (because Also, the pods will always rely on the service endpoints to reach to API Server and if we're bypassing that in CNI by providing cluster endpoint to improve the start up time of VPC CNI, then we're hiding the kube proxy problem from VPC CNI but other application pods will now run in to the same issue while trying to connect to API Server as you outlined above. |
Even when removing the API Server connectivity check, I believe the K8s cached client would still hang on synchronization blocking the rest of the initialization flow: amazon-vpc-cni-k8s/pkg/k8sapi/k8sutils.go Line 77 in f8bc3b8
I agree that it's no worse than what happens today with the connectivity check, but it still adds unneeded latency when initializing the node. |
Yeah, I can see the problem there. But, I do remember someone from NW team tested without the check for API Server connectivity to make sure the init goes fine when But, the |
README.md
Outdated
@@ -410,6 +410,17 @@ Specifies the cluster name to tag allocated ENIs with. See the "Cluster Name tag | |||
|
|||
--- | |||
|
|||
#### `CLUSTER_ENDPOINT` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Lets mention the release version, CLUSTER_ENDPOINT (v1.12.1+)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a minor nit, lgtm.
* create publisher with logger (#2119) * Add missing rules when NodePort support is disabled (#2026) * Add missing rules when NodePort support is disabled * the rules that need to be installed for NodePort support and SNAT support are very similar. The same traffic mark is needed for both. As a result, rules that are currently installed only when NodePort support is enabled should also be installed when external SNAT is disabled, which is the case by default. * remove "-m state --state NEW" from a rule in the nat table. This is always true for packets that traverse the nat table. * fix typo in one rule's name (extra whitespace). Fixes #2025 Co-authored-by: Quan Tian <qtian@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> * Fix typos and unit tests Signed-off-by: Antonin Bas <abas@vmware.com> * Minor improvement to code comment Signed-off-by: Antonin Bas <abas@vmware.com> * Address review comments * Delete legacy nat rule * Fix an unrelated log message Signed-off-by: Antonin Bas <abas@vmware.com> Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> * downgrade test go.mod to align with root go.mod (#2128) * skip addon installation when addon info is not available (#2131) * Merging test/Makefile and test/go.mod to the root Makefil and go.mod, adjust the .github/workflows and integration test instructions (#2129) * update troubleshooting docs for CNI image (#2132) fix location where make command is run * fix env name in test script (#2136) * optionally allow CLUSTER_ENDPOINT to be used rather than the cluster-ip (#2138) * optionally allow CLUSTER_ENDPOINT to be used rather than the kubernetes cluster ip * remove check for kube-proxy * add version to readme * Add resources config option to cni metrics helper (#2141) * Add resources config option to cni metrics helper * Remove default-empty resources block; replace with conditional * Add metrics for ec2 api calls made by CNI and expose via prometheus (#2142) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * increase workflow role duration to 4 hours (#2148) * Update golang 1.19.2 EKS-D (#2147) * Update golang * Move to EKS distro builds * [HELM]: Move CRD resources to a separate folder as per helm standard (#2144) Co-authored-by: Jay Deokar <jsdeokar@amazon.com> * VPC-CNI minimal image builds (#2146) * VPC-CNI minimal image builds * update dependencies for ginkgo when running integration tests * address review comments and break up init main function * review comments for sysctl * Simplify binary installation, fix review comments Since init container is required to always run, let binary installation for external plugins happen in init container. This simplifies the main container entrypoint and the dockerfile for each image. * when IPAMD connection fails, try to teardown pod network using prevResult (#2145) * add env var to enable nftables (#2155) * fix failing weekly cron tests (#2154) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER and remove no-op setter (#2153) * Deprecate AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER * update release version comments Signed-off-by: Antonin Bas <abas@vmware.com> Co-authored-by: Jeffrey Nelson <jdnelson@amazon.com> Co-authored-by: Antonin Bas <antonin.bas@gmail.com> Co-authored-by: Jayanth Varavani <1111446+jayanthvn@users.noreply.github.com> Co-authored-by: Sushmitha Ravikumar <58063229+sushrk@users.noreply.github.com> Co-authored-by: Jerry He <37866862+jerryhe1999@users.noreply.github.com> Co-authored-by: Brandon Wagner <wagnerbm@amazon.com> Co-authored-by: Jonathan Ogilvie <679297+jcogilvie@users.noreply.github.com> Co-authored-by: Jay Deokar <jsdeokar@amazon.com>
What type of PR is this?
Which issue does this PR fix:
awslabs/amazon-eks-ami#1099
What does this PR do / Why do we need it:
When a node is starting, kube-proxy and the VPC CNI DaemonSets get scheduled and started on the node at roughly the same time. This causes a race in the VPC CNI since it currently depends on kube-proxy to finish setting up cluster-ip routes on the node. If the VPC CNI hits the k8s server check first, then there is a 5 sec delay since that is the current timeout value set. This delays the node from getting to a Ready state.
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
Testing done on this change:
Installed with and without setting
env.CLUSTER_ENDPOINT
:This log shows a race where kube-proxy wasn't finished when aws-node reached out to the api-server through the cluster ip, resulting in a delay in initialization:
This PR adds an option CLUSTER_ENDPOINT environment variable that can be passed which allows aws-node to initialize without waiting on kube-proxy. Kube-proxy is still required before aws-node finishes initializing the CNI plugin.
This log shows the components starting at the same time where kube-proxy is not completed, but aws-node can continue bootstrapping with the CLUSTER_ENDPOINT:
The first api-server connectivity check uses the CLUSTER_ENDPOINT passed in and the second one uses the cluster ip to make sure that kube-proxy is finished. The second check blocks the CNI from initializing.
Automation added to e2e:
Will this PR introduce any new dependencies?:
NO
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Have not tested upgrading a cluster, but this change only affects the endpoint being used.
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.