Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP_COOLDOWN_PERIOD environment variable for ip cooldown period configuration #2492

Merged
merged 1 commit into from
Aug 16, 2023

Conversation

jchen6585
Copy link
Contributor

@jchen6585 jchen6585 commented Aug 9, 2023

What type of PR is this?
Continuing PR #2397

Which issue does this PR fix:
Issue #2378

What does this PR do / Why do we need it:
Continuing PR #2397: Added new environment variable IP_COOLDOWN_PERIOD. cmd/aws-vpc-cni/main.go defines two new variables in the const section, envIPCooldownPeriod with IP_COOLDOWN_PERIOD and defaultIPCooldownPeriod as 30s time.Duration. pkg/ipamd/datastore/data_store.go creates a new function getCooldownPeriod(), returns a time.Duration being either the default cooldown period of 30s or the custom one if it is not smaller than 0. getCooldownPeriod() is called when creating a new DataStore, storing the cooldown period value in a variable ipCooldownPeriod which is used in inCoolingPeriod(). pkg/ipamd/datastore/data_store_test.go creates a new test case TestGetIPStatsV4WithCustomIPCooldown(). util/utils.go adds a new function GetIntAsStringEnvVar() to parse the environment string into an integer.

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:

Ran against cni and ipamd integration tests with IP_COOLDOWN_PERIOD values set as: unset, 0, 10, 20
Screenshot 2023-08-09 at 2 25 25 PM
Screenshot 2023-08-09 at 2 26 11 PM

Automation added to e2e:

No

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
Have not updated a running cluster

Does this change require updates to the CNI daemonset config files to work?:

No

Does this PR introduce any user-facing change?:

Yes

#### `IP_COOLDOWN_PERIOD` (v1.13.4+)

Type: Integer as a String

Default: `30`

Specifies the number of seconds an IP address is in cooldown after pod deletion. The cooldown period gives kube-proxy time to update node iptables rules when the IP was registered as a valid endpoint, such as for a service. Modify this value with caution, as kube-proxy update time scales with the number of nodes and services.

**Note:** 0 is a supported value, however it is highly discouraged.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jchen6585 jchen6585 requested a review from a team as a code owner August 9, 2023 21:30
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
cmd/aws-vpc-cni/main.go Show resolved Hide resolved
utils/utils.go Outdated Show resolved Hide resolved
cmd/aws-vpc-cni/main.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Show resolved Hide resolved
pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jdn5126 jdn5126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, this is almost there

pkg/ipamd/datastore/data_store.go Outdated Show resolved Hide resolved
pkg/ipamd/datastore/data_store_test.go Show resolved Hide resolved
pkg/ipamd/datastore/data_store_test.go Show resolved Hide resolved
pkg/ipamd/datastore/data_store_test.go Outdated Show resolved Hide resolved
@jchen6585 jchen6585 force-pushed the master branch 2 times, most recently from 3527033 to f6af3b9 Compare August 11, 2023 00:44
Copy link
Contributor

@jdn5126 jdn5126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, these are the last nits, then this is good to go! Let's also update the title and clean up the description a bit for future readers.

utils/utils.go Outdated Show resolved Hide resolved
cmd/aws-vpc-cni/main.go Outdated Show resolved Hide resolved
@jchen6585 jchen6585 force-pushed the master branch 3 times, most recently from 521022a to 593cd07 Compare August 14, 2023 18:10
@jchen6585 jchen6585 changed the title Continuing PR #2397 IP_COOLDOWN_PERIOD environment variable for ip cooldown period configuration Aug 14, 2023
@jayanthvn
Copy link
Contributor

Can you add tests here - https://github.com/aws/amazon-vpc-cni-k8s/tree/master/test/integration/ipamd before merging the PR..Also callout increasing the cooldown timer will cause higher EC2 calls since IPs will be in cache for more time.

@jdn5126
Copy link
Contributor

jdn5126 commented Aug 14, 2023

Can you add tests here - https://github.com/aws/amazon-vpc-cni-k8s/tree/master/test/integration/ipamd before merging the PR..Also callout increasing the cooldown timer will cause higher EC2 calls since IPs will be in cache for more time.

@jayanthvn I am not sure that an integration test case adds much value here, as the number itself is really arbitrary. We have only been using 30 seconds as a default because that's what upstream Kubernetes recommended early on. If we run the service connectivity test with 10 second cooldown and it passes consistently, that isn't going to change our default, so this overhead doesn't seem worth it.

@jchen6585 the callout for the README is a good point. More IPs in cooldown means more IPs need to be allocated, which means more EC2 API calls

jdn5126
jdn5126 previously approved these changes Aug 14, 2023
@jayanthvn
Copy link
Contributor

jayanthvn commented Aug 14, 2023

But IMO, I feel we need to ensure DS functionality is fine i.e, no of IPs/total cooldown or at least the IP/ENIs allocated are expected with the setting..also if some change goes in breaking this functionality we should be able to capture it which UTs will not do..

@jchen6585 - Do add our recommendation and reason of 30s in EKS best practices..

@jchen6585
Copy link
Contributor Author

But IMO, I feel we need to ensure DS functionality is fine i.e, no of IPs/total cooldown or at least the IP/ENIs allocated are expected with the setting..also if some change goes in breaking this functionality we should be able to capture it which UTs will not do..

@jchen6585 - Do add our recommendation and reason of 30s in EKS best practices..

@jayanthvn is this what you are referring to? https://aws.github.io/aws-eks-best-practices/networking/index/
if so, how do i get access to edit it?

@jayanthvn
Copy link
Contributor

But IMO, I feel we need to ensure DS functionality is fine i.e, no of IPs/total cooldown or at least the IP/ENIs allocated are expected with the setting..also if some change goes in breaking this functionality we should be able to capture it which UTs will not do..
@jchen6585 - Do add our recommendation and reason of 30s in EKS best practices..

@jayanthvn is this what you are referring to? https://aws.github.io/aws-eks-best-practices/networking/index/ if so, how do i get access to edit it?

You can open a request here - https://github.com/aws/aws-eks-best-practices

@jdn5126 jdn5126 merged commit e6173fe into aws:master Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants