-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle delays tied to V6 interfaces #1631
Conversation
@@ -43,6 +44,8 @@ const ( | |||
fromContainerRulePriority = 1536 | |||
// Main routing table number | |||
mainRouteTable = unix.RT_TABLE_MAIN | |||
|
|||
WAIT_INTERVAL = 50 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find ! Not sure if we plan to support anything beyond AL2 for initial cut. But it might be worth checking if this delay holds good in other distributions as well.
Side note : Going forward we will have multiple eth attachments on single pod with 5G to separate out different flows. Having this delay as configurable option would help until we characterize the actual number for different use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the total wait time is actually 10s
. WAIT_INTERVAL
is essentially how long we wait before checking the status again. I'm assuming 10s
might be long enough and the function in ip
package that most of the CNI plugins rely on is capping it @10s
as well. I see that it usually takes between 1-2s in my testing but if we do run in to a specific requirement/use-case, I guess we can definitely consider making the upper bound configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I don't think it needs to be configurable - we should be able to just 'wait long enough' for every use case, and there's no benefit from timing out and aborting aggressively.
Re other distros: It would be odd to pick something wildly different from the current Linux kernel default values, and I expect the delay will always be around the few-seconds mark. One alternative here is that we either use 'optimistic DAD' which allows userspace to use the address for some purposes while it is still tentative. A better alternative is to just disable DAD altogether on veth interfaces, because we control both ends anyway so there are no surprises here. Meh, at best it gains 1-2s, and we can come back to this later. Even if we disable DAD on veth, we're still going to want this function at some point for "real" network interfaces (eg: trunk, EFA, ENI+ipvlan).
We could also remove the above timer by using netlink events rather than polling (see AddrSubscribe). Again, meh, we can come back to this if this 50ms poll ever becomes an issue.
@@ -43,6 +44,8 @@ const ( | |||
fromContainerRulePriority = 1536 | |||
// Main routing table number | |||
mainRouteTable = unix.RT_TABLE_MAIN | |||
|
|||
WAIT_INTERVAL = 50 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I don't think it needs to be configurable - we should be able to just 'wait long enough' for every use case, and there's no benefit from timing out and aborting aggressively.
Re other distros: It would be odd to pick something wildly different from the current Linux kernel default values, and I expect the delay will always be around the few-seconds mark. One alternative here is that we either use 'optimistic DAD' which allows userspace to use the address for some purposes while it is still tentative. A better alternative is to just disable DAD altogether on veth interfaces, because we control both ends anyway so there are no surprises here. Meh, at best it gains 1-2s, and we can come back to this later. Even if we disable DAD on veth, we're still going to want this function at some point for "real" network interfaces (eg: trunk, EFA, ENI+ipvlan).
We could also remove the above timer by using netlink events rather than polling (see AddrSubscribe). Again, meh, we can come back to this if this 50ms poll ever becomes an issue.
(nice, code style comments only) |
Co-authored-by: Angus Lees <gus@inodes.org>
What type of PR is this?
bug
What does this PR do / Why do we need it:
V6 addresses assigned to an interface might take a while before they transition from
tentative
state tostable
state as all addresses need to go through Duplicate Address Detection (DAD). PR introduces a check to make sure the address is instable
state before CNI returns.Testing done on this change:
Verified that there is no packet loss observed right after pod boot-up due to the issue documented above.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.