-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VM Agent] Bugfix: antrea-agent failed to delete ExternalNode #5191
Conversation
/test-vm-e2e |
// name is recovered. So the ips and routes in "adapterConfig" are actually read from the uplink and no need to | ||
// move the configurations back. The issue was seen on VM with RHEL 8.4 on azure cloud. | ||
if !hostInterfaceExists(uplinkIfName) { | ||
klog.InfoS("Uplink is not existing on the host, return") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add the uplinkIfName in the log message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
/test-vm-e2e |
// try after the error is returned, at this time the host internal interface is already deleted, and the uplink's | ||
// name is recovered. So the ips and routes in "adapterConfig" are actually read from the uplink and no need to | ||
// move the configurations back. The issue was seen on VM with RHEL 8.4 on azure cloud. | ||
if !hostInterfaceExists(uplinkIfName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this run in the beginning of the function since it means running the other code is dummy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
// name is recovered. So the ips and routes in "adapterConfig" are actually read from the uplink and no need to | ||
// move the configurations back. The issue was seen on VM with RHEL 8.4 on azure cloud. | ||
if !hostInterfaceExists(uplinkIfName) { | ||
klog.InfoS("Uplink is not existing on the host, return", "uplinkIfName", uplinkIfName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.InfoS("Uplink is not existing on the host, return", "uplinkIfName", uplinkIfName) | |
klog.InfoS("The interface with uplink name did not exist on the host, skipping its recovery", "uplinkIfName", uplinkIfName) |
to avoid ambiguity (It's not that uplink doesn't exist) and grammar issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
The issue is seen on a RHEL 8.4 VM on azure cloud, which is configured with dhclient to manage the network interface. The root cause is antrea-agent fails to recover the IP/Routes from host internal interface to uplink after ExternalNode is deleted, because the added IP/Routes is deleted or conflicted with dhclient configuration. Then in the continuous retry with ExternalNode deletion, antrea-agent is blocking at the precheck on the existentance of host internal interface always returns true as the uplink's name is already recovered. Signed-off-by: wenyingd <wenyingd@vmware.com>
/test-vm-e2e |
1 similar comment
/test-vm-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-vm-e2e |
// This is for issue #5111 (https://github.com/antrea-io/antrea/issues/5111), which may happen if an error occurs | ||
// when moving the configuration back from host internal interface to uplink. This logic is run in the second | ||
// try after the error is returned, at this time the host internal interface is already deleted, and the uplink's | ||
// name is recovered. So the ips and routes in "adapterConfig" are actually read from the uplink and no need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may miss some thing here. If the error occurs, the uplink's name is recovered. But ip and routes can still be wrong, right? Our current logic can not handle this case and make it right later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your mentioned case, we can not fix it in this change, and it may also break agent without this change.
The motivation for this change is issue #5111 , in which ip address/routes configuration are failed because dhclient is modifying the interface in the meanwhile. Then antrea-agent may fail in the first try, and it still blocks at the check on the existence host internal interface (antrea thought the interface was supposed to not exist because it was removed from OVS and the uplink was not renamed yet), but the fact is the uplink is already renamed. As for the ip address/routes are actually configured back by dhclient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the background details. It is clear to me now.
/skip-conformance |
The issue is seen on a RHEL 8.4 VM on azure cloud, which is configured with dhclient to manage the network interface. The root cause is antrea-agent fails to recover the IP/Routes from host internal interface to uplink after ExternalNode is deleted, because the added IP/Routes is deleted or conflicted with dhclient configuration. Then in the continuous retry with ExternalNode deletion, antrea-agent is blocking at the precheck on the existentance of host internal interface always returns true as the uplink's name is already recovered.
Fix: #5111