Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Egress not working with kube-proxy IPVS strictARP mode #3837

Merged
merged 1 commit into from
May 31, 2022

Conversation

xliuxu
Copy link
Contributor

@xliuxu xliuxu commented May 27, 2022

Check the arp_ignore sysctl value of the transport interface
and start a userspace ARP responder if it has a value other
than 0.
Copy the assigned IPs on the dummy interface to the ARP/NDP
responders on initializing to fix an issue that the responders
may not work as expected after the agent restarts.

Fixes: #3804

Signed-off-by: Xu Liu xliu2@vmware.com

@xliuxu xliuxu requested review from jianjuns and tnqn May 27, 2022 08:59
@codecov-commenter
Copy link

codecov-commenter commented May 27, 2022

Codecov Report

Merging #3837 (896dc3a) into main (5efac86) will decrease coverage by 16.50%.
The diff coverage is 35.03%.

❗ Current head 896dc3a differs from pull request most recent head d97a262. Consider uploading reports for the commit d97a262 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3837       +/-   ##
===========================================
- Coverage   62.15%   45.64%   -16.51%     
===========================================
  Files         281      253       -28     
  Lines       40094    36933     -3161     
===========================================
- Hits        24919    16859     -8060     
- Misses      13206    18383     +5177     
+ Partials     1969     1691      -278     
Flag Coverage Δ
e2e-tests 45.64% <35.03%> (?)
kind-e2e-tests ?
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/config/node_config.go 88.88% <ø> (-11.12%) ⬇️
pkg/agent/nodeportlocal/k8s/annotations.go 84.44% <ø> (ø)
pkg/agent/nodeportlocal/npl_agent_init.go 57.14% <ø> (ø)
.../certificatesigningrequest/approving_controller.go 0.00% <0.00%> (ø)
pkg/controller/certificatesigningrequest/common.go 0.00% <0.00%> (ø)
...er/certificatesigningrequest/ipsec_csr_approver.go 0.00% <0.00%> (ø)
...catesigningrequest/ipsec_csr_signing_controller.go 0.00% <0.00%> (ø)
pkg/features/antrea_features.go 11.11% <ø> (ø)
pkg/agent/controller/egress/egress_controller.go 69.67% <28.57%> (-4.60%) ⬇️
pkg/agent/ipassigner/ip_assigner_linux.go 44.02% <35.38%> (-15.11%) ⬇️
... and 180 more

@xliuxu
Copy link
Contributor Author

xliuxu commented May 27, 2022

I have verified the Egress feature with this patch locally with IPVS in strictARP mode and it works as expected.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for fixing it. A nit comment.

s/Fix Egress not work with .../Fix Egress not working with/

if a.dummyDevice == nil && a.arpResponder != nil {
// Start the ARP responder if the dummy device does not exist or
// arp_ignore of the transport interface has value other than 0.
if (a.arpIgnore > 0 || a.dummyDevice == nil) && a.arpResponder != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this to intialization: create arpResponder only when it's required, and run it when it exists?
To avoid creating an instance when we don't use it and save a member variable.

@tnqn tnqn added action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. labels May 27, 2022
@tnqn tnqn added this to the Antrea v1.7 release milestone May 27, 2022
@xliuxu xliuxu force-pushed the arp_responder_ipvs branch from 55ee05d to 41cccc1 Compare May 27, 2022 10:34
@xliuxu xliuxu changed the title Fix Egress not work with kube-proxy IPVS strictARP mode Fix Egress not work withing kube-proxy IPVS strictARP mode May 27, 2022
@xliuxu xliuxu changed the title Fix Egress not work withing kube-proxy IPVS strictARP mode Fix Egress not working with kube-proxy IPVS strictARP mode May 27, 2022
tnqn
tnqn previously approved these changes May 27, 2022
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented May 27, 2022

/test-all

@xliuxu
Copy link
Contributor Author

xliuxu commented May 27, 2022

@tnqn I found a subtle issue is that after restarting the agents, the ARP responder will not work as the IP already exists on the dummy interface, no AssignIP will be called for existing IPs so the IPs will not exist on the ARP responder. I think it also applies to IPv6 NDP responder case. We may need to copy the assigned IPs to ARP/NDP responders for agent restart cases.

@tnqn
Copy link
Member

tnqn commented May 27, 2022

@tnqn I found a subtle issue is that after restarting the agents, the ARP responder will not work as the IP already exists on the dummy interface, no AssignIP will be called for existing IPs so the IPs will not exist on the ARP responder. I think it also applies to IPv6 NDP responder case. We may need to copy the assigned IPs to ARP/NDP responders for agent restart cases.

good catch!

ip := net.ParseIP(assigned)
var err error
if utilnet.IsIPv4(ip) && a.arpResponder != nil {
err = a.arpResponder.AddIP(ip)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this will unconditionally trigger a GARP, which may disrupt existing connection if this node previously has one IP assigned and the agent restarts? Should we add a ReplaceIPs method which is supposed to be called on initialization and only add desired IPs to responders?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ReplaceIPs interface to the responder. thanks!

@@ -60,11 +61,19 @@ func NewIPAssigner(nodeTransportInterface string, dummyDeviceName string) (*ipAs
assignedIPs: sets.NewString(),
}
if ipv4 != nil {
arpResonder, err := responder.NewARPResponder(externalInterface)
// Create an ARP responder if the dummy device does not exist or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - why we do not always use ARP responder, and no longer assign IPs to the dummy device?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Egress needs Egress IPs to be configured to host because Egress IPs are used as tunnel endpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Could we add a comment here, as it is not easy to figure out the implication.

@xliuxu xliuxu force-pushed the arp_responder_ipvs branch 2 times, most recently from 2ded772 to f155893 Compare May 28, 2022 02:22
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits

@@ -60,11 +61,22 @@ func NewIPAssigner(nodeTransportInterface string, dummyDeviceName string) (*ipAs
assignedIPs: sets.NewString(),
}
if ipv4 != nil {
arpResonder, err := responder.NewARPResponder(externalInterface)
// For Egress scenario, the external IPs should always be present on the dummy
// interface as they are used as tunnel endpoints. If arp_ignore it set to values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it set -> is set

values -> an value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// For Egress scenario, the external IPs should always be present on the dummy
// interface as they are used as tunnel endpoints. If arp_ignore it set to values
// other than 0, the host will not reply to ARP requests received on the transport
// interface and the target IPs are assigned on the dummy interface. So a userspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when the target IPs"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -60,11 +61,22 @@ func NewIPAssigner(nodeTransportInterface string, dummyDeviceName string) (*ipAs
assignedIPs: sets.NewString(),
}
if ipv4 != nil {
arpResonder, err := responder.NewARPResponder(externalInterface)
// For Egress scenario, the external IPs should always be present on the dummy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Egress scenario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks for the review!

@xliuxu xliuxu force-pushed the arp_responder_ipvs branch 3 times, most recently from 95885b9 to 8210a1d Compare May 30, 2022 01:11
@xliuxu xliuxu force-pushed the arp_responder_ipvs branch 3 times, most recently from 9c7b2d1 to 339d536 Compare May 30, 2022 15:28
tnqn
tnqn previously approved these changes May 31, 2022
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@tnqn
Copy link
Member

tnqn commented May 31, 2022

/test-all
/test-ipv6-e2e
/test-ipv6-only-e2e

@xliuxu
Copy link
Contributor Author

xliuxu commented May 31, 2022

/test-all
/test-ipv6-e2e
/test-ipv6-only-e2e

Check the arp_ignore sysctl value of the transport interface
and start a userspace ARP responder if it has a value other
than 0.
Copy the assigned IPs on the dummy interface to the ARP/NDP
responder on initializing to fix an issue that the responders
may not work as expected after the agent restarts.

Fixes: antrea-io#3804

Signed-off-by: Xu Liu <xliu2@vmware.com>
@xliuxu xliuxu force-pushed the arp_responder_ipvs branch from 4f31d03 to d97a262 Compare May 31, 2022 08:24
@xliuxu
Copy link
Contributor Author

xliuxu commented May 31, 2022

/test-all
/test-ipv6-e2e
/test-ipv6-only-e2e

@tnqn
Copy link
Member

tnqn commented May 31, 2022

@xliuxu any change is made to latest version I should review? I don't find one.

@xliuxu
Copy link
Contributor Author

xliuxu commented May 31, 2022

@tnqn The latest change is for the integration tests of the IP assigner.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented May 31, 2022

/test-conformance

@jianjuns jianjuns merged commit cc4fe0b into antrea-io:main May 31, 2022
@xliuxu xliuxu deleted the arp_responder_ipvs branch June 1, 2022 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Egress work with kube-proxy IPVS strictARP mode
4 participants