Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reject loop issue and add error handling #3569

Merged
merged 2 commits into from
Apr 7, 2022

Conversation

GraysonWu
Copy link
Contributor

Fixes #3559
Skip the reject response generation when neither src nor dst are on
current Node.

Re-write the MAC address for RejectPodLocal reject type no matter
AntreaIPAM is on or not. And send the packetOut directly to the
dstPod instead of L3Forwarding table.

Signed-off-by: wgrayson wgrayson@vmware.com

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2022

Codecov Report

Merging #3569 (a5de58d) into main (ac50657) will decrease coverage by 18.63%.
The diff coverage is 34.83%.

❗ Current head a5de58d differs from pull request most recent head 29be1e2. Consider uploading reports for the commit 29be1e2 to get more accurate results

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3569       +/-   ##
===========================================
- Coverage   64.15%   45.52%   -18.64%     
===========================================
  Files         278      362       +84     
  Lines       27833    50754    +22921     
===========================================
+ Hits        17857    23105     +5248     
- Misses       8061    25439    +17378     
- Partials     1915     2210      +295     
Flag Coverage Δ
e2e-tests 48.68% <34.83%> (?)
integration-tests 38.16% <ø> (?)
kind-e2e-tests ?
unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/agent_linux.go 5.55% <ø> (+1.23%) ⬆️
pkg/agent/config/node_config.go 88.88% <ø> (-11.12%) ⬇️
pkg/agent/openflow/multicast.go 0.00% <0.00%> (ø)
pkg/controller/ipam/validate.go 0.00% <0.00%> (-82.26%) ⬇️
pkg/agent/openflow/pipeline.go 68.88% <7.14%> (-5.29%) ⬇️
pkg/features/antrea_features.go 11.11% <10.00%> (-5.56%) ⬇️
pkg/agent/proxy/proxier.go 49.07% <34.78%> (-13.65%) ⬇️
pkg/agent/openflow/service.go 69.76% <42.30%> (-13.91%) ⬇️
pkg/agent/openflow/egress.go 69.44% <50.00%> (-14.77%) ⬇️
pkg/agent/controller/networkpolicy/reject.go 76.47% <55.00%> (-11.44%) ⬇️
... and 395 more

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there was a regression, we should add a test so that it doesn't happen again

@GraysonWu
Copy link
Contributor Author

Do you have any good ideas about how to detect the loop in the test?

@antoninbas
Copy link
Contributor

Do you have any good ideas about how to detect the loop in the test?

Isn't the reject loop causing the Reject functionality to break in some way?

If not, you could think about unit testing this function to make sure it generates the correct Reject packets for a few interesting input PacketIns. Based on the contents of that patch, it seems the loop is caused by a bug in the function and is not related to the OVS flows.

@GraysonWu
Copy link
Contributor Author

Do you have any good ideas about how to detect the loop in the test?

Isn't the reject loop causing the Reject functionality to break in some way?

If not, you could think about unit testing this function to make sure it generates the correct Reject packets for a few interesting input PacketIns. Based on the contents of that patch, it seems the loop is caused by a bug in the function and is not related to the OVS flows.

My bad. I'm stupid. No matter if the loop will crash the agent or not. The test client won't receive the reject response in a loop situation. Then the test will fail.

@GraysonWu
Copy link
Contributor Author

Added tests.

Fixes antrea-io#3559
Skip the reject response generation when neither src nor dst are on
current Node.

Re-write the MAC address for `RejectPodLocal` reject type no matter
AntreaIPAM is on or not. And send the packetOut directly to the
dstPod instead of L3Forwarding table.

Signed-off-by: wgrayson <wgrayson@vmware.com>
tnqn
tnqn previously approved these changes Apr 1, 2022
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one question

test/e2e/antreapolicy_test.go Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
test/e2e/antreapolicy_test.go Outdated Show resolved Hide resolved
Signed-off-by: wgrayson <wgrayson@vmware.com>
@GraysonWu
Copy link
Contributor Author

/test-all

@GraysonWu
Copy link
Contributor Author

/test-integration

@GraysonWu GraysonWu requested review from antoninbas and tnqn April 4, 2022 21:21
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/agent/controller/networkpolicy/reject.go Outdated Show resolved Hide resolved
@antoninbas
Copy link
Contributor

/test-e2e

@antoninbas
Copy link
Contributor

@GraysonWu looks like jenkins-e2e failed twice, maybe you need to take a look

@GraysonWu
Copy link
Contributor Author

GraysonWu commented Apr 6, 2022

@GraysonWu looks like jenkins-e2e failed twice, maybe you need to take a look

Those tests failed in apply antrea phase. I think they shouldn't be introduced by this PR. And I'm unable to reproduce it in my env. Opened PR #3590 to just run those failed tests in jenkins. Will update here after test complete.

@GraysonWu
Copy link
Contributor Author

Tests passed when running alone. One more retry here.
/test-e2e

@GraysonWu
Copy link
Contributor Author

/test-e2e

@tnqn tnqn added the kind/bug Categorizes issue or PR as related to a bug. label Apr 7, 2022
@tnqn tnqn added action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. labels Apr 7, 2022
@tnqn tnqn merged commit 5e23ce9 into antrea-io:main Apr 7, 2022
@antoninbas
Copy link
Contributor

@GraysonWu could you backport this to 1.6?

@GraysonWu
Copy link
Contributor Author

Backported to 1.6 and 1.5

antoninbas pushed a commit that referenced this pull request Apr 13, 2022
…andling (#3601)

* Fix reject loop issue and add error handling

Fixes #3559
Skip the reject response generation when neither src nor dst are on
current Node.

Re-write the MAC address for `RejectPodLocal` reject type no matter
AntreaIPAM is on or not. And send the packetOut directly to the
dstPod instead of L3Forwarding table.

Signed-off-by: wgrayson <wgrayson@vmware.com>

* Address comments

Signed-off-by: wgrayson <wgrayson@vmware.com>

* Change test function parameters matching release-1.5 format

Signed-off-by: wgrayson <wgrayson@vmware.com>
tnqn pushed a commit that referenced this pull request Apr 14, 2022
…andling (#3600)

Fixes #3559

Skip the reject response generation when neither src nor dst are on
current Node.

Re-write the MAC address for `RejectPodLocal` reject type no matter
AntreaIPAM is on or not. And send the packetOut directly to the
dstPod instead of L3Forwarding table.

Signed-off-by: wgrayson <wgrayson@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/backport Indicates a PR that requires backports. action/release-note Indicates a PR that should be included in release notes. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

antrea 1.6 agent crashing [signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1bb2f45] │
5 participants