-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance error handling of policy reconciler #1667
Conversation
/test-all |
Codecov Report
@@ Coverage Diff @@
## master #1667 +/- ##
==========================================
+ Coverage 63.31% 64.42% +1.10%
==========================================
Files 170 181 +11
Lines 14250 15494 +1244
==========================================
+ Hits 9023 9982 +959
- Misses 4292 4484 +192
- Partials 935 1028 +93
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I need a clarification regarding the necessity of forgetRuleImmediately
. I understand why it could be an "optimization" to release the rule ID immediately when there is an error, but I don't understand why it is necessary for code correctness.
LGTM. I suspect @antoninbas and @Dyanngg may be better reviewer than I here. |
@antoninbas @suwang48404 I just realized the BatchInstallPolicyRuleFlows method is not atomical because it doesn't roll back some conjunction cache in agent upon failure and confirmed with @Dyanngg. So using it would introduce another bug if there is transient error in OVS. I don't think of how to make the method atomical with simple change. I have to use another approach to track each openflow rule's realization status... Let me know if you have good idea for this problem. |
45695a6
to
fa9bb85
Compare
/test-all |
34743cf
to
a12ab7b
Compare
/test-all |
I would think that addressing this TODO https://github.com/vmware-tanzu/antrea/blob/8b7430cc193bd657ca00277a417406ca9558df9b/pkg/agent/openflow/network_policy.go#L848 would be the right thing to do, but I am not super familiar with that code TBH. It seems that you went with a different approach. Does it cover all possible error cases when adding flows, are just the one Su ran into? |
I would say what Su discovered is the only possible error case I can think of as of now that the |
@antoninbas I agree that making the This patch fixes the case Su reported, it should be the only case could happen if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas addressed your comments, thanks for the review.
/test-all |
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
/test-all |
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
The reconciler might fail to install flows due to transient OVS error. It should add the IPBlocks to the PolicyRule when it retries, in which case "update" method is called. This patch fixes it and adds an unit test for it.
Fixes #1635