Fix race conditions in NetworkPolicyController #4028

tnqn · 2022-07-18T14:56:23Z

There were a few race conditions in NetworkPolicyController:

An AppliedToGroup or AddressGroup in use may be removed if situations
like this happens:

addANP creates a group for ANP A;
addNetworkPolicy reuses the group for KNP B, is going to create an
internal NetworkPolicy;
deleteANP deletes the group for ANP A because at that moment no other
internal NetworkPolicies are using the group;
addNetworkPolicy commits the internal NetworkPolicy for KNP B to
storage, but the group no longer exists.

An Antrea-native NetworkPolicy may be out-of-date if situations like
this happens:

An ACNP event is received, updateCNP calculates the new internal
NetworkPolicy for the ACNP, is going to commit it to storage;
A ClusterGroup event triggers update of the ACNP via
triggerCNPUpdates
triggerCNPUpdates calls reprocessCNP which updates the new internal
NetworkPolicy for the ACNP and commits it to storage;
updateCNP in the first step commits its internal NetworkPolicy to
storage which overrides the update of the ClusterGroup event.

The second one caused test flake of the test case
"TestGroupNoK8sNP/Case=ACNPNestedClusterGroup".

To resolve the race conditions completely and make NetworkPolicy
handling less error prone, this patch refactors NetworkPolicyController:

Event handlers no longer update the storage of internal NetworkPolicy
directly and only triggers resync of affected policies, which ensures
that there is at most one worker handling an internal NetworkPolicy at
any moment.
Ensure atomicity when updating internal NetworkPolicy and creating or
deleting AddressGroups and AppliedToGroups.

Duplicate code and tests are deleted with the refactoring.

Signed-off-by: Quan Tian qtian@vmware.com

Fixes #4127

codecov · 2022-07-18T15:05:22Z

Codecov Report

Merging #4028 (d483bf7) into main (cab72fc) will increase coverage by 1.87%.
The diff coverage is 92.47%.

@@            Coverage Diff             @@
##             main    #4028      +/-   ##
==========================================
+ Coverage   63.65%   65.52%   +1.87%     
==========================================
  Files         300      304       +4     
  Lines       46567    46604      +37     
==========================================
+ Hits        29640    30538     +898     
+ Misses      14596    13658     -938     
- Partials     2331     2408      +77

Flag	Coverage Δ
integration-tests	`34.98% <ø> (+0.02%)`	⬆️
kind-e2e-tests	`48.76% <50.75%> (+2.29%)`	⬆️
unit-tests	`44.38% <90.35%> (+0.42%)`	⬆️

Impacted Files	Coverage Δ
...kg/controller/networkpolicy/store/networkpolicy.go	`81.60% <ø> (+4.52%)`	⬆️
pkg/controller/networkpolicy/group.go	`41.66% <50.00%> (+6.80%)`	⬆️
pkg/controller/networkpolicy/clustergroup.go	`75.55% <66.66%> (-0.69%)`	⬇️
...g/controller/networkpolicy/clusternetworkpolicy.go	`75.11% <86.66%> (+7.02%)`	⬆️
...kg/controller/networkpolicy/antreanetworkpolicy.go	`74.54% <91.22%> (+2.70%)`	⬆️
...ntroller/networkpolicy/networkpolicy_controller.go	`82.58% <97.26%> (+3.85%)`	⬆️
pkg/controller/networkpolicy/crd_utils.go	`89.71% <100.00%> (-1.33%)`	⬇️
pkg/controller/types/networkpolicy.go	`100.00% <100.00%> (ø)`
pkg/ipam/poolallocator/allocator.go	`49.76% <0.00%> (-5.96%)`	⬇️
... and 69 more

tnqn · 2022-07-18T16:33:03Z

/test-all

tnqn · 2022-07-18T17:11:55Z

/test-all

antoninbas

I am not super familiar with this code at this stage, but I took a stab at a review

antoninbas · 2022-07-19T18:59:03Z

pkg/controller/networkpolicy/antreanetworkpolicy.go

@@ -24,19 +24,22 @@ import (
 	antreatypes "antrea.io/antrea/pkg/controller/types"
 )

+func getAntreaNetworkPolicyReference(anp *crdv1alpha1.NetworkPolicy) controlplane.NetworkPolicyReference {


I'm curious about why we are not returning a pointer here (*controlplane.NetworkPolicyReference)?
Ultimately we use it as key for internalNetworkPolicyQueue. I don't know if this causes unnecessary copies, or if on the contrary less indirection is better.

internalNetworkPolicyQueue must use value instead of pointer as the key, otherwise the same NetworkPolicies will not be treated as same item because the pointers may be different. But other places could use pointers to save some copy overhead. I have updated them.

antoninbas · 2022-07-19T19:05:16Z

pkg/controller/networkpolicy/antreanetworkpolicy.go

@@ -132,7 +88,7 @@ func (n *NetworkPolicyController) processAntreaNetworkPolicy(np *crdv1alpha1.Net
 	// Create AppliedToGroup for each AppliedTo present in AntreaNetworkPolicy spec.
 	for _, at := range np.Spec.AppliedTo {
 		appliedToGroupNamesSet.Insert(n.createAppliedToGroup(
-			np.Namespace, at.PodSelector, at.NamespaceSelector, at.ExternalEntitySelector))
+			networkPolicyRef, np.Namespace, at.PodSelector, at.NamespaceSelector, at.ExternalEntitySelector))


nit: maybe we could remove the second parameter of createAppliedToGroup if the namespace name can be easily retrieved from networkPolicyRef?

made some changes to how to track references. Now only policy UID is required for reference, so namespace is still passed.

changed to ensure atomicity when updating internal NetworkPolicy and creating or deleting AddressGroups and AppliedToGroups, no extra argument is needed.

antoninbas · 2022-07-19T19:28:41Z

pkg/controller/networkpolicy/clustergroup.go

+	_, exists, _ := c.appliedToGroupStore.Get(cg)
+	if exists {
+		c.enqueueAppliedToGroup(cg)


is it possible for the AppliedToGroup to be deleted between the call to get and the call to enqueue? I assume it is possible and that it's not an issue, but a comment to that effect would be nice.

antoninbas · 2022-07-19T19:47:02Z

pkg/controller/networkpolicy/clustergroup_test.go

+		key, _ := npc.internalNetworkPolicyQueue.Get()
+		npc.internalNetworkPolicyQueue.Done(key)


maybe for the clarity of the test, we should check that the keys match {getAntreaClusterNetworkPolicyReference(cnp1), getAntreaClusterNetworkPolicyReference(cnp2)}?

antoninbas · 2022-07-19T19:49:26Z

pkg/controller/networkpolicy/clustergroup_test.go

-	// cnp2 is added after the ClusterGroup.
-	npc.addCNP(cnp2)
-
+	// cnp1 is added before the ClusterGroup. The rule's From should be empty as the ClusterGroup hasn't been synced,


nit: I find the "added" terminology a bit confusing here, since we don't call the add handler anymore. Maybe just "synced"?

antoninbas · 2022-07-19T20:02:51Z

pkg/controller/networkpolicy/clusternetworkpolicy_test.go

+	_, npc := newController()
+	cnp := getCNP()
+	newCNP := cnp.DeepCopy()
+	newCNP.Spec.Priority = float64(100)


could you add a comment explaining why we set the priority to this value here?

antoninbas · 2022-07-19T20:03:02Z

pkg/controller/networkpolicy/antreanetworkpolicy_test.go

+	_, npc := newController()
+	anp := getANP()
+	newANP := anp.DeepCopy()
+	newANP.Spec.Priority = float64(1)


could you add a comment explaining why we set the priority to this value here?

antoninbas · 2022-07-19T20:04:48Z

pkg/controller/networkpolicy/crd_utils_test.go

@@ -428,7 +428,7 @@ func TestToAntreaPeerForCRD(t *testing.T) {
 			_, npc := newController()
 			npc.addClusterGroup(&cgA)
 			npc.cgStore.Add(&cgA)
-			actualPeer := npc.toAntreaPeerForCRD(tt.inPeers, testCNPObj, tt.direction, tt.namedPortExists)
+			actualPeer := npc.toAntreaPeerForCRD(getAntreaClusterNetworkPolicyReference(testCNPObj), tt.inPeers, testCNPObj, tt.direction, tt.namedPortExists)


I just thought about this: could we use getACNPReference / getANPReference as function names? It gets pretty long to read and we already use the ACNP / ANP abbreviations in other function names

antoninbas · 2022-07-19T20:06:02Z

pkg/controller/networkpolicy/networkpolicy_controller.go

+	// addressGroupReferences tracks the reference count of policies for AddressGroups.
+	addressGroupReferences map[string]networkPolicyReferences
+	// appliedToGroupMutex prevents race conditions between multiple internalNetworkPolicyWorkers when they create or
+	// delete same AppliedToGroups and ensures atomicity of updating appliedToGroupStore and appliedToGroupReferences.


delete the same

antoninbas · 2022-07-19T20:06:19Z

pkg/controller/networkpolicy/networkpolicy_controller.go

+	// delete same AppliedToGroups and ensures atomicity of updating appliedToGroupStore and appliedToGroupReferences.
+	appliedToGroupMutex sync.Mutex
+	// addressGroupMutex prevents race conditions between multiple internalNetworkPolicyWorkers when they create or
+	// delete same  AddressGroups and ensures atomicity of updating addressGroupStore and addressGroupReferences.


delete the same

there is also an extra whitespace

pkg/controller/networkpolicy/networkpolicy_controller.go

pkg/controller/networkpolicy/crd_utils.go

pkg/controller/networkpolicy/networkpolicy_controller.go

pkg/controller/networkpolicy/endpoint_querier_test.go

tnqn · 2022-08-08T03:11:32Z

/skip-integration tested manually

qiyueyao · 2022-08-09T04:55:37Z

Regarding PR #4047, has updated documentation in the enableLogging section that users should avoid annotation update to prevent NP from working in a stale state. The comment needs to be deleted after this PR fixes race conditions. Thanks!

tnqn · 2022-08-11T14:36:18Z

/test-all

tnqn · 2022-08-11T15:10:30Z

pkg/controller/networkpolicy/clustergroup.go

 	for _, p := range parentGroupObjs {
 		parentGrp := p.(*antreatypes.Group)
 		c.enqueueInternalGroup(parentGrp.SourceReference.ToGroupName())
 	}
 }

+// triggerDerivedGroupUpdates triggers processing of AppliedToGroup and AddressGroup derived from the provided group.
+func (c *NetworkPolicyController) triggerDerivedGroupUpdates(grp string) {


This replaces the previous group enqueue operations in reprocessCNP, which blindly enqueues all AppliedToGroups and AddressGroups.

tnqn · 2022-08-14T17:36:18Z

/test-all

tnqn · 2022-08-15T05:27:33Z

/test-all
/test-ipv6-all
/test-ipv6-only-all
/test-windows-all

tnqn · 2022-08-15T15:47:37Z

/test-all

antoninbas

LGTM< I think @GraysonWu should take another look as well

antoninbas · 2022-08-15T17:45:12Z

pkg/controller/networkpolicy/networkpolicy_controller.go

+	// It must use value instead of pointer as the key, otherwise the same NetworkPolicies will not be treated as same
+	// item because the pointers may be different.


thanks for adding this comment

antoninbas · 2022-08-15T17:46:08Z

pkg/controller/networkpolicy/antreanetworkpolicy.go

-// in case of ADD event or modified and store the updated instance, in case
-// of an UPDATE event.
-func (n *NetworkPolicyController) processAntreaNetworkPolicy(np *crdv1alpha1.NetworkPolicy) *antreatypes.NetworkPolicy {
+// instance to the caller wherein.


remove "wherein", which is no longer valid

GraysonWu

LGTM. Just curious: is there a way to "test" a race condition?

tnqn · 2022-08-16T03:26:12Z

LGTM. Just curious: is there a way to "test" a race condition?

Good question, I added an unit test TestSyncInternalNetworkPolicyConcurrently to verify one of the race conditions is resolved. Without the lock n.internalNetworkPolicyMutex.Lock(), the test had a chance to fail:

# go test -run TestSyncInternalNetworkPolicyConcurrently  -count 500 -race ./pkg/controller/networkpolicy/
--- FAIL: TestSyncInternalNetworkPolicyConcurrently (0.01s)
    networkpolicy_controller_test.go:3155:
                Error Trace:    networkpolicy_controller_test.go:3155
                                                        networkpolicy_controller_test.go:3130
                Error:          "[]" should have 1 item(s), but has 0
                Test:           TestSyncInternalNetworkPolicyConcurrently
    networkpolicy_controller_test.go:3158:
                Error Trace:    networkpolicy_controller_test.go:3158
                                                        networkpolicy_controller_test.go:3130
                Error:          Should be true
                Test:           TestSyncInternalNetworkPolicyConcurrently
FAIL
FAIL    antrea.io/antrea/pkg/controller/networkpolicy   5.405s
FAIL

Although the code before refactoring didn't create and delete groups in SyncInternalNetworkPolicy but in addNetworkPolicy, addANP, addCNP, updateNetworkPolicy, etc., they could execute concurrently as they are event handlers of different resources, so the same race condition could happen like above.

tnqn · 2022-08-16T03:27:47Z

/skip-all the latest change only added an unit test and updated a comment.

tnqn · 2022-08-16T03:30:43Z

@antoninbas @GraysonWu could you give another approval?

There were a few race conditions in NetworkPolicyController: * An AppliedToGroup or AddressGroup in use may be removed if situations like this happens: 1. addANP creates a group for ANP A; 2. addNetworkPolicy reuses the group for KNP B, is going to create an internal NetworkPolicy; 3. deleteANP deletes the group for ANP A because at that moment no other internal NetworkPolicies are using the group; 4. addNetworkPolicy commits the internal NetworkPolicy for KNP B to storage, but the group no longer exists. * An Antrea-native NetworkPolicy may be out-of-date if situations like this happens: 1. An ACNP event is received, `updateCNP` calculates the new internal NetworkPolicy for the ACNP, is going to commit it to storage; 2. A ClusterGroup event triggers update of the ACNP via triggerCNPUpdates 3. triggerCNPUpdates calls reprocessCNP which updates the new internal NetworkPolicy for the ACNP and commits it to storage; 4. updateCNP in the first step commits its internal NetworkPolicy to storage which overrides the update of the ClusterGroup event. The second one caused test flake of the test case "TestGroupNoK8sNP/Case=ACNPNestedClusterGroup". To resolve the race conditions completely and make NetworkPolicy handling less error prone, this patch refactors NetworkPolicyController: * Event handlers no longer update the storage of internal NetworkPolicy directly and only triggers resync of affected policies, which ensures that there is at most one worker handling an internal NetworkPolicy at any moment. * Ensure atomicity when updating internal NetworkPolicy and creating or deleting AddressGroups and AppliedToGroups. Duplicate code and tests are deleted with the refactoring. Signed-off-by: Quan Tian <qtian@vmware.com>

Dyanngg · 2022-08-16T03:58:36Z

pkg/controller/networkpolicy/antreanetworkpolicy.go

-		return "", nil
+		// Internal Group is not found, which means the corresponding namespaced group is either not created yet or not
+		// processed yet. Once the internal Group is created and processed, the sync worker for internal group will
+		// re-enqueue the ClusterNetworkPolicy processing which will trigger the creation of AppliedToGroup.


Suggested change

// re-enqueue the ClusterNetworkPolicy processing which will trigger the creation of AppliedToGroup.

// re-enqueue the AntreaNetworkPolicy processing which will trigger the creation of AppliedToGroup.

thanks for pointing it out, will fix it in other PR

pkg/controller/networkpolicy/antreanetworkpolicy.go

tnqn · 2022-08-16T06:42:36Z

/skip-all

jianjuns · 2022-08-16T18:26:09Z

@tnqn Sorry for the late review. I added several questions about locking in the comments, and have a general question: why earlier we chose to process related NPs in event handlers directly? Any side effects (performance, etc.) with the new approach of enqueuing NPs?

tnqn · 2022-08-17T15:42:52Z

@tnqn Sorry for the late review. I added several questions about locking in the comments

@jianjuns I don't find your questions about locking in the comments. Have you published them?

and have a general question: why earlier we chose to process related NPs in event handlers directly?

It's because we only support K8s NetworkPolicy when designing the workflow, it worked fine and was more efficient to calculate spec in event handlers and calculate span in workers separately. But with Antrea-native policies, groups features added, the workflow lost its advantage and became complex and error prone:

When we only support K8s NetworkPolicy:

InternalNetworkPolicy, AddressGroup, AppliedToGroup's spec have only single source of truth, which is K8s NetworkPolicy. Only K8s NetworkPolicy creation, update, and deletion events could trigger their creation and deletion, and their spec's update.
Event handlers of a single kind of resources (e.g. addNetworkPolicy, updateNetworkPolicy, deleteNetworkPolicy) are executed sequentially, so creating and deleting InternalNetworkPolicies, AddressGroups, AppliedToGroups don't need to concern race conditions.

The above two facts changed after NetworkPolicyController started to support Antrea-native policies and groups:

InternalNetworkPolicy, AddressGroup, AppliedToGroup's spec have multiple source of truth: K8s NetworkPolicy, Antrea ClusterNetworkPolicy, Antrea NetworkPolicy, Group, ClusterGroup, Namespace's annotation (for determining Audit Logging property of K8s NetworkPolicy). Then all these resources' events can trigger InternalNetworkPolicy, AddressGroup, and AppliedToGroup's creation, deletion, and spec's update. That's why there are functions like triggerANPUpdates, triggerCNPUpdates, triggerParentGroupSync, etc.
Event handlers of different kind of resources can execute concurrently, so race conditions between them need to be taken into consideration. Also, as ClusterGroup, Group, and even Namespace's annotation can affect the spec of InternalNetworkPolicy, the workers that sync ClusterGroup, Group and the event handlers of Namespace can update InternalNetworkPolicy, which can also conflict with the event handlers of networkpolicies.

Mutex need to be used very carefully to avoid race conditions, for example, even just internal NetworkPolicy, they may be written by the following goroutines:

K8s NetworkPolicy event handler executor
Antrea ClusterNetworkPolicy event handler executor
Antrea NetworkPolicy event handler executor
Namespace event handler executor
multiple InternalNetworkPolicy workers
multiple InternalGroup workers

But even mutex is used properly, there will be a bottleneck for all the above executors, only one of them could execute the code block that may create/update/delete InternalNetworkPolicy, AddressGroup, AppliedToGroup. The previous code's critical section was not big enough to avoid race conditions, hence the errors. To fix it in old way, more code block need to be executed with lock acquired. And the code was quite redundant, this can be told by the lines of code change: the refactoring added 1,504 lines of code and deleted 2,735 lines of code, and the unit test coverage of this package was still improved from 58.2% to 63.3%.

I just created issue #4127 to describe the race conditions and why a refactoring is necessary for record.

Any side effects (performance, etc.) with the new approach of enqueuing NPs?

In theory, syncInternalNetworkPolicy's overhead is greater than before because it calculates both spec and span while it only calculates span before. But the concurrency is improved, previously their spec calculation is executed sequentially, and the use of internalNetworkPolicyMutex also makes span calculation is executed sequentially even though there are multiple workers, now spec and span calculation are executed concurrently and can scale out by increasing number of workers, only updating storage is executed sequentially. In theory, calculating spec should not cost much as it's a simple translation operation, unlike calculating span operation which may depend on the number of Nodes, so I think repeating calculating spec in syncInternalNetworkPolicy should be fine. Even it adds some overhead for each round of processing, it should still be better than serializing them in the above 6 executors.

jianjuns

Thanks Quan for the detailed explanation. All make sense to me.

Yes seems I forgot to publish the previous comments.

jianjuns · 2022-08-16T18:14:44Z

pkg/controller/networkpolicy/networkpolicy_controller.go

+		n.internalNetworkPolicyMutex.Lock()
+		defer n.internalNetworkPolicyMutex.Unlock()
+
+		if !oldInternalPolicyExists {


Question - does NP store update need to be protected by the mutex too? Why?

Yes, they need to be protected by the same mutex because NP store has index of AddressGroups and AppliedToGroups, which is used to determine whether an AddressGroup/AppliedToGroup is orphan or not. For example, the following code means the AppliedToGroup is no longer used and can be deleted:

objs, _ := n.internalNetworkPolicyStore.GetByIndex(store.AppliedToGroupIndex, atgName) if len(objs) == 0 {

We must ensure the AddressGroup/AppliedToGroup data (stored in their own store) and their index (stored in NP store) are updated atomically, otherwise the first race condition the PR description describes could happen:

addANP creates a group for ANP A;

addNetworkPolicy reuses the group for KNP B, is going to create an internal NetworkPolicy;

deleteANP deletes the group for ANP A because at that moment no other internal NetworkPolicies are using the group;

addNetworkPolicy commits the internal NetworkPolicy for KNP B to storage, but the group no longer exists.

Ok. Got it.

jianjuns · 2022-08-16T18:21:11Z

pkg/controller/networkpolicy/networkpolicy_controller.go

+	case controlplane.AntreaClusterNetworkPolicy:
+		cnp, err := n.cnpLister.Get(key.Name)
+		if err != nil {
+			n.deleteInternalNetworkPolicy(internalNetworkPolicyName)


deleteInternalNetworkPolicy() can delete groups too, but I did not see that is protected by the mutex?

It's protected by the mutex:

antrea/pkg/controller/networkpolicy/networkpolicy_controller.go

Lines 1540 to 1542 in 9c9b098

func (n *NetworkPolicyController) deleteInternalNetworkPolicy(name string) {

n.internalNetworkPolicyMutex.Lock()

defer n.internalNetworkPolicyMutex.Unlock()

Humm.. not sure why I missed these lines.

There were a few race conditions in NetworkPolicyController: * An AppliedToGroup or AddressGroup in use may be removed if situations like this happens: 1. addANP creates a group for ANP A; 2. addNetworkPolicy reuses the group for KNP B, is going to create an internal NetworkPolicy; 3. deleteANP deletes the group for ANP A because at that moment no other internal NetworkPolicies are using the group; 4. addNetworkPolicy commits the internal NetworkPolicy for KNP B to storage, but the group no longer exists. * An Antrea-native NetworkPolicy may be out-of-date if situations like this happens: 1. An ACNP event is received, `updateCNP` calculates the new internal NetworkPolicy for the ACNP, is going to commit it to storage; 2. A ClusterGroup event triggers update of the ACNP via triggerCNPUpdates 3. triggerCNPUpdates calls reprocessCNP which updates the new internal NetworkPolicy for the ACNP and commits it to storage; 4. updateCNP in the first step commits its internal NetworkPolicy to storage which overrides the update of the ClusterGroup event. The second one caused test flake of the test case "TestGroupNoK8sNP/Case=ACNPNestedClusterGroup". To resolve the race conditions completely and make NetworkPolicy handling less error prone, this patch refactors NetworkPolicyController: * Event handlers no longer update the storage of internal NetworkPolicy directly and only triggers resync of affected policies, which ensures that there is at most one worker handling an internal NetworkPolicy at any moment. * Ensure atomicity when updating internal NetworkPolicy and creating or deleting AddressGroups and AppliedToGroups. Duplicate code and tests are deleted with the refactoring. Signed-off-by: Quan Tian <qtian@vmware.com>

tnqn force-pushed the fix-acnp branch 2 times, most recently from 6855b40 to 30ff4d7 Compare July 18, 2022 16:32

tnqn force-pushed the fix-acnp branch from 30ff4d7 to ecfb9dd Compare July 18, 2022 17:11

tnqn marked this pull request as ready for review July 19, 2022 02:13

tnqn requested review from GraysonWu, antoninbas, jianjuns and qiyueyao July 19, 2022 02:14

antoninbas reviewed Jul 19, 2022

View reviewed changes

GraysonWu reviewed Jul 21, 2022

View reviewed changes

pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved

tnqn added this to the Antrea v1.8 release milestone Jul 28, 2022

GraysonWu reviewed Jul 29, 2022

View reviewed changes

pkg/controller/networkpolicy/crd_utils.go Outdated Show resolved Hide resolved

GraysonWu reviewed Jul 29, 2022

View reviewed changes

pkg/controller/networkpolicy/networkpolicy_controller.go Outdated Show resolved Hide resolved

GraysonWu reviewed Jul 29, 2022

View reviewed changes

pkg/controller/networkpolicy/endpoint_querier_test.go Show resolved Hide resolved

tnqn mentioned this pull request Aug 4, 2022

Audit Logging support K8s Networkpolicy #4047

Merged

tnqn force-pushed the fix-acnp branch from ecfb9dd to 93440f6 Compare August 11, 2022 14:35

tnqn commented Aug 11, 2022

View reviewed changes

tnqn marked this pull request as draft August 11, 2022 16:59

tnqn force-pushed the fix-acnp branch from 93440f6 to 0021811 Compare August 14, 2022 17:35

tnqn force-pushed the fix-acnp branch 3 times, most recently from 80326a0 to 4844c6e Compare August 15, 2022 05:27

tnqn marked this pull request as ready for review August 15, 2022 05:27

tnqn force-pushed the fix-acnp branch from 4844c6e to 4a58e50 Compare August 15, 2022 15:47

tnqn requested review from antoninbas and GraysonWu August 15, 2022 15:47

antoninbas previously approved these changes Aug 15, 2022

View reviewed changes

GraysonWu previously approved these changes Aug 16, 2022

View reviewed changes

tnqn dismissed stale reviews from GraysonWu and antoninbas via 7ffacc0 August 16, 2022 03:19

tnqn force-pushed the fix-acnp branch from 4a58e50 to 7ffacc0 Compare August 16, 2022 03:19

tnqn force-pushed the fix-acnp branch from 7ffacc0 to d483bf7 Compare August 16, 2022 03:44

antoninbas approved these changes Aug 16, 2022

View reviewed changes

Dyanngg reviewed Aug 16, 2022

View reviewed changes

tnqn merged commit d1c6a43 into antrea-io:main Aug 16, 2022

tnqn deleted the fix-acnp branch August 16, 2022 06:43

tnqn mentioned this pull request Aug 17, 2022

Race conditions in NetworkPolicyController caused NetworkPolicy realization issues #4127

Closed

jianjuns reviewed Aug 17, 2022

View reviewed changes

qiyueyao added the action/backport Indicates a PR that requires backports. label Sep 3, 2022

xliuxu mentioned this pull request Mar 21, 2023

Automated cherry pick of #4028: Fix race conditions in NetworkPolicyController Cherry pick of #4028 on release-1.7 #4727

Merged

		key, _ := npc.internalNetworkPolicyQueue.Get()
		npc.internalNetworkPolicyQueue.Done(key)

		// It must use value instead of pointer as the key, otherwise the same NetworkPolicies will not be treated as same
		// item because the pointers may be different.

	// re-enqueue the ClusterNetworkPolicy processing which will trigger the creation of AppliedToGroup.
	// re-enqueue the AntreaNetworkPolicy processing which will trigger the creation of AppliedToGroup.

	func (n *NetworkPolicyController) deleteInternalNetworkPolicy(name string) {
	n.internalNetworkPolicyMutex.Lock()
	defer n.internalNetworkPolicyMutex.Unlock()

Fix race conditions in NetworkPolicyController #4028

Fix race conditions in NetworkPolicyController #4028

Conversation

tnqn commented Jul 18, 2022 • edited Loading

codecov bot commented Jul 18, 2022 • edited Loading

Codecov Report

tnqn commented Jul 18, 2022

tnqn commented Jul 18, 2022

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn commented Aug 8, 2022

qiyueyao commented Aug 9, 2022

tnqn commented Aug 11, 2022

Choose a reason for hiding this comment

tnqn commented Aug 14, 2022

tnqn commented Aug 15, 2022

tnqn commented Aug 15, 2022

antoninbas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GraysonWu left a comment

Choose a reason for hiding this comment

tnqn commented Aug 16, 2022

tnqn commented Aug 16, 2022

tnqn commented Aug 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tnqn commented Aug 16, 2022

jianjuns commented Aug 16, 2022

tnqn commented Aug 17, 2022

jianjuns left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianjuns Aug 17, 2022 • edited Loading

Choose a reason for hiding this comment

tnqn commented Jul 18, 2022 •

edited

Loading

codecov bot commented Jul 18, 2022 •

edited

Loading

jianjuns Aug 17, 2022 •

edited

Loading