-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit Logging support K8s Networkpolicy #4047
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4047 +/- ##
==========================================
+ Coverage 64.25% 68.15% +3.89%
==========================================
Files 294 297 +3
Lines 44805 45699 +894
==========================================
+ Hits 28789 31145 +2356
+ Misses 13687 12202 -1485
- Partials 2329 2352 +23
*This pull request uses carry forward flags. Click here to find out more.
|
185d85e
to
66fdf99
Compare
Reviewers, there's a problem for the Namespace UPDATE event. When modifying the |
if oldNamespace.Annotations[EnableNPLoggingAnnotationKey] != curNamespace.Annotations[EnableNPLoggingAnnotationKey] { | ||
affectedNPs, err := n.networkPolicyLister.NetworkPolicies(curNamespace.Name).List(labels.Everything()) | ||
if err != nil { | ||
klog.Errorf("Error fetching NetworkPolicies in the Namespace %s: %v", curNamespace.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use structured logging for new logs, but here the error can be ignored, as this is in-memory operation, List call will never return an error.
return | ||
} | ||
for _, np := range affectedNPs { | ||
n.updateNetworkPolicy(np, np) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the event handler directly would introduce another race condition similar to the ones I'm trying to fix via #4028:
updateNamespace
gets NP from lister and callsupdateNetworkPolicy
- A NP update event updates NP in lister and triggers
updateNetworkPolicy
with new object updateNetworkPolicy
triggered in step 2 finished and updated internal NPupdateNetworkPolicy
triggered in step 1 finished and overrided internal NP with a stale state
But I don't have a good solution to quickly fix it in this patch without refactoring the workflow like I did in #4028. Perhaps we could handle the annotation update case later and wait for #4028 refactoring the workflow first, or we would risky NetworkPolicies working in stale state.
If enableLogging changes, the ID of rule also changes, the original rule would be deleted and the new rule would be installed: antrea/pkg/agent/controller/networkpolicy/cache.go Lines 650 to 665 in 2a37aec
So it's already handled. Otherwise current code won't work because reconciler doesn't call functions like AddPolicyRuleAddress for a previously install rule if there is no address change.
I guess Code is not needed? |
Probably I didn't express it clearly, for the flow generated by the rule directly, it indeed changes due to |
@qiyueyao thanks for the explanation. Now I understand the problem is the dropFlow is not specific to a rule but specific to a match condition. Then does it work if we store |
I think Quan's suggestion could work. Since we only use it for |
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Signed-off-by: Qiyue Yao <yaoq@vmware.com>
Storing |
@@ -659,6 +659,9 @@ for all NetworkPolicies in the Namespace. Packets of any connection that match | |||
a NetworkPolicy rule will be logged with a reference to the NetworkPolicy name, | |||
but packets dropped by the implicit "default drop" (not allowed by any NetworkPolicy) | |||
will only be logged with consistent name `K8sNetworkPolicy` for reference. | |||
Note that currently, Antrea only retrieves the logging Annotation once when adding | |||
NetworkPolicies and in case of agent restart, users should not update Namespace | |||
logging Annotations, otherwise it would risk NetworkPolicies working in a stale state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reader (a probably bad one :)) I understand that if an agent is restarted, users should not be updated the namespace logging annotation anymore.
If I am not mistaken, the concept to convey is that events where the namespace annotation is processed concurrently with the network policy, such as agent restart, might trigger a risk of having stale NetworkPolicy info realized. If my understanding is correct, we might consider a little rephrasing above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salv-orlando The race condition was explained here: #4047 (comment). I think we should remove this restriction in this release. It would be a simple change after #4028 is merged. I was waiting for these feature PRs to be merged first, otherwise all of them need to address many conflicts caused by the refactoring by #4028
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get what exact issues can happen under what exact conditions either. Could you explain more? @qiyueyao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we are talking about two issues here:
- We do not handle annotation changes after a NetworkPolicies are created.
- For agent restart (before? after? during?), users cannot change the Namespace annotation, otherwise we can run into some issues (but what issues? What does "working in a stale state means"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline and opened follow up PR #4099.
@@ -731,6 +734,15 @@ func (c *clause) addConjunctiveMatchFlow(featureNetworkPolicy *featureNetworkPol | |||
changeType: insertion, | |||
} | |||
} | |||
} else if context.dropFlowEnableLogging != enableLogging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will also work if the namespace annotation is changed while the antrea agent is down, as this code will be invoked upon agent restart. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually there is another pathway for agent restart BatchInstallPolicyRuleFlows -> addRuleToConjunctiveMatch -> addActionToConjunctiveMatch especially for batch install, which is triggered upon Antrea agent restart.
if ns.Status.Phase == corev1.NamespaceTerminating { | ||
return fmt.Errorf("error when creating '%s' Namespace: namespace exists but is in 'Terminating' phase", namespace) | ||
} | ||
} | ||
return nil | ||
} | ||
|
||
// createNamespaceWithAnnotations creates the namespace with Annotations. | ||
func (data *TestData) UpdateNamespace(namespace string, mutateFunc func(*corev1.Namespace)) error { | ||
ns, _ := data.clientset.CoreV1().Namespaces().Get(context.TODO(), namespace, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not ignore the error since this access K8s API, which could fail because of several reasons, otherwise the following code would panic because ns is nil
// testAuditLoggingEnableNP tests that audit logs are generated when K8s NP is applied | ||
// tests both Allow traffic by K8s NP and Drop traffic by implicit K8s policy drop | ||
func testAuditLoggingEnableNP(t *testing.T, data *TestData) { | ||
data.updateNamespaceWithAnnotations(namespaces["x"], map[string]string{networkpolicy.EnableNPLoggingAnnotationKey: "true"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned error should checked
failOnError(data.updateNamespaceWithAnnotations(namespaces["x"], map[string]string{networkpolicy.EnableNPLoggingAnnotationKey: "true"}), t)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in follow up PR #4099.
@@ -659,6 +659,9 @@ for all NetworkPolicies in the Namespace. Packets of any connection that match | |||
a NetworkPolicy rule will be logged with a reference to the NetworkPolicy name, | |||
but packets dropped by the implicit "default drop" (not allowed by any NetworkPolicy) | |||
will only be logged with consistent name `K8sNetworkPolicy` for reference. | |||
Note that currently, Antrea only retrieves the logging Annotation once when adding | |||
NetworkPolicies and in case of agent restart, users should not update Namespace | |||
logging Annotations, otherwise it would risk NetworkPolicies working in a stale state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salv-orlando The race condition was explained here: #4047 (comment). I think we should remove this restriction in this release. It would be a simple change after #4028 is merged. I was waiting for these feature PRs to be merged first, otherwise all of them need to address many conflicts caused by the refactoring by #4028
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge the PR as is. We could improve the tests via separate PR
@qiyueyao I had to rewrite the commit message and close the corresponding issue manually when merging the PR. Could you please write detailed commit message (and perhaps squash commits timely) for future PRs? If you prefer appending commits instead of force pushing, the first commit should have a proper commit message which could be used when merging. The PR description and commit message should use the well known format to link the issue, like "Fixes #XXX", instead of a hyperlink "This PR is to address this issue", which couldn't close issue automatically. You could find more details here. |
Previously audit logging only works for Antrea NetworkPolicy and ClusterNetworkPolicy when setting the
enableLogging
field in the rule. This PR is to address this issue.Method overview of this PR is at this design doc.
To enable logging for K8s NetworkPolicy, set
annotations
inmetadata
for the Namespace aspolicy.antrea.io/enable-np-logging: "true"
. Then K8s NetworkPolicy applied to this Namespace will be logged in the directory/var/log/antrea/networkpolicy/np.log
.