Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netpolicy logs should also log drops from standard k8s netpol rules #3640

Closed
jsalatiel opened this issue Apr 14, 2022 · 17 comments
Closed

netpolicy logs should also log drops from standard k8s netpol rules #3640

jsalatiel opened this issue Apr 14, 2022 · 17 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@jsalatiel
Copy link

As the cluster admin, I use antrea ACNP to define global rules and one last rule "default deny" policy with logging:

apiVersion: crd.antrea.io/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: default-deny
spec:
  priority: 99
  tier: Baseline
  appliedTo:
   - namespaceSelector: {}
  ingress:
    - action: Drop
      enableLogging: true
  egress:
    - action: Drop
      enableLogging: true

This way all drops are logged to /var/log/antrea/networkpolicy/np.log.
The developers are the ones responsible for building the netpolicies ( standard k8s netpolicies) for their appications as long as those do not conflict with the global rules defined in ANCP. The problem is that as soon their netpol is applied, the selected pods will not log anything on /var/log/antrea/networkpolicy/np.log.
In calico If i define a global logging. every DROP will be logged ( as a iptables LOG target in syslog ) even if the pod has been isolated by standard netpolicies. This global logging behaviour helps a lot debugging mainly when the developer forgets some allow rule. It would be nice if antrea could also log DROPs caused by pod isolation in standard k8s netpol.

Describe the solution you'd like
Add some global parameter to the agent, for example "LogStandardNetpolDrops" that will also log all drops from isolated pods to /var/log/antrea/networkpolicy/np.log. This is one of the main problems I see to debug developers' net policies when new deployments are not workind as expected.

@jsalatiel jsalatiel added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 14, 2022
@Dyanngg Dyanngg added the good first issue Good for newcomers label Apr 15, 2022
@tnqn
Copy link
Member

tnqn commented Apr 18, 2022

@Dyanngg I feel this may not be a good first issue. I have some questions:

  1. When there are multiple policies applied to a Pod, both of them isolate it. Which policy should be logged as the effective one? @jsalatiel
  2. Can Traceflow meet the requirement (identify which policy is responsible for the drop)? @jsalatiel
  3. Can antctl query endpoint meet the requirement (identify which policies are applied to the Pod)? @jsalatiel
  4. Do you think it's good to add a global option specific to K8s NetworkPolicy's drop action? Logging as a default behavior may cause some overhead, and toggling config and making it effective is kind of heavy too, compared to updating logging field of ACNP. @Dyanngg

@Dyanngg Dyanngg removed the good first issue Good for newcomers label Apr 19, 2022
@jsalatiel
Copy link
Author

Hi @tnqn ,

  1. IMHO The log line can just say it was dropped by a "standard k8s net policy" in that case. The important is that the drops are logged some place. Besides being easier to debug rules that the developer forgot by seeing those drops in the logs, there are several tools that can parse the drops in /var/log/antrea/networkpolicy/np.log and trigger alerts if excessive drops are being generated by some pod which could indicate that the pod could have been compromised. So that's the importance of also logging when isolated.
  2. I will read about it how to use traceflow and get back with this info later
  3. Yes, we can identify the netpolicies applied to the pod, but still it is not good enough because one can not which traffic is being blocked.
    For example, see this output
# antctl  query endpoint -p testpod7 -n testing
799-default-deny    <NONE>    6d55d6df-207a-4fd6-9f7d-73188eff46f4
default-deny-all    testing   1228d1b1-3b0b-4698-8b45-242e39ecc0c5

799-default-deny is a ANCP for block all and it has logging enabled, so it would be expected by the user being able to see all traffic dropped logged someplace.

apiVersion: crd.antrea.io/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: 799-default-deny
spec:
  priority: 99
  tier: Baseline
  appliedTo:
   - namespaceSelector: {}
  ingress:
    - action: Drop
      enableLogging: true
  egress:
    - action: Drop
      enableLogging: true

But since default-deny-all is a standard k8s net policy that isolates the pod I can not see what connections are being attempted and blocked on /var/log/antrea/networkpolicy/np.log. maybe traceflow can help, but still it would be better to see that as an entry in the logs.

@jsalatiel
Copy link
Author

Getting back to this. Traceflow is not a nice replacement for logs in anp.log

@jdoylei
Copy link

jdoylei commented May 4, 2022

Hi @tnqn, I completely agree with the issue described by @jsalatiel here, and their responses to your questions.

Background:

  • We are implementing a Kubernetes cluster with VMware's "vSphere with Tanzu" product and adopted Antrea as the CNI plugin on their recommendation.
  • We successfully tested using Kubernetes NetworkPolicy with Antrea, apart from logging policy denies, which our Cyber Security team specified as a requirement.
  • After upgrading our "vSphere with Tanzu" environment so that it includes Antrea v0.13.5, we are supposed to be able to use Antrea's logging feature to capture denies from the overall policy.
  • But it doesn't appear to work like that. Pods isolated by the Kubernetes NetworkPolicy (KNP) have no way to log denied traffic to Antrea's np.log, as @jsalatiel explained.

Configuration tested:

Antrea ClusterNetworkPolicy (ACNP) to drop and log all traffic associated with baseline Tier:

apiVersion: security.antrea.tanzu.vmware.com/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: default-cluster-network-policy
spec:
  priority: 1
  tier: baseline
  appliedTo:
    - namespaceSelector:
        matchLabels:
          nstype: app
  ingress:
    - action: Drop              # For all Pods in those Namespaces, drop and log all ingress traffic from anywhere
      name: drop-all-ingress
      enableLogging: true
  egress:
    - action: Drop              # For all Pods in those Namespaces, drop and log all egress traffic towards anywhere
      name: drop-all-egress
      enableLogging: true

Kubernetes NetworkPolicy (KNP) objects in the namespace select the pods under test (making them KNP-isolated pods):

$ kubectl get networkpolicy
NAME                                        POD-SELECTOR                                                                                   AGE
database-admin-client-webservices           database-name in (admin)                                                                       36m
department-service-integration              deployment-artifact-image-name in (department-service)                                         36m
dns-from-webservices                        deployment-artifact-image-name in (department-service,employee-service,organization-service)   36m
employee-service-integration                deployment-artifact-image-name in (employee-service)                                           36m
external-services-from-webservices          deployment-artifact-image-name in (department-service)                                         36m
k8s-hostnetwork-components-to-webservices   deployment-artifact-image-name in (department-service,employee-service,organization-service)   36m
k8sapi-from-webservices                     deployment-artifact-image-name in (department-service,employee-service,organization-service)   36m
organization-service-integration            deployment-artifact-image-name in (organization-service)                                       36m
testpods                                    testpod                                                                                        36m
utilities                                   utility                                                                                        36m
webservices-from-testpods                   deployment-artifact-image-name in (department-service,employee-service,organization-service)   36m

antctl query endpoint reports combined list of ACNP and KNP for pod under test:

$ kubectl -n kube-system exec -c antrea-controller ... -- antctl query endpoint -p department-service-2021-mid-tier-eai-release1-hourly-mp-a-d6dpf -n poc1-dev
Endpoint poc1-dev/department-service-2021-mid-tier-eai-release1-hourly-mp-a-d6dpf
Applied Policies:
Name                                      Namespace UID
default-cluster-network-policy            <NONE>    7c51db5e-c046-45a3-b613-295b10210cf3
department-service-integration            poc1-dev  1be90b65-3cb2-467e-9af4-fd3e09770a00
dns-from-webservices                      poc1-dev  63cf73cf-a9b0-451c-af03-64da24a52ac1
external-services-from-webservices        poc1-dev  cea8355d-1865-435f-ae93-e37544de3562
k8s-hostnetwork-components-to-webservices poc1-dev  23b68e85-1243-4b55-9fe7-d9e24b6de273
k8sapi-from-webservices                   poc1-dev  f7c8f82b-3321-4f28-ba12-dd25622b8619
webservices-from-testpods                 poc1-dev  b8f44e19-b22c-4c0b-abbf-1ff156eb4099

Tests on pod traffic show traffic is dropped (client programs time out), but np.log does not include any related messages.

If I remove labels from a pod under test so that none of the KNPs select it, traffic is dropped and np.log shows drop events for the ACNP:

2022/05/04 16:15:47.368165 EgressDefaultRule AntreaClusterNetworkPolicy:default-cluster-network-policy Drop 170 SRC: 192.0.4.53 DEST: 192.0.8.7 130 UDP
2022/05/04 16:15:52.371214 EgressDefaultRule AntreaClusterNetworkPolicy:default-cluster-network-policy Drop 170 SRC: 192.0.4.53 DEST: 192.0.8.3 103 UDP

I also wanted to point out that this test is without any "default-deny-all" KNP like the case shown by @jsalatiel. The test has no KNP like default-deny-all selecting every pod. All it takes is one KNP selecting a pod, to make it isolated by KNP, and then the denies/drops for that pod will be done by Antrea's KNP mechanism only, and Antrea appears to be unable to log those drops to np.log.

Exactly as stated in the issue title, Antrea should be able to log dropped traffic for pods isolated by KNP. Even though the drop is implicit by KNP's design, and, as @tnqn pointed out, is not tied to a specific KNP object, there will be teams that need to be informed of that dropped traffic, via the log (np.log). The logging might capture dropped traffic for pods not isolated by KNP, which might cover a spurious pod that was not intended by the development team. But the traffic with these "known" pods that are covered in the KNP is also still of interest - if the pods are behaving unexpectedly or used unexpectedly, those events are important. Unless those drops can be logged, there's almost no benefit to Antrea's logging feature, for organizations using KNP.

@jdoylei
Copy link

jdoylei commented May 4, 2022

About configuring the proposed logging behavior for Kubernetes NetworkPolicies/KNPs - since the NetworkPolicies within a namespace are evaluated collectively when deciding whether a pod is isolated by KNP, maybe the configuration is something that makes sense per namespace? Some options:

  • Enable KNP drop logging by installing an Antrea NetworkPolicy in the given namespace, assuming Antrea NetworkPolicy has some new field for this.
  • Enable KNP drop logging by configuring some newly-introduced Antrea CRD in the given namespace.
  • Enable KNP drop logging by adding an Antrea-specific annotation or label to the given Namespace object itself.

Options like these might also make configuration easier for users who are using Antrea as part of a pre-packaged solution like VMware's vSphere with Tanzu. VMware doesn't support users changing the Antrea configuration (the antrea-config-xxx ConfigMap) for this product. Users can only add new Antrea-type resources via the Antrea CRDs.

@Dyanngg
Copy link
Contributor

Dyanngg commented May 4, 2022

/cc @antoninbas @jianjuns

@antoninbas
Copy link
Contributor

I think that this is a legitimate ask. The idea of adding an Antrea-specific Namespace annotation to toggle logging for K8s NetworkPolicies in that Namespace is interesting IMO.

@jianjuns
Copy link
Contributor

jianjuns commented May 5, 2022

Is it the reasoning that - users prefer to use K8s NetworkPolicy (which they are more familiar with) over AntreaNetworkPolicy, but they are willing to learn the new annotation?

@jsalatiel
Copy link
Author

Is it the reasoning that - users prefer to use K8s NetworkPolicy (which they are more familiar with) over AntreaNetworkPolicy, but they are willing to learn the new annotation?

Afaik , the annotation is created by the cluster admins. The KNP by the devs.

@jianjuns
Copy link
Contributor

jianjuns commented May 5, 2022

Was thinking about that too, which makes more sense to me. + @yfauser to see if he has an opinion.

@jdoylei
Copy link

jdoylei commented May 5, 2022

Afaik , the annotation is created by the cluster admins. The KNP by the devs.

That's exactly what I was thinking, too. It could allow a separation of duties like with the Namespace labels used by Pod Security Admission (pod-security.kubernetes.io/enforce etc.). The KNPs and other application resources deployed inside the Namespace can stay CNI-agnostic and reflect app architecture information the devs have. The Antrea annotation on the Namespace can be added by an admin with more permissions and more information about the cluster implementation and the CNI plugin involved.

@tnqn
Copy link
Member

tnqn commented May 6, 2022

@jsalatiel @jdoylei thanks for the explanation. Now I understand the requirement is to log unintended access. The requirement and the proposal that uses a namespace annotation to control the behavior make sense to me.

If everyone is on the same page, @qiyueyao could you evaluate the change and see which release we can add the support?

@jianjuns jianjuns added this to the Antrea v1.8 release milestone May 26, 2022
@qiyueyao
Copy link
Contributor

qiyueyao commented May 28, 2022

@jsalatiel @jdoylei @jianjuns @tnqn @Dyanngg thanks for the proposal and discussion. Had a question regarding admission for the two methods

  1. namespace annotation -- Antrea would not have enforcement on that only cluster admins create the annotation, admission depends on specific clusters, Antrea only retrieves this information.
  2. Antrea CRD -- Antrea could be responsible and verify admission for this CRD

If this namespace annotation still sounds good, we can proceed further. Thanks.

Doc for tracking this issue change.

@jianjuns
Copy link
Contributor

I think Namespace annotation should work.

@jsalatiel
Copy link
Author

annotation is fine

@jdoylei
Copy link

jdoylei commented Jun 2, 2022

@qiyueyao - The namespace annotation sounds fine to me, even without Antrea itself enforcing who can or can't create the annotation.

I think organizations deploying a cluster with Antrea could still choose one or both of these options outside of Antrea:

  • Use K8s RBAC to restrict who can create and modify Namespaces, and those users also have permission to set this Antrea-specific annotation on Namespaces.
  • Use an admission webhook like OPA Gatekeeper's to apply a finer-grained policy about annotations, e.g. whether this particular annotation is required or forbidden according to other rules.

Thanks!

@tnqn
Copy link
Member

tnqn commented Aug 9, 2022

Closed by #4047, audit logging support for K8s NetworkPolicy will be available in v1.8.0.

@tnqn tnqn closed this as completed Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

7 participants