Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Not using hostNetwork=true #1392

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Shreyas220
Copy link
Member

Purpose of PR?:
Removing the dependency on hostNetwork=true

Fixes #

Does this PR introduce a breaking change?
Yes

If the changes in this PR are manually verified, list down the scenarios covered::

  • Tested AKS node with bpf Lsm enabled
2023-08-29 07:52:26.861054	INFO	OS Image: Ubuntu 22.04.2 LTS
2023-08-29 07:52:26.861088	INFO	Kernel Version: 5.15.0-1042-azure
2023-08-29 07:52:26.861118	INFO	Kubelet Version: v1.26.6
2023-08-29 07:52:26.861141	INFO	Container Runtime: containerd://1.7.1+azure-1
2023-08-29 07:52:26.861461	INFO	Initialized KubeArmor Logger
2023-08-29 07:52:26.862820	INFO	Detected mounted BPF filesystem at /sys/fs/bpf
2023-08-29 07:52:26.863968	INFO	Initializing eBPF system monitor
2023-08-29 07:52:26.880875	INFO	Successfully added visibility map with key={PidNS:0 MntNS:0} to the kernel
2023-08-29 07:52:26.880899	INFO	eBPF system monitor object file path: /opt/kubearmor/BPF/system_monitor.container.bpf.o
2023-08-29 07:52:27.163314	INFO	Initialized the eBPF system monitor
2023-08-29 07:52:28.874960	INFO	Initialized KubeArmor Monitor
2023-08-29 07:52:28.875007	INFO	Started to monitor system events
2023-08-29 07:52:28.876201	INFO	Supported LSMs: lockdown,capability,bpf
2023-08-29 07:52:32.145612	INFO	Initialized BPF-LSM Enforcer
2023-08-29 07:52:32.145661	INFO	Initialized KubeArmor Enforcer
2023-08-29 07:52:32.145668	INFO	Started to protect containers
2023-08-29 07:52:32.145674	INFO	CRI socket not set. Trying to detect.
2023-08-29 07:52:32.146174	INFO	Initialized Containerd Handler
2023-08-29 07:52:32.146199	INFO	Started to monitor Containerd events
  • tested on ubuntu pod
  • applied a policy to block curl from ubuntu
apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
  name: ksp-ubuntu-1-allow-net-tcp-from-source
  namespace: multiubuntu
spec:
  severity: 8
  selector:
    matchLabels:
      container: ubuntu-1
  network:
    matchProtocols:
    - protocol: tcp
      fromSource:
      - path: /usr/bin/curl
  action: Block

The policy was enforced and was able to receive Block alerts and network logs

Additional information for reviewer? :

Alert

== Alert / 2023-08-29 07:52:51.570983 ==
ClusterName: default
HostName: kubearmor-g2rgb
NamespaceName: multiubuntu
PodName: ubuntu-1-deployment-565b9f669d-gmnwl
Labels: container=ubuntu-1,group=group-1
ContainerName: ubuntu-1-container
ContainerID: a2cedd7c6278110ed6adaaba94603e3bb6bb35e164342a71a75e2622c45293d1
ContainerImage: docker.io/kubearmor/ubuntu-w-utils:0.1@sha256:b4693b003ed1fbf7f5ef2c8b9b3f96fd853c30e1b39549cf98bd772fbd99e260
Type: MatchedPolicy
PolicyName: ksp-ubuntu-1-allow-net-tcp-from-source
Severity: 8
Source: /usr/bin/curl www.google.com
Resource: domain=AF_UNIX type=SOCK_STREAM|SOCK_NONBLOCK|SOCK_CLOEXEC protocol=0
Operation: Network
Action: Block
Data: syscall=SYS_SOCKET
Enforcer: BPFLSM
Result: Permission denied
HostPID: 29159
HostPPID: 29127
Owner: map[Name:ubuntu-1-deployment Namespace:multiubuntu Ref:Deployment]
PID: 123
PPID: 113
ParentProcessName: /bin/bash
ProcessName: /usr/bin/curl

log

== Log / 2023-08-29 07:57:56.498821 ==
ClusterName: default
HostName: kubearmor-g2rgb
NamespaceName: multiubuntu
PodName: ubuntu-1-deployment-565b9f669d-gmnwl
Labels: container=ubuntu-1,group=group-1
ContainerName: ubuntu-1-container
ContainerID: a2cedd7c6278110ed6adaaba94603e3bb6bb35e164342a71a75e2622c45293d1
ContainerImage: docker.io/kubearmor/ubuntu-w-utils:0.1@sha256:b4693b003ed1fbf7f5ef2c8b9b3f96fd853c30e1b39549cf98bd772fbd99e260
Type: ContainerLog
Source: /usr/bin/curl www.google.com
Resource: sa_family=AF_INET sin_port=80 sin_addr=172.253.63.105
Operation: Network
Data: syscall=SYS_CONNECT fd=3
Result: Passed
HostPID: 36382
HostPPID: 36236
Owner: map[Name:ubuntu-1-deployment Namespace:multiubuntu Ref:Deployment]
PID: 135
PPID: 125
ParentProcessName: /bin/bash
ProcessName: /usr/bin/curl

on pod

root@ubuntu-1-deployment-565b9f669d-gmnwl:/# curl www.google.com
curl: (7) Couldn't connect to server

Checklist:

  • Bug fix. Fixes #
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Commit has unit tests
  • Commit has integration tests

Copy link
Member

@achrefbensaad achrefbensaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update deploygen and all test files with this change .

@Shreyas220 Shreyas220 force-pushed the remove_hostnet branch 3 times, most recently from 42ce885 to 0db961a Compare August 31, 2023 05:26
@daemon1024
Copy link
Member

include deployments/helm in ci trigger

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

fix

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

added

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

asdsa

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

i[dae

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>
@Shreyas220
Copy link
Member Author

CI failing was because of the condition for grpc variable in util localhost:32676 was used

Now as hostNetwork is false it wont work hence now by turning grpc = "" we port forward relay server to get logs ( which should have been the case ( at least i assumed it was ) )

now we are port forwarding we get an error that socat was missing
hence now added that in gink test job and controller test jobs

deployments/generic/kubearmor.yaml Outdated Show resolved Hide resolved
Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

test

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>

asds

Signed-off-by: Shreyas220 <shreyas.ny@gmail.com>
@nyrahul
Copy link
Contributor

nyrahul commented Sep 14, 2023

Hey @kranurag7 , requesting your review here, since you had previously worked on removing hostNetwork requirement. Thanks

Copy link
Member

@kranurag7 kranurag7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only modifying the spec.HostNetwork field, and I think this should break on some providers. Last time, I check only, setting the value to false will not work.
Please verify on all the providers and let me know if I'm missing the PR in which this was implemented. I ran this query and was not able to find any PR corresponding to the implementation.

@daemon1024
Copy link
Member

let me know if I'm missing the PR in which this was implemented. I ran this query and was not able to find any PR corresponding to the implementation.

@kranurag7 this has been part of KubeArmor since it's inception so no PRs for this :D

@daemon1024
Copy link
Member

@Shreyas220 afaik this has been verified on AKS already, can we verify it on GCP and AWS too?

@Shreyas220
Copy link
Member Author

Shreyas220 commented Sep 14, 2023

This PR is only modifying the spec.HostNetwork field, and I think this should break on some providers. Last time, I check only, setting the value to false will not work. Please verify on all the providers and let me know if I'm missing the PR in which this was implemented. I ran this query and was not able to find any PR corresponding to the implementation.

@kranurag7 afaik hostnetwork is true when we are trying to access something host node's network namespace, I looked thoroughly and in our case we are not so we should not be using hostnetwork=true. (if you see anything do let me know )

@Shreyas220 afaik this has been verified on AKS already, can we verify it on GCP and AWS too?

@daemon1024 yes i tested on local vm (k3s) and AKS and found it to working properly, as i mentioned above we are not trying to access anything from host network namespace so it should work in all cases

but i'll test it on AKS and GKE to be sure

@kranurag7
Copy link
Member

@Shreyas220 This was the area that I needed to handle during my mentorship
If only setting the hostNetwork to false was required, then We would have handled that during the mentorship period itself. (with approval from mentors)

I looked thoroughly and in our case we are not so we should not be using hostnetwork=true

We were aware of this during the mentorship time period as well that we don't require hostNetwork anywhere in the codebase, but since something was not working, we decided to keep it set to true.

I was mostly testing on GKE and k3s and something at least was breaking and it not then CI was not happy. I tested this very thoroughly with all permutations and combinations. I don't recall what was breaking at that time.

We were not able to do this directly, and this was the reason I was keeping an eye on this issue. I brought the same with Barun and Rahul during community call so that we can get rid of host parameters set to true with kubearmor deployment.

Having said that, we have evolved a lot in terms of CI flakiness and codebase itself. So, please test this out on different providers and if it works, then we are good to go with setting hostNetwork to false. It's always good to avoid hostNetwork so really looking forward to getting this merged. 💯

@Shreyas220
Copy link
Member Author

@kranurag7 yup the CI was breaking for me as well, not sure if this was same reason when you were trying

The reason the CI was breaking for me was that we were trying to connect to kubearmor using hostIP as now since IP is different

changes were required in both relay-server (link) to get PodIP instead of hostIP and tests tests/util/karmorlog.go (done in this PR) to connect to relay-server instread of kubearmor directly

I'll test on GKE and AWS to see if it works

@daemon1024
Copy link
Member

To be tested on KinD (Issue potentially only on KinD with AppArmor)

@daemon1024
Copy link
Member

Create a seperate PR for removing deployment files.

@Shreyas220
Copy link
Member Author

@daemon1024 has verified that the functionality works as intended on AWS EKS (Amazon Linux and Bottle Rocket.)

But, we have an issue when using Kind,

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x18b946c]

goroutine 1 [running]:
github.com/cilium/ebpf.(*Map).Update(0x1c7c7a0?, {0x1c7c7a0?, 0xc0001a6528?}, {0x1e4b900?, 0xc0000c22d0?}, 0x1c78060?)
	/go/pkg/mod/github.com/cilium/ebpf@v0.11.0/map.go:700 +0x2c
github.com/cilium/ebpf.(*Map).Put(...)
	/go/pkg/mod/github.com/cilium/ebpf@v0.11.0/map.go:695
github.com/kubearmor/KubeArmor/KubeArmor/monitor.(*SystemMonitor).UpdateNsKeyMap(0xc00036cc00, {0x1e883cd?, 0x18b6525?}, {0x0, 0x0}, {0x10?, 0x0?, 0x0?, 0x0?})
	/usr/src/KubeArmor/KubeArmor/monitor/systemMonitor.go:333 +0x548
github.com/kubearmor/KubeArmor/KubeArmor/monitor.(*SystemMonitor).UpdateHostVisibility(0xc00036cc00)
	/usr/src/KubeArmor/KubeArmor/monitor/systemMonitor.go:398 +0x18c
github.com/kubearmor/KubeArmor/KubeArmor/monitor.(*SystemMonitor).initBPFMaps(0xc00036cc00)
	/usr/src/KubeArmor/KubeArmor/monitor/systemMonitor.go:267 +0xff
github.com/kubearmor/KubeArmor/KubeArmor/monitor.(*SystemMonitor).InitBPF(0xc00036cc00)
	/usr/src/KubeArmor/KubeArmor/monitor/systemMonitor.go:438 +0x297
github.com/kubearmor/KubeArmor/KubeArmor/core.(*KubeArmorDaemon).InitSystemMonitor(0xc0000c6540)
	/usr/src/KubeArmor/KubeArmor/core/kubeArmor.go:231 +0x8e
github.com/kubearmor/KubeArmor/KubeArmor/core.KubeArmor()
	/usr/src/KubeArmor/KubeArmor/core/kubeArmor.go:435 +0x969
main.main()
	/usr/src/KubeArmor/KubeArmor/main.go:54 +0x1a8

the InitBPF function returns an empty map which causes this error in UpdateNsKeyMap function which is because when hostNetwork is set to false, KubeArmor is unable to mount the /sys/fs/bpf filepath .

Usually, such behavior is unexpected. there might be some difference in capabilities or the mounted path , or we need to look into the security policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants