Skip to content

Latest commit

 

History

History
250 lines (218 loc) · 11.8 KB

development.md

File metadata and controls

250 lines (218 loc) · 11.8 KB

How it works

sudo ./bin/netkat
Usage: nk [options] [hostname] [port]

  -debug
        Debug
  -interface string
        Specify interface to use. Default interface with the route to the specified [hostname]
  -listen
        Bind and listen for incoming connections
  -source-port int
        Specify source port to use
  -udp
        Use UDP instead of default TCP

netkat works as the original netcat, but using raw sockets, so it is not affected by the netfilter hooks.

A RAW socket receives a copy of all packets, however, we don't need all of them, just the ones used by the netkat connection.

On the socket we filter the packets that we don't WANT On ingress we filter the packets that we WANT, so the host doesn't close our connection

Netkat can obtain the tuple of the connection in advance, so it can filter using BFP and the socket SO_ATTACH_FILTER option, and the traffic control eBPF filtering capability.

The packets received on the socket, are raw packets (with ethernet headers), but since they are bypassing the kernel TCP/IP stack, something need to reassemble the packets and obtain a data stream. This is achieved using an userspace TCP/IP stack https://pkg.go.dev/gvisor.dev/gvisor/pkg/tcpip/stack

At high level, the code does:

  • obtain the connection details: source and destination IP and port, and host interface
  • parametrize the eBPF code with the connection details, bpf2go generate the eBPF code
  • inject the BPF code to the raw socket and to the ingress interface

As result, we have a new version of netcat that bypass iptables (needs CAP_NET_RAW) It has a bonus, with the -d flag it has a sniffer XD

sudo ./netkat -d -l 127.0.0.1 80
2021/07/16 01:58:08 Creating raw socket
2021/07/16 01:58:08 Adding ebpf ingress filter on interface lo
2021/07/16 01:58:08 filter {LinkIndex: 1, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:58:08 Creating user TCP/IP stack
2021/07/16 01:58:08 Listening on 127.0.0.1:80
I0716 01:58:31.206864   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ee flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715804214 TSEcr:0 SACKPermitted:true}
I0716 01:58:31.206937   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ee flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715804214 TSEcr:0 SACKPermitted:true}
I0716 01:58:32.239938   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ef flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715805247 TSEcr:0 SACKPermitted:true}
I0716 01:58:32.240044   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5ef flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715805247 TSEcr:0 SACKPermitted:true}
I0716 01:58:34.287962   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5f0 flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715807295 TSEcr:0 SACKPermitted:true}
I0716 01:58:34.288096   22926 sniffer.go:418] recv tcp 127.0.0.1:44644 -> 127.0.0.1:80 len:0 id:a5f0 flags:  S     seqnum: 1321663292 ack: 0 win: 65495 xsum:0xfe30 options: {MSS:65495 WS:7 TS:true TSVal:2715807295 TSEcr:0 SACKPermitted:true}
^C2021/07/16 01:58:35 Exiting: received signal
2021/07/16 01:58:35 Done

Demo: Kubernetes LoadBalancer Services and ExternalTrafficPolicy

  1. Create a multinode cluster with KIND
cat <<EOF | kind create cluster --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
EOF
  1. Run a fake loadbalancer controller
go get -v github.com/aojea/networking-controllers
cd $GOPATH/github.com/aojea/networking-controllers
cd cmd/loadbalancer
go build .
kind get kubeconfig > kind.conf
./loadbalancer --kubeconfig kind.conf --iprange "10.111.111.0/24"
I0716 02:58:43.535494   15454 controller.go:80] Starting controller fake-loadbalancer-controller
I0716 02:58:43.535567   15454 controller.go:84] Waiting for informer caches to sync
I0716 02:58:43.535589   15454 shared_informer.go:240] Waiting for caches to sync for fake-loadbalancer-controller
I0716 02:58:43.636251   15454 shared_informer.go:247] Caches are synced for fake-loadbalancer-controller 
I0716 02:58:43.636267   15454 controller.go:90] Starting workers
I0716 02:58:43.636285   15454 controller.go:149] Processing sync for service 
  1. Create a deployment with one pod and expose it with a Loadbalancer
kubectl apply -f https://gist.githubusercontent.com/aojea/369ad6a5d4cbb6b0fbcdd3dd909d9887/raw/0ecd0564c1db4c7103de07accce99da1c7bf91c3/loadbalancer.yaml
  1. Obtain the LoadBalancer assigned IP
kubectl get service lb-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
10.111.111.129

The loadbalancer has externalTrafficPolicy: Local so it should only be able to work if the traffic is directed to the nodes that contains pods belonging to that Service.

kubectl get pods -l app=MyApp -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
test-deployment-6f45696bd5-xlp96   1/1     Running   0          10m   10.244.2.2   kind-worker2   <none>           <none>
  1. Emulate the external loadbalancer installing a route in the host to the loadbalancer IP through the node with the pod

Get the nodes IPs

kubectl get nodes -o wide
NAME                 STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE       KERNEL-VERSION            CONTAINER-RUNTIME
kind-control-plane   Ready    control-plane,master   34m   v1.21.2   172.18.0.4    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2
kind-worker          Ready    <none>                 33m   v1.21.2   172.18.0.3    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2
kind-worker2         Ready    <none>                 33m   v1.21.2   172.18.0.2    <none>        Ubuntu 21.04   4.18.0-301.1.el8.x86_64   containerd://1.5.2

example: pod is in kind-worker2 with ip 172.18.0.2 and loadbalancer IP 10.111.111.129

sudo ip route add 10.111.111.129 via 172.18.0.2
  1. Check it preserves the source ip ...
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc 10.111.111.129 80
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:30:35 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

172.18.0.1:34028
  1. and that it doesn't work if we target a node without pods backing that service
sudo ip route del 10.111.111.129 via 172.18.0.2
sudo ip route add 10.111.111.129 via 172.18.0.3

echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out.
  1. However, if we use externalTrafficPolicy: Cluster it works ... but the source IP is not preserved
kubectl patch service lb-service -p '{"spec":{"externalTrafficPolicy":"Cluster"}}'
service/lb-service patched
$ echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.111.111.129:80.
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:34:22 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

172.18.0.3:14569
Ncat: 36 bytes sent, 133 bytes received in 0.01 seconds.
  1. Set externalTrafficPolicy: Local again
kubectl patch service lb-service -p '{"spec":{"externalTrafficPolicy":"Local"}}'
service/lb-service patched
$ echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out.
  1. Test the service from within node without the pod
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | nc -v -w 3 10.111.111.129 80
Connection to 10.111.111.129 80 port [tcp/http] succeeded!
HTTP/1.1 200 OK
Date: Thu, 15 Jul 2021 23:36:50 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8

We think it shoudn't work because we are in a node without pods, however, since the traffic comes from a node within the cluster the traffic is considered internal and externalTrafficPolicy doesn't apply. The reason is that kube-proxy installs some iptables rules that capture the traffic from within the node.

This causes a problem to test LoadBalancers, because it requires traffic to be sent from an external node, something that is not easy: Kubernetes cluster use to be isolated, expose apiserver endpoint e2e should not assume direct connectivity from the e2e binary

Bypassing iptables with netkat

  1. Run netkat in a pod (using hostnetwork) we are going to simulate an external connecting from inside a node
 kubectl apply -f https://gist.githubusercontent.com/aojea/952c82a58da625fbd9b8aca35f0e63f1/raw/0c5c625e037eb6431947c93152a3182c47004aa3/netkat.yaml
pod/netkat created
  1. Check where is running
$ kubectl get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
netkat                             1/1     Running   0          9s    172.18.0.3   kind-worker    <none>           <none>
test-deployment-6f45696bd5-xlp96   1/1     Running   0          63m   10.244.2.2   kind-worker2   <none>           <none>
  1. Login to the container
kubectl exec -it netkat ash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl kubectl exec [POD] -- [COMMAND] instead.
/ # ip route add 10.111.111.15 via 172.18.0.2
echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | ./netkat 10.111.111.15 80
2021/07/16 01:06:54 routes {Ifindex: 18 Dst: 10.111.111.15/32 Src: 172.18.0.3 Gw: 172.18.0.2 Flags: [] Table: 254}
2021/07/16 01:06:54 Creating raw socket
2021/07/16 01:06:54 Adding ebpf ingress filter on interface eth0
2021/07/16 01:06:54 filter {LinkIndex: 18, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:06:54 Creating user TCP/IP stack
2021/07/16 01:06:54 Dialing ...
2021/07/16 01:06:54 Connection established
2021/07/16 01:06:54 Connection error: <nil>
HTTP/1.1 200 OK
Date: Fri, 16 Jul 2021 01:06:54 GMT
Content-Length: 16
Content-Type: text/plain; charset=utf-8
172.18.0.3:26913

we can see it preserves the source ip

  1. Check that if we target a different node it doesn't work as expected because the service has externalTrafficPolicy: local # and it considers the connection as external
/ # ip route del 10.111.111.15 via 172.18.0.3
/ # ip route add 10.111.111.15 via 172.18.0.4
/ # echo -e "GET /clientip HTTP/1.1\nhost:myhost\n" | ./netkat 10.111.111.15 80
2021/07/16 01:11:38 routes {Ifindex: 18 Dst: 10.111.111.15/32 Src: 172.18.0.2 Gw: 172.18.0.4 Flags: [] Table: 254}
2021/07/16 01:11:38 Creating raw socket
2021/07/16 01:11:38 Adding ebpf ingress filter on interface eth0
2021/07/16 01:11:38 filter {LinkIndex: 18, Handle: 0:1, Parent: ffff:fff2, Priority: 0, Protocol: 3}
2021/07/16 01:11:38 Creating user TCP/IP stack
2021/07/16 01:11:38 Dialing ...
2021/07/16 01:11:43 Dialing error: context deadline exceeded
2021/07/16 01:11:43 Can't connect to server: context deadline exceeded

References:

  1. https://en.wikipedia.org/wiki/Netcat
  2. https://github.com/aojea/netkat
  3. https://man7.org/linux/man-pages/man7/raw.7.html
  4. https://man7.org/linux/man-pages/man7/packet.7.html
  5. https://man7.org/linux/man-pages/man8/tc-bpf.8.html
  6. https://pkg.go.dev/github.com/cilium/ebpf/cmd/bpf2go
  7. https://docs.cilium.io/en/stable/bpf/
  8. https://developers.redhat.com/blog/2021/04/01/get-started-with-xdp
  9. https://blog.cloudflare.com/tag/ebpf/
  10. https://www.asykim.com/blog/deep-dive-into-kubernetes-external-traffic-policies