ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

jsalatiel · 2022-05-13T17:41:50Z

Describe the bug
I have a 5 node cluster with a deployment with 2 replicas. The deployment uses the ServiceExternalIP feature of antrea.
I can see that the service got an IP from the same node network

whoami-headless   LoadBalancer   10.239.37.140   10.1.2.220   80:31591/TCP   5s

But if I try to curl 10.1.2.220 it will work for some remote endpoints, but not for others.
It works just fine if I set the externalTrafficPolicy=Cluster, but that way I will lose the client IP

To Reproduce
Enable Service IP to any service with a single replica deployed and use the yamls at the end of this bug report.

Expected
I suppose the loadblancer IP would be "acquired" by one of the nodes running the pod and it should work.

Actual behavior
Most of times I get connection refused and in other just works. From the masters it works using the clusterIP but it fails using the LoadBalancer IP.

# curl 10.239.37.140 -I
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 17:35:05 GMT
Content-Length: 203
Content-Type: text/plain; charset=utf-8

# curl  10.1.2.220 -I
curl: (7) Failed to connect to 10.199.0.220 port 80: Connection refused

Versions:

Antrea version (Docker image tag). 1.6.1
Kubernetes version (use kubectl version). 1.22.8
Container runtime: cri-o
Linux kernel version on the Kubernetes Nodes (uname -r). 4.18.0-348.23.1.el8_5.x86_64

apiVersion: crd.antrea.io/v1alpha2
kind: ExternalIPPool
metadata:
  name: service-external-ip-pool
spec:
  ipRanges:
  - start: 10.1.1.220 
    end: 10.1.2.250 
  nodeSelector: {}
---
apiVersion: v1
kind: Service
metadata:
  name: whoami-headless
  annotations:
    service.antrea.io/external-ip-pool: "service-external-ip-pool"
spec:
  type: LoadBalancer 
  externalTrafficPolicy: Local
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  selector:
    app: whoami

Which node does get the LoadBalancer IP assigned to? I can see the LoadBalancer IP assigned to the kube-ipvs0 interface on all servers, but I suppose only one is really using, otherwise it would be an IP conflict situation, wouldn't it ?
I will check metallb to see If I get the same behaviour or not.

The text was updated successfully, but these errors were encountered:

antoninbas · 2022-05-13T18:20:00Z

I suppose the loadblancer IP would be "acquired" by one of the nodes running the pod and it should work.

This is the correct expectation and this is how the implementation should work even today.

Since you are using Antrea v1.6.1, you can exec into antrea-agent Pods and run antctl get serviceexternalip. It will display information about how external IPs are distributed across Nodes. The command only connects to the Agent API (which is why you need to exec into an antrea-agent Pod to run it). It may be worth running it in all the antrea-agent Pods in your cluster to ensure that you get consistent results. After that, you can check if the Node responsible for the IP is indeed running the Service Pod.

An alternative is to look at the Agent logs for entries like this one:

Select Node for IP...

BTW, did you have the same issue with Antrea v1.6.0, or is it new to Antrea v1.6.1?

jsalatiel · 2022-05-13T19:34:54Z

It appears to be acquired for worker5 which seems to be OK.

# antctl get serviceexternalip
NAMESPACE NAME            EXTERNAL-IP-POOL         EXTERNAL-IP  ASSIGNED-NODE
default   whoami-headless service-external-ip-pool 10.1.2.220 worker5

Pods are running on worker3 and worker 5

# kubectl  get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
whoami-app-68bf75b985-5b776   1/1     Running   0          2m21s   10.239.68.4   worker3   <none>           <none>
whoami-app-68bf75b985-rbxjr   1/1     Running   0          2m21s   10.239.69.3   worker5   <none>           <none>

Service is there

# kubectl get svc
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
kubernetes        ClusterIP      10.239.0.1      <none>         443/TCP        3h21m
whoami-headless   LoadBalancer   10.239.37.140   10.1.2.220    80:31591/TCP   110m

From master1:

# curl -I  10.1.2.220 ( lb ip )
curl: (7) Failed to connect to 10.1.2.220 port 80: Connection refused

# curl -I 10.239.68.4 (replica 1 cluster ip )
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:27:00 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

# curl -I 10.239.69.3 (replica 2 cluster ip)
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:27:07 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

From node4 ( not running the pod ):

# curl -I  10.1.2.220
curl: (7) Failed to connect to 10.1.2.220 port 80: Connection refused

From node3 ( running the pod )

# curl -I  10.1.2.220
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:31:28 GMT
Content-Length: 213
Content-Type: text/plain; charset=utf-8

From node3 ( running the pod )

# curl -I  10.1.2.220
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:31:50 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

jsalatiel · 2022-05-13T19:41:06Z

I have never tested on 1.6.
I am starting to test this feature now, so I can replace metallb.

antoninbas · 2022-05-13T19:50:43Z

I assume you have the following default configuration, but let us know if this is not the case:
AntreaProxy enabled (feature gate set to true)
ProxyAll disabled (antreaProxy.proxyAll set to false)
kube-proxy running normally

jsalatiel · 2022-05-13T20:02:23Z

All default.
The only change i have done on the antrea.yaml file was changing ServiceExternalIP to true in two lines.

antoninbas · 2022-05-13T21:10:43Z

@jsalatiel I gave it more thought and I think this is the expected behavior and not specific to the Antrea implementation.
The traffic "path" is as follows:

you run the curl command from a Node (host) - curl <LB IP>
kube-proxy is handling the traffic
because Antrea has updated the Service with the correct LB external IP, kube-proxy can load-balance the traffic to that IP (as it would with any other LoadBalancer). There are 2 cases:

if there are local Endpoints, the request is forwarded to a local Endpoint
if there are no local Endpoints, the request will fail
This explains why you observe some successes (when curl is run on a Node with a Service Pod) and some failures (when curl is run on any other Node).

The "right" way to test this is to try to access the LB IP (10.1.2.220) from a machine which is NOT a K8s Node in your cluster.

When you access the Service from a Node (not common in real life) or a Pod, you should use the Cluster IP (or the Service DNS name when accessing from a Pod with Cluster DNS).

jianjuns · 2022-05-14T20:28:33Z

If I understood the problem correctly, the ask is to select a Node with a backend Pod to assign the Service's LoadBalancer IP, when externalTrafficPolicy is Local. I think we do not implement this behavior today, but it should be a valid feature. I also saw MetalLB does respect externalTrafficPolicy.

tnqn · 2022-05-16T06:51:27Z

@jianjuns Antrea's implementation does respect externalTrafficPolicy:

antrea/pkg/agent/controller/serviceexternalip/controller.go

Lines 363 to 371 in 2526b1f

    
           if service.Spec.ExternalTrafficPolicy == corev1.ServiceExternalTrafficPolicyTypeLocal { 
        
           	nodes, err := c.nodesHasHealthyServiceEndpoint(service) 
        
           	if err != nil { 
        
           		return err 
        
           	} 
        
           	filters = append(filters, func(s string) bool { 
        
           		return nodes.Has(s) 
        
           	}) 
        
           }

I think @antoninbas is correct. The access is supposed to fail if the traffic towards a service with Local externalTrafficPolicy reaches a Node that doesn't have any backends of the service. I haven't tried but I suppose MetalLB is same.

jsalatiel · 2022-05-16T12:01:33Z

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening.
If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?

jsalatiel · 2022-05-16T12:42:25Z

More debugging here:
I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again.
Thats really odd.

how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5
Is this somehow related to my ExternalIPPool config that has nodeSelectore: {} ?

apiVersion: crd.antrea.io/v1alpha2
kind: ExternalIPPool
metadata:
  name: service-external-ip-pool
spec:
  ipRanges:
  - start: 10.1.2.220 
    end: 10.1.2.250 
  nodeSelector: {}

I can see inet 10.1.2.220/32 scope global kube-ipvs0 assigned in all nodes. Is that expected ?

xliuxu · 2022-05-16T13:15:22Z

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening. If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

ping will not work for the external IPs of Services managed by Antrea. The IP is virtually assigned on Nodes.

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?
AFAIK no extra requirements are required.

More debugging here: I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again. Thats really odd.

how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5

antctl get serviceexternalip should always return the same Node selected by Antrea on all antrea-agents. Could you help to confirm the result of tcpcump when running curl from the same subnet?

jsalatiel · 2022-05-16T13:38:05Z

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening. If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

ping will not work for the external IPs of Services managed by Antrea. The IP is virtually assigned on Nodes.

Ping does work ( from servers outside the cluster at least )! This is a new subnet so I am absolutely sure there is no IP conflict if that's what you mean. As soon I remove the service ( which releases the loadBalancerIP ) ping stops.

PING 10.1.2.220 (10.1.2.220) 56(84) bytes of data.
64 bytes from 10.1.2.220: icmp_seq=1 ttl=63 time=1.11 ms
64 bytes from 10.1.2.220: icmp_seq=2 ttl=63 time=20.1 ms
64 bytes from 10.1.2.220: icmp_seq=3 ttl=63 time=22.3 ms
64 bytes from 10.1.2.220: icmp_seq=4 ttl=63 time=22.3 ms
64 bytes from 10.1.2.220: icmp_seq=5 ttl=63 time=26.6 ms
64 bytes from 10.1.2.220: icmp_seq=6 ttl=63 time=30.2 ms
64 bytes from 10.1.2.220: icmp_seq=7 ttl=63 time=27.4 ms
64 bytes from 10.1.2.220: icmp_seq=8 ttl=63 time=18.9 ms
64 bytes from 10.1.2.220: icmp_seq=9 ttl=63 time=19.6 ms
From 10.1.2.23 icmp_seq=10 Destination Host Unreachable
From 10.1.2.23 icmp_seq=11 Destination Host Unreachable
From 10.1.2.23 icmp_seq=12 Destination Host Unreachable
From 10.1.2.23 icmp_seq=13 Destination Host Unreachable
From 10.1.2.23 icmp_seq=14 Destination Host Unreachable
From 10.1.2.23 icmp_seq=15 Destination Host Unreachable
From 10.1.2.23 icmp_seq=16 Destination Host Unreachable
From 10.1.2.23 icmp_seq=17 Destination Host Unreachable
^C
--- 10.1.2.220 ping statistics ---
19 packets transmitted, 9 received, +8 errors, 52.6316% packet loss, time 18148ms
rtt min/avg/max/mdev = 1.108/20.945/30.169/7.908 ms, pipe 4

10.1.2.23 is the node5 node IP. Curious that when I remove the service from the cluster and ping stops, the real ip of the node is exposed. ( good to know )
If i keep the service and just shutdown the node5 I can see that the serviceIP is reassigned to node3 and pings continue just fine. Interesting that as soon as the node5 is back the externalIP failsback to it again. Is this expected ?

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?
AFAIK no extra requirements are required.

More debugging here: I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again. Thats really odd.

how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5

antctl get serviceexternalip should always return the same Node selected by Antrea on all antrea-agents. Could you help to confirm the result of tcpcump when running curl from the same subnet?

From the same subnet I can see the traffic going to the right node every time. ( node5)

jsalatiel · 2022-05-16T14:03:04Z

i tried a quick deploy of metalLB and got the same problem, so I decided to read metallb documentation and found the fix there.
The kube-proxy must have strictARP: true
I could not find that on Antrea's serviceexternalIP, so it would be nice put there.

jsalatiel · 2022-05-16T14:10:37Z

Found a nice comment here that may explain what was happening to me.

This issue just hit us as well, using metallb in layer 2 mode with a trafficpolicy of local. We were finding all of our kubernetes nodes responding to ARPs for metallb service ips, resulting in traffic getting routed to places that couldn't handle it. The ARPs weren't coming from the metallb speakers. Eventually we stumbled across tickets mentioning the strictARP mode.

jianjuns · 2022-05-16T15:10:51Z

Antrea's implementation does respect externalTrafficPolicy

I missed this part! Yes I agree it is expected that access a Service with externalTrafficPolicy Local from a Node without backend Pods should fail. I do not think MetalLB will have any difference, as it just relies on kube-proxy for Service LB.

jianjuns · 2022-05-16T15:26:41Z

i tried a quick deploy of metalLB and got the same problem, so I decided to read metallb documentation and found the fix there.
The kube-proxy must have strictARP: true

Nice finding! I was suspecting kube-proxy IPVS too when you mentioned kube-ipvs0. Also found an earlier K8s issue for this problem: kubernetes-sigs/kubespray#4788.

I could not find that on Antrea's serviceexternalIP, so it would be nice put there.

Yes. Would you create a PR to add it in docs/service-loadbalancer.md? Of course no problem if you prefer to let us handle it. Thanks again for reporting and root causing the issue!

jsalatiel · 2022-05-16T16:31:11Z

I do not know how to do PRs , sorry.

jianjuns · 2022-05-16T16:53:25Z

No problem. I can make the change.

In case you like to learn how to create a PR, check here: https://github.com/antrea-io/antrea/blob/main/CONTRIBUTING.md.

jsalatiel · 2022-05-16T16:55:36Z

Thanks. Closing this.

antoninbas · 2022-05-16T18:13:31Z

We need to document this, something came up in the past already with kube-proxy IPVS: #3370

Although it may be a pain to identify when exactly it is required. And @tnqn pointed out in the past (#3370 (comment)) that strictARP mode may interfere with the Egress feature if we are not cautious.

antoninbas · 2022-05-16T18:14:05Z

I'll reopen this issue since there is a documentation change required. And I will assign @jianjuns since he volunteered :)

jianjuns · 2022-05-16T18:25:41Z

Although it may be a pain to identify when exactly it is required. And @tnqn pointed out in the past (#3370 (comment)) that strictARP mode may interfere with the Egress feature if we are not cautious.

That is actually a TODO (@xliuxu) with Egress. We can add it too in the Service LB document.

jsalatiel · 2022-05-16T19:27:24Z

What must be done to not break egress?

jianjuns · 2022-05-16T20:20:12Z

I think one workaround can be manually setting arp_ignore to 0 for antrea-gw0, e.g.:

echo 0 > /proc/sys/net/ipv4/conf/antrea-gw0/arp_ignore

@xliuxu : do you have an idea?

jsalatiel · 2022-05-17T10:59:42Z

Can that be set automatically by the node agent if it detects strictArp=true?

jianjuns · 2022-05-17T14:44:08Z

I think our plan is to remove the requirement for arp_ignore for Egress. If you do have use cases to have both Egress and Service LB working with kube-proxy IPVS, we can prioritize the fix.

Another solution is to use AntreaProxy to replace kube-proxy: https://github.com/antrea-io/antrea/blob/main/docs/antrea-proxy.md#antreaproxy-with-proxyall. Not sure if that is what you want though.

jsalatiel · 2022-05-17T14:53:28Z

Well, I do use Egress right now on production and I would start using ServiceExternalIP now, but apparently I can not use both at the sametime yet =)
No problem, I will wait until the removal of the requirement for arp_ignore for Egress.

Moving to antreaproxy is not on my plans for the current clusters.

jianjuns · 2022-05-17T15:13:29Z

Good to know you plan to use the feature in production! We can definitely prioritize the Egress fix. I created an issue to track that: #3804

jsalatiel · 2022-05-17T15:17:07Z

Any chance that we could get on 1.7 ?

jianjuns · 2022-05-17T16:18:44Z

1.7 might be hard, as we plan to freeze this week. But we can consider a patch after 1.7.

jsalatiel added the kind/bug Categorizes issue or PR as related to a bug. label May 13, 2022

antoninbas assigned xliuxu May 13, 2022

antoninbas added the area/proxy Issues or PRs related to proxy functions in Antrea label May 13, 2022

jsalatiel closed this as completed May 16, 2022

antoninbas reopened this May 16, 2022

antoninbas assigned jianjuns and unassigned xliuxu May 16, 2022

antoninbas added kind/documentation Categorizes issue or PR as related to a documentation. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 16, 2022

jianjuns mentioned this issue May 17, 2022

Make Egress work with kube-proxy IPVS strictARP mode #3804

Closed

jianjuns mentioned this issue May 22, 2022

Update Egress and Service LB doc with the kube-proxy IPVS issue #3816

Merged

jianjuns closed this as completed in #3816 May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jsalatiel commented May 13, 2022

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jianjuns commented May 14, 2022

tnqn commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022 •

edited

Loading

xliuxu commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

antoninbas commented May 16, 2022

antoninbas commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022 •

edited

Loading

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022 •

edited

Loading

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022 •

edited

Loading

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

Comments

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jsalatiel commented May 13, 2022

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jsalatiel commented May 13, 2022

antoninbas commented May 13, 2022

jianjuns commented May 14, 2022

tnqn commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022 • edited Loading

xliuxu commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

antoninbas commented May 16, 2022

antoninbas commented May 16, 2022

jianjuns commented May 16, 2022

jsalatiel commented May 16, 2022

jianjuns commented May 16, 2022 • edited Loading

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022 • edited Loading

jsalatiel commented May 17, 2022

jianjuns commented May 17, 2022 • edited Loading

jsalatiel commented May 16, 2022 •

edited

Loading

jianjuns commented May 16, 2022 •

edited

Loading

jianjuns commented May 17, 2022 •

edited

Loading

jianjuns commented May 17, 2022 •

edited

Loading