Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceLoadBalancer + externalTrafficPolicy: Local = Connection Refused most of time #3785

Closed
jsalatiel opened this issue May 13, 2022 · 30 comments · Fixed by #3816
Closed
Assignees
Labels
area/proxy Issues or PRs related to proxy functions in Antrea kind/documentation Categorizes issue or PR as related to a documentation.

Comments

@jsalatiel
Copy link

Describe the bug
I have a 5 node cluster with a deployment with 2 replicas. The deployment uses the ServiceExternalIP feature of antrea.
I can see that the service got an IP from the same node network

whoami-headless   LoadBalancer   10.239.37.140   10.1.2.220   80:31591/TCP   5s

But if I try to curl 10.1.2.220 it will work for some remote endpoints, but not for others.
It works just fine if I set the externalTrafficPolicy=Cluster, but that way I will lose the client IP

To Reproduce
Enable Service IP to any service with a single replica deployed and use the yamls at the end of this bug report.

Expected
I suppose the loadblancer IP would be "acquired" by one of the nodes running the pod and it should work.

Actual behavior
Most of times I get connection refused and in other just works. From the masters it works using the clusterIP but it fails using the LoadBalancer IP.

# curl 10.239.37.140 -I
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 17:35:05 GMT
Content-Length: 203
Content-Type: text/plain; charset=utf-8

# curl  10.1.2.220 -I
curl: (7) Failed to connect to 10.199.0.220 port 80: Connection refused

Versions:

  • Antrea version (Docker image tag). 1.6.1
  • Kubernetes version (use kubectl version). 1.22.8
  • Container runtime: cri-o
  • Linux kernel version on the Kubernetes Nodes (uname -r). 4.18.0-348.23.1.el8_5.x86_64
apiVersion: crd.antrea.io/v1alpha2
kind: ExternalIPPool
metadata:
  name: service-external-ip-pool
spec:
  ipRanges:
  - start: 10.1.1.220 
    end: 10.1.2.250 
  nodeSelector: {}
---
apiVersion: v1
kind: Service
metadata:
  name: whoami-headless
  annotations:
    service.antrea.io/external-ip-pool: "service-external-ip-pool"
spec:
  type: LoadBalancer 
  externalTrafficPolicy: Local
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  selector:
    app: whoami

Which node does get the LoadBalancer IP assigned to? I can see the LoadBalancer IP assigned to the kube-ipvs0 interface on all servers, but I suppose only one is really using, otherwise it would be an IP conflict situation, wouldn't it ?
I will check metallb to see If I get the same behaviour or not.

@jsalatiel jsalatiel added the kind/bug Categorizes issue or PR as related to a bug. label May 13, 2022
@antoninbas
Copy link
Contributor

I suppose the loadblancer IP would be "acquired" by one of the nodes running the pod and it should work.

This is the correct expectation and this is how the implementation should work even today.

Since you are using Antrea v1.6.1, you can exec into antrea-agent Pods and run antctl get serviceexternalip. It will display information about how external IPs are distributed across Nodes. The command only connects to the Agent API (which is why you need to exec into an antrea-agent Pod to run it). It may be worth running it in all the antrea-agent Pods in your cluster to ensure that you get consistent results. After that, you can check if the Node responsible for the IP is indeed running the Service Pod.

An alternative is to look at the Agent logs for entries like this one:

Select Node for IP...

BTW, did you have the same issue with Antrea v1.6.0, or is it new to Antrea v1.6.1?

@antoninbas antoninbas added the area/proxy Issues or PRs related to proxy functions in Antrea label May 13, 2022
@jsalatiel
Copy link
Author

It appears to be acquired for worker5 which seems to be OK.

# antctl get serviceexternalip
NAMESPACE NAME            EXTERNAL-IP-POOL         EXTERNAL-IP  ASSIGNED-NODE
default   whoami-headless service-external-ip-pool 10.1.2.220 worker5   

Pods are running on worker3 and worker 5

# kubectl  get pods -o wide
NAME                          READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
whoami-app-68bf75b985-5b776   1/1     Running   0          2m21s   10.239.68.4   worker3   <none>           <none>
whoami-app-68bf75b985-rbxjr   1/1     Running   0          2m21s   10.239.69.3   worker5   <none>           <none>

Service is there

# kubectl get svc
NAME              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
kubernetes        ClusterIP      10.239.0.1      <none>         443/TCP        3h21m
whoami-headless   LoadBalancer   10.239.37.140   10.1.2.220    80:31591/TCP   110m

From master1:

# curl -I  10.1.2.220 ( lb ip )
curl: (7) Failed to connect to 10.1.2.220 port 80: Connection refused

# curl -I 10.239.68.4 (replica 1 cluster ip )
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:27:00 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

# curl -I 10.239.69.3 (replica 2 cluster ip)
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:27:07 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

From node4 ( not running the pod ):

# curl -I  10.1.2.220
curl: (7) Failed to connect to 10.1.2.220 port 80: Connection refused

From node3 ( running the pod )

# curl -I  10.1.2.220
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:31:28 GMT
Content-Length: 213
Content-Type: text/plain; charset=utf-8

From node3 ( running the pod )

# curl -I  10.1.2.220
HTTP/1.1 200 OK
Date: Fri, 13 May 2022 19:31:50 GMT
Content-Length: 212
Content-Type: text/plain; charset=utf-8

@jsalatiel
Copy link
Author

I have never tested on 1.6.
I am starting to test this feature now, so I can replace metallb.

@antoninbas
Copy link
Contributor

I assume you have the following default configuration, but let us know if this is not the case:
AntreaProxy enabled (feature gate set to true)
ProxyAll disabled (antreaProxy.proxyAll set to false)
kube-proxy running normally

@jsalatiel
Copy link
Author

All default.
The only change i have done on the antrea.yaml file was changing ServiceExternalIP to true in two lines.

@antoninbas
Copy link
Contributor

@jsalatiel I gave it more thought and I think this is the expected behavior and not specific to the Antrea implementation.
The traffic "path" is as follows:

  1. you run the curl command from a Node (host) - curl <LB IP>
  2. kube-proxy is handling the traffic
  3. because Antrea has updated the Service with the correct LB external IP, kube-proxy can load-balance the traffic to that IP (as it would with any other LoadBalancer). There are 2 cases:
  • if there are local Endpoints, the request is forwarded to a local Endpoint
  • if there are no local Endpoints, the request will fail
    This explains why you observe some successes (when curl is run on a Node with a Service Pod) and some failures (when curl is run on any other Node).

The "right" way to test this is to try to access the LB IP (10.1.2.220) from a machine which is NOT a K8s Node in your cluster.

When you access the Service from a Node (not common in real life) or a Pod, you should use the Cluster IP (or the Service DNS name when accessing from a Pod with Cluster DNS).

@jianjuns
Copy link
Contributor

If I understood the problem correctly, the ask is to select a Node with a backend Pod to assign the Service's LoadBalancer IP, when externalTrafficPolicy is Local. I think we do not implement this behavior today, but it should be a valid feature. I also saw MetalLB does respect externalTrafficPolicy.

@tnqn
Copy link
Member

tnqn commented May 16, 2022

@jianjuns Antrea's implementation does respect externalTrafficPolicy:

if service.Spec.ExternalTrafficPolicy == corev1.ServiceExternalTrafficPolicyTypeLocal {
nodes, err := c.nodesHasHealthyServiceEndpoint(service)
if err != nil {
return err
}
filters = append(filters, func(s string) bool {
return nodes.Has(s)
})
}

I think @antoninbas is correct. The access is supposed to fail if the traffic towards a service with Local externalTrafficPolicy reaches a Node that doesn't have any backends of the service. I haven't tried but I suppose MetalLB is same.

@jsalatiel
Copy link
Author

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening.
If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?

@jsalatiel
Copy link
Author

jsalatiel commented May 16, 2022

More debugging here:
I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again.
Thats really odd.

  1. how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5
  2. Is this somehow related to my ExternalIPPool config that has nodeSelectore: {} ?
apiVersion: crd.antrea.io/v1alpha2
kind: ExternalIPPool
metadata:
  name: service-external-ip-pool
spec:
  ipRanges:
  - start: 10.1.2.220 
    end: 10.1.2.250 
  nodeSelector: {}
  1. I can see inet 10.1.2.220/32 scope global kube-ipvs0 assigned in all nodes. Is that expected ?

@xliuxu
Copy link
Contributor

xliuxu commented May 16, 2022

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening. If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

ping will not work for the external IPs of Services managed by Antrea. The IP is virtually assigned on Nodes.

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?
AFAIK no extra requirements are required.

More debugging here: I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again. Thats really odd.

  1. how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5

antctl get serviceexternalip should always return the same Node selected by Antrea on all antrea-agents. Could you help to confirm the result of tcpcump when running curl from the same subnet?

@jsalatiel
Copy link
Author

I understand what @antoninbas and @tnqn are saying and it looks like the expected behaviour, but there is something still strange happening. If I try to curl from any node outside the cluster but in the same network ( 10.1.2.0/24 ) it will work, but If I try from any node outside the cluster but in another network ( 172.16.0.0/24 for example ) I will be able to ping the service IP but I will get connection refused trying to access the service. If I change the externalTrafficPolicy to clusterIP I will be able to curl just fine from another network.

ping will not work for the external IPs of Services managed by Antrea. The IP is virtually assigned on Nodes.

Ping does work ( from servers outside the cluster at least )! This is a new subnet so I am absolutely sure there is no IP conflict if that's what you mean. As soon I remove the service ( which releases the loadBalancerIP ) ping stops.

PING 10.1.2.220 (10.1.2.220) 56(84) bytes of data.
64 bytes from 10.1.2.220: icmp_seq=1 ttl=63 time=1.11 ms
64 bytes from 10.1.2.220: icmp_seq=2 ttl=63 time=20.1 ms
64 bytes from 10.1.2.220: icmp_seq=3 ttl=63 time=22.3 ms
64 bytes from 10.1.2.220: icmp_seq=4 ttl=63 time=22.3 ms
64 bytes from 10.1.2.220: icmp_seq=5 ttl=63 time=26.6 ms
64 bytes from 10.1.2.220: icmp_seq=6 ttl=63 time=30.2 ms
64 bytes from 10.1.2.220: icmp_seq=7 ttl=63 time=27.4 ms
64 bytes from 10.1.2.220: icmp_seq=8 ttl=63 time=18.9 ms
64 bytes from 10.1.2.220: icmp_seq=9 ttl=63 time=19.6 ms
From 10.1.2.23 icmp_seq=10 Destination Host Unreachable
From 10.1.2.23 icmp_seq=11 Destination Host Unreachable
From 10.1.2.23 icmp_seq=12 Destination Host Unreachable
From 10.1.2.23 icmp_seq=13 Destination Host Unreachable
From 10.1.2.23 icmp_seq=14 Destination Host Unreachable
From 10.1.2.23 icmp_seq=15 Destination Host Unreachable
From 10.1.2.23 icmp_seq=16 Destination Host Unreachable
From 10.1.2.23 icmp_seq=17 Destination Host Unreachable
^C
--- 10.1.2.220 ping statistics ---
19 packets transmitted, 9 received, +8 errors, 52.6316% packet loss, time 18148ms
rtt min/avg/max/mdev = 1.108/20.945/30.169/7.908 ms, pipe 4

10.1.2.23 is the node5 node IP. Curious that when I remove the service from the cluster and ping stops, the real ip of the node is exposed. ( good to know )
If i keep the service and just shutdown the node5 I can see that the serviceIP is reassigned to node3 and pings continue just fine. Interesting that as soon as the node5 is back the externalIP failsback to it again. Is this expected ?

Is there any extra requirements when running ServiceExternalIP on VM guests on VMWare?
AFAIK no extra requirements are required.

More debugging here: I have opened a tcpdump on all cluster nodes ( including the masters ) and I can see that when I curl from one server in another network, that curl was arriving on the master node, and after a while started arriving on another node not running the pod, and then got to a node that has the pod and curl started answering for a while until that IP moved again. Thats really odd.

  1. how can the IP be moving between the nodes . antctl get serviceexternalip still shows that the assigned node is worker5

antctl get serviceexternalip should always return the same Node selected by Antrea on all antrea-agents. Could you help to confirm the result of tcpcump when running curl from the same subnet?

From the same subnet I can see the traffic going to the right node every time. ( node5)

@jsalatiel
Copy link
Author

i tried a quick deploy of metalLB and got the same problem, so I decided to read metallb documentation and found the fix there.
The kube-proxy must have strictARP: true
I could not find that on Antrea's serviceexternalIP, so it would be nice put there.

image

@jsalatiel
Copy link
Author

Found a nice comment here that may explain what was happening to me.

This issue just hit us as well, using metallb in layer 2 mode with a trafficpolicy of local. We were finding all of our kubernetes nodes responding to ARPs for metallb service ips, resulting in traffic getting routed to places that couldn't handle it. The ARPs weren't coming from the metallb speakers. Eventually we stumbled across tickets mentioning the strictARP mode.

@jianjuns
Copy link
Contributor

Antrea's implementation does respect externalTrafficPolicy

I missed this part! Yes I agree it is expected that access a Service with externalTrafficPolicy Local from a Node without backend Pods should fail. I do not think MetalLB will have any difference, as it just relies on kube-proxy for Service LB.

@jianjuns
Copy link
Contributor

i tried a quick deploy of metalLB and got the same problem, so I decided to read metallb documentation and found the fix there.
The kube-proxy must have strictARP: true

Nice finding! I was suspecting kube-proxy IPVS too when you mentioned kube-ipvs0. Also found an earlier K8s issue for this problem: kubernetes-sigs/kubespray#4788.

I could not find that on Antrea's serviceexternalIP, so it would be nice put there.

Yes. Would you create a PR to add it in docs/service-loadbalancer.md? Of course no problem if you prefer to let us handle it. Thanks again for reporting and root causing the issue!

@jsalatiel
Copy link
Author

I do not know how to do PRs , sorry.

@jianjuns
Copy link
Contributor

No problem. I can make the change.

In case you like to learn how to create a PR, check here: https://github.com/antrea-io/antrea/blob/main/CONTRIBUTING.md.

@jsalatiel
Copy link
Author

Thanks. Closing this.

@antoninbas
Copy link
Contributor

We need to document this, something came up in the past already with kube-proxy IPVS: #3370

Although it may be a pain to identify when exactly it is required. And @tnqn pointed out in the past (#3370 (comment)) that strictARP mode may interfere with the Egress feature if we are not cautious.

@antoninbas
Copy link
Contributor

I'll reopen this issue since there is a documentation change required. And I will assign @jianjuns since he volunteered :)

@antoninbas antoninbas reopened this May 16, 2022
@antoninbas antoninbas assigned jianjuns and unassigned xliuxu May 16, 2022
@antoninbas antoninbas added kind/documentation Categorizes issue or PR as related to a documentation. and removed kind/bug Categorizes issue or PR as related to a bug. labels May 16, 2022
@jianjuns
Copy link
Contributor

Although it may be a pain to identify when exactly it is required. And @tnqn pointed out in the past (#3370 (comment)) that strictARP mode may interfere with the Egress feature if we are not cautious.

That is actually a TODO (@xliuxu) with Egress. We can add it too in the Service LB document.

@jsalatiel
Copy link
Author

What must be done to not break egress?

@jianjuns
Copy link
Contributor

jianjuns commented May 16, 2022

I think one workaround can be manually setting arp_ignore to 0 for antrea-gw0, e.g.:

echo 0 > /proc/sys/net/ipv4/conf/antrea-gw0/arp_ignore

@xliuxu : do you have an idea?

@jsalatiel
Copy link
Author

Can that be set automatically by the node agent if it detects strictArp=true?

@jianjuns
Copy link
Contributor

I think our plan is to remove the requirement for arp_ignore for Egress. If you do have use cases to have both Egress and Service LB working with kube-proxy IPVS, we can prioritize the fix.

Another solution is to use AntreaProxy to replace kube-proxy: https://github.com/antrea-io/antrea/blob/main/docs/antrea-proxy.md#antreaproxy-with-proxyall. Not sure if that is what you want though.

@jsalatiel
Copy link
Author

Well, I do use Egress right now on production and I would start using ServiceExternalIP now, but apparently I can not use both at the sametime yet =)
No problem, I will wait until the removal of the requirement for arp_ignore for Egress.

Moving to antreaproxy is not on my plans for the current clusters.

@jianjuns
Copy link
Contributor

jianjuns commented May 17, 2022

Good to know you plan to use the feature in production! We can definitely prioritize the Egress fix. I created an issue to track that: #3804

@jsalatiel
Copy link
Author

Any chance that we could get on 1.7 ?

@jianjuns
Copy link
Contributor

jianjuns commented May 17, 2022

1.7 might be hard, as we plan to freeze this week. But we can consider a patch after 1.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Issues or PRs related to proxy functions in Antrea kind/documentation Categorizes issue or PR as related to a documentation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants