-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods running on non-gateway nodes cannot communicate with each other. #3110
Comments
Thanks for reaching out @lgy1027 , A. Short background: So, podA@non_gw_node@cluster1 communication with podB@non_gw_node@cluster2 consists of the following segments :
B. Tcpdumping the traffic on the relevant nodes on cluster1 and cluster2 can point us to the root cause. C. Please share the content of Calico default-ipv4-ippool from both clusters |
I looked at the above iss and found that by changing it to accept through this method, interoperability can be achieved, but I think the main problem should not be this. cluster1 default-ipv4-ippool:
cluster2
subctl gather : |
Trying to understand the root cause for packet drop , so
Can you elaborate ? |
In fact, I am a newbie in the network, and I have some shortcomings in locating problems in various aspects. I previously captured packets and tested them. The pod container network of cluster2 can track the gateway node of cluster2, but there is no communication between the gateway and the gateway. I entered the container of the gateway. Query the routing table and find that the routing table of the object cluster shows unreachable 10.102.0.0/16 proto bird |
@lgy1027 , [1] |
I tried restarting the route container and gateway container, but the result was the same when executing inside the gateway container, as follows:
At the same time, I found that after a while, the container in cluster1 failed to access the container in cluster2, and it would always be stuck. The gateway log reported an error link public ip warning. I don’t know if it is related to this, but I did not set the public ip, of course. I restarted the gateways of the two clusters and it worked again. |
this gateway logs:
cluster2:
· |
In GW node the hostnetworking routes to remote cluster CIDRs configured in table 150, |
cluster1 GW node:
cluster2 GW node:
|
When using submarine, the Calico encapsulation mode VXLAN needs to be modified, so that the environment that originally did not collapse the subnet now needs to use Overlay, which will affect the network performance of the original Pod.
Pod room across Node:Using CrossSubnet
UseAlways:
It can be seen that when using CrossSubnet, there is basically no difference between the container and the host. However, when changing to Always, the throughput dropped significantly after many tests, basically dropping by 2/3. Is this normal? |
That's correct, CrossSubnet has better performance compared to VxLAN overlay, Let's go back to the submariner inter-cluster datapath segments :
In step 4 (in cluster B), for example, when Calico is set to CrossSubnet, the source IP address is podA IP and the destination IP is podB IP. If Calico is configured for VxLAN overlay, packet will be encapsulated in VxLAN (source IP = IP of GW node, destination IP = IP of the node where podB is running). If in your environment inter-cluster traffic is fine with Calico configured to CrossSubnet you can stick with this configuration. |
@lgy1027 Any more discussion here or can we close this? |
@lgy1027 closing the issue for now. Feel free to reopen if you still need any help. |
Hello, I currently have two cluster container networks that need to communicate with each other. When using Subenmani to accommodate the tube, if the business container is deployed on a non-gateway node, there will be no response using curl. The following are my deployment steps.
K8s version: 1.25.3
subctl version: 0.18.0
cni: calico --- vxlan
broker deploy:
cluster1 join:
cluster2 join:
Deploy nginx on the non-gateway node in the cluster1 cluster and use subctl exportd to export
Deploy the nettool service in cluster2 to test access to the nginx service of cluster1
nslooku will return the resolved IP address. When I use curl, it will always get stuck without responding.
Of course, if the two services and the group are deployed on the node where the gateway is located, they can communicate with each other.
I don’t know where my problem lies. It’s normal to use subctl show all or subctl diagnose all. I also modified the mode of kube-proxy and vxlan of calico according to the official documents.
The text was updated successfully, but these errors were encountered: