Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods running on non-gateway nodes cannot communicate with each other. #3110

Closed
lgy1027 opened this issue Aug 1, 2024 · 14 comments
Closed

Pods running on non-gateway nodes cannot communicate with each other. #3110

lgy1027 opened this issue Aug 1, 2024 · 14 comments
Assignees
Labels
Calico datapath Datapath related issues or enhancements need-info support

Comments

@lgy1027
Copy link

lgy1027 commented Aug 1, 2024

Hello, I currently have two cluster container networks that need to communicate with each other. When using Subenmani to accommodate the tube, if the business container is deployed on a non-gateway node, there will be no response using curl. The following are my deployment steps.

K8s version: 1.25.3
subctl version: 0.18.0
cni: calico --- vxlan

broker deploy:

subctl deploy-broker

cluster1 join:

subctl join --kubeconfig ~/.kube/c1 broker-info.subm --clusterid c1

cluster2 join:

subctl join --kubeconfig ~/.kube/c2 broker-info.subm --clusterid c2

Deploy nginx on the non-gateway node in the cluster1 cluster and use subctl exportd to export
Deploy the nettool service in cluster2 to test access to the nginx service of cluster1
nslooku will return the resolved IP address. When I use curl, it will always get stuck without responding.
Of course, if the two services and the group are deployed on the node where the gateway is located, they can communicate with each other.
I don’t know where my problem lies. It’s normal to use subctl show all or subctl diagnose all. I also modified the mode of kube-proxy and vxlan of calico according to the official documents.

@lgy1027 lgy1027 added the support label Aug 1, 2024
@lgy1027
Copy link
Author

lgy1027 commented Aug 2, 2024

cluster1 ippool:
image
cluster2 ippool:
image

subctl show all

Cluster "kubernetes"
 ✓ Detecting broker(s)
NAMESPACE               NAME                COMPONENTS                        GLOBALNET   GLOBALNET CIDR   DEFAULT GLOBALNET SIZE   DEFAULT DOMAINS   
submariner-k8s-broker   submariner-broker   service-discovery, connectivity   no          242.0.0.0/8      65536                                      

 ✓ Showing Connections
GATEWAY            CLUSTER   REMOTE IP     NAT   CABLE DRIVER   SUBNETS                       STATUS      RTT avg.     
yigou-dev-102-64   dev       10.0.102.64   no    libreswan      10.96.0.0/16, 10.244.0.0/16   connected   711.052µs    

 ✓ Showing Endpoints
CLUSTER   ENDPOINT IP   PUBLIC IP        CABLE DRIVER   TYPE     
karmada   10.0.102.23   112.29.111.158   libreswan      local    
dev       10.0.102.64   112.29.111.158   libreswan      remote   

 ✓ Showing Gateways
NODE                    HA STATUS   SUMMARY                               
hero-cloud-dev-102-23   active      All connections (1) are established   

 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  calico
        Service CIDRs:   [10.102.0.0/16]
        Cluster CIDRs:   [10.202.0.0/16]

 ✓ Showing versions 
COMPONENT                       REPOSITORY           CONFIGURED   RUNNING                     ARCH    
submariner-gateway              quay.io/submariner   0.18.0       release-0.18-e3f3e56b57fe   amd64   
submariner-routeagent           quay.io/submariner   0.18.0       release-0.18-e3f3e56b57fe   amd64   
submariner-metrics-proxy        quay.io/submariner   0.18.0       release-0.18-011349c6f17e   amd64   
submariner-operator             quay.io/submariner   0.18.0       release-0.18-68fefdd74105   amd64   
submariner-lighthouse-agent     quay.io/submariner   0.18.0       release-0.18-02b6a5b37266   amd64   
submariner-lighthouse-coredns   quay.io/submariner   0.18.0       release-0.18-02b6a5b37266   amd64

cluster2:

Cluster "kubernetes"
 ✓ Detecting broker(s)
 ✓ No brokers found

 ✓ Showing Connections
GATEWAY                 CLUSTER   REMOTE IP     NAT   CABLE DRIVER   SUBNETS                        STATUS      RTT avg.     
hero-cloud-dev-102-23   karmada   10.0.102.23   no    libreswan      10.102.0.0/16, 10.202.0.0/16   connected   526.891µs    

 ✓ Showing Endpoints
CLUSTER   ENDPOINT IP   PUBLIC IP        CABLE DRIVER   TYPE     
dev       10.0.102.64   112.29.111.158   libreswan      local    
karmada   10.0.102.23   112.29.111.158   libreswan      remote   

 ✓ Showing Gateways
NODE               HA STATUS   SUMMARY                               
yigou-dev-102-64   active      All connections (1) are established   

 ✓ Showing Network details
    Discovered network details via Submariner:
        Network plugin:  calico
        Service CIDRs:   [10.96.0.0/16]
        Cluster CIDRs:   [10.244.0.0/16]

 ✓ Showing versions 
COMPONENT                       REPOSITORY           CONFIGURED   RUNNING                     ARCH    
submariner-gateway              quay.io/submariner   0.18.0       release-0.18-e3f3e56b57fe   amd64   
submariner-routeagent           quay.io/submariner   0.18.0       release-0.18-e3f3e56b57fe   amd64   
submariner-metrics-proxy        quay.io/submariner   0.18.0       release-0.18-011349c6f17e   amd64   
submariner-operator             quay.io/submariner   0.18.0       release-0.18-68fefdd74105   amd64   
submariner-lighthouse-agent     quay.io/submariner   0.18.0       release-0.18-02b6a5b37266   amd64   
submariner-lighthouse-coredns   quay.io/submariner   0.18.0       release-0.18-02b6a5b37266   amd64

cluster1 subctl diagnose all:

[root@hero-cloud-dev-102-23 network]# subctl diagnose all
Cluster "kubernetes"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.25.14" is supported

 ✓ Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 ✓ Checking DaemonSet "submariner-gateway"
 ✓ Checking DaemonSet "submariner-routeagent"
 ✓ Checking DaemonSet "submariner-metrics-proxy"
 ✓ Checking Deployment "submariner-lighthouse-agent"
 ✓ Checking Deployment "submariner-lighthouse-coredns"
 ✓ Checking the status of all Submariner pods
 ✓ Checking that gateway metrics are accessible from non-gateway nodes 

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("calico") is supported
 ✓ Calico CNI detected, checking if the Submariner IPPool pre-requisites are configured
 ✓ Checking gateway connections
 ✓ Checking Submariner support for the kube-proxy mode 
 ✓ The kube-proxy mode is supported
 ✓ Checking that firewall configuration allows intra-cluster VXLAN traffic 

 ✓ Checking that services have been exported properly

cluster2 diagnose all

[root@hero-cloud-dev-102-23 network]# subctl diagnose all kubeconfig ./subctl/a12397116471111680373621 
Cluster "kubernetes"
 ✓ Checking Submariner support for the Kubernetes version
 ✓ Kubernetes version "v1.25.14" is supported

 ✓ Non-Globalnet deployment detected - checking that cluster CIDRs do not overlap
 ✓ Checking DaemonSet "submariner-gateway"
 ✓ Checking DaemonSet "submariner-routeagent"
 ✓ Checking DaemonSet "submariner-metrics-proxy"
 ✓ Checking Deployment "submariner-lighthouse-agent"
 ✓ Checking Deployment "submariner-lighthouse-coredns"
 ✓ Checking the status of all Submariner pods
 ✓ Checking that gateway metrics are accessible from non-gateway nodes 

 ✓ Checking Submariner support for the CNI network plugin
 ✓ The detected CNI network plugin ("calico") is supported
 ✓ Calico CNI detected, checking if the Submariner IPPool pre-requisites are configured
 ✓ Checking gateway connections
 ✓ Checking Submariner support for the kube-proxy mode 
 ✓ The kube-proxy mode is supported
 ✓ Checking that firewall configuration allows intra-cluster VXLAN traffic 

 ✓ Checking that services have been exported properly

@yboaron yboaron added the Calico label Aug 4, 2024
@yboaron
Copy link
Contributor

yboaron commented Aug 5, 2024

Thanks for reaching out @lgy1027 ,
This looks like datapath issue that needs further investigation,

A. Short background:
Only the egress of inter-cluster datapath is handled by Submariner while ingress is handled by the CNI (Calico in your case).

So, podA@non_gw_node@cluster1 communication with podB@non_gw_node@cluster2 consists of the following segments :

  1. Egress : podA@non_gw_node@cluster1 --> gw_node@cluster1 via vx-submariner
  2. Egress : gw_node@cluster1 -> gw_node@cluster2 via IPSec tunnel
  3. Ingress: Decrypt IPSec packet
  4. Ingress: CNI should forward the packet to podB IP

B. Tcpdumping the traffic on the relevant nodes on cluster1 and cluster2 can point us to the root cause.

C. Please share the content of Calico default-ipv4-ippool from both clusters
D. Also upload Submariner logs (using subctl gather ) from both clusters.

@yboaron yboaron added this to Backlog Aug 5, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Backlog Aug 5, 2024
@yboaron yboaron self-assigned this Aug 5, 2024
@yboaron yboaron added the datapath Datapath related issues or enhancements label Aug 5, 2024
@lgy1027
Copy link
Author

lgy1027 commented Aug 6, 2024

Recently, we spent some time trying to locate the problem. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

I looked at the above iss and found that by changing it to accept through this method, interoperability can be achieved, but I think the main problem should not be this.

cluster1 default-ipv4-ippool:

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  creationTimestamp: "2024-07-31T08:52:01Z"
  name: default-ipv4-ippool
  resourceVersion: "246979"
  uid: 2df236cc-f21b-49f0-9cab-bbc1f4e4a785
spec:
  allowedUses:
  - Workload
  - Tunnel
  blockSize: 26
  cidr: 10.202.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

cluster2

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  creationTimestamp: "2024-07-26T03:50:41Z"
  name: default-ipv4-ippool
  resourceVersion: "155237418"
  uid: 6343a566-5ac4-4cfe-bb5c-15ab689916b8
spec:
  allowedUses:
  - Workload
  - Tunnel
  blockSize: 26
  cidr: 10.244.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

subctl gather :
cluster1.zip
cluster2.zip

@yboaron
Copy link
Contributor

yboaron commented Aug 6, 2024

Trying to understand the root cause for packet drop , so
podA@non_gw_node@cluster1 communication with podB@non_gw_node@cluster2 consists of the following segments :

  1. Egress : podA@non_gw_node@cluster1 --> gw_node@cluster1 via vx-submariner
  2. Egress : gw_node@cluster1 -> gw_node@cluster2 via IPSec tunnel
  3. Ingress: decrypt IPSec packet
  4. Ingress: Calico should forward the packet to podB IP
  5. Egress : podB@non_gw_node@cluster2 --> gw_node@cluster2 via vx-submariner
  6. Egress : gw_node@cluster2 -> gw_node@cluster1 via IPSec tunnel
  7. Ingress: decrypt IPSec packet
  8. Calico should forward the packet to podA IP

Recently, we spent some time trying to locate the problem. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

Can you elaborate ?
Is packet being dropped in ingress direction ? which step ?
Could it be that Calico drop the packet because source IP address doesn't belong to clusters CIDR (it's the IP address of the pod from the other cluster)?
Do you have networkpolicy defined in your clusters?

@lgy1027
Copy link
Author

lgy1027 commented Aug 15, 2024

Trying to understand the root cause for packet drop , so podA@non_gw_node@cluster1 communication with podB@non_gw_node@cluster2 consists of the following segments :

  1. Egress : podA@non_gw_node@cluster1 --> gw_node@cluster1 via vx-submariner
  2. Egress : gw_node@cluster1 -> gw_node@cluster2 via IPSec tunnel
  3. Ingress: decrypt IPSec packet
  4. Ingress: Calico should forward the packet to podB IP
  5. Egress : podB@non_gw_node@cluster2 --> gw_node@cluster2 via vx-submariner
  6. Egress : gw_node@cluster2 -> gw_node@cluster1 via IPSec tunnel
  7. Ingress: decrypt IPSec packet
  8. Calico should forward the packet to podA IP

Recently, we spent some time trying to locate the problem. It is found that packet loss occurs on the FORWARD chain of the filter table. The default policy of the FORWARD chain is DROP. Because the request packet does not match any of the rules, the packet is discarded. After the default policy of the Forward chain is changed to ACCEPT or the rules that can be matched by data packets are added, the access is normal.

Can you elaborate ? Is packet being dropped in ingress direction ? which step ? Could it be that Calico drop the packet because source IP address doesn't belong to clusters CIDR (it's the IP address of the pod from the other cluster)? Do you have networkpolicy defined in your clusters?

In fact, I am a newbie in the network, and I have some shortcomings in locating problems in various aspects. I previously captured packets and tested them. The pod container network of cluster2 can track the gateway node of cluster2, but there is no communication between the gateway and the gateway. I entered the container of the gateway. Query the routing table and find that the routing table of the object cluster shows unreachable 10.102.0.0/16 proto bird
unreachable 10.202.0.0/16 proto bird, is this normal?

@yboaron
Copy link
Contributor

yboaron commented Aug 15, 2024

@lgy1027 ,
Can you try restarting all Submariner RouteAgent pods (you can use [1] command) on both clusters, and update if that helps?

[1]
kubectl delete pods -n submariner-operator -l app=submariner-routeagent

@lgy1027
Copy link
Author

lgy1027 commented Aug 21, 2024

@lgy1027 , Can you try restarting all Submariner RouteAgent pods (you can use [1] command) on both clusters, and update if that helps?

[1] kubectl delete pods -n submariner-operator -l app=submariner-routeagent

I tried restarting the route container and gateway container, but the result was the same when executing inside the gateway container, as follows:

bash-5.2# ip route
default via 10.0.102.1 dev ens192 proto static metric 100 
10.0.102.0/24 dev ens192 proto kernel scope link src 10.0.102.64 metric 100 
unreachable 10.102.0.0/16 proto bird 
unreachable 10.202.0.0/16 proto bird 

At the same time, I found that after a while, the container in cluster1 failed to access the container in cluster2, and it would always be stuck. The gateway log reported an error link public ip warning. I don’t know if it is related to this, but I did not set the public ip, of course. I restarted the gateways of the two clusters and it worked again.

@lgy1027
Copy link
Author

lgy1027 commented Aug 21, 2024

@lgy1027 , Can you try restarting all Submariner RouteAgent pods (you can use [1] command) on both clusters, and update if that helps?

[1] kubectl delete pods -n submariner-operator -l app=submariner-routeagent

this gateway logs:
cluster1

+ trap 'exit 1' SIGTERM SIGINT
+ export CHARON_PID_FILE=/var/run/charon.pid
+ CHARON_PID_FILE=/var/run/charon.pid
+ rm -f /var/run/charon.pid
+ SUBMARINER_VERBOSITY=2
+ '[' false == true ']'
+ DEBUG=-v=2
++ cat /proc/sys/net/ipv4/conf/all/send_redirects
+ [[ 0 = 0 ]]
+ exec submariner-gateway -v=2 -alsologtostderr
submariner-gateway version: release-0.18-e3f3e56b57fe
2024-08-21T01:20:51.218Z INF ../versions/version.go:35 main                 Go Version: go1.22.4
2024-08-21T01:20:51.218Z INF ../versions/version.go:36 main                 Go Arch: amd64
2024-08-21T01:20:51.218Z INF ../versions/version.go:37 main                 Git Commit Hash: e3f3e56b57fe22371cdcf885315804f67663006e
2024-08-21T01:20:51.218Z INF ../versions/version.go:38 main                 Git Commit Date: 2024-06-24T18:49:44+03:00
2024-08-21T01:20:51.219Z INF ..o/submariner/main.go:94 main                 Parsed env variables: types.SubmarinerSpecification{ClusterCidr:[]string{"10.202.0.0/16"}, GlobalCidr:[]string{}, ServiceCidr:[]string{"10.102.0.0/16"}, Broker:"", CableDriver:"libreswan", ClusterID:"c1", Namespace:"submariner-operator", PublicIP:"", Token:"", Debug:false, NATEnabled:true, HealthCheckEnabled:true, Uninstall:false, HaltOnCertError:false, HealthCheckInterval:0x1, HealthCheckMaxPacketLossCount:0x5, MetricsPort:32780}
2024-08-21T01:20:51.219Z INF ..o/submariner/main.go:97 main                 Proxy env variables: HTTP_PROXY: , HTTPS_PROXY: , NO_PROXY: 
W0821 01:20:51.220062       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-08-21T01:20:51.234Z INF ../gateway/gateway.go:108 Gateway              Initializing the gateway engine
2024-08-21T01:20:51.234Z INF ../gateway/gateway.go:134 Gateway              Creating the cable engine
2024-08-21T01:20:51.234Z INF ../gateway/gateway.go:145 Gateway              AIR_GAPPED_DEPLOYMENT is set to false
2024-08-21T01:20:51.702Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "lo" has "127.0.0.1" address
2024-08-21T01:20:51.702Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "ens192" has "10.0.102.23" address
2024-08-21T01:20:51.702Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "docker0" has "172.17.0.1" address
2024-08-21T01:20:51.702Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "vxlan.calico" has "10.202.44.192" address
2024-08-21T01:20:51.702Z DBG ..pkg/cni/cni_iface.go:75 CNI                  Found CNI Interface "vxlan.calico" that has IP "10.202.44.192" from ClusterCIDR "10.202.0.0/16"
2024-08-21T01:20:51.702Z INF ../gateway/gateway.go:161 Gateway              Creating the datastore syncer
2024-08-21T01:20:51.704Z INF ../gateway/gateway.go:188 Gateway              Starting the gateway engine
2024-08-21T01:20:51.704Z DBG ..ery/natdiscovery.go:102 NAT                  NAT discovery server starting on port 4490
2024-08-21T01:20:51.720Z INF ../gateway/gateway.go:247 Gateway              Starting leader election
I0821 01:20:51.720556       1 leaderelection.go:250] attempting to acquire leader lease submariner-operator/submariner-gateway-lock...
I0821 01:20:51.727152       1 leaderelection.go:260] successfully acquired lease submariner-operator/submariner-gateway-lock
2024-08-21T01:20:51.727Z DBG ..ols/record/event.go:377 Gateway              Event(v1.ObjectReference{Kind:"Lease", Namespace:"submariner-operator", Name:"submariner-gateway-lock", UID:"feb4d5cc-aa03-4352-ada9-87330dec3637", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"7554264", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' hero-cloud-dev-102-23-submariner-gateway became leader
2024-08-21T01:20:51.727Z INF ../gateway/gateway.go:280 Gateway              Leadership acquired - starting controllers
2024-08-21T01:20:51.727Z INF ..reswan/libreswan.go:136 libreswan            Using NATT UDP port 4500
2024-08-21T01:20:51.727Z INF ..gine/cableengine.go:111 CableEngine          CableEngine started with driver "libreswan"
2024-08-21T01:20:51.727Z INF ..public_ip_watcher.go:58 Endpoint             Starting the public IP watcher.
2024-08-21T01:20:51.727Z INF ..r/datastoresyncer.go:70 DSSyncer             Starting the datastore syncer
2024-08-21T01:20:51.727Z INF ..ers/tunnel/tunnel.go:40 Tunnel               Starting the tunnel controller
2024-08-21T01:20:51.727Z INF ../gateway/gateway.go:349 Gateway              Updating Gateway pod HA status to "active"
2024-08-21T01:20:51.742Z INF ../gateway/gateway.go:370 Gateway              Successfully updated Gateway pod HA status to "active"
I0821 01:20:51.743058       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:20:51.743426       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:20:51.744889       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Cluster from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
2024-08-21T01:20:51.841Z INF ..er/healthchecker.go:114 HealthChecker        CableEngine HealthChecker started with PingInterval: 1, MaxPacketLossCount: 5
2024-08-21T01:20:51.841Z INF ..ery/natdiscovery.go:144 NAT                  Starting NAT discovery for endpoint "submariner-cable-c2-10-0-102-64"
2024-08-21T01:20:51.841Z INF ..thchecker/pinger.go:106 HealthChecker        Starting pinger for IP "10.244.187.128"
2024-08-21T01:20:51.841Z INF ..er/healthchecker.go:176 HealthChecker        CableEngine HealthChecker started pinger for CableName: "submariner-cable-c2-10-0-102-64" with HealthCheckIP "10.244.187.128"
2024-08-21T01:20:51.843Z INF ..thchecker/pinger.go:177 HealthChecker        Ping to remote endpoint IP "10.244.187.128" is successful
I0821 01:20:51.846437       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:20:51.846436       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Cluster from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
2024-08-21T01:20:51.849Z INF .._update_federator.go:71 Federator            local -> broker: Updated Cluster "submariner-k8s-broker/c1" 
2024-08-21T01:20:51.902Z INF .._update_federator.go:71 Federator            broker -> local: Updated Cluster "submariner-operator/c2" 
2024-08-21T01:20:51.945Z INF ../datastoresyncer.go:208 DSSyncer             Ensuring we are the only endpoint active for this cluster
2024-08-21T01:20:51.945Z INF ../datastoresyncer.go:266 DSSyncer             Creating local submariner Cluster: {
  "id": "c1",
  "spec": {
    "cluster_id": "c1",
    "color_codes": [
      "blue"
    ],
    "service_cidr": [
      "10.102.0.0/16"
    ],
    "cluster_cidr": [
      "10.202.0.0/16"
    ],
    "global_cidr": []
  }
}
I0821 01:20:51.947979       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
2024-08-21T01:20:51.950Z INF .._update_federator.go:71 Federator            broker -> local: Updated Cluster "submariner-operator/c1" 
2024-08-21T01:20:51.950Z INF ../datastoresyncer.go:281 DSSyncer             Creating local submariner Endpoint: {
  "metadata": {
    "name": "c1-submariner-cable-c1-10-0-102-23",
    "creationTimestamp": null
  },
  "spec": {
    "cluster_id": "c1",
    "cable_name": "submariner-cable-c1-10-0-102-23",
    "healthCheckIP": "10.202.44.192",
    "hostname": "hero-cloud-dev-102-23",
    "subnets": [
      "10.102.0.0/16",
      "10.202.0.0/16"
    ],
    "private_ip": "10.0.102.23",
    "public_ip": "112.29.111.158",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
2024-08-21T01:20:51.951Z INF .._update_federator.go:71 Federator            local -> broker: Updated Endpoint "submariner-k8s-broker/c1-submariner-cable-c1-10-0-102-23" 
2024-08-21T01:20:51.955Z INF .._update_federator.go:71 Federator            broker -> local: Updated Endpoint "submariner-operator/c1-submariner-cable-c1-10-0-102-23" 
2024-08-21T01:20:51.955Z INF ../datastoresyncer.go:100 DSSyncer             Datastore syncer started
2024-08-21T01:20:52.004Z INF .._update_federator.go:71 Federator            broker -> local: Updated Endpoint "submariner-operator/c2-submariner-cable-c2-10-0-102-64" 
2024-08-21T01:20:52.705Z DBG ..ery/request_send.go:117 NAT                  Sending request - REQUEST_NUMBER: 0x68b3c66c31ad4c90, SENDER: "submariner-cable-c1-10-0-102-23", RECEIVER: "submariner-cable-c2-10-0-102-64", USING_SRC: 10.0.102.23:4490, USING_DST: 10.0.102.64:4490
2024-08-21T01:20:52.705Z DBG ..ery/request_send.go:117 NAT                  Sending request - REQUEST_NUMBER: 0x68b3c66c31ad4c91, SENDER: "submariner-cable-c1-10-0-102-23", RECEIVER: "submariner-cable-c2-10-0-102-64", USING_SRC: 10.0.102.23:4490, USING_DST: 112.29.111.158:4490
2024-08-21T01:20:52.706Z DBG ..y/response_handle.go:31 NAT                  Received response from 10.0.102.64:4490 - REQUEST_NUMBER: 0x68b3c66c31ad4c90, RESPONSE: OK, SENDER: "submariner-cable-c2-10-0-102-64", RECEIVER: "submariner-cable-c1-10-0-102-23"
2024-08-21T01:20:52.706Z DBG ../remote_endpoint.go:184 NAT                  selected private IP "10.0.102.64" for endpoint "submariner-cable-c2-10-0-102-64"
2024-08-21T01:20:52.706Z INF ..gine/cableengine.go:219 CableEngine          Installing Endpoint cable "submariner-cable-c2-10-0-102-64"
2024-08-21T01:20:52.706Z INF ..reswan/libreswan.go:596 libreswan            Starting Pluto
2024-08-21T01:20:52.707Z DBG ..y/response_handle.go:31 NAT                  Received response from 112.29.111.158:4490 - REQUEST_NUMBER: 0x68b3c66c31ad4c91, RESPONSE: UNKNOWN_DST_CLUSTER, SENDER: "submariner-cable-cluster-a-10-0-102-111", RECEIVER: "submariner-cable-c1-10-0-102-23"
2024-08-21T01:20:52.709Z ERR ..iscovery/listener.go:82 NAT                  Error handling message from address 112.29.111.158:4490:
00000000  1a 7c 08 91 99 b5 8d c3  cd f1 d9 68 10 02 1a 34  |.|.........h...4|
00000010  0a 09 63 6c 75 73 74 65  72 2d 61 12 27 73 75 62  |..cluster-a.'sub|
00000020  6d 61 72 69 6e 65 72 2d  63 61 62 6c 65 2d 63 6c  |mariner-cable-cl|
00000030  75 73 74 65 72 2d 61 2d  31 30 2d 30 2d 31 30 32  |uster-a-10-0-102|
00000040  2d 31 31 31 22 25 0a 02  63 31 12 1f 73 75 62 6d  |-111"%..c1..subm|
00000050  61 72 69 6e 65 72 2d 63  61 62 6c 65 2d 63 31 2d  |ariner-cable-c1-|
00000060  31 30 2d 30 2d 31 30 32  2d 32 33 42 11 0a 0b 31  |10-0-102-23B...1|
00000070  39 32 2e 31 36 38 2e 37  2e 31 10 df 88 03        |92.168.7.1....|
 error="remote endpoint \"submariner-cable-cluster-a-10-0-102-111\" responded with \"UNKNOWN_DST_CLUSTER\" : &proto.SubmarinerNATDiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), RequestNumber:0x68b3c66c31ad4c91, Response:2, Sender:(*proto.EndpointDetails)(0xc000488c80), Receiver:(*proto.EndpointDetails)(0xc000488ff0), SrcIpNatDetected:false, SrcPortNatDetected:false, DstIpNatDetected:false, ReceivedSrc:(*proto.IPPortPair)(0xc000168880)}"
Initializing NSS database

002 listening for IKE messages
002 Kernel supports NIC esp-hw-offload
002 adding UDP interface vx-submariner 240.0.102.23:500
002 adding UDP interface vx-submariner 240.0.102.23:4500
002 adding UDP interface vxlan.calico 10.202.44.192:500
002 adding UDP interface vxlan.calico 10.202.44.192:4500
002 adding UDP interface docker0 172.17.0.1:500
002 adding UDP interface docker0 172.17.0.1:4500
002 adding UDP interface ens192 10.0.102.23:500
002 adding UDP interface ens192 10.0.102.23:4500
002 adding UDP interface lo 127.0.0.1:500
002 adding UDP interface lo 127.0.0.1:4500
002 loading secrets from "/etc/ipsec.secrets"
002 loading secrets from "/etc/ipsec.d/submariner.secrets"
2024-08-21T01:20:53.423Z INF ..reswan/libreswan.go:384 libreswan            Creating connection(s) for {"metadata":{"name":"c2-submariner-cable-c2-10-0-102-64","namespace":"submariner-operator","uid":"bde7e753-8936-4c44-a88e-2c5d9c7350a0","resourceVersion":"5313411","generation":1,"creationTimestamp":"2024-08-15T01:40:13Z","labels":{"submariner-io/clusterID":"c2"}},"spec":{"cluster_id":"c2","cable_name":"submariner-cable-c2-10-0-102-64","healthCheckIP":"10.244.187.128","hostname":"yigou-dev-102-64","subnets":["10.96.0.0/16","10.244.0.0/16"],"private_ip":"10.0.102.64","public_ip":"112.29.111.158","nat_enabled":true,"backend":"libreswan","backend_config":{"natt-discovery-port":"4490","preferred-server":"false","udp-port":"4500"}}} in bi-directional mode
2024-08-21T01:20:53.423Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c2-10-0-102-64-0-0 --id 10.0.102.23 --host 10.0.102.23 --client 10.102.0.0/16 --ikeport 4500 --to --id 10.0.102.64 --host 10.0.102.64 --client 10.96.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c2-10-0-102-64-0-0": added IKEv2 connection
181 "submariner-cable-c2-10-0-102-64-0-0" #1: initiating IKEv2 connection
2024-08-21T01:20:53.468Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c2-10-0-102-64-0-1 --id 10.0.102.23 --host 10.0.102.23 --client 10.102.0.0/16 --ikeport 4500 --to --id 10.0.102.64 --host 10.0.102.64 --client 10.244.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c2-10-0-102-64-0-1": added IKEv2 connection
002 "submariner-cable-c2-10-0-102-64-0-1" #3: initiating Child SA using IKE SA #1
189 "submariner-cable-c2-10-0-102-64-0-1" #3: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c2-10-0-102-64-0-1" #3: initiator established Child SA using #1; IPsec tunnel [10.102.0.0-10.102.255.255:0-65535 0] -> [10.244.0.0-10.244.255.255:0-65535 0] {ESP=>0xf81cf2a1 <0xdda7abc1 xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:20:53.526Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c2-10-0-102-64-1-0 --id 10.0.102.23 --host 10.0.102.23 --client 10.202.0.0/16 --ikeport 4500 --to --id 10.0.102.64 --host 10.0.102.64 --client 10.96.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c2-10-0-102-64-1-0": added IKEv2 connection
002 "submariner-cable-c2-10-0-102-64-1-0" #4: initiating Child SA using IKE SA #1
189 "submariner-cable-c2-10-0-102-64-1-0" #4: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c2-10-0-102-64-1-0" #4: initiator established Child SA using #1; IPsec tunnel [10.202.0.0-10.202.255.255:0-65535 0] -> [10.96.0.0-10.96.255.255:0-65535 0] {ESP=>0x5ff63a4c <0xcfd84475 xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:20:53.553Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c2-10-0-102-64-1-1 --id 10.0.102.23 --host 10.0.102.23 --client 10.202.0.0/16 --ikeport 4500 --to --id 10.0.102.64 --host 10.0.102.64 --client 10.244.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c2-10-0-102-64-1-1": added IKEv2 connection
002 "submariner-cable-c2-10-0-102-64-1-1" #5: initiating Child SA using IKE SA #1
189 "submariner-cable-c2-10-0-102-64-1-1" #5: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c2-10-0-102-64-1-1" #5: initiator established Child SA using #1; IPsec tunnel [10.202.0.0-10.202.255.255:0-65535 0] -> [10.244.0.0-10.244.255.255:0-65535 0] {ESP=>0x2a3dd195 <0x327d78af xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:20:53.579Z INF ..gine/cableengine.go:226 CableEngine          Successfully installed Endpoint cable "submariner-cable-c2-10-0-102-64" with remote IP 10.0.102.64
2024-08-21T01:21:09.451Z DBG ..ry/request_handle.go:54 NAT                  Received request from 10.0.102.64:4490 - REQUEST_NUMBER: 0x7c3dab443a12bf30, SENDER: "submariner-cable-c2-10-0-102-64", RECEIVER: "submariner-cable-c1-10-0-102-23"
2024-08-21T01:21:09.452Z DBG ..y/request_handle.go:119 NAT                  Sending response to 10.0.102.64:4490 - REQUEST_NUMBER: 0x7c3dab443a12bf30, RESPONSE: OK, SENDER: "submariner-cable-c1-10-0-102-23", RECEIVER: "submariner-cable-c2-10-0-102-64"

cluster2:

+ trap 'exit 1' SIGTERM SIGINT
+ export CHARON_PID_FILE=/var/run/charon.pid
+ CHARON_PID_FILE=/var/run/charon.pid
+ rm -f /var/run/charon.pid
+ SUBMARINER_VERBOSITY=2
+ '[' false == true ']'
+ DEBUG=-v=2
++ cat /proc/sys/net/ipv4/conf/all/send_redirects
+ [[ 0 = 0 ]]
+ exec submariner-gateway -v=2 -alsologtostderr
submariner-gateway version: release-0.18-e3f3e56b57fe
2024-08-21T01:21:07.719Z INF ../versions/version.go:35 main                 Go Version: go1.22.4
2024-08-21T01:21:07.719Z INF ../versions/version.go:36 main                 Go Arch: amd64
2024-08-21T01:21:07.719Z INF ../versions/version.go:37 main                 Git Commit Hash: e3f3e56b57fe22371cdcf885315804f67663006e
2024-08-21T01:21:07.719Z INF ../versions/version.go:38 main                 Git Commit Date: 2024-06-24T18:49:44+03:00
2024-08-21T01:21:07.720Z INF ..o/submariner/main.go:94 main                 Parsed env variables: types.SubmarinerSpecification{ClusterCidr:[]string{"10.244.0.0/16"}, GlobalCidr:[]string{}, ServiceCidr:[]string{"10.96.0.0/16"}, Broker:"", CableDriver:"libreswan", ClusterID:"c2", Namespace:"submariner-operator", PublicIP:"", Token:"", Debug:false, NATEnabled:true, HealthCheckEnabled:true, Uninstall:false, HaltOnCertError:false, HealthCheckInterval:0x1, HealthCheckMaxPacketLossCount:0x5, MetricsPort:32780}
2024-08-21T01:21:07.721Z INF ..o/submariner/main.go:97 main                 Proxy env variables: HTTP_PROXY: , HTTPS_PROXY: , NO_PROXY: 
W0821 01:21:07.721366       1 client_config.go:659] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-08-21T01:21:07.751Z INF ../gateway/gateway.go:108 Gateway              Initializing the gateway engine
2024-08-21T01:21:07.751Z INF ../gateway/gateway.go:134 Gateway              Creating the cable engine
2024-08-21T01:21:07.751Z INF ../gateway/gateway.go:145 Gateway              AIR_GAPPED_DEPLOYMENT is set to false
2024-08-21T01:21:08.452Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "lo" has "127.0.0.1" address
2024-08-21T01:21:08.452Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "ens192" has "10.0.102.64" address
2024-08-21T01:21:08.452Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "docker0" has "172.17.0.1" address
2024-08-21T01:21:08.452Z DBG ..pkg/cni/cni_iface.go:70 CNI                  Interface "vxlan.calico" has "10.244.187.128" address
2024-08-21T01:21:08.453Z DBG ..pkg/cni/cni_iface.go:75 CNI                  Found CNI Interface "vxlan.calico" that has IP "10.244.187.128" from ClusterCIDR "10.244.0.0/16"
2024-08-21T01:21:08.453Z INF ../gateway/gateway.go:161 Gateway              Creating the datastore syncer
2024-08-21T01:21:08.453Z INF ../gateway/gateway.go:188 Gateway              Starting the gateway engine
2024-08-21T01:21:08.453Z DBG ..ery/natdiscovery.go:102 NAT                  NAT discovery server starting on port 4490
2024-08-21T01:21:08.471Z INF ../gateway/gateway.go:247 Gateway              Starting leader election
I0821 01:21:08.471440       1 leaderelection.go:250] attempting to acquire leader lease submariner-operator/submariner-gateway-lock...
I0821 01:21:08.482354       1 leaderelection.go:260] successfully acquired lease submariner-operator/submariner-gateway-lock
2024-08-21T01:21:08.482Z DBG ..ols/record/event.go:377 Gateway              Event(v1.ObjectReference{Kind:"Lease", Namespace:"submariner-operator", Name:"submariner-gateway-lock", UID:"958a0848-5a70-48a9-ba3f-2cc6fac7f461", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"182027556", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' yigou-dev-102-64-submariner-gateway became leader
2024-08-21T01:21:08.482Z INF ../gateway/gateway.go:280 Gateway              Leadership acquired - starting controllers
2024-08-21T01:21:08.482Z INF ..reswan/libreswan.go:136 libreswan            Using NATT UDP port 4500
2024-08-21T01:21:08.482Z INF ..gine/cableengine.go:111 CableEngine          CableEngine started with driver "libreswan"
2024-08-21T01:21:08.482Z INF ..public_ip_watcher.go:58 Endpoint             Starting the public IP watcher.
2024-08-21T01:21:08.482Z INF ..r/datastoresyncer.go:70 DSSyncer             Starting the datastore syncer
2024-08-21T01:21:08.483Z INF ..ers/tunnel/tunnel.go:40 Tunnel               Starting the tunnel controller
2024-08-21T01:21:08.483Z INF ../gateway/gateway.go:349 Gateway              Updating Gateway pod HA status to "active"
2024-08-21T01:21:08.505Z INF ../gateway/gateway.go:370 Gateway              Successfully updated Gateway pod HA status to "active"
I0821 01:21:08.508266       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Cluster from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:21:08.509867       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:21:08.511608       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:21:08.601233       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Cluster from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
I0821 01:21:08.601343       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
2024-08-21T01:21:08.605Z INF .._update_federator.go:71 Federator            local -> broker: Updated Cluster "submariner-k8s-broker/c2" 
2024-08-21T01:21:08.608Z INF ..ery/natdiscovery.go:144 NAT                  Starting NAT discovery for endpoint "submariner-cable-c1-10-0-102-23"
2024-08-21T01:21:08.611Z INF ..er/healthchecker.go:114 HealthChecker        CableEngine HealthChecker started with PingInterval: 1, MaxPacketLossCount: 5
2024-08-21T01:21:08.611Z INF ..thchecker/pinger.go:106 HealthChecker        Starting pinger for IP "10.202.44.192"
2024-08-21T01:21:08.611Z INF ..er/healthchecker.go:176 HealthChecker        CableEngine HealthChecker started pinger for CableName: "submariner-cable-c1-10-0-102-23" with HealthCheckIP "10.202.44.192"
2024-08-21T01:21:08.612Z INF ..thchecker/pinger.go:177 HealthChecker        Ping to remote endpoint IP "10.202.44.192" is successful
2024-08-21T01:21:08.662Z INF .._update_federator.go:71 Federator            broker -> local: Updated Cluster "submariner-operator/c1" 
2024-08-21T01:21:08.699Z INF ../datastoresyncer.go:208 DSSyncer             Ensuring we are the only endpoint active for this cluster
2024-08-21T01:21:08.700Z INF ../datastoresyncer.go:266 DSSyncer             Creating local submariner Cluster: {
  "id": "c2",
  "spec": {
    "cluster_id": "c2",
    "color_codes": [
      "blue"
    ],
    "service_cidr": [
      "10.96.0.0/16"
    ],
    "cluster_cidr": [
      "10.244.0.0/16"
    ],
    "global_cidr": []
  }
}
I0821 01:21:08.701872       1 reflector.go:359] Caches populated for submariner.io/v1, Kind=Endpoint from k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232
2024-08-21T01:21:08.706Z INF .._update_federator.go:71 Federator            local -> broker: Updated Endpoint "submariner-k8s-broker/c2-submariner-cable-c2-10-0-102-64" 
2024-08-21T01:21:08.707Z INF .._update_federator.go:71 Federator            broker -> local: Updated Cluster "submariner-operator/c2" 
2024-08-21T01:21:08.707Z INF ../datastoresyncer.go:281 DSSyncer             Creating local submariner Endpoint: {
  "metadata": {
    "name": "c2-submariner-cable-c2-10-0-102-64",
    "creationTimestamp": null
  },
  "spec": {
    "cluster_id": "c2",
    "cable_name": "submariner-cable-c2-10-0-102-64",
    "healthCheckIP": "10.244.187.128",
    "hostname": "yigou-dev-102-64",
    "subnets": [
      "10.96.0.0/16",
      "10.244.0.0/16"
    ],
    "private_ip": "10.0.102.64",
    "public_ip": "112.29.111.158",
    "nat_enabled": true,
    "backend": "libreswan",
    "backend_config": {
      "natt-discovery-port": "4490",
      "preferred-server": "false",
      "udp-port": "4500"
    }
  }
}
2024-08-21T01:21:08.717Z INF .._update_federator.go:71 Federator            broker -> local: Updated Endpoint "submariner-operator/c2-submariner-cable-c2-10-0-102-64" 
2024-08-21T01:21:08.717Z INF ../datastoresyncer.go:100 DSSyncer             Datastore syncer started
2024-08-21T01:21:08.762Z INF .._update_federator.go:71 Federator            broker -> local: Updated Endpoint "submariner-operator/c1-submariner-cable-c1-10-0-102-23" 
2024-08-21T01:21:09.455Z DBG ..ery/request_send.go:117 NAT                  Sending request - REQUEST_NUMBER: 0x7c3dab443a12bf30, SENDER: "submariner-cable-c2-10-0-102-64", RECEIVER: "submariner-cable-c1-10-0-102-23", USING_SRC: 10.0.102.64:4490, USING_DST: 10.0.102.23:4490
2024-08-21T01:21:09.455Z DBG ..ery/request_send.go:117 NAT                  Sending request - REQUEST_NUMBER: 0x7c3dab443a12bf31, SENDER: "submariner-cable-c2-10-0-102-64", RECEIVER: "submariner-cable-c1-10-0-102-23", USING_SRC: 10.0.102.64:4490, USING_DST: 112.29.111.158:4490
2024-08-21T01:21:09.456Z DBG ..y/response_handle.go:31 NAT                  Received response from 10.0.102.23:4490 - REQUEST_NUMBER: 0x7c3dab443a12bf30, RESPONSE: OK, SENDER: "submariner-cable-c1-10-0-102-23", RECEIVER: "submariner-cable-c2-10-0-102-64"
2024-08-21T01:21:09.456Z DBG ../remote_endpoint.go:184 NAT                  selected private IP "10.0.102.23" for endpoint "submariner-cable-c1-10-0-102-23"
2024-08-21T01:21:09.456Z INF ..gine/cableengine.go:219 CableEngine          Installing Endpoint cable "submariner-cable-c1-10-0-102-23"
2024-08-21T01:21:09.456Z INF ..reswan/libreswan.go:596 libreswan            Starting Pluto
2024-08-21T01:21:09.456Z DBG ..y/response_handle.go:31 NAT                  Received response from 112.29.111.158:4490 - REQUEST_NUMBER: 0x7c3dab443a12bf31, RESPONSE: UNKNOWN_DST_CLUSTER, SENDER: "submariner-cable-cluster-a-10-0-102-111", RECEIVER: "submariner-cable-c2-10-0-102-64"
2024-08-21T01:21:09.457Z ERR ..iscovery/listener.go:82 NAT                  Error handling message from address 112.29.111.158:4490:
00000000  1a 7c 08 b1 fe ca d0 c3  e8 ea 9e 7c 10 02 1a 34  |.|.........|...4|
00000010  0a 09 63 6c 75 73 74 65  72 2d 61 12 27 73 75 62  |..cluster-a.'sub|
00000020  6d 61 72 69 6e 65 72 2d  63 61 62 6c 65 2d 63 6c  |mariner-cable-cl|
00000030  75 73 74 65 72 2d 61 2d  31 30 2d 30 2d 31 30 32  |uster-a-10-0-102|
00000040  2d 31 31 31 22 25 0a 02  63 32 12 1f 73 75 62 6d  |-111"%..c2..subm|
00000050  61 72 69 6e 65 72 2d 63  61 62 6c 65 2d 63 32 2d  |ariner-cable-c2-|
00000060  31 30 2d 30 2d 31 30 32  2d 36 34 42 11 0a 0b 31  |10-0-102-64B...1|
00000070  39 32 2e 31 36 38 2e 37  2e 31 10 e0 88 03        |92.168.7.1....|
 error="remote endpoint \"submariner-cable-cluster-a-10-0-102-111\" responded with \"UNKNOWN_DST_CLUSTER\" : &proto.SubmarinerNATDiscoveryResponse{state:impl.MessageState{NoUnkeyedLiterals:pragma.NoUnkeyedLiterals{}, DoNotCompare:pragma.DoNotCompare{}, DoNotCopy:pragma.DoNotCopy{}, atomicMessageInfo:(*impl.MessageInfo)(nil)}, sizeCache:0, unknownFields:[]uint8(nil), RequestNumber:0x7c3dab443a12bf31, Response:2, Sender:(*proto.EndpointDetails)(0xc000b18a50), Receiver:(*proto.EndpointDetails)(0xc000b18aa0), SrcIpNatDetected:false, SrcPortNatDetected:false, DstIpNatDetected:false, ReceivedSrc:(*proto.IPPortPair)(0xc000f48200)}"
Initializing NSS database

002 listening for IKE messages
002 forgetting secrets
002 loading secrets from "/etc/ipsec.secrets"
002 loading secrets from "/etc/ipsec.d/submariner.secrets"
2024-08-21T01:21:11.578Z INF ..reswan/libreswan.go:384 libreswan            Creating connection(s) for {"metadata":{"name":"c1-submariner-cable-c1-10-0-102-23","namespace":"submariner-operator","uid":"1b558961-0e45-4766-8745-43ed334f56f2","resourceVersion":"173768085","generation":1,"creationTimestamp":"2024-08-15T01:40:13Z","labels":{"submariner-io/clusterID":"c1"}},"spec":{"cluster_id":"c1","cable_name":"submariner-cable-c1-10-0-102-23","healthCheckIP":"10.202.44.192","hostname":"hero-cloud-dev-102-23","subnets":["10.102.0.0/16","10.202.0.0/16"],"private_ip":"10.0.102.23","public_ip":"112.29.111.158","nat_enabled":true,"backend":"libreswan","backend_config":{"natt-discovery-port":"4490","preferred-server":"false","udp-port":"4500"}}} in bi-directional mode
2024-08-21T01:21:11.578Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c1-10-0-102-23-0-0 --id 10.0.102.64 --host 10.0.102.64 --client 10.96.0.0/16 --ikeport 4500 --to --id 10.0.102.23 --host 10.0.102.23 --client 10.102.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c1-10-0-102-23-0-0": added IKEv2 connection
181 "submariner-cable-c1-10-0-102-23-0-0" #1: initiating IKEv2 connection
2024-08-21T01:21:11.627Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c1-10-0-102-23-0-1 --id 10.0.102.64 --host 10.0.102.64 --client 10.96.0.0/16 --ikeport 4500 --to --id 10.0.102.23 --host 10.0.102.23 --client 10.202.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c1-10-0-102-23-0-1": added IKEv2 connection
002 "submariner-cable-c1-10-0-102-23-0-1" #3: initiating Child SA using IKE SA #1
189 "submariner-cable-c1-10-0-102-23-0-1" #3: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c1-10-0-102-23-0-1" #3: initiator established Child SA using #1; IPsec tunnel [10.96.0.0-10.96.255.255:0-65535 0] -> [10.202.0.0-10.202.255.255:0-65535 0] {ESP=>0x05ad8384 <0x6cd0fe3c xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:21:11.692Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c1-10-0-102-23-1-0 --id 10.0.102.64 --host 10.0.102.64 --client 10.244.0.0/16 --ikeport 4500 --to --id 10.0.102.23 --host 10.0.102.23 --client 10.102.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c1-10-0-102-23-1-0": added IKEv2 connection
002 "submariner-cable-c1-10-0-102-23-1-0" #4: initiating Child SA using IKE SA #1
189 "submariner-cable-c1-10-0-102-23-1-0" #4: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c1-10-0-102-23-1-0" #4: initiator established Child SA using #1; IPsec tunnel [10.244.0.0-10.244.255.255:0-65535 0] -> [10.102.0.0-10.102.255.255:0-65535 0] {ESP=>0xe76d5edc <0x8894fd4a xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:21:11.719Z INF ..reswan/libreswan.go:448 libreswan            bidirectionalConnectToEndpoint: executing whack with args: [--psk --encrypt --name submariner-cable-c1-10-0-102-23-1-1 --id 10.0.102.64 --host 10.0.102.64 --client 10.244.0.0/16 --ikeport 4500 --to --id 10.0.102.23 --host 10.0.102.23 --client 10.202.0.0/16 --ikeport 4500 --dpdaction=hold --dpddelay 30]
002 "submariner-cable-c1-10-0-102-23-1-1": added IKEv2 connection
002 "submariner-cable-c1-10-0-102-23-1-1" #5: initiating Child SA using IKE SA #1
189 "submariner-cable-c1-10-0-102-23-1-1" #5: sent CREATE_CHILD_SA request for new IPsec SA
004 "submariner-cable-c1-10-0-102-23-1-1" #5: initiator established Child SA using #1; IPsec tunnel [10.244.0.0-10.244.255.255:0-65535 0] -> [10.202.0.0-10.202.255.255:0-65535 0] {ESP=>0xb5f5bbef <0xf4bd3087 xfrm=AES_GCM_16_256-NONE DPD=active}
2024-08-21T01:21:11.748Z INF ..gine/cableengine.go:226 CableEngine          Successfully installed Endpoint cable "submariner-cable-c1-10-0-102-23" with remote IP 10.0.102.23

·

@yboaron
Copy link
Contributor

yboaron commented Aug 21, 2024

I tried restarting the route container and gateway container, but the result was the same when executing inside the gateway container, as follows:

bash-5.2# ip route
default via 10.0.102.1 dev ens192 proto static metric 100
10.0.102.0/24 dev ens192 proto kernel scope link src 10.0.102.64 metric 100
unreachable 10.102.0.0/16 proto bird
unreachable 10.202.0.0/16 proto bird

In GW node the hostnetworking routes to remote cluster CIDRs configured in table 150,
try checking :
ip route show table 150

@lgy1027
Copy link
Author

lgy1027 commented Aug 23, 2024

I tried restarting the route container and gateway container, but the result was the same when executing inside the gateway container, as follows:

bash-5.2# ip route
default via 10.0.102.1 dev ens192 proto static metric 100
10.0.102.0/24 dev ens192 proto kernel scope link src 10.0.102.64 metric 100
unreachable 10.102.0.0/16 proto bird
unreachable 10.202.0.0/16 proto bird

In GW node the hostnetworking routes to remote cluster CIDRs configured in table 150, try checking : ip route show table 150

cluster1 GW node:

10.96.0.0/16 dev ens192 proto static scope link src 10.202.44.192 
10.244.0.0/16 dev ens192 proto static scope link src 10.202.44.192 

cluster2 GW node:

10.102.0.0/16 dev ens192 proto static scope link src 10.244.187.128 
10.202.0.0/16 dev ens192 proto static scope link src 10.244.187.128

@lgy1027
Copy link
Author

lgy1027 commented Aug 28, 2024

When using submarine, the Calico encapsulation mode VXLAN needs to be modified, so that the environment that originally did not collapse the subnet now needs to use Overlay, which will affect the network performance of the original Pod.
iperf for bandwidth testing
Node(VM) room

iperf3 -s
[root@a-111 heros-deploy]# iperf3 -c 10.0.102.112 --bidir -f M
Connecting to host 10.0.102.112, port 5201
[  5] local 10.0.102.111 port 43702 connected to 10.0.102.112 port 5201
[  7] local 10.0.102.111 port 43706 connected to 10.0.102.112 port 5201
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec  1005 MBytes  1004 MBytes/sec  576   1.89 MBytes
[  7][RX-C]   0.00-1.00   sec  1.07 GBytes  1098 MBytes/sec
[  5][TX-C]   1.00-2.00   sec  1.02 GBytes  1046 MBytes/sec  445   1.69 MBytes
[  7][RX-C]   1.00-2.00   sec  1.09 GBytes  1116 MBytes/sec
[  5][TX-C]   2.00-3.00   sec  1.07 GBytes  1095 MBytes/sec   20   1.70 MBytes
[  7][RX-C]   2.00-3.00   sec  1.06 GBytes  1081 MBytes/sec
[  5][TX-C]   3.00-4.00   sec  1.02 GBytes  1041 MBytes/sec    0   2.00 MBytes
[  7][RX-C]   3.00-4.00   sec  1.08 GBytes  1108 MBytes/sec
[  5][TX-C]   4.00-5.00   sec   974 MBytes   974 MBytes/sec    0   2.03 MBytes
[  7][RX-C]   4.00-5.00   sec  1.09 GBytes  1114 MBytes/sec
[  5][TX-C]   5.00-6.00   sec   976 MBytes   975 MBytes/sec    0   2.09 MBytes
[  7][RX-C]   5.00-6.00   sec  1.09 GBytes  1118 MBytes/sec
[  5][TX-C]   6.00-7.00   sec   947 MBytes   947 MBytes/sec    0   2.34 MBytes
[  7][RX-C]   6.00-7.00   sec  1.07 GBytes  1099 MBytes/sec
[  5][TX-C]   7.00-8.00   sec   993 MBytes   993 MBytes/sec    0   2.63 MBytes
[  7][RX-C]   7.00-8.00   sec  1.06 GBytes  1088 MBytes/sec
[  5][TX-C]   8.00-9.00   sec  1010 MBytes  1010 MBytes/sec  492   1.72 MBytes
[  7][RX-C]   8.00-9.00   sec  1.09 GBytes  1117 MBytes/sec
[  5][TX-C]   9.00-10.00  sec  1.04 GBytes  1060 MBytes/sec    0   2.13 MBytes
[  7][RX-C]   9.00-10.00  sec  1.07 GBytes  1098 MBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  9.91 GBytes  1015 MBytes/sec  1533             sender
[  5][TX-C]   0.00-10.00  sec  9.91 GBytes  1014 MBytes/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  10.8 GBytes  1104 MBytes/sec  397             sender
[  7][RX-C]   0.00-10.00  sec  10.8 GBytes  1104 MBytes/sec                  receiver

Pod room across Node:Using CrossSubnet

root@ws-a-111:/# iperf3 -c 10.248.242.3 --bidir -f M
Connecting to host 10.248.242.3, port 5201
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  9.34 GBytes   956 MBytes/sec  2904             sender
[  5][TX-C]   0.00-10.04  sec  9.34 GBytes   952 MBytes/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  10.0 GBytes  1024 MBytes/sec  1408             sender
[  7][RX-C]   0.00-10.04  sec  10.0 GBytes  1020 MBytes/sec                  receiver

UseAlways:

root@ws-a-111:/# iperf3 -c 10.248.242.4 --bidir -f M
Connecting to host 10.248.242.4, port 5201
...
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-10.00  sec  1.99 GBytes   204 MBytes/sec  154             sender
[  5][TX-C]   0.00-10.04  sec  1.98 GBytes   202 MBytes/sec                  receiver
[  7][RX-C]   0.00-10.00  sec  2.88 GBytes   295 MBytes/sec  255             sender
[  7][RX-C]   0.00-10.04  sec  2.88 GBytes   294 MBytes/sec                  receiver

It can be seen that when using CrossSubnet, there is basically no difference between the container and the host. However, when changing to Always, the throughput dropped significantly after many tests, basically dropping by 2/3. Is this normal?

@yboaron
Copy link
Contributor

yboaron commented Sep 2, 2024

That's correct, CrossSubnet has better performance compared to VxLAN overlay,
we recommend using VxLAN overlay for submariner inter-cluster traffic because we noticed that infra/security-group rules blocked incoming traffic in some environments because the source IP address was not in the CIDR range of the cluster.

Let's go back to the submariner inter-cluster datapath segments :

  1. Egress : podA@non_gw_node@cluster1 --> gw_node@cluster1 via vx-submariner
  2. Egress : gw_node@cluster1 -> gw_node@cluster2 via IPSec tunnel
  3. Ingress: decrypt IPSec packet
  4. Ingress: Calico should forward the packet to podB IP
  5. Egress : podB@non_gw_node@cluster2 --> gw_node@cluster2 via vx-submariner
  6. Egress : gw_node@cluster2 -> gw_node@cluster1 via IPSec tunnel
  7. Ingress: decrypt IPSec packet
  8. Calico should forward the packet to podA IP

In step 4 (in cluster B), for example, when Calico is set to CrossSubnet, the source IP address is podA IP and the destination IP is podB IP.

If Calico is configured for VxLAN overlay, packet will be encapsulated in VxLAN (source IP = IP of GW node, destination IP = IP of the node where podB is running).

If in your environment inter-cluster traffic is fine with Calico configured to CrossSubnet you can stick with this configuration.

@dfarrell07
Copy link
Member

@lgy1027 Any more discussion here or can we close this?

@maayanf24
Copy link
Contributor

@lgy1027 closing the issue for now. Feel free to reopen if you still need any help.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Submariner 0.19 Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Calico datapath Datapath related issues or enhancements need-info support
Projects
Status: Done
Development

No branches or pull requests

4 participants