Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Calico] TestKernel2Wireguard2Kernel_dual_stack is not stable #671

Open
glazychev-art opened this issue May 26, 2022 · 2 comments
Open

[Calico] TestKernel2Wireguard2Kernel_dual_stack is not stable #671

glazychev-art opened this issue May 26, 2022 · 2 comments

Comments

@glazychev-art
Copy link
Contributor

glazychev-art commented May 26, 2022

Description

TestKernel2Wireguard2Kernel_dual_stack is not stable with Calico.
As we can see in logs - ping IPv6 starts working but very intermittent. And inside the client's logs, we see that the datapath healing works.
In the order of the "dst_ip_addrs":["172.16.1.100/32","2001:db8::/128"], we can notice that the IPV4 address is checked first, and it fails.
This indirectly points to this problem - networkservicemesh/sdk-vpp#527.

Build:
https://github.com/networkservicemesh/integration-k8s-kind/actions/runs/2376683993

Logs:

...
time=2022-05-24T11:57:40Z level=info msg=NSC=$(kubectl get pods -l app=alpine -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:57:40Z level=info msg=NSE=$(kubectl get pods -l app=nse-kernel -n ${NAMESPACE} --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}') TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:57:40Z level=info msg=kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 2001:db8:: TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:57:45Z level=info msg=PING 2001:db8:: (2001:db8::): 56 data bytes
64 bytes from 2001:db8::: seq=0 ttl=62 time=2.332 ms
64 bytes from 2001:db8::: seq=3 ttl=62 time=0.367 ms

--- 2001:db8:: ping statistics ---
4 packets transmitted, 2 packets received, 50% packet loss
round-trip min/avg/max = 0.367/1.349/2.332 ms TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdout
time=2022-05-24T11:57:45Z level=info msg=Defaulted container "alpine" out of: alpine, cmd-nsc, coredns, cmd-nsc-init (init) TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stderr
time=2022-05-24T11:57:45Z level=info msg=kubectl exec ${NSE} -n ${NAMESPACE} -- ping -c 4 2001:db8::1 TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:57:49Z level=info msg=PING 2001:db8::1 (2001:db8::1): 56 data bytes
64 bytes from 2001:db8::1: seq=1 ttl=62 time=0.858 ms
64 bytes from 2001:db8::1: seq=3 ttl=62 time=1.187 ms

--- 2001:db8::1 ping statistics ---
4 packets transmitted, 2 packets received, 50% packet loss
round-trip min/avg/max = 0.858/1.022/1.187 ms TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdout
time=2022-05-24T11:57:49Z level=info msg=kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:58:03Z level=info msg=PING 172.16.1.100 (172.16.1.100): 56 data bytes

--- 172.16.1.100 ping statistics ---
4 packets transmitted, 0 packets received, 100% packet loss TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdout
time=2022-05-24T11:58:03Z level=info msg=Defaulted container "alpine" out of: alpine, cmd-nsc, coredns, cmd-nsc-init (init)
command terminated with exit code 1 TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stderr
time=2022-05-24T11:58:03Z level=info msg=1 TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=exitCode
time=2022-05-24T11:58:03Z level=info msg=kubectl exec ${NSC} -n ${NAMESPACE} -- ping -c 4 172.16.1.100 TestRunFeatureSuiteCalico/TestKernel2Wireguard2Kernel_dual_stack=stdin
time=2022-05-24T11:58:17Z level=info msg=PING 172.16.1.100 (172.16.1.100): 56 data bytes
...

TestRunFeatureSuiteCalico.zip

@glazychev-art
Copy link
Contributor Author

Update:

Got proof that this is the same problem - networkservicemesh/sdk-vpp#527

Trace from the client forwarder:

------------------- Start of thread 0 vpp_main -------------------
Packet 1

03:24:26:630005: af-packet-input
  af_packet: hw_if_index 1 next-index 4
    tpacket2_hdr:
      status 0x20000001 len 118 snaplen 118 mac 66 net 80
      sec 0x628f3326 nsec 0x223534b9 vlan 0 vlan_tpid 0
03:24:26:630009: ethernet-input
  IP4: 02:42:ac:12:00:06 -> 02:42:ac:12:00:05
03:24:26:630010: ip4-input
  UDP: 172.18.0.6 -> 172.18.0.5
    tos 0x00, ttl 63, length 104, checksum 0x2356 dscp CS0 ecn NON_ECN
    fragment id 0x0000
  UDP: 51820 -> 51820
    length 84, checksum 0x0000
03:24:26:630011: cnat-input-ip4
  session not found
  in:host-eth0 out:DELETED 
03:24:26:630014: ip4-lookup
  fib 0 dpo-idx 20 flow hash: 0x00000000
  UDP: 172.18.0.6 -> 172.18.0.5
    tos 0x00, ttl 63, length 104, checksum 0x2356 dscp CS0 ecn NON_ECN
    fragment id 0x0000
  UDP: 51820 -> 51820
    length 84, checksum 0x0000
03:24:26:630015: ip4-receive
    UDP: 172.18.0.6 -> 172.18.0.5
      tos 0x00, ttl 63, length 104, checksum 0x2356 dscp CS0 ecn NON_ECN
      fragment id 0x0000
    UDP: 51820 -> 51820
      length 84, checksum 0x0000
03:24:26:630016: ip4-udp-lookup
  UDP: src-port 51820 dst-port 51820
03:24:26:630017: wg4-input
  Wireguard input: 
    Type: Data
    Peer: 0
    Length: 44
    Keepalive: false
03:24:26:630021: ip4-input-no-checksum
  ICMP: 172.16.1.100 -> 172.16.1.101
    tos 0x00, ttl 63, length 44, checksum 0x6e9f dscp CS0 ecn NON_ECN
    fragment id 0xb248
  ICMP echo_reply checksum 0x73d7 id 45066
03:24:26:630022: l3xc-input-ip4
  l3xc-index:4 lb-index:48
03:24:26:630023: ip4-rewrite
  tx_sw_if_index 22 dpo-idx 48 : ipv4 via 0.0.0.0 DELETED:22 mtu:1440 next:27 flags:[] flow hash: 0x00000000
  00000000: 4500002cb24800003e016f9fac100164ac100165000073d7b00a000116f298b3
  00000020: a76616374567e3539f28a6f5aedf33de85dc0ac9653a3b88fc47f1c5
03:24:26:630024: interface-17-output-deleted
  sw_if_index: 22 
  00000000: 4500002cb24800003e016f9fac100164ac100165000073d7b00a000116f298b3
  00000020: a76616374567e3539f28a6f5aedf33de85dc0ac9653a3b88fc47f1c598a9471e
  00000040: 86a5ac05e9b331922266f2ccdd37dfa99116fa4f4b19d0b8918bc3db141a09fe
  00000060: de40b87aa9e94544f694ccf1df55c2097ba25b61e2d3a7b6
03:24:26:630026: error-drop
  rx:DELETED
03:24:26:630027: drop
  interface-17-output-deleted: interface is deleted

@glazychev-art
Copy link
Contributor Author

Blocked by: networkservicemesh/sdk-vpp#527

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant