Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROKS(Calico): Submariner RA failed to discover CNI interface after node reboot #3120

Closed
yboaron opened this issue Aug 14, 2024 · 1 comment · Fixed by #3121
Closed

ROKS(Calico): Submariner RA failed to discover CNI interface after node reboot #3120

yboaron opened this issue Aug 14, 2024 · 1 comment · Fixed by #3121
Assignees
Labels
bug Something isn't working Calico datapath Datapath related issues or enhancements release-note-needed Should be mentioned in the release notes

Comments

@yboaron
Copy link
Contributor

yboaron commented Aug 14, 2024

I deployed Submariner (0.18) on ROKS, after node reboot Submariner RA failed to discover the CNI interface.

Checking the logs it seems that Calico CNI interface (tunl0) doesn't exist when RA pod starts running post-reboot.

This is the order of events based on logs details:

A. After rebooting the node pods running on it are restarted
B. Node's status changed to NotReady
C. Submariner RouteAgent pod starts running on node

  • C.1 RouteAgent submariner-routeagent-init waits until node status is READY
  • C.2 RouteAgent submariner-routeagent fails to discover Calico CNI interface on node.

D. Calico-node pod creates Calico CNI interface

@yboaron yboaron added the bug Something isn't working label Aug 14, 2024
@yboaron yboaron self-assigned this Aug 14, 2024
yboaron added a commit to yboaron/submariner that referenced this issue Aug 14, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: submariner-io#3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
@yboaron yboaron added Calico datapath Datapath related issues or enhancements labels Aug 15, 2024
@maayanf24 maayanf24 moved this to In Review in Submariner 0.19 Aug 18, 2024
@skitt skitt closed this as completed in 47a302c Aug 19, 2024
@github-project-automation github-project-automation bot moved this from In Review to Done in Submariner 0.19 Aug 19, 2024
@dfarrell07
Copy link
Member

We think we'd like to have RelNotes for this but we're interested in how #3110 resolves before writing it up.

@dfarrell07 dfarrell07 added the release-note-needed Should be mentioned in the release notes label Aug 20, 2024
yboaron added a commit to yboaron/submariner that referenced this issue Aug 21, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: submariner-io#3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
yboaron added a commit to yboaron/submariner that referenced this issue Aug 21, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: submariner-io#3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
yboaron added a commit to yboaron/submariner that referenced this issue Aug 21, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: submariner-io#3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
tpantelis pushed a commit that referenced this issue Aug 21, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: #3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
yboaron added a commit that referenced this issue Aug 28, 2024
We noticed that even though RA pod starts running only after the node is
ready, sometimes the kube-proxy handler fails to discover the CNI interface a
fter the node is rebooted.

This PR adds a retry to CNI discovery.

Fixes: #3120

Signed-off-by: Yossi Boaron <yboaron@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Calico datapath Datapath related issues or enhancements release-note-needed Should be mentioned in the release notes
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants