-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Windows] Use IP and MAC to find virtual management adatper #3641
Conversation
/test-windows-all |
Codecov Report
@@ Coverage Diff @@
## main #3641 +/- ##
===========================================
- Coverage 63.35% 50.18% -13.17%
===========================================
Files 278 248 -30
Lines 39367 35664 -3703
===========================================
- Hits 24941 17899 -7042
- Misses 12472 15968 +3496
+ Partials 1954 1797 -157
Flags with carried forward coverage won't be shown. Click here to find out more.
|
1. After creating HNSNetwork, Windows host creates a virtual management network adapter which takes over the uplink's IP and MAC. Originally the name with a format "vEthernet ($uplink_name)" is used to get the virtual adapter, but it might fail when the name is taken by other adapters. In this change, uses the uplink's IP and MAC to find the adpter, and uses the prefix "vEthernet" as a filter. 2. Remove the virtual adapter name from the name list to search the Windows Node transport interface's IP configuration in agent restart case. This is because the IP is finally moved to OVS bridge interface, which is renamed from the virtual network adapter. So in a restart case, a virtual network adapter with the name format "vEthernet ($uplink_name)" should not exist. Signed-off-by: wenyingd <wenyingd@vmware.com>
/test-windows-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @wenyingd. This LGTM.
I tried it on my existing Windows Node first. I got the following logs from the agent:
ubuntu@ip-10-0-0-25:~$ kubectl -n kube-system logs antrea-agent-windows-fbx9l -f
Directory: C:\host\k\antrea
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 4/14/2022 8:33 PM bin
I0414 20:34:34.151157 6448 log_file.go:99] Set log file max size to 104857600
I0414 20:34:34.225900 6448 agent.go:84] Starting Antrea agent (version v1.7.0-dev-185e0c2.dirty)
I0414 20:34:34.225900 6448 client.go:81] No kubeconfig file was specified. Falling back to in-cluster config
W0414 20:34:34.233898 6448 env.go:83] Environment variable POD_NAMESPACE not found
W0414 20:34:34.235898 6448 env.go:121] Failed to get Pod Namespace from environment. Using "kube-system" as the Antrea Service Namespace
I0414 20:34:34.235898 6448 prometheus.go:171] Initializing prometheus metrics
I0414 20:34:34.235898 6448 ovs_client.go:68] Connecting to OVSDB at address \\.\pipe\C:openvswitchvarrunopenvswitchdb.sock
I0414 20:34:34.236899 6448 agent.go:331] Setting up node network
I0414 20:34:43.286042 6448 agent.go:837] "Setting Node MTU" MTU=8951
I0414 20:34:48.793827 6448 net_windows.go:386] "Creating HNSNetwork" name="antrea-hnsnetwork" subnet="192.168.3.0/24" nodeIP="10.0.0.189/24" adapter=&{Index:11 MTU:9001 Name:Ethernet HardwareAddr:06:5e:47:7f:7f:93 Flags:up|broadcast|multicast}
I0414 20:34:50.430779 6448 net_windows.go:408] "Moving uplink configuration to the management virtual network adapter" adapter="vEthernet (Ethernet) 3"
I0414 20:35:02.896840 6448 net_windows.go:431] "Moved uplink configuration to the management virtual network adapter" adapter="vEthernet (Ethernet) 3"
^C
ubuntu@ip-10-0-0-25:~$ ping 10.0.0.189
PING 10.0.0.189 (10.0.0.189) 56(84) bytes of data.
^C
--- 10.0.0.189 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4084ms
ubuntu@ip-10-0-0-25:~$
After that, connectivity was lost to the instance and I had to force reboot from the AWS console to recover connectivity. I had the same issue after rebooting the Antrea Agent.
However, I tried on a fresh Windows instance, and I didn't observe the issue:
ubuntu@ip-10-0-0-25:~$ kubectl -n kube-system logs antrea-agent-windows-g62fc -f
Directory: C:\host\k\antrea
Mode LastWriteTime Length Name
---- ------------- ------ ----
d---- 4/14/2022 8:49 PM bin
I0414 20:49:37.801997 7584 log_file.go:99] Set log file max size to 104857600
I0414 20:49:37.866647 7584 agent.go:84] Starting Antrea agent (version v1.7.0-dev-185e0c2.dirty)
I0414 20:49:37.867650 7584 client.go:81] No kubeconfig file was specified. Falling back to in-cluster config
W0414 20:49:37.875656 7584 env.go:83] Environment variable POD_NAMESPACE not found
W0414 20:49:37.877655 7584 env.go:121] Failed to get Pod Namespace from environment. Using "kube-system" as the Antrea Service Namespace
I0414 20:49:37.878663 7584 prometheus.go:171] Initializing prometheus metrics
I0414 20:49:37.878663 7584 ovs_client.go:68] Connecting to OVSDB at address \\.\pipe\C:openvswitchvarrunopenvswitchdb.sock
I0414 20:49:37.879668 7584 agent.go:331] Setting up node network
I0414 20:49:37.920852 7584 agent.go:837] "Setting Node MTU" MTU=8951
I0414 20:49:43.122357 7584 net_windows.go:386] "Creating HNSNetwork" name="antrea-hnsnetwork" subnet="192.168.4.0/24" nodeIP="10.0.0.10/24" adapter=&{Index:9 MTU:9001 Name:Ethernet HardwareAddr:06:37:40:9b:4a:09 Flags:up|broadcast|multicast}
I0414 20:49:58.292344 7584 net_windows.go:514] Enabled Receive Segment Coalescing (RSC) for vSwitch antrea-hnsnetwork
I0414 20:49:58.292480 7584 net_windows.go:453] "Created HNSNetwork" name="antrea-hnsnetwork" id="8918EBD5-E86A-4B3F-B6F6-46C485DB0806"
I0414 20:49:58.293621 7584 ovs_client.go:119] Created bridge: 0d1b7b88-5b32-4db1-8d36-1e9e98e73819
...
I don't know if the error in the first instance is something we need to worry about. I know that this corresponds to a different code path in PrepareHNSNetwork
, but I don't know enough about it.
/test-windows-conformance |
I think we should focus on the issue on the first instance. @antoninbas Could you help dump the IP/route configurations from console after the network is lost? These logic happens when HNS network doesn't move the IP to the virtual management adapter (although it is not expected), and agent will try to move the configurations instead. |
@antoninbas I have another question, is OVS working correctly on the Windows Node in your first instance? To verify it, maybe you can try with antrea 1.4? |
@wenyingd Unfortunately I deleted that instance yesterday after I got the new instance working, so I can't collect the information you are asking for :/
I only have RDP access to the instance, so I don't think I could have done that... Once the network goes down, I don't have any access to the instance anymore. Let me know if we can merge this PR. |
Then maybe we could merge this PR first? In my opinion, there should be some different issue for the network connectivity lost, and we can process that issue when it is reproduced and collected enough infomation. What do you think @antoninbas ? |
@wenyingd sound good to me |
@wenyingd could you backport this as needed? |
network adapter which takes over the uplink's IP and MAC. Originally
the name with a format "vEthernet ($uplink_name)" is used to get the
virtual adapter, but it might fail when the name is taken by other
adapter. In this change, uses the uplink's IP and MAC to find the
adpter, and uses the prefix "vEthernet" as a filter.
Windows Node transport interface's IP configuration in agent restart
case. This is because the IP is finally moved to OVS bridge
interface, which is renamed from the virtual network adapter. So in a
restart case, a virtual network adapter with the name format
"vEthernet ($uplink_name)" should not exist.
Fixes #3636
Signed-off-by: wenyingd wenyingd@vmware.com