-
Notifications
You must be signed in to change notification settings - Fork 51
Unterminated ixgbe NICs
**10Gb Network Performance issues with unterminated ixgbe NICs **
Date: 04/20/2015
Intel ixgbe based network interface cards (NICs) with driver revision 3.22.02 or lower that have no SFP adapter connected can cause system-wide network delays in the 100-1000ms range when certain commands and utilities attempt to detect the SFP capabilities of the NIC. Other symptoms will include elevated Load Averages and processes in an uninterruptible sleep state. The utilities that attempt to identify the SFP features will wait for multiple 100ms timeouts if no SFP is connected. The waiting task is holding a transport-wide rtnl_lock used to control access to routing and device tables in exclusive mode causing many other unrelated network activities to block for this lock. Common network tasks and services such as the snmpd daemon frequently query the NIC while holding the rtnl_lock.
During a period where network commands such as 'ifconfig' appear to hang, Linux KI data was collected and analyzed. Analysis of the hanging 'ifconfig' command shows that the task spent a lot of time sleeping in stacks related to acquisition of the rtnl_lock:
Process Sleep stack traces (sort by % of total wait time) - Top 20 stack traces
count wpct avg Stack trace
% msecs
===============================================================
6 50.12 1135.401 __mutex_lock_slowpath mutex_lock rtnl_lock devinet_ioctl inet_ioctl sock_ioctl
vfs_ioctl do_vfs_ioctl sys_ioctl tracesys
4 33.76 1146.952 __mutex_lock_slowpath mutex_lock rtnl_lock rtnetlink_rcv netlink_unicast
netlink_sendmsg sock_sendmsg sys_sendto tracesys
1 8.36 1136.040 __mutex_lock_slowpath mutex_lock netlink_dump netlink_recvmsg sock_recvmsg
__sys_recvmsg sys_recvmsg tracesys
1 7.76 1055.000 __mutex_lock_slowpath mutex_lock rtnl_lock dev_ioctl sock_ioctl vfs_ioctl
do_vfs_ioctl sys_ioctl tracesys
The “COOPERATING/COMPETING TASKS REPORT” section for this task will show the other tasks that wake it from these mutex lock sleeps:
******** COOPERATING/COMPETING TASKS REPORT ********
Tasks woken up by this task (Top 10)
PID Count SlpPcnt Slptime Command
138108 1 36.20% 13.720052 -bash
71219 1 14.93% 4.515843 kiinfo (kiinfo)
231 1 7.54% 2.281812 [migration/76]
93828 4 6.43% 2.270062 cmathreshd
Tasks that have woken up this task(Top 10)
PID Count SlpPcnt Slptime Command
144390 11 91.65% 12.456243 /usr/sbin/snmpd
93806 1 8.35% 1.135013 cmahostd
-1 0 0.00% 0.000000 ICS
Since the snmpd task appears to be involved with the vast majority of the sleep time, we take a closer look at what it is doing with its time. Note that the snmpd task is also sleeping the majority of the time in the following paths related to querying the ixgbe NIC :
PID 144390 /usr/sbin/snmpd
PPID 1 /sbin/init
********* SCHEDULER ACTIVITY REPORT ********
RunTime : 4.518001 SysTime : 4.511822 UserTime : 0.006178
SleepTime : 35.032811 Sleep Cnt : 325 Wakeup Cnt : 72
RunQTime : 0.001950 Switch Cnt: 326 PreemptCnt : 1
Last CPU : 0 CPU Migrs : 3 NODE Migrs : 0
Policy : SCHED_NORMAL vss : 27542 rss : 2583
busy : 11.42%
sys : 11.41%
user : 0.02%
runQ : 0.00%
sleep : 88.57%
******** COOPERATING/COMPETING TASKS REPORT ********
Tasks woken up by this task (Top 10)
PID Count SlpPcnt Slptime Command
8114 40 100.00% 30.640394 /sbin/rsyslogd (rsyslogd)
71606 12 100.00% 13.496665 /sbin/ifconfig
71591 11 91.65% 12.456243 /sbin/ifconfig
71314 5 100.00% 5.417794 /sbin/ifconfig
71310 1 14.70% 5.311666 kiinfo (kiinfo)
93806 2 2.89% 1.136224 cmahostd
93828 1 0.00% 0.000000 cmathreshd
Tasks that have woken up this task(Top 10)
PID Count SlpPcnt Slptime Command
-1 325 100.00% 35.032811 ICS
93828 1 0.00% 0.000000 cmathreshd
******** SLEEP REPORT ********
Kernel Functions calling sleep() - Top 20 Functions
Count Pct SlpTime Slp% TotalTime% Msec/Slp MaxMsecs Func
317 97.54% 31.9682 91.25% 80.82% 100.846 101.647 msleep
7 2.15% 3.0646 8.75% 7.75% 437.799 1002.136 poll_schedule_timeout
Process Sleep stack traces (sort by % of total wait time) - Top 20 stack traces
count wpct avg Stack trace
% msecs
===============================================================
310 89.23 100.842 msleep ixgbe_read_i2c_byte_generic ixgbe_read_i2c_eeprom_generic
ixgbe_identify_sfp_module_generic ixgbe_identify_module_generic ixgbe_get_settings dev_ethtool dev_ioctl
sock_ioctl vfs_ioctl do_vfs_ioctl sys_ioctl tracesys | ioctl
7 8.75 437.799 poll_schedule_timeout do_select core_sys_select sys_select tracesys |
__select_nocancel 0x702e64706d6e732f
7 2.02 101.032 msleep ixgbe_read_i2c_byte_generic ixgbe_read_i2c_eeprom_generic
ixgbe_identify_sfp_module_generic ixgbe_identify_module_generic ixgbe_get_settings dev_ethtool dev_ioctl
sock_ioctl vfs_ioctl do_vfs_ioctl sys_ioctl system_call_fastpath | ioctl
Other behaviors that can be observed without LinuxKI data include elevated Load Averages due to processing sleeping in an uninterruptible state. The ps -ax command will show these sleepers with a state flag of "D", and they will typically be those processes making the ioctl() syscall into the driver.
Source code review shows that the rtnl_lock is acquired in dev_ioctl() and held for the duration of the subsequent calls in the stack. The attempt to read the SFP ‘type’ via the call to ixgbe_read_i2c_byte_generic() will sleep for 10 iterations each with a 100ms timeout, thus holding the rtnl_lock for ~1 second. Other commands and network code paths requiring the lock will block waiting for the lock. The ixgbe driver revisions 3.22.0.2 and older can exhibit this problem.
A fix for the SFP data retrieval algorithm is available in driver versions 3.22.0.3 and newer.
The latest “HP Service Pack for Proliant”, version 2015.04.0, contains the 3.23.0.79-1 version of the ixgbe driver. This Service Pack is available at http://h17007.www1.hpe.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx.
Connect an SFP termination to the NIC(s).
Identify and stop the processes querying the NIC (such as the snmpd daemon).
- LinuxKI Mainpage
- LinuxKI Basic Documentation
- LinuxKI 7.10 - New!!
- LinuxKI Video Series
-
LinuxKI Warnings
- High System CPU utilization during memory allocations, deallocations, and page faults
- RunQ delays for critical processes can impact performance in a variety of ways
- Performance degradation on Microsoft Windows due to TCP interrupt timeouts
- Microsoft SQLServer scaling issues caused by SQL auto statistics
- Excessive page faults on KVM host
- Large IOs (>1MB) causing performance degradation on servers with PCIe Smart Array Controllers
- Oracle column tracking causing high CPU usage by Oracle processes
- Side-Channel Attack mitigation
- High SYS CPU time by processes reading /proc/stat such
- hugetlb_fault lock contention
- Excessive CPU time in pcc_cpufreq driver
- Excessive poll() calls by Oracle
- High wait time in md_flush()
- High BLOCK SoftIRQ times
- Network Latency Tuned profile
- Power vs. Performance
- Unaligned Direct IO
- NUMA Balancing
- NUMA Off
- SAP DB2 semget
- Semaphore Lock Scaling
- Tasklet IRQs
- Unterminated ixgbe NICs
- Poor Direct IO Reads
- RHEL 7.3 / SLES 12SP2 Multipath bug
- Barrier Writes