-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EF_AF_XDP_ZEROCOPY env var, not working and hence performance of app, degraded or on-par #5
Comments
Looking at https://elixir.bootlin.com/linux/v4.18/source/net/xdp/xdp_umem.c#L43 , vanilla 4.18 kernel requires driver to support It kind of indicates that ixgbe's support for AF_XDP Zerocopy is not compatible with 4.18 (I have not checked RHEL kernel backports though). |
So @maciejj-xilinx do you suggest that, I must bump the kernel version to 5.3+ ? or downgrade the ixgbe driver This might be a problem in general since Centos8/RHEL 8 is usually the deployed OS is as RHEL8+ (given that we are dealing with stability issues etc,) Any advice is appreciated ? |
So @maciejj-xilinx as mentioned by you, a patch was indeed applied to remove the XSK_QUERY_XSK_UMEM support. Here is that patch which killed this., https://patchwork.ozlabs.org/project/netdev/patch/20190213170729.13845-1-bjorn.topel@gmail.com/ So, it does look like I need to find the right driver version for triggering this command. |
It looks that
|
@maciejj-xilinx the latest driver is not compiling with the patch. |
Sorry just saw your message. Yes you are right. |
The minimum driver version with AF_XDP with ZC support for ixgbe is: |
Currently, most extensively Onload with AF_XDP Zerocopy is tested with Ubuntu 20.04 with 5.4.0-42-generic kernel. |
OK thanks @maciejj-xilinx, let me test with that and get some validation numbers and revert. Wanted to let you know that I will start a big compatibility test once the numbers look good on one platform: Combination: VM and container stuff will focus on coming weeks, based on virtio and virtio-hardware-offload. |
My understanding now is that rhel8.2 kernel 4.18.0-193.el8.x86_64 with its in-distro driver ixgbe version 5.1.0-k-rh8.2.0 support AF_XDP zerocopy out of the box. |
Hello @maciejj-xilinx , I have been to validate the above claim. The stack is being created.
I have also verified via Let me run some benchmarks now. |
Hello @maciejj-xilinx , I tried to run some benchmarks on memcached with stock Intel Driver and performance is completely degraded with EF_AF_XDP_ZEROCOPY enabled. Please see details below: Driver version: EF_AF_XDP_ZEROCOPY enabled: The profile enabled is same as above and stacks are being created and have started memcached. The TPs for 16byte Key and 64 byte value size is: As compared to 362K for same payload profile which is a degradation of 50X :D When I revert back to the 5.9.4 driver again I get back the same performance with just starting onload and ZC mode disabled with 5% enhancement. I am not sure whats going on. I will check with Ubuntu for now since you had mentioned its well tested there. |
Hi @maciejj-xilinx , I have been testing the Ubuntu 20.04 release with Onload. It does look relatively stable and numbers in terms of TPS (Transactions Per Second) is like 380K with latency profile on Intel 82599, ixgbe driver, upgraded to 5.9.4. But when I enable the AF_XDP mode in latency.opf, I keep getting this error stack in Please see below image. The server once onloaded, never returns any response. Any help is appreciated. |
Hi @maciejj-xilinx , I decided to dig a bit further on Ubuntu 20.04, We see an exhorbitant time, roughly 75%, being spent in onload epoll routines roughly and just 1% time being spent on memcached actual function for retrieving the values. As a comparison, when memcached runs in kernel mode, please see stack frame snapshot below: Barring do_syscall_64(), which wraps a perf call, the memcached function is being exercised 2.10% times, exactly double! and Poll Mode Driver of ixgbe, is also roughly at 2%. Is there anything we can do to reduce the massive onload overhead ? In the ef_vi mode with Solarflare NiCs, this functions normally accounted for 20% overhead at max. |
This looks like crash during stack clean-up. We have raised internal issue ON-12824 |
With regards to overhead. With latency profile Onload tends to spin - that is busy loop on the network device queues. The test shows throughput of 380K TPS @ 1kB payload. That is roughly 3Gbps - far from saturating the link. And gives indication that link should get saturated at around 1M TPS. Ideally, your client would have at least 1M TPS capacity then. If you are not certain the client has enough ooomph, you can use 3 clients in parallel. |
@maciejj-xilinx is EF_STACK_PER_THREAD a valid option for Onload in AF_XDP mode ? |
The trouble with memcached is that it creates single listen (unless it has changed rcently) socket and by that fact alone only single stack is created as all the accepted sockets end up in the same stack. In our recent whitepaper on memcached with Onload (https://china.xilinx.com/publications/resutls/onload-memcached-performance-results.pdf ) we used a separate memcached instance per core. In the past we have found multiple points of contention in memcached with multiple threads - not just the listening socket. |
I think the above fact is validated by this study. I will also post some numbers shortly for Key=16bytes and Value=64 bytes. |
I think the above fact is validated by this study.
Yes, and I also checked the detailed work on memached benchmarking here. Its little on the older side.
|
I'd expect this to work in general as long as there is enough hardware queues set up on the NIC to allow creation of stacks. |
Any plans on supporting SO_REUSEPORT on AF_XDP for Onload ? |
We are currently working on it. |
Helo @maciejj-xilinx , Any status on the SO_REUSEPORT on AF_XDP for Onload ? |
Hi, we are still figuring out internally how to best share our roadmap. |
Sounds good, thanks for the update @maciejj-xilinx |
@shirshen12, @maciejj-xilinx NIC: OS: Thank you very much. |
@h2cw2l I think it can. Please see below instructions for Onload on XDP for Intel 82599 on RHEL 8.X
Remove unused kernels Install latest Intel ixgbe driver
Enable hugepageshttps://www.golinuxcloud.com/configure-hugepages-vm-nr-hugepages-red-hat-7/ optional till Onload fixes it master branch:git reset --hard e9d90b2 install XDP toolsyum install clang llvm enable the flow directorethtool --features enp1s0 ntuple on enable port 11211firewall-cmd --zone=public --permanent --add-service=memcache install python2 for our benchmarking scriptyum install python2 install screendnf install epel-release -y optional stuffadduser s.chakrabarti |
Deferring oo_exit_hook() fixes a stuck C++ application: #0 0x00007fd2d7afb87b in ioctl () from /lib64/libc.so.6 #1 0x00007fd2d80c0621 in oo_resource_op (cmd=3221510722, io=0x7ffd15be696c, fp=<optimized out>) at /home/iteterev/lab/onload_internal/src/include/onload/mmap.h:104 #2 __oo_eplock_lock (timeout=<synthetic pointer>, maybe_wedged=0, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:35 #3 __ef_eplock_lock_slow (ni=ni@entry=0x20c8480, timeout=timeout@entry=-1, maybe_wedged=maybe_wedged@entry=0) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/eplock_slow.c:72 #4 0x00007fd2d80d7dbf in ef_eplock_lock (ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/onload/eplock.h:61 #5 __ci_netif_lock_count (stat=0x7fd2d5c5b62c, ni=0x20c8480) at /home/iteterev/lab/onload_internal/src/include/ci/internal/ip_shared_ops.h:79 #6 ci_tcp_setsockopt (ep=ep@entry=0x20c8460, fd=6, level=level@entry=1, optname=optname@entry=9, optval=optval@entry=0x7ffd15be6acc, optlen=optlen@entry=4) at /home/iteterev/lab/onload_internal/src/lib/transport/ip/tcp_sockopts.c:580 #7 0x00007fd2d8010da7 in citp_tcp_setsockopt (fdinfo=0x20c8420, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/tcp_fd.c:1594 #8 0x00007fd2d7fde088 in onload_setsockopt (fd=6, level=1, optname=9, optval=0x7ffd15be6acc, optlen=4) at /home/iteterev/lab/onload_internal/src/lib/transport/unix/sockcall_intercept.c:737 #9 0x00007fd2d7dcb7dd in ?? () #10 0x00007fd2d83392e0 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so #11 0x000000000060102c in data_start () #12 0x00007fd2d8339540 in ?? () from /home/iteterev/lab/onload_internal/build/gnu_x86_64/lib/transport/unix/libcitransport0.so #13 0x00000001d85426c0 in ?? () #14 0x00007fd2d7fcbe08 in ?? () #15 0x00007fd2d7a433c7 in __cxa_finalize () from /lib64/libc.so.6 #16 0x00007fd2d7dcb757 in ?? () #17 0x00007ffd15be6be0 in ?? () #18 0x00007fd2d834f2a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2 Here, _fini() is a function that calls all library destructors. The problem is that _fini() decides to run the C++ library destructor *after* Onload and makes it operate on an invalid Onload state. The patch leverages the fact that Glibc sets up _fini() after running the last library constructor, so by manually installing the exit handler (instead of providing a library destructor), Onload wins the race with _fini(). There's still an issue if the user library sets a custom exit handler with atexit() or on_exit() and makes intercepted system calls from there. Tested: * RHEL 7.9/glibc 2.17 * RHEL 8.2/glibc 2.28 * RHEL 9.1/glibc 2.34 Thanks-to: Richard Hughes <rhughes@xilinx.com> Thanks-to: Siân James <sian.james@xilinx.com>
Hello Onload Team,
I have been testing Onload with AF_XDP support on 10 GBe Intel 82599. I have updated the driver of the NiC to 5.9.4. This version has the support for Zero Copy primitive of AF_XDP as referenced here:
But when I start onload by editing the latency profile (made a new copy to latency-af-xdp.opf), please see below:
and start the application, I see no stacks being created.
Also, as a result, the config var is not set either.
Requesting your team to look at this.
Please see: When EF_AF_XDP_ZEROCOPY is not enabled, the stacks are created fine and with ixgbe ZC enabled, for payloads under 3KB, I do see a marginal increase of 5% in throughput.
The text was updated successfully, but these errors were encountered: