-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All kinds of weird problems when using more than one or two Atheros cards #114
Comments
All REG related commands go over EP3 and EP4. This EPs was configured for some time as Bulk, then i added patches to revert this behavior back to Interrupt mode. Latest kernel and FW should include some optimization for this EPs.
I assume, it should be possible to reduce this problems by mowing as match functionality as possible to the FW. So we avoid direct REG read/write operations. Currently without FW changes, you can reduce EP3/EP4 load by disabling LED and bgscan, if this are enabled. |
Kernel is 4.4.11, ath9k_htc-firmware is version 1.4. LED blink is already disabled. It is definitely not a power or cable issue (see my initial post). How do I disable bgscan? Couldn't find anything about it with a quick search. Or do you mean scans that get initiated by network-manager or wpa-supplicant etc.? Network-manager, wpa-supplicant and everything else that might mess with the cards is disabled. Only thing I do is bring up the interface, set monitor mode and channel (all via plain ifconfig/iwconfig) then I start the rx program that puts the cards into promiscous mode. Regarding not enough USB bandwidth: Well, I have never really measured USB bandwidth on the Pi, but it works flawless with four Ralink cards. So far I didn't notice it being slow when copying files over the USB ethernet connection or to an USB memory stick. The incoming data is something around 500 1024bytes packets per second per dongle, so not too much. Regarding the packet size: I read somewhere, that USB max. packet size is 512bytes for bulk transfer. So assuming some overhead, I figured 1024 bytes ethernet packets will probably lead to 2x 512 bytes packets plus 1x 20bytes (or whatever low number) packets on the USB Bus, i.e. something similar to fragmentation in the IP world that is best to be avoided. But lowering the packet size to 900 didn't seem to make a difference. |
hm.. so, what do i need to reproduce this issue? |
I just downloaded the latest Raspbian lite image to have a fresh clean install. Could not reproduce the issue with putting the cards in promiscous mode when I gave it a quick try. However, as soon as I plug in the 3rd AR9271 stick, my USB keyboard does strange things (like repeating characters that I did not press and being laggy) and stops working completely shortly after. When starting the Pi3 with the three sticks already attached, the keyboard keeps working, but I end up with only one card being shown with iwconfig. Not sure if that is something easy for you to reproduce. If I can do something to narrow it down some more I'll gladly do that. |
Did some more testing. One issue really seems to be switching the cards to promiscous mode. I have now changed this line in the code of the rx program that receives the packets: to this: to disable putting the cards into promiscous mode, and the problem that the cards stop receiving traffic shortly after the rx program has been started sometimes is gone. But the problem that cards don't get initialized if more than two or three cards are used is still there. |
In promiscous mode the card will dump everything what it get from the air over usb. It means, depending on environment you will get different results. |
The rx program still works as intended with promisc mode disabled, I guess this is because wireless cards in monitor mode forward all traffic anyway (?) As far as I see it in the code, it sets a pcap filter and uses a special mac address to filter traffic:
I will try to see what happens on a linux notebook. BTW (little offtopic): To optimize that further, would it in theory be possible to filter out traffic based on mac addresses already in the atheros firmware, so that it doesn't even reach the host? And basically don't listen to other packets at all, i.e. abort reception and listen again as soon as the header with macs has been read? |
s/promiscous/monitore you are right... where are my thoughts IMO filtering in the firmware should be good point. @erikarn , what do you think? I think this should help some other users too. |
Compiled a kernel now with ath9k_htc debugging enabled on the fresh Raspbian lite image. Kernel has stock Raspbian config (just some un-needed unrelated drivers removed to speed up compilation on the Pi) and is the same version as the stock Raspbian one. two sticks already plugged before power-on, then plugged the third stick:
After some tries, three sticks connected before power-up worked, but when plugging the 4th one:
In both cases "modprobe -r ath9k_htc" made atleast the USB subsystem crash (ssh connection and usb keyboard didn't work anymore). I'll see that I configure the serialport to see what is going on then. |
That's weird, seeing transfer timeouts is a big problem. Can you enable -a |
The dmesg log above was with ath9k_htc module parameter debug=0xffffffff, can I enable more debugging somewhere? Here is a .pcap made with the usbmon module while plugging in a third card (two were already connected during boot-up). The corresponding dmesg output: Please note, this was not made on the fresh Raspbian image, but on my EZ-Wifibroadcast image which has a different kernel with 1000Hz timer frequency.Not sure if that makes a difference, I can do it again with the other clean Raspbian image with stock kernel if needed. If you are wondering about the strange firmware version, it's a 1.4 firmware, only change is it has an 18mbit fixed bitrate, I made different firmware files for different fixed bitrates. |
can you recompile the kernel with this change? #define MAX_REG_OUT_URB_NUM 1 #define MAX_REG_IN_BUF_SIZE 64 |
Tried the patch on the stock raspbian 4.4.32 kernel on the clean image now, I think it makes no difference. This is a complete dmesg log from bootup with four adapters installed. Will do more testing ...
|
Did another test on the clean image with the above patch. Three cards were plugged in at bootup already, then plugged in the 4th card:
usbmon capture: |
But 3 cards work stable now? |
I would like share my experiences in trying to use multiple AR9271 USB Wi-Fi dongles (TP-Link WN722N) with Olimex Olinuxino Lime 2 (OSHW). When I tried to connect more than one USB dongle to one USB port (or even two USB ports) on the Lime2, I had issues with device showing up in lsusb but not in ifconfig. Then I used an unpower USB hub with mostly similar results. Suspecting a power issue, I switched to a powered USB hub. I was able to work with at least 4 USB devices. When I insert the USB devices one after the other every thing works. I could even say they they work reliably. However, after a reboot they don't get detected. After a reboot sometimes only one device shows up. Removing one device and inserting it back again does gets all devices working again. There are no messages about the device detection in dmesg (such as "new high-speed USB device..."). Unfortunately, I am unable to provide logs at this moment. Please let me know if they will be useful so I can put in some time for that. By the way, a huge thank you to @olerem and everyone else working on this project and Debian inclusion. AR9271 based dongles are the only solution I know of for free software Wi-Fi on single board computers. This means a lot to the FreedomBox project. |
Hm... currently we start all adapters almost at once. According to wireshark capture it generate lots of traffic, especially the interrupt endpoint. By reducing number on Interrupt pipes looks like we can start at least 3 adapters at same time. @rodizio1 correct? |
Yes, three cards connected during boot-up worked. But I did not test very often, give me a day or two for more testing (also on the other image with the 1000Hz kernel) to really make sure it wasn't just coincidence. |
Did a lot more testing with both the clean image and the other with the 1000Hz Kernel. Reducing the interrupt pipes didn't help. It was just that I was logged in over serialport and did not use the ethernet port (which is connected over usb on the pi ...). Three cards work as long as there is nothing else happening on the USB bus. Four cards never work. |
Uff.. ok. |
@rodizio1 , is it your patch http://www.spinics.net/lists/linux-wireless/msg156697.html ? |
Not using any patches, just changed the kernel .config to use 1000Hz. So using 1000Hz should actually make it better? Have tried soldering some wires to the Atheros Chip for serial debugging. Failed miserably, card is toast now :( |
Yes and not, 1000Hz would increase system/interrupt load. But change behavior of timeout for this driver... |
Hey @rodizio1, did you ever find a solution to this? I'm having the same troubles with the 3 AR9271 cards + 1 ethernet adapter on the same hub. Was previously working with 2 AR9271 cards + 1 non-AR9271 card + 1 ethernet adapter, just not with 3 of AR9271 ones. Getting the same errors as you. And in the same way as you, if I don't have the 4th device plugged in (the ethernet adapter), the 3 cards work fine, but as soon as there are 4 devices on the hub, one of them stops working. Sometimes, the ethernet adapter is one of those things. But if AR9271 cards make up 2 or fewer of the 4 devices, then there are no problems, even if all 4 ports are being used. Apart from what you guys have discussed above, I only found this VirtualBox ticket, where someone attributes this to:
which you seem to have mentioned above already. |
@rodizio1 Disregard that, I hadn't read carefully once again. I'll apply the patch and see what happens. Have you had luck with it? |
I applied the patch from the link and confirmed it was patched, but the problem persists in the exact same way when using more than 2 AR9271 cards. Which is weird because it was pulled into the upstream already...
The patch is supposed to make it timeout after 1 second instead of 0.1 seconds, but judging by the log, there's no change in the time before the the "Service connection timeout" occurs, which is around 300 ms. I changed the code to 3 seconds, 5 seconds, and 10 seconds, and it has no effect at all, it still times out after less than a second. |
I went back to
I wonder why this is. Isn't 4.9.13+ supposed to be ahead of 4.4.50+? I hope this issue won't seep into a stable kernel in the future. EDIT: After some repositioning of the USB plugs in their respective ports, we're back to not initialization failure on more than 3 AR9271 devices, even on this kernel. EDIT2: I seem to have found a tenuous workaround (after applying the patch) for the 3 cards, at least for the last 2 boots. On a fresh install of 4.4.50 kernel Raspbian Lite, I do the following:
|
Sorry for my late reply melikyuksel, didn't realize you were posting here. Well, what you describe is exactly what I'm seeing. It appears to be more or less random and also being dependent on other stuff using the USB bus. Three cards is 99% now for me, as long as I make sure no other stuff is running during card initialization and no network cable is plugged. Oh and LED blinking disabled. Maybe we should open another issue on the Raspberry Linux kernel Github. |
@rodizio1 Sounds like a good idea. This does seem to be limited to Raspberry Pi kernels; I assume Kali Linux for Raspberry Pi also uses the same kernel as Raspbian, since it has the problem as well. According to the mailing list, it used to be a problem for all of the Linux kernel, but was fixed with that patch above; but applying that patch doesn't fix it for Raspberry Pi, so it must depend on some other new development that the Pi kernel doesn't have yet... or it's a hardware thing. I'll try it out with Arch Linux on the Pi and see what happens. If not, I'll submit an issue on raspberrypi/linux, unless you want to be proactive. :) |
I am also struggling with 4X antenna setups using TPLink W722N. I am trying to create a device that I can reboot and have all the cards come up reliably on kernel 4.x (3.x works oddly enough), I am simply at a loss at what is happening so started collecting details about whats happening. I am not seeing the exact same errors as OP, but rather am having trouble initializing the devices at startup altogether, in the scenarios where all 4 cards work, we haven't noticed any problems (yet) during monitor mode. Perhaps it could be related? I can confirm that power hasn't been an issue at all, as running powered USB hubs, using multi-meters and myriads of hub setups has had no hugely positive results and many times I have had the four cards working all on single hub from a single USB port without additional power. I would note here that our experience with OTG USB such as the one on the Raspberry Pi (both 1 and 2) provoked the issue so badly that we moved to devices that do not use OTG such as the OrangePi Lite as we suspected the OTG just cant handle so many USB devices and especially in this manner. I would note also that we have had similar issues on other devices, even with a proper USB hub, such as the new ASUS Tinkerboard, OrangePi Lite, and OrangePi Zero. Funnily enough, we have no issues with the x86 based Intel Compute Stick, but is an expensive device. We have had less issues on ARM based systems with the Ralink based Mediatek (mt7601u) setup, but is still problematic and unrelated. So using the OrangePi Lite we do not have issues using 3.x based Armbian or DietPi based distro. I am including information on my current setup in development, the OrangePiLite / Armbian nightly Kernel 4.11:
I have not yet tried compiling the driver yet with debug enabled. I also have 6+ dongles available to further provoke or test the limitations as needed. I have tried blacklisting ath9k_htc in a modprobe conf, and then modprobing to restore them to simulate the delay, but get the same problem as the booting behavior where only 2 or so cards become available. Output and Logs: ath9k_htc x4.tar.gz Update 1: I tried compiling the latest driver on an x86 system and transferring the firmware files and rebooting, doesn't seem to have made any difference. |
I am trying to interface TL-WN722N with Angstrom Image on a Cyclone V SoC. I have built the module as static and initialized the firmware. I am using the following kernel version : 4.1.33-ltsi-altera .
I had a read over the linux drivers for ath9k_htc on the official website for TP-Link(the manufacture of wifi dongle), and it says that the Has anyone faced such an issue. |
I think this can be tentatively be closed, now that this was fixed at the end of raspberrypi/linux#2023. Or we could keep it open until the kernel that fixes it becomes stable so |
Do i see it correctly, the fix is related to RPi (or dw) USB host controller and not to atheros driver or firmware? |
config ATH9K_AHB This option enables the AHB bus support in ath9k.
regardless of the USB bus in htc case, AR9271 has built-in wifi mac. maybe some AHB support code could handle these issues properly? |
I found some sort of workaround to my multiple adapter problem, Using kernel 4.19 and two adapters plugged in, I had one of the two adapters failing maybe once every 5-10 reboots. Once in that state, I tried so many things to completly restart the card or pc or turn off USB power without success, but finally managed to do it using a realtimeclock wake alarm. It may be not implemented on every systems but on my NUC pc I do the following: The system will shut down and in my case TURN OFF THE USB POWER, and then restart after 10 seconds. I know this is not ideal but im my case that is a acceptable workaround to make the device boot in a working state. |
I've beeen trying to narrow this issue down for months now, but it's so weird, really don't know what to do further. Already tried disabling LED blinking, but that didn't seem to help much.
Sometimes cards don't get initialized, sometimes they do, but don't receive packets, sometimes they do and receive packets only for a second or two then they crash.
When this occurs, packet reception stops, ifconfig RX bytes/packets does not increase anymore.
Tried to unload the ath9k module, this seems to have crashed the whole USB subsystem, my ssh connection is gone and keyboard also doesn't work anymore.
It doesn't occur always and sometimes the REGISTER READ FAILED message don't seem to cause issues, here it looks like two of the three cards don't work:
But actually, only the first card doesn't work (5.3MiB received vs. 200MiB received):
Next test I wanted to do was to check if setting the cards in promiscous mode has something to do with it, rebooted and tried:
iwconfig wlan0 mode monitor
ifconfig wlan0 up
Now the USB subsystem seems frozen again.
Next try was to only disable the packet reception in my startup script. Rebooted, now I cannot get three cards to work anymore at all. It always hangs while initializing the cards (bringing them up and into monitor mode on the desired channel).
The weird thing is, nothing regarding the wifi cards was changed, just commented out a line in a script that runs way after the initialization of the wifi cards. I know this sounds not logic at all, but I've had such weird behaviour on different occasions. It seems that just changing something completely unrelated makes a difference. Not sure why, maybe because some stuff gets loaded or executed with a few nano or milli-seconds different when files have changed on the sdcard?
Users of my EZ-Wifibroadcast image also reported similar things, some had problems with a newer version although they did not change anything with their hardware or setup and I did just some scripting cleanups and cosmetic stuff, i.e. nothing that would affect the wifi cards at all.
One user reported he could trigger that behaviour just by choosing a different channel, which also seems to make no sense at all.
I also could find forum posts from users trying to use more than two Atheros sticks from more than a year ago, they suffered the same problems.
Next test again with two cards. This time I can succesfully bring them up and issue "ifconfig wlan0 promisc".
Now the weird thing is, when I set the cards to promiscous mode via ifconfig, it seems to always trigger that REGISTER_READ_TIMEOUT message. However, sometimes the card is still able to receive packets, even though the message appeared. After running "ifconfig wlan0 promisc" a few times in a row though, it certainly stops working.
I can also trigger this with only one card. I.e. start-up the system, configure card, run "ifconfig promisc" for maybe 5-10 times and the card crashes.
Some general infos:
It seems to get worse with more cards: Two cards seem to work in 99.9% of cases, three cards in maybe 10%.
Can be reproduced with TPLink 722N/Alfa AWUS036NHA (both AR9271) and TPLINK 822 V2 (AR9287)
It seems related to other stuff using the USB bus. If the ethernet connection on the Pi is in use (which is also connected via USB internally), three cards almost never work.
The same happens with other wifi cards (rt2800usb). 1x ath9k_htc plus 3x rt800usb works, 2x ath9k_htc plus 1x rt2800usb is flaky. 2x ath9k_htc plus 2x rt2800usb is even more flaky.
However, 4x rt2800usb without any Atheros cards is absolutely stable and works 100%.
4x Atheros cards never work, not at all. Even when plugging them each after another, the 4th one always causes all kinds of crashes and issues
When plugging the cards slowly each after another (instead of having all three connected during boot-up), chances to get them working seem to be higher, but it doesn't always work.
It seems to have gotten worse since I changed Kernel timer frequency to 1000Hz (in order to fix a different problem with packets being delayed on the USB-Bus or subsystem somewhere).
It is not power-related or bad cables or something, I have even gone so far as soldering the adapters directly to the Raspberry USB Ports and powering the Pi and the cards directly from a 7.5 Ampere power supply connected to a large Lipo battery. Problem remains.
The problems only occur during initialization of the cards or a few seconds after the startup of the rx program which puts the cards into promiscous mode. If the cards have "survived" past that, they run stable for hours or even days. But just killing and re-starting the rx program (or issuing "ifconfig wlan0 promisc") can make them crash again anytime.
The text was updated successfully, but these errors were encountered: