Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: USART IRQ priority for external module #4047

Merged
merged 1 commit into from
Sep 27, 2023
Merged

fix: USART IRQ priority for external module #4047

merged 1 commit into from
Sep 27, 2023

Conversation

raphaelcoeffic
Copy link
Member

Should fix missing bytes in CRSF telemetry stream.

@raphaelcoeffic raphaelcoeffic added this to the 2.9.1 milestone Sep 15, 2023
@pfeerick
Copy link
Member

pfeerick commented Sep 23, 2023

Was this the real reason you hid the error counter from me? 🤪 Anything particular should look out for other than "yay! ELRS still works!!":tm:?

@raphaelcoeffic
Copy link
Member Author

Anything particular should look out for other than "yay! ELRS still works!!"™️?

Well, you should not have the "telemetry lost / recovered" anymore. You remember that periodic thingy?

@pfeerick
Copy link
Member

Strangely I don't seem to have experienced them AFAICT... maybe because I don't use crazy fast baud rates and packet rates, and expect everything to work perfectly? 🤪

Should fix missing bytes in CRSF telemetry stream.
@pfeerick pfeerick added bug 🪲 Something isn't working telemetry 📶 labels Sep 25, 2023
@raphaelcoeffic
Copy link
Member Author

Strangely I don't seem to have experienced them AFAICT... maybe because I don't use crazy fast baud rates and packet rates, and expect everything to work perfectly? 🤪

It happened for me at anything above 400k. I use "100Hz Full" (might be important somehow). @3djc would you like to comment on it as well?

@pfeerick
Copy link
Member

pfeerick commented Sep 27, 2023

The above 400k baud is probably the important factor, as I hinted at prior ;) PR was rebased so it could be a testing ground for 2.9.1 release, nothing stands out as a problem so far. Have since flight tested this with TX16S for several flights now and no issues. TX12MKII, iFlight Commando8 and TLite all seem fine also. Tried "100hz Full" on X9D+2019 and TX16S with HM ES24TX since you mentioned the PWM mode, and bar a lone warning on changing the mode on the X9D+, it has behaved perfectly.

@pfeerick pfeerick merged commit 5c202da into 2.9 Sep 27, 2023
@pfeerick pfeerick deleted the 2.9-crsf-frames branch September 27, 2023 02:18
@JBKingdon
Copy link

Just wanted to give a shout and say thank you for this work. The telemetry lost/recovered messages always made me nervous and so far things are looking much better.

@msoos
Copy link

msoos commented Oct 15, 2023

Hi All,

This will be a bit long-winded, but stay with me. Basically, this PR seems to have fixed a genuine, full loss of control (not just telemetry loss) bug.

Issue Description

2.9.0 EdgeTX on RadioMaster TX16s, 3.3.0 ELRS FW on both TX+RX, JHEMCU RX24T recevier, BetaFPV 2.4Ghz Micro TX module (non-1W), EU regulations, LBT, 100mw Dynamic, 100Hz 8ch full, 400k baud rate, 25mW telemetry baud rate on the JHEMCU. It 100% reliably disconnects after about 15-20 minutes (reproduced 3x). All systems on PSU power, so full, good power source. If I reboot the RX, no luck, still disconnected. I can reboot multiple times the RX, still disconnected. If I reboot the TX, immediately reconnects. Radio quality is 100% throughout (radio is few meters from receiver). Baud rate is 400k for the (external) TX, Mode CRSF. Internal TX turned off.

If I set the ELRS channel rate to 333Hz 8ch full then I don't disconnect after 40+mins. So that seems to fix it. Thanks to Malte (on rxvideoreviews Discord), I now upgraded to 2.9.1 EdgeTX as he said that that EdgeTX 2.9.0 + LBT + 3.3.0 ELRS may have been a bad combo due to this bug. I upgraded to 2.9.1 EdgeTX (everything else stays the same, i.e. 100Hz 8ch full), and now it's stable.

More Info: another TX (but I can test if you like)

This is almost surely not an RX bug. I started investigating because the exact same behaviour happened in the field, on a plane, in the air, that had a Matek 24-V RX (i.e. combined ELRS RX + CRSF converter to PWM). Same thing, i.e. disconnect mid-flight, I had about 10-20s intermittent control, though, so I managed to land. After landing, zero control, even after RX reboot (multiple times). Immediate re-connect after TX reboot. That was the first time I started using 100Hz 8ch full instead of 333Hz. So this seems to be reproducible on 2 different RX-es (but same TX).

How can I help?

Would you like me to get some debug info? I am happy to build new FW, put it on the TX&RX, reproduce the bug, and post the debug data. I think it'd be good to be able to debug. I can also try to reproduce with another TX and another RX. My hunch is that this is 100% reproducible with different TX&RX. Almost surely another RX, but I think this is a SW bug so likely easy to repro both TX&RX.

Conclusions

This seems to fix a SERIOUS bug of full loss of control at 100Hz 8ch full after 10-20 mins. If indeed you are convinced this is a serious bug fix, should we mark 2.9.1 as an "IMPORTANT FIX" or something? I fear people may not realize what kind of big mess they may be in if they use 100Hz 8ch full.

@pfeerick
Copy link
Member

If you get a chance, instead of rebooting the transmitter, try turning the RF module OFF and ON again (or just pulling the module out and plugging it back in again), to see if it is actually the handset that needed power cycling, or the TX module. You mention the BetaFPV 2.4Ghz Micro TX - does that one still have an OLED screen on it? If so, does it say the link is still connected or not?

@mha1
Copy link
Contributor

mha1 commented Oct 15, 2023

It's a genuine yet very sporadic ELRS or BetaFPV problem. I can reproduce this occasionally, a friend of mine too on a none-ETX/OTX radio. The only thing that brings back the link is power cycling the module (pull it out and plug it in again while the radio is running).

So far this was only reproduced with BetaFPV 500mW and 1W modules in EU_LBT configuration. Both modules are known for not having a TCXO which might contribute to the very sporadic nature of the link loss.

I reported and (hopefully) fixed this in ELRS 3.2. The fix was accidentally reverted in 3.3 and was reintroduced in 3.x.x-maintenance a few days again.

ExpressLRS/ExpressLRS#2028

@pfeerick
Copy link
Member

Both modules are known for not having a TCXO which might contribute to the very sporadic nature of the link loss.

That was what came to mind at first, except for this being oddly more specific to 100hz packet mode. And nasty, didn't see that the LBT collision issue got re-introduced... hopefully it shouldn't be long until the next ELRS 3.3.x release so that can get put back to rest. 🤞

@msoos
Copy link

msoos commented Oct 15, 2023

Thanks all! I am going to downgrade to 2.9.0 EdgeTX, reproduce the bug, then upgrade ELRS to 3.x.x-maintenance on the TX, to see if the now re-fixed bug actually fixes the issue. Just to be 100% sure. I'll let you know how it goes.

PS: (1) Thanks for confirming that I am not completely imagining things, and (2) I was concerned about XO in the JHEMCU, which is very bad (see: https://www.expresslrs.org/hardware/crystal-frequency-error/), but didn't realize the BetaFPV also has missing TCXO -- I will be buying higher quality HW in the future.

@mha1
Copy link
Contributor

mha1 commented Oct 15, 2023

Both modules are known for not having a TCXO which might contribute to the very sporadic nature of the link loss.

That was what came to mind at first, except for this being oddly more specific to 100hz packet mode. And nasty, didn't see that the LBT collision issue got re-introduced... hopefully it shouldn't be long until the next ELRS 3.3.x release so that can get put back to rest. 🤞

It's not specific to 100Hz, we also saw it on 50Hz and on both BetaFPV modules (500mW and 1W). There doesn't seem to be any correlation to which receivers are used. I had it on RM ER6 and BetaFPV nano. Especially the ER6 is a very stable one with TCXO. I did rule out receivers. It's a problem that only showed on BetaFPV modules (unless proven otherwise). There is btw no indication of the link loss on the display and even LUA responds as expected.

I actually had this in flight as I was dumb enough to assume my 2.4GHz polluted bench environment was the culprit. Fortunately I knew what I bargained for and remained cool enough to unplug/plug the module while the plane was still in the air.

@pfeerick
Copy link
Member

pfeerick commented Oct 15, 2023

There is btw no indication of the link loss on the display and even LUA responds as expected.

insert four-letter swear word of choice here

Yeah, as long as either the TX or the RX has a TXCO, drift shouldn't be that much of an issue (since it's more when both the TX AND the RX drifting in opposite directions that things go horribly wrong... don't ask me how I know that 😖). Damn, just when I was thinking the BetaFPV modules were actually pretty decent. Now you have me wondering if my BetaFPV Nano module is going to play up at some point.

@mha1
Copy link
Contributor

mha1 commented Oct 15, 2023

So far I couldn't reproduce this in ISM configuration and there was no report so far from anyone else. I am very confident EU_LBT is a piece of the puzzle that's required for failure. Lucky you - you don't have to fly according to shitty EU regulations

@3djc
Copy link
Collaborator

3djc commented Oct 15, 2023

Flying 100Hz 8ch full on EU_LBT with RM module and receiver. Haven't seen any issue yet, despite lengthy flight times as those are on gliders. Keeping finger crossed

@mha1
Copy link
Contributor

mha1 commented Oct 15, 2023

Hope I didn't make everyone crazy about this. I am flying various models with my Jumper Nano and RM ERx and BetaFPV SuperP (with TCXO!) with no issues. I am sure BetaFPV in EU_LBT configuration have a risk with 3.3.0. 3.3.1 should resolve this again.

Off topic: Jumper AION Nano in EU_LBT configuration with a SuperP onboard a 4m glider at 100mW and HoTT GPS/Vario connected to the SuperP. No issues with pretty good LQI at 900+ meters distance and other pilots next to me (lots of LBT collisions)

Untitled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🪲 Something isn't working telemetry 📶
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants