Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orange Pi 5+; high power consumption and thermals; load average >= 1 #606

Closed
bbklopfer opened this issue Feb 2, 2024 · 76 comments
Closed
Labels
bug Something isn't working stale Issues with a lack of recent activity

Comments

@bbklopfer
Copy link

Hi,

Booted ubuntu-22.04.3-preinstalled-server-arm64-orangepi-5-plus on my 5 Plus. Noticed a few things (in comparison to the Orange Pi issued Debian image):

  • Power consumption increased by around 3-4W (i.e., about 2x the power consumption).
  • Reported temps (sensors) are about ~8C higher, which qualitatively makes sense given the power consumption
    • I am NOT using a PWM fan, so the cooling system shouldn't be affected
  • Load average never goes below 1.00 on a fresh system at idle. Booting up and waiting 20m I get load average: 1.00, 1.00, 0.80 (with the 15m average slowly creeping up)

I am booting from the SD card, eMMC is connected to system and has the Orange Pi Debian image. I tried running the ubuntu-rockchip kernel with the Orange Pi Debian userspace, and I get the same power/thermal/load average results, so it seems (?) like it's a kernel issue.

System has an NVME SSD installed, no wifi card.

Happy to provide any additional info. Thanks!

@JFLim1
Copy link

JFLim1 commented Feb 3, 2024

Have similar experience with v1.29 and current v1.33 Desktop on Opi5-Plus. The load avg is >1 even at idle for significant time (>20min( not sure it will go down <1 if let the system idle longer.

Installed joshua's kernel 5.10.160-28 or 5.10.160-30 on Archlinux the same high load avg >1 exist. at idle Use vendor's kernel 5.10.110-2 or 5.10.160 the load avg is much lower at idle.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 3, 2024

Kind a funny to see the very same bug over the years across various SoC (not only Rockchip). And yes, I experience this as well. However I did not test against vendor images but against mainline 6.8-rc which seems fine.

Anyway, just a guess. Does the load disappear when the NVMe is removed and booted from SD or eMMC?

@JFLim1
Copy link

JFLim1 commented Feb 3, 2024

Does the load disappear when the NVMe is removed and booted from SD or eMMC?

Boot up from NVMe and SD card same high load avg with Joshua's Kernel. In boot case NVMe and EMMC (256GB but empty) is already installed and not remove when boot with SD Card.

Joshua image is stable and so far very good experience on Opi5-Plus.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 4, 2024

Did some tests myself. I doesn't seem to be related to the used type of storage. Tried NVMe, eMMC and SDcard in all combinations either as rootfs or just plain installed. Load always raises to >=1. Bummer...

@bbklopfer
Copy link
Author

@EvilOlaf how was your experience with the 6.8-rc kernels, and where did you grab them from (looks like Armbian offers 6.8-rc1, or did you just roll your own)?

My whole reason for wanting to try a new kernel was due to this issue I was experiencing.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 5, 2024

Was from Armbian.

@Joshua-Riek
Copy link
Owner

@EvilOlaf how was your experience with the 6.8-rc kernels, and where did you grab them from (looks like Armbian offers 6.8-rc1, or did you just roll your own)?

My whole reason for wanting to try a new kernel was due to this issue I was experiencing.

Mainline Linux 6.8 just got HDMI introduced and not all of the hardware is working, specifically the GPU and VPU. You will not be able to run Jellyfin with hardware acceleration if you plan on going down this road. Even when GPU or VPU support comes into mainline Linux I would expect there to be many issues as this bleeding edge software.

This forces most users to use the crappy 5.10 Android kernel. I likely will not look into the load average issue as the kernel is a mess and it's way too much work on a kernel that will likely be dead in a year from now.

@bbklopfer
Copy link
Author

Got it --- thanks for chiming in!

@bbklopfer bbklopfer closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024
@Joshua-Riek
Copy link
Owner

I will keep this open but add a wont fix tag as it's a valid issue.

@Joshua-Riek Joshua-Riek reopened this Feb 5, 2024
@Joshua-Riek Joshua-Riek self-assigned this Feb 5, 2024
@Joshua-Riek Joshua-Riek added bug Something isn't working wontfix This will not be worked on labels Feb 5, 2024
@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 6, 2024

@Joshua-Riek just curious. Whats your opinion of rkr7.1 (5.10.198 I think?) or 6.1 bsp? Is noticed you played with former just a bit and abandoned it.

@Joshua-Riek
Copy link
Owner

I think rkr 7.1 is fine and see no breaking changes, I may bump to this kernel in the future for legacy reasons. As for 6.1 I still do not have the release tag for it. I've started to do some work on the 6.1 kernel from an old snapshot i got back in late October, but i really want a release tag before spending a lot of time inito it.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 6, 2024

Gotcha.

@nyanmisaka
Copy link

I think rkr 7.1 is fine and see no breaking changes, I may bump to this kernel in the future for legacy reasons. As for 6.1 I still do not have the release tag for it. I've started to do some work on the 6.1 kernel from an old snapshot i got back in late October, but i really want a release tag before spending a lot of time inito it.

As far as I know, JeffyCN's kernel-6.1-2024_01_02 tag is the first release of 6.1 bsp. OrangePi also updated their kernel tree not long ago, which also confirmed this.

kernel-6.1-2024_01_02

https://github.com/orangepi-xunlong/linux-orangepi/tree/orange-pi-6.1-rk35xx

@Joshua-Riek
Copy link
Owner

I would still like to see a release tag, but this looks good. I will likely create a fork from this point and start to rebase stuff.

@Joshua-Riek
Copy link
Owner

I dropped WiFi patches, LCD panel patches, and some changes for the Khadas Edge. Because I went through about 200 patches with a ton of merge conflicts, I could have made a few mistakes. But here is the current progress, should be an OK starting point.

https://github.com/Joshua-Riek/linux-rockchip/commits/rockchip-6.1/

@nyanmisaka
Copy link

Some non-essential peripherals should have lower priority if they cannot be easily ported to 6.1.

Btw I dropped the r8125 out-of-tree driver. The original one is a bit outdated.

@Joshua-Riek
Copy link
Owner

Hey @nyanmisaka, do you have gnome wayland working with the 6.1 kernel? I just finished some testing and only X11 would start 🤔

@nyanmisaka
Copy link

Hey @nyanmisaka, do you have gnome wayland working with the 6.1 kernel? I just finished some testing and only X11 would start 🤔

I haven't tried panfork on the 6.1 kernel. But I know that libmali can provide Wayland support for Gnome on Ubuntu 23.10 mantic.
AFBA2BA3-B709-4173-8ABE-1DDD0C02D277

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Feb 8, 2024

So might be worth going the noble route directly?

@nyanmisaka
Copy link

The problem may be whether panfork itself is compatible with the updated panfrost kernel mode driver in 6.1 and the new mali csf firmware, rather than the distro version.

@Joshua-Riek
Copy link
Owner

I just tested Noble and it seems to use llvmpipe sadly, I'll need to try with your 6.1 fork directly with Armbian mantic. Does glmark2 use hw accel in your OS?
Screenshot from 2024-02-08 06-27-41

@nyanmisaka
Copy link

glmark2-wayland requires full OpenGL but libmali only provide GLES. glmark2-es2-wayland works. And the desktop is still accelerated by kworker/u17:1-mali_kbase_csf_sync_upd Applications requiring full OpenGL will not be accelerated.

Screenshot from 2024-02-08 19-55-02

https://github.com/tsukumijima/libmali-rockchip/releases/tag/v1.9-1-b5d7972

@Joshua-Riek
Copy link
Owner

I did test panfork and wayland did not work as mentioned before, then crashed a bit later with the below logs, I've not done much debugging yet:

Feb  7 21:27:53 ubuntu-desktop kernel: [   24.302826] mali fb000000.gpu: Loading Mali firmware 0x1010000
Feb  7 21:27:53 ubuntu-desktop kernel: [   24.305300] mali fb000000.gpu: Mali firmware git_sha: ee476db42870778306fa8d559a605a73f13e455c 
Feb  7 21:27:53 ubuntu-desktop kernel: [   24.737056] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 1227_0
Feb  7 21:31:20 ubuntu-desktop kernel: [  232.566709] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 1272_1
Feb  7 21:31:21 ubuntu-desktop kernel: [  234.137685] mali fb000000.gpu: Invalid CPU access to UMM memory for ctx 3244_19

@nyanmisaka
Copy link

Apparently this is Mali bifrost in the kernel complaining, and panfork doesn't work well with it. You can try downgrading it from g21p0 to g18p0.

https://github.com/JeffyCN/mirrors/commits/kernel-6.1-2024_01_02/drivers/gpu/arm/bifrost

@Calvario
Copy link

Calvario commented Apr 21, 2024

I got the same issue on Armbian (all kernels).

Using the "armbian-config", I disabled the "hdmirx_ctrler" (HDMI Input) on the DTS and the load is now around 0 on idle (CPU temp: 42°C).

Troubleshoot information:

  • On "htop" kworker/4:2+events was in "d" state
  • With "trace-cmd", I did find out that "function=pm_runtime_work" was invoked a lot
trace-cmd record -e workqueue:workqueue_queue_work
trace-cmd report > trace.log
grep -o -e "function=[_a-zA-Z_][_a-zA-Z0-9]*" trace.log|sort|uniq -c |sort -rn
  • After checking "cat /proc/interrupts", I did find out a lot of interrupts from "rk_hdmirx-hdmi" for the same core as kworker.

@EvilOlaf
Copy link
Contributor

Confirmed! Oh my god, this is awesome. Load seems now normal. Thank you!
@Joshua-Riek

@Joshua-Riek
Copy link
Owner

I did notice the Orange Pi 5+ hdmirx spams udev to no end, so this tracks. However, this does bring in the question about a proper solution. There is likely a driver problem specific to this board and hdmirx that should be addressed.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Apr 21, 2024

Well the proper solution would be to dive into the driver and fix it but who should do?

A workaround for example could be to provide a simple dtbo to disable hdmirx.
Also providing a hint (wiki, download page, maybe even at first login?) for the users how to apply and why this is necessary.
On the bottom line the majority of users don't even notice or don't care about this issue so hdmirx should stay enabled by default IMHO.

@Joshua-Riek
Copy link
Owner

I recall seeing hdmirx was disabled when I first looking at the OPI5+ device tree from Xulong's kernel tree. This could well be the reason why. I will see if I can make an overlay to disable hdmirx and send a PR.

@nilo85
Copy link

nilo85 commented Apr 21, 2024

I would suggest this is a two stage rocket, disable the HDMI in by default, add known issue to wiki similar to the built in microphone.

Second stage address driver issue separately.

This way a lot of users can use this board for most cases.

@Joshua-Riek
Copy link
Owner

I'm worried about disabling HDMI in by default, as users who update their system will no longer be able to use HDMI in and may cause some confusion. But id imagine the number of users who use HDMI input on the OPI5+ is limited.

@nilo85
Copy link

nilo85 commented Apr 21, 2024

On the bottom line the majority of users don't even notice or don't care about this issue so hdmirx should stay enabled by default IMHO.

I suspect users who don’t notice the issue might just don’t know what to expect and have accepted poor performance as normal

@Joshua-Riek
Copy link
Owner

On the bottom line the majority of users don't even notice or don't care about this issue so hdmirx should stay enabled by default IMHO.

I suspect users who don’t notice the issue might just bit know what to expect and have accepted poor performance as normal

True, I will gather some information and send a kernel PR do disable HDMI input by default.

@EvilOlaf
Copy link
Contributor

EvilOlaf commented Apr 21, 2024

Since Joshua's images is about having best possible support for everything OOB I made the assumption leaving it enabled is the best solution here. Might be different for Armbian but that is a different story.
I mean yes, there is some performance impact but, as guessed, users don't know or don't care. And for the other from what I've read HDMI input itself works, just has this loadavg sideeffect.

@nilo85
Copy link

nilo85 commented Apr 21, 2024

Since Joshua's images is about having best possible support for everything OOB I made the assumption leaving it enabled is the best solution here. Might be different for Armbian but that is a different story. I mean yes, there is some performance impact but, as guessed, users don't know or don't care. And for the other from what I've read HDMI input itself works, just has this loadavg sideeffect.

I get that, however, I would argue this issue is severe enough to degrade the whole board and brings it to such unusable state, in my SCP example above where we have IO and encryption, performance was 5 times worse than vendor image. Therefore I think in this case better broader support is to disable this feature while it is an issue.

I will try to confirm on my setup later tonight. I hope the SSD heat is also gone, and if it is, it could hint that whatever this interrupt is doing might reset / ripple to other devices too

@nilo85
Copy link

nilo85 commented Apr 22, 2024

So finally managed to compile myself a kernel with this patch and a working jammy image.

My load on my to devices atm:

 ubuntu@k3s-1:~$ uptime
 09:12:22 up 11 min,  1 user,  load average: 0.00, 0.02, 0.02
 
 ubuntu@k3s-2:~$ uptime
 09:12:25 up 13 min,  1 user,  load average: 0.00, 0.00, 0.00

Processor and SSD is slighly warm but not hot so I think this is a great success!! Thanks everyone for jumping in =)

@JFLim1
Copy link

JFLim1 commented Apr 24, 2024

Hi @Joshua-Riek,

Upgraded to kernel-5.10.16-36 on Ubuntu-22.04.4. The CPU Load Avg is now able go below 1, currently at 0.18.

Edit: Same with kernel-6.1.0.1009.9, CPU Load Avg can now drop below 1 on Opi5-Plus. Thank you.

@abasu0713
Copy link

abasu0713 commented May 11, 2024

@Joshua-Riek is there a prebuilt image with a 6.1.* kernel for Ubuntu Server image for OPi 3B with this fix? Thank you for your time and the amazing work here! :)

@Joshua-Riek
Copy link
Owner

The Orange Pi 3B does not have HDMIRX, that is another issue entirely.

Copy link

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 14 days. Thank you for your contribution!

@github-actions github-actions bot added the stale Issues with a lack of recent activity label Nov 29, 2024
@Joshua-Riek Joshua-Riek removed their assignment Nov 29, 2024
@github-actions github-actions bot removed the stale Issues with a lack of recent activity label Nov 29, 2024
Copy link

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 14 days. Thank you for your contribution!

@github-actions github-actions bot added the stale Issues with a lack of recent activity label Jan 20, 2025
Copy link

This issue has been closed due to inactivity. If you feel this is in error, please reopen the issue or file a new issue with the relevant details.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues with a lack of recent activity
Projects
None yet
Development

No branches or pull requests