-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"vc4-drm gpu: [drm] ERROR Failed to allocate DLIST entry: -28" after running for some period #5674
Comments
For anyone watching this issue, #5684 addresses a couple of issues but I haven't been able to reproduce exhausting the DLIST memory. If anyone wants to test, then |
Same for me. See below: Oct 27 00:29:54 kernel: vc4-drm gpu: [drm] ERROR Failed to allocate DLIST entry: -28 |
Thanks for testing. PR updated to add a debugfs entry to print out the allocations. You should be able to do
0x20 is the scaling kernel. Entries that are 0x11 bytes long have a primary and a cursor plane, whilst those 0x9 bytes long only have a primary plane. This can also be confirmed via Knowing what it thinks are allocated vs active when it goes wrong would be very useful. |
Sorry I think I wrote something misleading. I was not meaning that I had the issue after applying your patch. The kernel dump I provided is before before applying it. |
OK, you had me quite confused that we still appeared to have a leak. Having the debugfs hook to dump out the allocations doesn't hurt anyway. |
I am the one that initially reported the issue on Ubuntu 23.10 in the forum thread. Since there's no easy way of applying the fix in Ubuntu (right?), I have applied it to a Raspberry Pi OS installation instead and so far so good. However, I don't think I've been able to reproduce the issue on RPi OS before the fix, so the fact that it works may not mean that much. I'll continue using the fix and let you know if I run into any issues. |
I have a guy on our forums keeps having this issue so I compiled the 6.1.58 kernel with your commits using commit 36ed2f9. He only had time for a short test but he appears to still be having the issue. He is using Gnome with Wayland. I do not seem to have the issue here on XFCE. I get no related output in journalctl while playing like he does. My terminal output in XFCE:
Him playing with mpv compiled with jc-kynesim's ffmpeg6:
Requested info on his pi4:
|
@Dark-Sky That log doesn't include the
|
I've been seeing that a lot on my Pi 4 as well. I believe on both RPi OS Bookworm and Ubuntu 23.10. Haven't made a report on it yet. |
I had questioned him about that earlier and he said he was getting the same error when he succeeded to open a TTY. Most of the time his system would just lock up was his response later on. |
Our forum member had some more time to test today and reported this:
|
So does that mean it is working? If not, what kernel version are they actually running? And what client software? Some of the reports providing the WARN backtrace including I'm wondering if this is on some app being killed, hence cleanup coming in and freeing all resources in one hit, thereby generating many commits. Even so, if not directly driving DRM to add lots of planes, then I wouldn't expect there to be lots of active FBs needing actually removing, but the accounting may not be that finely grained. |
I asked again today and this was his response still testing the using the kernel I compiled with your commits you had done above: I have enabled a maximum of gnome-shell extension and apart from vc4-drm gpu: swiotlb buffer is full, I didn’t have any crash. But I think it takes a few days to be sure. Seems to me it should have locked up by now. |
That sounds positive. I've updated the PR, and will look to get it merged now. |
Any news on this? |
It was merged on Nov 6th. |
rpi-update should now include this fix. |
Here, the issue persists after
|
What does |
|
You're not running the rpi-update kernel, so you won't have the fix. |
Are you on Ubuntu? rpi-update is designed for RPiOS. If you are on another OS, you'll have to check with their developers on how to get testing kernel updates. |
I got this just now on a RPi3B+ with Arch Linux ARM. The used kernel is The problem occurred after running Kodi for a couple of days. dmesg:
kodi.log:
|
Yes, I'm on Ubuntu. What should I do? |
You'll need to report it to Ubuntu devs, and see if the kernel you are running includes #5684 |
Hi, same problem here: CM3+ [300517.627445] vc4-drm soc:gpu: [drm] ERROR Failed to allocate DLIST entry. Requested size=8. ret=-28 |
Same problem, using the last kernel with 5684. Gnome-wayland Here when playing a video : Sometimes i have No TTY available, i need to reboot with ctl alt key. i don't have another PC to ssh and type sudo cat /sys/kernel/debug/dri/1/hvs_dlist_allocs when the crash occurs. (Edit i have configured my smartphone for SSH,waiting another crash now) It's seems the crash is only with wayland. One week with gnome-Xorg and no crash. |
Had the issue again a couple of times, here the information from the debugfs: /sys/kernel/debug/dri/128/hvs_dlist_allocs
/sys/kernel/debug/dri/128/hvs_dlists
|
Hm, I still get a lot of |
|
These errors occur while displaying a QML file executed with qmlscene and fullscreen EGLFS that shows a few MJPEG CCTV streams with gstreamer. Kernel is 6.1.63. |
If this is the only output from |
If you have a way of reproducing, please give details - I'm still at a loss. The DISPCTRL value in the above logs is weird if it's from a Pi4/CM4. None of the EOF interrupts are enabled (bits 7, 11, and 15), yet we have entries in the stale queue. That shouldn't be possible as they get enabled when any entries get marked as stale via |
Sorry, this was on a RPi3 in my case: There is not much more I can tell. The app is displaying the images. It also has a user interface but nobody has touched them. I can provide you the QML file but it is simply displaying MJPEG streams that you cannot access. In the case above it only took a few minutes to trigger the bug. |
Ah one thing, I switched back to an old kernel (5.15.84) for now but everything else is the same and in this case the bug does not occur. |
I can synthesize MJPEG streams if needed. Currently I have no way to reproduce.
Earlier kernels had an issue where DLIST memory could get reused whilst it was still in use, hence 013f247 was introduced. |
I sent you the email. |
Linux testTerminal 6.1.74-v8+ #1725 SMP PREEMPT Mon Jan 22 13:35:32 GMT 2024 aarch64 GNU/Linux Pi 4B Jan 24 12:57:46 testTerminal kernel: vc4-drm gpu: [drm] ERROR Failed to allocate DLIST entry. Requested size=17. ret=-28. DISPCTRL is aa0c020e |
Thanks @anyc - I'll try running that up. And thanks also to @WilkuAgresor. Was that a single log entry, or was it repeating? The system shouldn't lock up at that point, but I want to understand why it is triggering at all. |
It repeated once before my script rebooted the pi. I'll disable the workaround and report when I catch it again. |
I think we may have a read-modify-write issue between setting the DSPEISLUR and EOF enables - I'm seeing both bits 7 (0x80) and 9 (0x200) being enabled and disabled in SCALER_DISPCTRL via /sys/kernel/debug/dri/1/hvs_regs over updates, and they're done from different contexts. I've created #5891 as a test which disables the underrun interrupt, so it should remove the race condition. It's a hack, but it would be useful if people could test to see if it does solve their issues. Once CI has completed, |
The builds are done - |
Right now uptime is 5 days after the 'sudo rpi-update pulls/5891' and a reboot. |
Thanks for testing - I was beginning to think no one had tested it. I'll have a think about a clean way of implementing both underrun and EOF interrupt handling. It's tempting to just enable EOF permanently, but it adds some overhead triggering interrupts at 60Hz for each HVS channel. |
I can also confirm that it works for me for over one day now. |
Hello, ruining since 7 days on a cm3+, and it works. Thanks |
Thanks @WilkuAgresor, @TRyan84 and @anyc. I don't need any other confirmations that it works now. |
Can someone please tell me which kernel version has the relevant fixes for this issue? I'm running Debian 12 Bookworm (arm64) on a RPi4 (8GB). |
I've dropped underrun detection via #5935 which was merged 7:52PM Fri 8th Feb. AIUI using |
Hello, I update the System with sudo rpi-update rpi-6.6.y Now I get this error: And the Display is kind of fuzzy... Thanks. |
What DSI display are you using? Presumably the Pi 7" DSI panel as it's loading Which kernel version were you on previously? |
Hi 6by9 thanks for your answer. Yes, I am using the Pi 7" DSI panel The kernel version before was 6.1.63-v7, and I never saw this behavior After a few resets, the error is gone. I have 10 of them in my office, and after I power cycle 2 of them has the same error. Thanks Greetings |
Hi 6by9, sorry, I think my setup is the problem... Greetings |
Describe the bug
Noted from #5649, https://forums.raspberrypi.com/viewtopic.php?t=357826, and https://forums.raspberrypi.com/viewtopic.php?t=358177
It looks like we have a memory leak of dlist entries somewhere.
Steps to reproduce the behaviour
See the other threads
Device (s)
Raspberry Pi 4 Mod. B
System
Bookworm or Ubuntu 23.10
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: