Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Config Support]: Frigate V0.12 crashes and freezes the compleate server! #6477

Closed
HAuser1234 opened this issue May 13, 2023 · 69 comments
Closed

Comments

@HAuser1234
Copy link

HAuser1234 commented May 13, 2023

Describe the problem you are having

Frigate v0.12 crashes the whole server every 1-2 days!
I have not eben able to record anything, because the homeassistant server dies compleatly.

Right before the crash no memory or ram problems can be observed. The only thing is that the crashes seem to be occouring in the early morning every time, baut maybe this is just bad luck!

PC is a core I5 Intel cpu with 8 gig ram.

Version

v0.12

Frigate config file

mqtt:
  # Required: host name
  host: xxx
  user: xxx
  password: xxx
  # Optional: port (default: shown below)
  port: 1883


detectors:
  cpu1:
    type: cpu
    num_threads: 1
  cpu2:
    type: cpu
    num_threads: 1
  cpu3:
    type: cpu
    num_threads: 1

birdseye:
  enabled: True
  mode: continuous
  quality: 3
live:
  # Optional: Set the height of the live stream. (default: 720)
  # This must be less than or equal to the height of the detect stream. Lower resolutions
  # reduce bandwidth required for viewing the live stream. Width is computed to match known aspect ratio.
  height: 1080
  # Optional: Set the encode quality of the live stream (default: shown below
  # 1 is the highest quality, and 31 is the lowest. Lower quality feeds utilize less CPU resources.
  quality: 2



record:
  enabled: True
  expire_interval: 1
  # Optional: Retention settings for recording
  retain:
    # Optional: Number of days to retain recordings regardless of events (default: shown below)
    # NOTE: This should be set to 0 and retention should be defined in events section below
    #       if you only want to retain recordings of events.
    days: 0
    # Optional: Mode for retention. Available options are: all, motion, and active_objects
    #   all - save all recording segments regardless of activity
    #   motion - save all recordings segments with any detected motion
    #   active_objects - save all recording segments with active/moving objects
    # NOTE: this mode only applies when the days setting above is greater than 0
    mode: all
  events:
    pre_capture: 20 # Optional: Number of seconds before the event to include (default: shown below)
    # Optional: Number of seconds after the event to include (default: shown below)
    post_capture: 10
    # Optional: Objects to save recordings for. (default: all tracked objects)
    objects:
      - person
      - car
    # Optional: Retention settings for recordings of events
    retain:
      # Required: Default retention days (default: shown below)
      default: 5
      # Optional: Mode for retention. (default: shown below)
      #   all - save all recording segments for events regardless of activity
      #   motion - save all recordings segments for events with any detected motion
      #   active_objects - save all recording segments for event with active/moving objects
      mode: active_objects
      # Optional: Per object retention days
      objects:
        person: 5
        car: 1


snapshots:
  # Optional: Enable writing jpg snapshot to /media/frigate/clips (default: shown below)
  # This value can be set via MQTT and will be updated in startup based on retained value
  enabled: True
  # Optional: save a clean PNG copy of the snapshot image (default: shown below)
  clean_copy: True
  # Optional: draw bounding box on the snapshots (default: shown below)
  bounding_box: True
  # Optional: crop the snapshot (default: shown below)
  crop: False
  # Optional: Restrict snapshots to objects that entered any of the listed zones (default: no required zones)
  required_zones: []
  # Optional: Camera override for retention settings (default: global values)
  retain:
    # Required: Default retention days (default: shown below)
    default: 5
    # Optional: Per object retention days
    objects:
      person: 5
      car: 1


cameras:
  Cam0:
    rtmp:
      enabled: False
    ffmpeg:
      inputs:
        - path: rtsp://???/stream2 # <-- 102 substream
          roles:
            - detect
        - path: rtsp://???/stream1 # <-- 101 mainstream
          roles:
            - record
    detect:
      width: 1920
      height: 1080
      fps: 5
    objects:
      track:
        - person
      filters:
        person:
          min_area: 10500
          max_area: 130000
          max_ratio: 0.58
          min_ratio: 0.2
    zones:  
      frigate_Sitzplatz:
        coordinates: 1134,1080,1920,1080,1920,292,1729,265,1712,0,1388,0,1387,280,1458,281,1505,335,1501,407,1494,478,1383,476,1299,466,1308,334,1363,286,1369,0,1122,0
      frigate_Sudseite:
        coordinates: 1168,1080,484,1080,412,0,1141,0
      frigate_Westseite:
        coordinates: 0,1080,504,1080,414,42,0,0
    motion:
      mask: 
        - 707,53,707,0,0,0,0,54
        - 1320,268,1432,267,1531,278,1574,314,1564,445,1489,515,1405,515,1266,487,1282,286
        - 1228,350,1238,491,1053,548,1003,373


  Cam1:
    rtmp:
      enabled: False
    ffmpeg:
      inputs:
        - path: rtsp://???/stream2 # <-- 102 substream
          roles:
            - detect
        - path: rtsp://???/stream1 # <-- 101 mainstream
          roles:
            - record
    detect:
      width: 1920
      height: 1080
      fps: 5
    motion:
      mask: 
        - 706,0,710,84,0,113,0,0

  Cam2:
    rtmp:
      enabled: False
    ffmpeg:
      inputs:
        - path: rtsp://??? # <-- 102 substream
          roles:
            - detect
        - path: rtsp://??? <-- 101 mainstream
          roles:
            - record
    detect:
      width: 2592
      height: 1944
      fps: 5
    motion:
      mask:
        - 2554,1846,2560,1899,1904,1897,1904,1829
        - 0,392,156,375,279,350,386,320,576,269,595,220,605,41,0,90
        - 1392,0,1398,168,915,164,900,0
        - 2445,51,2426,264,2123,207,2057,0
        - 1963,105,1955,318,1405,292,1403,79
    objects:
      track:
        - person
        - car
    zones:
      frigate_Eingang_vorne:
        coordinates: 0,439,228,352,328,369,443,360,563,341,428,533,0,799,0,795
        objects:
          - car
          - person
      frigate_Garage_Abfahrt_vorne:
        coordinates: 546,362,0,833,0,1420,0,1944,222,1944,929,347
        objects:
          - car
          - person
      frigate_Garten_vorne:
        coordinates: 249,1417,2571,1944,2592,786,2560,616,2407,446,1985,379,848,339
        objects:
          - car
          - person
      frigate_Strasse_vorne:
        coordinates: 0,388,192,345,397,301,625,262,980,224,1522,241,2373,328,2592,390,2592,0,0,0
        objects:
          - car
          - person
        stationary:
          max_frames:
            objects:
              car: 1000





ffmpeg:
  hwaccel_args: -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format yuv420p
  output_args:
    record: -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c:v copy -c:a aac

Relevant log output

none available

Frigate stats

No response

Operating system

Debian with Homeassistant Supervised / Docker

Install method

HassOS Addon

Coral version

CPU (no coral)

Any other information that may be helpful

v0.11 does work with on problem.

@blakeblackshear
Copy link
Owner

Can you run without your hwaccel args to see if it still crashes?

@HAuser1234
Copy link
Author

Thank you!
yes I will try to remove that as a test next week! will that increase the cpu strain massively?

@blakeblackshear
Copy link
Owner

It probably will increase significantly. You could try setting enabled: False under some cameras to temporarily disable some if your server can't handle it.

@HAuser1234
Copy link
Author

ok good to know i will report back when I have the results. :)
strange though, as the hwaccel was in v0.11 working without problems

@blakeblackshear
Copy link
Owner

There is an updated driver in 0.12, so I would like to see if we can eliminate that as the source of the problem.

@monsieurlatte
Copy link

Going to add in that I recently migrated my HA install from my windows box using vmware to a dell sff with an i5 6500t, enabled hwaccell to get the cpu usage down a bit and it has hard locked the machine on me twice over the past 4 days. After some googling I found this and I will turn off hwaccell and see if it fixes mine as well. Just wanted to chime in that you're not the only one with this issue!

@NickM-27
Copy link
Collaborator

@monsieurlatte if it does not exhibit this behavior without hwaccel, you may try updating the host driver and host kernel as that has fixed this for some users

@monsieurlatte
Copy link

Thanks for the info! I'm just now getting into some of this linux/docker stuff so I'm a bit of a noob at it, if you know of a link that gives step by step instructions on how to do that, I'd be grateful and happily send you a coffee :D

@HAuser1234
Copy link
Author

HAuser1234 commented May 16, 2023

There is an updated driver in 0.12, so I would like to see if we can eliminate that as the source of the problem.

I have tested it now for some time without hwaccel. no crash since then happend! (seems to be the issue.)
(if it crashes in the next days I will report back!)
Interestingly the cpu usage didn‘t increase at all, comparing to v0.11 or v0.12! Also the ram usage seems to climb much slower until a maximum is reached.
it seems like hwaccel didn‘t work from the beginning! Is there a alternative driver availlable?
@NickM-27 sounds like I have the issue you described. have you instructions on how to do that?

@NickM-27
Copy link
Collaborator

it is going to be on the host so it depends what host you have, #5799 has the related discussion

@monsieurlatte
Copy link

Gonna be tough for me since I use HAOS and frigate runs as an add on there. I'll just not use hwaccell I think for now, cpu runs about 65% on my i5 6600T.

@NickM-27
Copy link
Collaborator

If you are running the latest HA OS it should not be a problem, I've not heard of HA OS users having this issue

@monsieurlatte
Copy link

Well I'm keeping an eye on it and so far it's been over 24 hours and no hard lockup as of yet. I was using the intel qsv version of hwacell if that matters and not vaapi.

@NickM-27
Copy link
Collaborator

I'd definitely recommend using vaapi not qsv

@monsieurlatte
Copy link

I got better results for cpu usage with the qsv, but I'll give vaapi a go and see what happens!

@monsieurlatte
Copy link

{"return_code":0,"stderr":"","stdout":"vainfo:VA-APIversion:1.17(libva2.10.0)nvainfo:Driverversion:InteliHDdriverforIntel(R)GenGraphics-23.1.1()nvainfo:SupportedprofileandentrypointsnVAProfileNone:tVAEntrypointVideoProcnVAProfileNone:tVAEntrypointStatsnVAProfileMPEG2Simple:tVAEntrypointVLDnVAProfileMPEG2Simple:tVAEntrypointEncSlicenVAProfileMPEG2Main:tVAEntrypointVLDnVAProfileMPEG2Main:tVAEntrypointEncSlicenVAProfileH264Main:tVAEntrypointVLDnVAProfileH264Main:tVAEntrypointEncSlicenVAProfileH264Main:tVAEntrypointFEInVAProfileH264Main:tVAEntrypointEncSliceLPnVAProfileH264High:tVAEntrypointVLDnVAProfileH264High:tVAEntrypointEncSlicenVAProfileH264High:tVAEntrypointFEInVAProfileH264High:tVAEntrypointEncSliceLPnVAProfileVC1Simple:tVAEntrypointVLDnVAProfileVC1Main:tVAEntrypointVLDnVAProfileVC1Advanced:tVAEntrypointVLDnVAProfileJPEGBaseline:tVAEntrypointVLDnVAProfileJPEGBaseline:tVAEntrypointEncPicturenVAProfileH264ConstrainedBaseline:tVAEntrypointVLDnVAProfileH264ConstrainedBaseline:tVAEntrypointEncSlicenVAProfileH264ConstrainedBaseline:tVAEntrypointFEInVAProfileH264ConstrainedBaseline:tVAEntrypointEncSliceLPnVAProfileVP8Version0_3:tVAEntrypointVLDnVAProfileVP8Version0_3:tVAEntrypointEncSlicenVAProfileHEVCMain:tVAEntrypointVLDnVAProfileHEVCMain:tVAEntrypointEncSlicenVAProfileHEVCMain:tVAEntrypointFEI"}

@NickM-27
Copy link
Collaborator

NickM-27 commented May 16, 2023

They use the same hardware and qsv is a wrapper on vaapi, in my experience they have always been identical or very similar

@xbmcnut
Copy link

xbmcnut commented May 16, 2023

If you are running the latest HA OS it should not be a problem, I've not heard of HA OS users having this issue

I'd have to disagree with that #6485

@NickM-27
Copy link
Collaborator

Fair enough, forgot about that. That's the only case I've seen that HA OS has been associated with this type of issue.

@yannpub
Copy link

yannpub commented May 30, 2023

I have the same hardware (i5-6500T), same library and driver versions (1.17 and 23.1.1), and also experiencing freezes. I use OpenVino detectors.
Following the comments above, I turned hwaccel (vaapi) off. I was expecting a spike in CPU usage, but it is nearly invisible, which is very surprising.
I will check in the coming days/weeks if freeze still occur.

@iMiMx
Copy link

iMiMx commented May 30, 2023

Adding myself to this (and the various other posts) running on a Gigabyte Brix, i7-3537U, Debian 11, kernel 5.10.0-23-amd64, Supervised install.

I have just disabled/hashed hwaccel this morning from the frigate config. Last line of the log before the hard crash/lockup was the 'api/stats' line

@monsieurlatte
Copy link

I have found that vaapi is stable for using hwacell on my i5 6500t and i5-8500t, using qsv is not, also can't use OV detectors either as it does the same thing assuming it's using the same driver model. However it works with TPU. Not sure if anyone else read but it clearly states in the documents that to use qsv it says intel gen 10 or greater so maybe that's the reason why?

@NickM-27
Copy link
Collaborator

In general vaapi is recommended over qsv for any generation. Some older CPUs do support qsv but it provides no benefit over vaapi

@monsieurlatte
Copy link

I saw no difference in hwaccel between qsv and vaapi however it appears openvino uses the same driver as qsv or something because with vaapi and openvino I got the same hardlocks. Tpu is fine.

@yannpub
Copy link

yannpub commented Jun 9, 2023

So, after couple of weeks running without any -vaapi accell on my i5-6500T 6th gen, on a supervised install, I have not faced any freeze, so unless it is a surprising coincidence, I would say the hwaccell was the problem.
Now, for the supervised install, there will be a big update with Debian bookworm getting available in the coming days, so it will become the new platform. Once migrated, I'll try again the accell, to see if new kernel and drivers fix the issue.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ithesk
Copy link

ithesk commented Oct 5, 2023

I have the same problem that happens to us after 3 days. i5-4590T ,Proxmox , lxc , hardware acceleration , coral m2 mini pcie

@markuznw
Copy link

markuznw commented Oct 5, 2023

Seems to be fixed in my case by upgrading machine libva to 2.19 (from 2.18) and setting max memory in the docker container

@ithesk
Copy link

ithesk commented Oct 5, 2023

Seems to be fixed in my case by upgrading machine libva to 2.19 (from 2.18) and setting max memory in the docker container

Sorry, how did you manage to update, and what was the value you increased in docker?

@carloda
Copy link

carloda commented Nov 10, 2023

I have the same problem that happens to us after 3 days. i5-4590T ,Proxmox , lxc , hardware acceleration , coral m2 mini pcie

@ithesk Identical machine, exactly after 3 days it crashes, now I'll try to deactivate "hwaccel_args" and let's see... the machine actually crashed for me, I had to restart it from the button, same thing for you?

@homeassistant7
Copy link

Also having similar issues with my i5-6500T on 0.12. Running frigate with docker in a Linux VM on proxmox. Currently shifted Home Assistant off the machine so it's just Frigate and still having hard machine lockups every other day.
Have tried updating libva and restricting RAM on docker container. Also tried 0.13 Beta 4 for a newer go2rtc and it's no better.

Considering ditching proxmox to try without that.

@NickM-27
Copy link
Collaborator

I will say there is a trend that these types of issues have occurred for proxmox users (though not exclusively) and some users have found without proxmox it did not occur.

@homeassistant7
Copy link

Thanks Nick, maybe I can be a test dummy.

What are your thoughts on ESXI? I'd like to run HASSOS and Frigate on the same hardware. Or am I better off ditching that virtualisation layer and running HA in a docker container too?

@NickM-27
Copy link
Collaborator

I can only speak from personal experience which is that it's simpler and easier running everything in docker. Esxi has had some frustrating limitations for other users.

@ithesk
Copy link

ithesk commented Nov 11, 2023

I think the problem has improved a lot by increasing this value, now I have more than 15 days, yes. the problemimage

@carloda
Copy link

carloda commented Nov 11, 2023

I think the problem has improved a lot by increasing this value, now I have more than 15 days, yes. the problemimage

shm_size?

@ithesk
Copy link

ithesk commented Nov 11, 2023

I think the problem has improved a lot by increasing this value, now I have more than 15 days, yes. the problemimage

shm_size?

Yes

@carloda
Copy link

carloda commented Nov 11, 2023

I think the problem has improved a lot by increasing this value, now I have more than 15 days, yes. the problemimage

shm_size?

Yes

Thanks, I set 1024mb, let's see if anything changes...

@carloda
Copy link

carloda commented Nov 25, 2023

Guys, after 15 days of testing I understood that the only way to avoid crashing proxmox due to frigate is to not use the GPU, but I don't like this as the load is on the shoulders of the CPU, with which you have found a solution?

@sodennis
Copy link

sodennis commented Nov 25, 2023

My Unraid server is using a J6413 and a PCI Coral. Frigate is running in Docker with recording enabled and detectors enabled.

Two week ago when I started running Frigate, it would crash daily due to OOM. After turning off the GPU, the OOM crashes stopped but I still wanted hardware acceleration. I started tweaking the server a couple days ago.

  • Enabled the iHD driver instead of the i965 driver in Frigate
  • Switch to using the qsv preset instead of vaapi preset
  • Changed the Disk Cache 'vm.dirty_background_ratio' (%) and the Disk Cache 'vm.dirty_ratio' (%) to 1 and 2.

Memory usage has been stable now. It hasn't crash for 5 days now. The last setting I have changed was the qsv preset.

I think there is a memory leak with ffmpeg or the Intel media drivers but I haven't had the time to open up Valgrind and ffmpeg to reproduce the issue.

I also had a 6600k running in another server but I couldn't figure out a setting besides turning off the hardware acceleration to prevent OOM.

@iMiMx
Copy link

iMiMx commented Dec 5, 2023

Anyone running a supervised install that has upgraded to Debian 12 - if so, any improvement/change?

EDIT: Updated to Debian 12 this morning, also installed the latest libva2 from Debian testing on the host:

vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.1

.... however obviously the Frigate add-on uses its own version/package, so I'm a little sceptical that this will make any difference:

vainfo: VA-API version: 1.17 (libva 2.10.0) 
vainfo: Driver version: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.1

Re-enabled hw_accel, now we wait....

ffmpeg:
  hwaccel_args: preset-vaapi

Is it possible to increase SHM_SIZE on the HA Frigate add-on yet? I seem to recall previously, at one point, it wasn't.

@peterjonesk
Copy link

peterjonesk commented Dec 11, 2023

I have this issue and am just adding another data point in case it's helpful. Using a Dell 7050 micro, with i5-6500T CPU, OpenVino detector, vaapi hardware accelerator.

Initially I had HAOS installed on bare metal with Frigate add-on and was seeing a complete host reboot at least once every 24 hours. Switched to proxmox with Frigate docker in an LXC (Debian 12). Continued seeing the issue (except that the host would completely hang and require a power cycle). I am new to docker and proxmox, so may have not been looking in the right place, but couldn't find anything relevant in logs (in frigate, proxmox host syslog, HAOS via journalctl).

  • Tried on a different hardware setup (an identical Dell machine), didn't help
  • Same experience with Frigate 0.12 and 0.13beta 6.
  • Tried switching to qsv preset instead of vaapi preset, didn't help
  • Tried both the iHD and i965 drivers in Frigate, didn't help

In an effort to try anything to see what would help, I switched to the yolov8n model as per these instructions. I don't understand the significance of using this model, but it seems to be stable now (at seemingly higher CPU usage reported by the proxmox host). I am expecting delivery of a Coral TPU (M.2) soon, so will try that as well...

@iMiMx
Copy link

iMiMx commented Dec 16, 2023

Anyone running a supervised install that has upgraded to Debian 12 - if so, any improvement/change?

Still crashed the entire box for me - back to hw_accel disabled.

@iMiMx
Copy link

iMiMx commented Jan 31, 2024

Updated to 0.13 this morning and have subsequently re-enabled hw_accel as a test.

Still seems to be using the same libva version in the Home Assistant addon:

vainfo: VA-API version: 1.17 (libva 2.10.0) vainfo: Driver version: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.1

To answer my previous query about SHM size when using the Home Assistant add-on:

The shm size cannot be set per container for Home Assistant add-ons. However, this is probably not required since by default Home Assistant Supervisor allocates /dev/shm with half the size of your total memory. If your machine has 8GB of memory, chances are that Frigate will have access to up to 4GB without any additional configuration.

Box has 16GB of RAM, so we assume the SHM size is 8GB - with plenty for Frigate.

@carloda
Copy link

carloda commented Feb 7, 2024

Updated to 0.13 this morning and have subsequently re-enabled hw_accel as a test.

Still seems to be using the same libva version in the Home Assistant addon:

vainfo: VA-API version: 1.17 (libva 2.10.0) vainfo: Driver version: Intel i965 driver for Intel(R) Ivybridge Mobile - 2.4.1

To answer my previous query about SHM size when using the Home Assistant add-on:

The shm size cannot be set per container for Home Assistant add-ons. However, this is probably not required since by default Home Assistant Supervisor allocates /dev/shm with half the size of your total memory. If your machine has 8GB of memory, chances are that Frigate will have access to up to 4GB without any additional configuration.

Box has 16GB of RAM, so we assume the SHM size is 8GB - with plenty for Frigate.

Is this working with Frigate 0.13? or does it cause the usual blocking of proxmox?

@homeassistant7
Copy link

Also having similar issues with my i5-6500T on 0.12. Running frigate with docker in a Linux VM on proxmox. Currently shifted Home Assistant off the machine so it's just Frigate and still having hard machine lockups every other day. Have tried updating libva and restricting RAM on docker container. Also tried 0.13 Beta 4 for a newer go2rtc and it's no better.

Considering ditching proxmox to try without that.

Here's my update, I have seen significant improvement.

I believe my server was overheating, causing the lockups.

I previously had my miniPC in a cupboard and drew a correlation between hot (heat) windy (lots of motion) days and lockups.
For the heat, I removed the server from the cupboard, this started to help.
Next I tweaked my config to setup more motion masks to stop detecting motion for trees moving, this really did help.

Doing this I dropped from sometimes daily lockups to weeks without lockups.

I've since added a USB Coral in the last few weeks to remove some of the processing load from my poor 6500T.

This is with 2 cameras, one at 2k and one at 4k. 12-15fps. Unfortunately my cameras don't support good substream options so decoding and processing those at full res.

Hope this helps someone out!

@wiredolphin
Copy link

wiredolphin commented Mar 13, 2024

@monsieurlatte if it does not exhibit this behavior without hwaccel, you may try updating the host driver and host kernel as that has fixed this for some users

Updating the Kernel from 6.1 to 6.5 solved the issue for me on Debian. Complete server freezes as soon hardware acceleration is enabled through preset-vaapi configuration parameter, this on a Intel N100 cpu NUC, Debian 12 and Frigate container.
Thanks for sharing this

@rohit267
Copy link

so after debugging for whole 3 days found this.

I have ubuntu 22.04 with i5-6500. Kernel: 5.15.0-100-generic
The:

ffmpeg:
   hwaccel_args: preset-vaapi

causes it to freeze after 2-3 hours of uptime.

@wiredolphin please help.

@wiredolphin
Copy link

wiredolphin commented Mar 14, 2024

so after debugging for whole 3 days found this.

I have ubuntu 22.04 with i5-6500. Kernel: 5.15.0-100-generic The:

ffmpeg:
   hwaccel_args: preset-vaapi

causes it to freeze after 2-3 hours of uptime.

@wiredolphin please help.

Seems the same symptoms I've faced on Debian 12 Bookworn and its official kernel, 6.1, with only a difference: in my case the host used to freeze immediateley when the Frigate container restarted.

If your host Is able to run for 2/3 hours before freezing, search the Frigate logs for any useful hint regarding the VAAPI, the Intel graphics driver (iHD or i965) or ffmpeg malfunctioning.

Search instruction to update the kernel version of your distro or, if possibile, update to a newer distro version.
In my discovering, kernel 6.5 seems working fine until now, within 3/4 hours of testing in 2 days and with only 2 IP cameras configured. Sorry, I can tell more when my system will be production ready.

@aav7fl
Copy link
Contributor

aav7fl commented Apr 6, 2024

@wiredolphin

Thanks for that bit of information! I think your mention about upgrading the kernel is what finally solved my stability issue that was introduced 2+ years ago. It used to be worse when I was using USB 3.0. For whatever reason, with the sunny days lately it was crashing much faster even when I slowed it down on USB 2.0 speeds. But I do think that the preset-vaapi with my older kernel and Intel NUC was the problem.

I've upgraded the kernel and it might actually be working now.

@nacree
Copy link

nacree commented Sep 23, 2024

Hi,

Did everybody resolve their crashing issue already?

I have same HW (core I5 Intel cpu with 8 gig ram)

https://www.reddit.com/r/debian/comments/1fi2ye6/nvr_minipc_with_debian_freezing_constantly/

and tried to update kernel to 6.10 and increase docker shm-size to 1GB.. I think increasing SHM size actually extended period without crashing by one day, up to 3-4 days! But still doing it.

Anybody knows if the memory leaks with frigate/ffmpeg are resolved by now?

Thanks!

Edit: 6 HD-cameras with following config

ffmpeg:
#  hwaccel_args: -c:v h264_qsv
  hwaccel_args: preset-vaapi

detectors:
  ov:
    type: openvino
    device: GPU

model:
  width: 300
  height: 300
  input_tensor: nhwc
  input_pixel_format: bgr
  path: /openvino-model/ssdlite_mobilenet_v2.xml
  labelmap_path: /openvino-model/coco_91cl_bkgr.txt

@xbmcnut
Copy link

xbmcnut commented Sep 23, 2024

At least for me @nacree, I've not had any issues under v14.x with h/w acceleration enabled. Been running now for 3+ weeks.

@luzidchris
Copy link

luzidchris commented Oct 26, 2024

Hi,

Did everybody resolve their crashing issue already?

I have same HW (core I5 Intel cpu with 8 gig ram)

https://www.reddit.com/r/debian/comments/1fi2ye6/nvr_minipc_with_debian_freezing_constantly/

and tried to update kernel to 6.10 and increase docker shm-size to 1GB.. I think increasing SHM size actually extended period without crashing by one day, up to 3-4 days! But still doing it.

Anybody knows if the memory leaks with frigate/ffmpeg are resolved by now?

Thanks!

Edit: 6 HD-cameras with following config

ffmpeg:
#  hwaccel_args: -c:v h264_qsv
  hwaccel_args: preset-vaapi

detectors:
  ov:
    type: openvino
    device: GPU

model:
  width: 300
  height: 300
  input_tensor: nhwc
  input_pixel_format: bgr
  path: /openvino-model/ssdlite_mobilenet_v2.xml
  labelmap_path: /openvino-model/coco_91cl_bkgr.txt

@nacree , I know it's kinda late now, but we've experienced a similar issue with v0.14.0 running in a docker environment. After a few minutes (sometimes 5 min, sometimes 1 hour), frigate would crash, meaning the container would be freeze without any error messages. Neither frigate nor Linux logs contains any information which would have helped to identify the root cause. We've monitored memory consumption, increased size of tmpfs etc. but none of that helped.
As it required a restart of the host system, rather than just the docker container, we suspected the issue was caused by some low-level problem. We might have missed some information in one of the various threads about similar issues, but for us, the only way to get rid of the crash was a downgrade to v0.11.0. We've experienced not a single crash since Oct 5, so frigate has been running for almost 4 weeks now without a single problem.

I don't know how complex your configuration is but in our case, it didn't take a lot of effort to give 0.11.0 a try. We basically only had to manually add an MQTT broker (we decided to go with another container to run mosquitto) and change a few parameters. That's all. The downgrade was supposed to:

  1. rule out a hardware defect
  2. rule out (or at least decrease the probability) of a problem in our docker configuration

We could in fact rule out a hardware defect and it seems less likely that the problem was caused by some configuration issue on our side (we're really not relying on any special setup, it's basically identical to what can be found in the documentation). We decided not to make any move towards an upgrade until there's a clear reason for us to do that, like a bug fix or any new feature we would miss in 0.11.0.

@ALL: Is there any thread describing a similar issue which hasn't been closed? We've not been able to fix our freeze/crash by adjusting hwaccel configuration and we could rule out a memory leak. It required the downgrade to 0.11.0 to "fix" the issue.

@nacree
Copy link

nacree commented Oct 28, 2024

@luzidchris Thanks for sharing the tip.

I have experienced constant crashes every 1-3 days, so I was actually a bit confusing when I came back from the work trip and now my system uptime is over 14 days. And this is again - without doing anything. I hope just that some component (kernel?) would have been updated before the last crash and now it just works. But I cannot say what would have fixed my setup.

I will keep monitoring and hope that it keeps running.

Currently running 0.14.0-da913d8

@ouafnico
Copy link

I got exacty the same problem since 0.14.

Frigate is running in docker in LXC on proxmox.
It can't hold more than 1 day.

I've tested different hardware, different kernels : same.

I've increased shm_size to 1Gb, to see the difference.
Testing beta 0.15 got the same result.

@nacree
Copy link

nacree commented Jan 13, 2025

Mine has been actually now finally working.

I noticed while ago huge peaks in CPU usage of one camera detection. It wasn't constant, but happened daily. Whatever happened, apparently core did not recover from that every time and ended up just freezing.

Now I actually disabled detection from all the other cameras except 2 of 6 cameras I have, and now the system has been running perfectly for two weeks already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests