Thread leak in netavark-dhcp-proxy #811

jsonn · 2023-09-18T15:39:58Z

Using SuSE MicroOS with a bunch of macvlan-using containers, I see netvark-dhcp-proxy hanging every few days. From journalctl:

netavark[14606]: thread 'tokio-runtime-worker' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /home/abuild/rpmbuild/BUILD/rustc-1.71.1-src/library/std/src/thread/mod.rs:686:29

Even with RUST_BACKTRACE=1 set, it doesn't give a backtrace. Last time this happened, ps reported over 4000 threads for the PID.

The text was updated successfully, but these errors were encountered:

Luap99 · 2023-09-18T15:48:53Z

How many macvlan containers are we talking about? Do you know how long your DHCP lease time is?

jsonn · 2023-09-18T15:51:13Z

16 container ATM, 10 minutes.

Luap99 · 2023-09-18T15:53:09Z

Ok I think that explains why it leaks so fast then. I think we spawn a new thread for each lease but somehow the code does not cleanup the old one so we leak the old thread.
I take a look.

jsonn · 2024-03-25T10:16:24Z

Any news?

Luap99 · 2024-04-02T17:12:22Z

No, I haven't found the time to reproduce this issue.

Jackbaude · 2024-05-07T20:15:52Z

I can take a look at this issue. Can someone point me in the right direction to reproduce this?

jsonn · 2024-05-07T20:41:51Z

Use macvlan and a DHCP server with as short a lease as reasonable, e.g. a minute. Observe the number of threads?

Luap99 · 2024-05-08T11:41:56Z

yes checking ls /proc/$pidOfProxy/task/ over time should show the leak I guess

baude · 2024-06-24T19:20:10Z

I am now able to replicate. I started 10 containers on a network where the lease is only 60 seconds. In my case, the nv dhcp-proxy PID is 6808 and after a short while:

Threads:	552

jjzazuet · 2024-07-12T04:30:50Z

Ah, just noticed this issue. Could this be related? My DHCP lease time is 30 mins.

#1024

Thanks!

thecubic · 2024-07-13T19:56:51Z

I definitely have this thread leak, there were 13708 threads for ~15 containers after 3 days of running - and I was also seeing #618 as a symptom (I assume, of thread starvation). I have the underlying pattern (IPv6 multicast on IPv4 network)

I updated past the fix for that specific symptom and I'm watching how many threads it creates long-term

thecubic · 2024-07-19T18:41:16Z

My thread leak seems "better, but not totally fixed". I have 1497 threads after 6 days (post #1022) versus the 13708 after 3 days.

Importantly the dhcp-proxy is not spinning CPU right now and my core symptom (restarting containers sometimes had dhcp task aborts) is gone

Luap99 mentioned this issue Jul 12, 2024

DHCP proxy - no available capacity / crash when using external DHCP service #1024

Open

thecubic mentioned this issue Jul 13, 2024

flake in dhcp-proxy test: parser ran out of data-- not enough byte #618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread leak in netavark-dhcp-proxy #811

Thread leak in netavark-dhcp-proxy #811

jsonn commented Sep 18, 2023

Luap99 commented Sep 18, 2023

jsonn commented Sep 18, 2023

Luap99 commented Sep 18, 2023

jsonn commented Mar 25, 2024

Luap99 commented Apr 2, 2024

Jackbaude commented May 7, 2024

jsonn commented May 7, 2024

Luap99 commented May 8, 2024

baude commented Jun 24, 2024

jjzazuet commented Jul 12, 2024

thecubic commented Jul 13, 2024

thecubic commented Jul 19, 2024

Thread leak in netavark-dhcp-proxy #811

Thread leak in netavark-dhcp-proxy #811

Comments

jsonn commented Sep 18, 2023

Luap99 commented Sep 18, 2023

jsonn commented Sep 18, 2023

Luap99 commented Sep 18, 2023

jsonn commented Mar 25, 2024

Luap99 commented Apr 2, 2024

Jackbaude commented May 7, 2024

jsonn commented May 7, 2024

Luap99 commented May 8, 2024

baude commented Jun 24, 2024

jjzazuet commented Jul 12, 2024

thecubic commented Jul 13, 2024

thecubic commented Jul 19, 2024