Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watchdog: fix another possible watchdog timeout #619

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

alexmohr
Copy link
Contributor

@alexmohr alexmohr commented Apr 9, 2024

dlt_daemon_process_user_message_log prior to this
commit was running without limit, meaning when a lot of messages have to be transmitted dlt-daemon might
be killed by systemd watchdog due to being busy
with sending and not updating the watchdog anymore. This commit introduces a timeout in dlt_daemon_process_user_message_log which allows this function to run at most watchdog_trigger_interval seconds.

This MR extends the logic introduced in #595 as the original commit still allowed systemd to kill dlt-daemon

To test we need the following things

  • dlt-daemon started via systemd with enabled watchdog
  • a lot of logging going into the daemon

Excecute the script below on your test system.
It spawns a lot of dlt-receive proceses, the goal is for dlt-daemon to survive.

end_time=$((SECONDS+180))

while [ $SECONDS -lt $end_time ]; do 
  dlt-receive -a 127.1 &
done 

killall -9 dlt-receive

if [[ $(uname) == "Linux" ]]; then
  systemctl status dlt.service
  echo Service should be running for > 3 minutes
else
  pidin -faAt | grep dlt-daemon | grep -v grep
  echo Service should be running at least since system boot
fi

The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0

Alexander Mohr, alexander.m.mohr@mercedes-benz.com, Mercedes-Benz Tech Innovation GmbH, imprint

dlt_daemon_process_user_message_log prior to this
commit was running without limit, meaning when a lot of
messages have to be transmitted dlt-daemon might
be killed by systemd watchdog due to being busy
with sending and not updating the watchdog anymore.
This commit introduces a timeout in dlt_daemon_process_user_message_log
which allows this function to run at most watchdog_trigger_interval
seconds.

Signed-off-by: Alexander Mohr <alexander.m.mohr@mercedes-benz.com>
@minminlittleshrimp minminlittleshrimp self-assigned this Apr 29, 2024
@minminlittleshrimp
Copy link
Collaborator

imho, I am not sure about the fix.
Honestly I have not touched Watchdog feature in dlt yet 😁
I would love to spend sometime to check from my local some functionality of this option.
Thank you for the patch!

@michael-methner michael-methner merged commit efd8c16 into COVESA:master Apr 30, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants