Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DUA reports reboot failure, but reboot occurs #679

Open
D-r-P-3-p-p-3-r opened this issue Dec 17, 2024 · 1 comment
Open

DUA reports reboot failure, but reboot occurs #679

D-r-P-3-p-p-3-r opened this issue Dec 17, 2024 · 1 comment
Assignees

Comments

@D-r-P-3-p-p-3-r
Copy link

Expected Behavior

Reboot without error in the log.

Current Behavior

Despite the reboot working an error is logged.

Steps to Reproduce

  1. Run update handler which requests reboot via workflow_request_immediate_reboot()
  2. In some cases (almost 50% on our system) the "Reboot failed." logging appears.

Device Information

  • Architecture: arm32
    *DU Agent Version: 1.1

Logs

2024-10-28T12:16:01.4509Z 926[1387] [I] Action 'Install' complete. Result: 705 (succeeded), 0 (0x0) [ADUC_Workflow_WorkCompletionCallback:890]
2024-10-28T12:16:01.4512Z 926[1387] [I] Install indicated success with RebootRequired - rebooting system now [ADUC_Workflow_MethodCall_Install_Complete:1436]
2024-10-28T12:16:01.4513Z 926[1387] [I] Calling ADUC_RebootSystem [ADUC_MethodCall_RebootSystem:104]
2024-10-28T12:16:01.4513Z 926[1387] [I] ADUC_RebootSystem called. Rebooting system. [ADUC_RebootSystem:74]
2024-10-28T12:16:01.7114Z 926[926] [W] Shutdown signal detected: 15 [OnShutdownSignal:834]
2024-10-28T12:16:01.7117Z 926[926] [I] Agent exited with code 0 [main:1104]
2024-10-28T12:16:01.7118Z 926[926] [W] Agent is shutting down. [ShutdownAgent:815]
2024-10-28T12:16:01.7118Z 926[926] [I] De-initializing command listener thread [UninitializeCommandListenerThread:396]
2024-10-28T12:16:01.7118Z 926[926] [I] ADUC agent stopping [AzureDeviceUpdateCoreInterface_Destroy:314]
2024-10-28T12:16:01.7118Z 926[926] [I] Calling ADUC_Unregister [ADUC_MethodCall_Unregister:90]
2024-10-28T12:16:01.7153Z 926[1387] [I] Child process terminated, signal 15 [ADUC_LaunchChildProcessHelper:179]
2024-10-28T12:16:01.7154Z 926[1387] [E] Reboot failed. Process exit with code: 15 [ADUC_RebootSystem:95]
2024-10-28T12:16:01.7156Z 926[1387] [E] Reboot attempt failed. [ADUC_Workflow_MethodCall_Install_Complete:1446]

Additional Information

Our Linux is systemd-based. What happens in the end when /sbin/reboot is called, is that the reboot.target is activated.
This leads to SIGTERM (15) being sent to all user context processes.
Depending on if the reboot finishes before receiving SIGTERM itself or after it exits with code 0 or with code 15.

See also
https://systemd-devel.freedesktop.narkive.com/JafJo2Lj/systemctl-reboot-get-terminated-by-signal-15

@jw-msft
Copy link
Contributor

jw-msft commented Jan 7, 2025

This looks like a race-condition that is not appropriately handled in the adu-shell child process that issues the reboot due to adu-shell not registering and handling SIGTERM properly.

The agent execs a seteuid'ed adu-shell child process so that it can issue a shutdown as root user. The adu-shell child process is receiving SIGTERM signal after it had invoked /sbin/reboot and does not have a registered signal handler for SIGTERM so its process is terminated.

Therefore, in the agent parent process, waitpid() leads to a wait status where WIFSIGNALED(wstatus) being true here:

Log_Info("Child process terminated, signal %d", childExitStatus);

I think the following is needed in a fix:

  • adu-shell must register a handler function for SIGTERM
  • The SIGTERM signal handler should ignore the signal if reboot cmdline arg has been parsed as the action
  • After waitpid() leads to child wait status of 0 success, the parent AducIotAgent process should perhaps proactively call ADUC_ShutdownService_RequestShutdown() to set s_isShuttingDown = true before waiting for SIGTERM to come and doing so here:
    ADUC_ShutdownService_RequestShutdown();
  • Ideally, in adu-shell, it can also be switched to using int syscall(SYS_reboot, int magic, int magic2, int op, void *arg); instead of invoking system("/sbin/reboot"), but that involves using the magic number arguments that are Linus Torvalds' birthday and his 3 daughters' birthdays in hex.

@jw-msft jw-msft self-assigned this Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants