Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: uninstall custom signal handlers before shutdown #5913

Merged
merged 13 commits into from
Jan 13, 2025

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Dec 5, 2024

Proposed Commit Message

fix: uninstall custom signal handlers before shutdown

Fixes GH-5849

Additional Context

This might address the following bug reports:

https://bugs.launchpad.net/maas/+bug/2089185
#5849

Test Steps

I haven't been able to reproduce either bug, so testing is needed.

Here is are debs for testing (funny name because Github is unaware of the existence of debian files):

noble: cloud-init_24.3.1-1174-gf7bbd23d-1~bddeb_all.tar.gz
jammy: cloud-init_24.3.1-1016-ge3d0bcd4-1~bddeb_all.tar.gz

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@TheRealFalcon
Copy link
Member

@holmanb , I'm assuming we're looking for some confirmation that this approach works before reviewing?

@r00ta
Copy link
Contributor

r00ta commented Dec 17, 2024

We tested this patch in MAAS but it does not work. As per chat we decided to release a fix in MAAS directly.

So, if this patch was supposed to be used only for MAAS we can close it. Thanks!

Copy link

github-actions bot commented Jan 7, 2025

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2025
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2025
@@ -64,7 +64,7 @@ class TestPowerChange:
[
("poweroff", "now", "10", "will execute: shutdown -P now msg"),
("reboot", "now", "0", "will execute: shutdown -r now msg"),
("halt", "+1", "0", "will execute: shutdown -H +1 msg"),
("halt", "+1", "0", re.escape("will execute: shutdown -H +1 msg")),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this ever pass?

Perhaps something changed in the verify ordering utility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good likelihood it never passed

return None


def inspect_handler(sig: Union[int, Callable, None]) -> None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather some data that would be useful for future debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we won't actually see these logs unless we manually call cloud-init with cloud-init --debug. This is currently getting called pre-log setup.

@@ -106,8 +109,9 @@ def handle(name: str, cfg: Config, cloud: Cloud, args: list) -> None:
break
if (upgrade or pkglist) and reboot_if_required and reboot_fn_exists:
try:
LOG.warning(
"Rebooting after upgrade or install per %s", reboot_marker
LOG.info(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected, so the warnings log level doesn't make sense.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach LGTM.

Can we get some unittests of the new context manager? We obviously need to mock the exit, but I think we could use the context manager and then just call the handler directly and check the logs.

Also, if we're concerned about all the places this would need to be plumbed, I'm wondering if it'd be possible to check the process tree and determine if whatever invoked the reboot was initiated by cloud-init and just ignore those. I'm not saying we should do that here, but it's an alternative to think about.

@TheRealFalcon TheRealFalcon self-assigned this Jan 8, 2025
@holmanb
Copy link
Member Author

holmanb commented Jan 8, 2025

determine if whatever invoked the reboot

Is that possible? Does systemd expose the PID/GID of the process that set a specific boot target? I can't think of a way to do this that wouldn't be completely broken.

@TheRealFalcon
Copy link
Member

Is that possible?

It appears so, but none of the options presented there seem reasonable for this use case, so...just ignore me 😄

@holmanb
Copy link
Member Author

holmanb commented Jan 8, 2025

Is that possible?

It appears so

I still don't think so.

If a user passes a shutdown command through a script, cloud-init doesn't receive the signal from its own child process. The process that sends the signal on reboot/shutdown is systemd. What we would need for this to be possible is to know which process set the systemd boot target to shutdown / reboot / etc.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Not something I'm asking to change, but I did find the new tests a bit hard to grok. Since the test body contains an if condition for each of the 2nd parametrizations, I think that 3 separate tests might have been more appropriate, but that could just be personal preference.

@TheRealFalcon
Copy link
Member

TheRealFalcon commented Jan 9, 2025

Hrm...my approval didn't see the CI failures. There appears to be some legitimate ones there, so let's not merge quite yet.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments inline

return None


def inspect_handler(sig: Union[int, Callable, None]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we won't actually see these logs unless we manually call cloud-init with cloud-init --debug. This is currently getting called pre-log setup.

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@holmanb holmanb merged commit 6cdf51a into canonical:main Jan 13, 2025
22 checks passed
TheRealFalcon pushed a commit to TheRealFalcon/cloud-init that referenced this pull request Jan 13, 2025
holmanb added a commit to holmanb/cloud-init that referenced this pull request Feb 11, 2025
holmanb added a commit to holmanb/cloud-init that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants