-
-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch-to-configuration.pl: Fix timers resulting in old unit running #307562
base: master
Are you sure you want to change the base?
switch-to-configuration.pl: Fix timers resulting in old unit running #307562
Conversation
…ixes NixOS#49415 See the added comment for a detail description. See also the related NixOS#49528.
c65e2f9
to
3f896ed
Compare
} | ||
print STDERR "ensuring the following units and targets are started: ", join(", ", sort(keys(%units_to_start))), "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not potentially a quite long list. I thought this is what the filter above was for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a medium long list on my servers, e.g. 3-4 lines in my 100-char terminal.
I believe it is worth showing, because:
- It's useful for debugging what's going on. The
systemctl start
done here is one of the 3 key actions done in configuration switching (the other ones beingstop
andrestart
), and depending on what your units/targets do, it may be the last line you see before your server loses the network or desktop loses the GPU output. In those cases, it can be extremely helpful to see what's being started. - This not being an invisble
start
invocation will hopefully reduce the number of issues people can't diagnose/solve by themselves, and thus number of issues they file. Most NixOS users understandsystemctl
and can reproduce issues when being told whatsystemctl
commands to run, but much fewer users are able/willing to read/modify Perl to be able see the invocation.
I added it as a separate log line in addition to the previous one that lists only the stopped units, so that the short, most important line stays easy to read for the user. Then the longer line follows, which the user can ignore if their system didn't go down.
@@ -806,6 +806,7 @@ sub filter_units { | |||
print STDERR "stopping the following units: ", join(", ", @units_to_stop_filtered), "\n"; | |||
} | |||
# Use current version of systemctl binary before daemon is reexeced. | |||
# The stop may be cancelled by a timer, see note [systemd-timers-cancelling-stop-commands]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this stopping units that are getting removed from the NixOS configuration right? So this specific case here is probably very rare (I hope)
Wouldn't changing |
Oh: but if it's stopped already I guess it isn't stopped a second time by restart... Okay perhaps this approach actually works |
Since this is a tricky problem I wrote a regression test, feel free to include it import ./make-test-python.nix ({ pkgs, ... }: {
name = "nixos-rebuild-timers";
nodes = {
machine = { lib, pkgs, ... }: {
imports = [
../modules/profiles/installation-device.nix
../modules/profiles/base.nix
];
nix.settings = {
substituters = lib.mkForce [ ];
hashed-mirrors = null;
connect-timeout = 1;
};
system.includeBuildDependencies = true;
system.extraDependencies = [
# Not part of the initial build apparently?
pkgs.grub2
];
virtualisation = {
cores = 2;
memorySize = 4096;
};
systemd.services.dostuff = {
script = "while true; do echo a > /tmp/a; sleep 1; done";
startAt = "*:*"; # every second, in practice it seems rate limited to 30s
serviceConfig.Type="simple";
};
# this service ensures that switching takes at least 60s, enough time for
# dostuff.timer to fire
systemd.services.delaySwitch = {
script = "echo delaySwitch started";
postStop = "echo start sleeping; sleep 60; echo stop sleeping";
serviceConfig.Type="oneshot";
serviceConfig.RemainAfterExit=true;
wantedBy = [ "multi-user.target" ];
};
};
};
testScript =
let
configFile = pkgs.writeText "configuration.nix" ''
{ lib, pkgs, ... }: {
imports = [
./hardware-configuration.nix
<nixpkgs/nixos/modules/testing/test-instrumentation.nix>
];
boot.loader.grub = {
enable = true;
device = "/dev/vda";
forceInstall = true;
};
documentation.enable = false;
systemd.services.dostuff = {
script = "echo b > /tmp/a";
startAt = "*:*"; # every second
serviceConfig.Type="oneshot";
};
systemd.services.delaySwitch = {
script = "echo delaySwitch changed";
postStop = "echo start sleeping; sleep 60; echo stop sleeping";
serviceConfig.Type="oneshot";
serviceConfig.RemainAfterExit=true;
wantedBy = [ "multi-user.target" ];
};
}
'';
in
''
machine.start()
machine.succeed("udevadm settle")
machine.wait_for_unit("multi-user.target")
machine.succeed("nixos-generate-config")
machine.copy_from_host(
"${configFile}",
"/etc/nixos/configuration.nix",
)
with subtest("Switch to the base system"):
machine.succeed("nixos-rebuild switch")
machine.wait_until_succeeds("grep b /tmp/a")
'';
}) |
Fixes #49415.
See the added comment for a detail description.
See also the related #49528.
Things done
nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.
CC:
restartIfChanged, reloadIfChanged, stopIfChanged
is fragile #49528 (comment)restartIfChanged, reloadIfChanged, stopIfChanged
is fragile #49528