Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/switch-to-configuration: Document and test handling of socket-activated services #161838

Merged
merged 2 commits into from
Mar 4, 2022

Conversation

dasJ
Copy link
Member

@dasJ dasJ commented Feb 25, 2022

Motivation for this change

More tests, more docs

The manual should explain what this is doing exactly.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` labels Feb 25, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels Feb 25, 2022
@ncfavier
Copy link
Member

@ofborg test switchTest

@dasJ
Copy link
Member Author

dasJ commented Feb 25, 2022

Also cc @aanderse to remind people that you want the script rewritten ;)

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a flaw 😢 This will break services where ExecStop relies on still having the old etc.

I'm not confident that this can be fixed without fixing systemd or mutating the old units. We really need to inhibit the old service start; otherwise it seems we'll have to introduce new corner cases.

Comment on lines 834 to 835
out = switch_to_specialisation("${machine}", "simple-socket-stop-if-changed-and-execstop")
assert_contains(out, "stopping the following units: socket-activated.service, socket-activated.socket\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This switches from a non-ExecStop service to an ExecStop one, so it doesn't need to stop the socket yet, but does need to do so on the next switch, even when ExecStop is removed.

default), the behavior depends on whether `ExecStop=` is set. If set, the
`.socket` unit and the `.service` unit are **stop**ped and the `.socket`
unit is **start**ed. This is required to prevent races. If `ExecStop=` is
not set, there is no need to touch the `.socket` unit. Instead, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that despite stopIfChanged being enabled, the old service will run until after etc has been switched.
I don't think that's acceptable.

@dasJ
Copy link
Member Author

dasJ commented Feb 27, 2022

@roberth thank you for reviewing the change.
The whole "ExecStop is taken from the new generation" is fixable but I won't do that until we agreed what to do with the other points you mentioned.
Running the service in the new environment is imo acceptable because it's only very short and it increases general service availability.
I do however not know each and every service in NixOS and I don't know if services could suffer from this behavior.

Do you have an alternative suggestion I can implement? If we don't come to an agreement I can redo this PR to only clean the code, add test cases, and add docs to at least have the current behavior properly specified and tested.

@dasJ dasJ force-pushed the feat/stc-less-socket-restarts branch from 79349fe to df75fef Compare February 27, 2022 16:34
@dasJ
Copy link
Member Author

dasJ commented Feb 27, 2022

I removed most of the logic after a longer discussion with Robert. We came to the conclusion that there was no way we can keep up with the race-free nature of the current behavior (please correct me if I got your points wrong).
This would be possible if systemd blessed us with some magic way, but as of writing this comment, it's not.

So this PR mostly adds test cases and proper docs for handling of socket-activated units and leaves the current logic intact.

@dasJ dasJ changed the title nixos/switch-to-configuration: Improve handling of socket-activated services nixos/switch-to-configuration: Document and test handling of socket-activated services Feb 27, 2022
nixos/doc/manual/development/unit-handling.section.md Outdated Show resolved Hide resolved
@@ -339,21 +340,23 @@ sub handleModifiedUnit {
# If this unit is socket-activated, then stop the
# socket unit(s) as well, and restart the
# socket(s) instead of the service.
my $socketActivated = 0;
my $socket_activated = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this snake case when all the other variables are camel case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, but what is unitsToReload?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently a mix between camel case (not yet refactored) and snake case (already refactored)

# Socket-activated services don't get started, just the socket
machine.fail("[ -S /run/test.sock ]")
out = switch_to_specialisation("${machine}", "simple-socket")
# assert_lacks(out, "stopping the following units:") nobody cares
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not throw it out then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it clear that this check was left out explicitly and not just forgotten

assert_lacks(out, "reloading the following units:")
assert_contains(out, "restarting the following units: test-timer.timer\n")
assert_contains(out, "\nrestarting the following units: test-timer.timer\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why aren't we stripping away the spaces from each out variable (or rather in the switch_to_specialisation function)? is it important to check the newline everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because restarting the following units and starting the following units must somehow be differentiated and using regex would be pretty expensive to run hundreds of times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i didn't know out was a string, thought it was an array, my bad

@dasJ dasJ force-pushed the feat/stc-less-socket-restarts branch from df75fef to d56d805 Compare February 27, 2022 20:45
@dasJ
Copy link
Member Author

dasJ commented Mar 2, 2022

If there are no more strong opinions, I'd merge this today or so. It's mostly docs and tests and doesn't touch the logic

nixos/tests/switch-test.nix Outdated Show resolved Hide resolved
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the suggestions from my comment just now. Lgtm!

Copy link
Member

@stigtsp stigtsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dasJ dasJ force-pushed the feat/stc-less-socket-restarts branch from d56d805 to f6ad15f Compare March 3, 2022 19:56
@dasJ dasJ merged commit 803f7d4 into NixOS:master Mar 4, 2022
@dasJ dasJ deleted the feat/stc-less-socket-restarts branch March 4, 2022 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants