Skip to content

Commit

Permalink
upgrade watchdog service to scheduled task (kolide#1951)
Browse files Browse the repository at this point in the history
  • Loading branch information
zackattack01 authored Nov 12, 2024
1 parent 9fc64eb commit bc44bcb
Show file tree
Hide file tree
Showing 14 changed files with 633 additions and 496 deletions.
2 changes: 1 addition & 1 deletion cmd/launcher/launcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ func runLauncher(ctx context.Context, cancel func(), multiSlogger, systemMultiSl
go checkpointer.Once(ctx)
runGroup.Add("logcheckpoint", checkpointer.Run, checkpointer.Interrupt)

watchdogController, err := watchdog.NewController(ctx, k)
watchdogController, err := watchdog.NewController(ctx, k, opts.ConfigFilePath)
if err != nil { // log any issues here but move on, watchdog is not critical path
slogger.Log(ctx, slog.LevelError,
"could not init watchdog controller",
Expand Down
2 changes: 1 addition & 1 deletion cmd/launcher/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ func runSubcommands(systemMultiSlogger *multislogger.MultiSlogger) error {
case "secure-enclave":
run = runSecureEnclave
case "watchdog": // note: this is currently only implemented for windows
run = watchdog.RunWatchdogService
run = watchdog.RunWatchdogTask
default:
return fmt.Errorf("unknown subcommand %s", os.Args[1])
}
Expand Down
31 changes: 15 additions & 16 deletions docs/architecture/launcher_watchdog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,30 @@
Note that for the initial implementation, this service is windows only. It is intentionally designed to give room for alternate OS implementations if needed in the future.
Most of the relevant code can be found in [ee/watchdog](../../ee/watchdog/)

Here is a basic sequence diagram displaying the enable path for the windows watchdog service. The `launcher_watchdog_enabled` control flag will trigger the initial configuration and installation, and removal of the flag will trigger removal of the service.
Here is a basic sequence diagram displaying the enable path for the windows watchdog task. The `launcher_watchdog_enabled` control flag will trigger the initial configuration and installation, and removal of the flag will trigger removal of the task.

You can alternatively install or remove the task for testing/troubleshooting using the `--install-task` and `--remove-task` options for the watchdog subcommand. Note this is intended for developer convenience or emergency usage - the `launcher_watchdog_enabled` flag sent from cloud will eventually override any manual actions.

```mermaid
sequenceDiagram
participant LauncherKolideK2Svc
Note right of LauncherKolideK2Svc: ./launcher.exe svc ...
create participant WindowsServiceManager
LauncherKolideK2Svc->>WindowsServiceManager: if launcher_watchdog_enabled
create participant LauncherKolideWatchdogSvc
WindowsServiceManager->>LauncherKolideWatchdogSvc: have we installed the watchdog?
Note left of LauncherKolideWatchdogSvc: ./launcher.exe watchdog
participant WindowsSchedulerService
participant LauncherKolideK2WatchdogTask
Note right of LauncherKolideK2WatchdogTask: ./launcher.exe watchdog
alt yes the service already exists
LauncherKolideK2Svc->>LauncherKolideWatchdogSvc: Restart to ensure latest
else no the service does not exist
LauncherKolideK2Svc->>WindowsServiceManager: 1 - create, configure, etc
LauncherKolideK2Svc->>LauncherKolideWatchdogSvc: 2 - Start
activate LauncherKolideWatchdogSvc
alt launcher_watchdog_enabled
LauncherKolideK2Svc->>WindowsSchedulerService: create, configure, and install watchdog task
activate LauncherKolideK2WatchdogTask
else flag is not enabled
LauncherKolideK2Svc->>WindowsSchedulerService: remove watchdog task
end
loop every n minutes
LauncherKolideWatchdogSvc->>WindowsServiceManager: Query LauncherKolideK2Svc status
LauncherKolideWatchdogSvc->>LauncherKolideK2Svc: Start if Stopped
loop every 30 minutes, or 1 minute after wake event
WindowsSchedulerService->>LauncherKolideK2WatchdogTask: triggers scheduled task
LauncherKolideK2WatchdogTask->>LauncherKolideK2Svc: performs healthcheck, restarts if stopped
end
```

The restart functionality is currently limited to detecting a stopped state, but the idea here is to lay out the foundation for more advanced healthchecking.
The watchdog service itself runs as a separate invocation of launcher, writing all logs to sqlite. The main invocation of launcher runs a watchdog controller, which responds to the `launcher_watchdog_enabled` flag, and publishes all sqlite logs to debug.json.
The watchdog task itself runs as a launcher subcommand, performing any required checks/actions and writing all logs to sqlite before exiting. The main invocation of launcher runs a watchdog controller, which responds to the `launcher_watchdog_enabled` flag, and publishes all sqlite logs to debug.json.
12 changes: 6 additions & 6 deletions ee/powereventwatcher/power_event_watcher_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ type (

const (
eventIdEnteringModernStandby = 506
eventIdExitingModernStandby = 507
EventIdExitingModernStandby = 507
eventIdEnteringSleep = 42
eventIdResumedFromSleep = 107
EventIdResumedFromSleep = 107

operationSuccessfulMsg = "The operation completed successfully."
)
Expand Down Expand Up @@ -108,7 +108,7 @@ func (ims *InMemorySleepStateUpdater) OnPowerEvent(eventID int) error {
switch eventID {
case eventIdEnteringModernStandby, eventIdEnteringSleep:
ims.inModernStandby = true
case eventIdExitingModernStandby, eventIdResumedFromSleep:
case EventIdExitingModernStandby, EventIdResumedFromSleep:
ims.inModernStandby = false
default:
ims.slogger.Log(context.TODO(), slog.LevelWarn,
Expand All @@ -134,7 +134,7 @@ func (ks *knapsackSleepStateUpdater) OnPowerEvent(eventID int) error {
"err", err,
)
}
case eventIdExitingModernStandby, eventIdResumedFromSleep:
case EventIdExitingModernStandby, EventIdResumedFromSleep:
ks.slogger.Log(context.TODO(), slog.LevelDebug,
"system is waking",
"event_id", eventID,
Expand Down Expand Up @@ -187,9 +187,9 @@ func New(ctx context.Context, slogger *slog.Logger, pes powerEventSubscriber) (*

queryStr := fmt.Sprintf("*[System[Provider[@Name='Microsoft-Windows-Kernel-Power'] and (EventID=%d or EventID=%d or EventID=%d or EventID=%d)]]",
eventIdEnteringModernStandby,
eventIdExitingModernStandby,
EventIdExitingModernStandby,
eventIdEnteringSleep,
eventIdResumedFromSleep,
EventIdResumedFromSleep,
)
query, err := syscall.UTF16PtrFromString(queryStr)
if err != nil {
Expand Down
4 changes: 2 additions & 2 deletions ee/uninstall/uninstall_windows.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ func disableAutoStart(ctx context.Context, k types.Knapsack) error {
}

// attempt to remove watchdog service in case it is installed to prevent startups later on
if err := watchdog.RemoveService(svcMgr); err != nil {
return fmt.Errorf("removing watchdog service, error may be expected if not enabled: %w", err)
if err := watchdog.RemoveWatchdogTask(k.Identifier()); err != nil {
return fmt.Errorf("removing watchdog task, error may be expected if not installed: %w", err)
}

return nil
Expand Down
2 changes: 1 addition & 1 deletion ee/watchdog/controller_other.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ import (

type WatchdogController struct{}

func NewController(_ context.Context, _ types.Knapsack) (*WatchdogController, error) {
func NewController(_ context.Context, _ types.Knapsack, _ string) (*WatchdogController, error) {
return nil, nil
}

Expand Down
Loading

0 comments on commit bc44bcb

Please sign in to comment.