Skip to content

Commit

Permalink
Merge pull request #80 from tmknight/develop
Browse files Browse the repository at this point in the history
v0.9.0
  • Loading branch information
tmknight authored Feb 18, 2024
2 parents e3e363e + a8f6c2a commit d8a43fd
Show file tree
Hide file tree
Showing 9 changed files with 190 additions and 46 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

## 0.9.0

### Added

- `post-action` to execute a task post-restart attempt
- `autoheal.restart.exclude` container label as override when `AUTOHEAL_CONTAINER_LABEL` set to `all`
- `log-excluded` as a switch to allow logging of containers excluded from restart

## 0.8.3

### Changed
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "docker-autoheal"
version = "0.8.3"
version = "0.9.0"
authors = ["Travis M Knight"]
license = "GPL-3.0"
description = "A cross-platform tool to monitor and remediate unhealthy Docker containers"
Expand Down
35 changes: 24 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Docker Autoheal
# Docker-Autoheal

[![GitHubRelease][GitHubReleaseBadge]][GitHubReleaseLink]
[![DockerPublishing][DockerPublishingBadge]][DockerLink]
Expand All @@ -21,24 +21,27 @@ The `docker-autoheal` binary may be executed in a native OS or from a Docker con

| Variable | Default | Description |
|:----------------------------:|:------------------------:|:-----------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autoheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts (override via label; see below) |
| **AUTOHEAL_CONNECTION_TYPE** | local | This determines how `docker-autoheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts (override via label; see below) |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every `n` seconds |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_POST_ACTION** | | The absolute path of an executable to be run after restart attempts; container `name`, `id` and `stop-timeout` are passed as arguments |
| **AUTOHEAL_LOG_EXCLUDED** | FALSE | Allow (`TRUE`/`FALSE`) logging (and webhook/apprise if set) for containers with `autostart.restart.exclusion=TRUE` |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 (ssl: 2376) | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
| **AUTOHEAL_PEM_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
| **AUTOHEAL_APPRISE_URL** | |URL to post messages to the apprise following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_KEY** | |KEY to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_URL** | |URL to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_PEM_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
| **AUTOHEAL_APPRISE_URL** | |URL to post messages to the apprise following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_KEY** | |KEY to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_URL** | |URL to post messages to the webhook following actions on unhealthy container |

### Optional Container Labels

| Label | Description |
|:----------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| **autoheal.stop.timeout** | Per container override (in seconds) of `AUTOHEAL_STOP_TIMEOUT` during restart (e.g. some container routinely takes longer to cleanly exit) |
| Label | Description | Example |
|:----------------------------:|:-------------------------------------------------------------------------:|:---:|
| **autoheal.stop.timeout** | Per container override (in seconds) of `AUTOHEAL_STOP_TIMEOUT` during restart | Some container routinely takes longer to cleanly exit |
| **autoheal.restart.exclusion** | Per container override (true/false) to `AUTOHEAL_CONTAINER_LABEL` | If you have a large number of containers that you wish to monitor and restart, apply this label as `TRUE` to the few that you do not wish to restart and set `AUTOHEAL_CONTAINER_LABEL` to `all` |

### Binary Options

Expand Down Expand Up @@ -72,6 +75,11 @@ Options:
The webhook json key string
-w, --webhook-url <WEBHOOK_URL>
The webhook url
--post-action <SCRIPT_PATH>
The fully qualified path to a script that should be
executed after container restart
--log-excluded Log unhealthy, but restart excluded containers
(WARNING, this could be chatty)
-h, --help Print help
-v, --version Print version information
```
Expand Down Expand Up @@ -169,6 +177,11 @@ If you need the `docker-autoheal` container timezone to match the local machine,
docker run ... -v /etc/localtime:/etc/localtime:ro
```

### A Word of Caution about Excluding from Restart and Logging Exclusions

- If you exclude containers from restart and set logging of excluded to `true` there will be a large number of log messages about that container if it becomes unhealthy
- Additionally, if you have set a webhook or apprise in this scenario, those will be executed at the same interval as monitoring is set

## Credits

- [willfarrell](https://github.com/willfarrell)
Expand Down
72 changes: 53 additions & 19 deletions src/execute/looper.rs
Original file line number Diff line number Diff line change
@@ -1,42 +1,51 @@
use crate::{
inquire::inspect::inspect_container, inquire::list::containers_list,
report::logging::log_message, report::webhook::notify_webhook, ERROR, INFO, WARNING,
execute::postaction::execute_action,
inquire::{inspect::inspect_container, list::containers_list},
report::{logging::log_message, webhook::notify_webhook},
LoopVariablesList, ERROR, INFO, WARNING,
};
use bollard::{container::RestartContainerOptions, Docker};
use std::time::Duration;

pub async fn start_loop(
autoheal_interval: u64,
autoheal_container_label: String,
autoheal_stop_timeout: isize,
autoheal_apprise_url: String,
autoheal_webhook_key: String,
autoheal_webhook_url: String,
var: LoopVariablesList,
docker: Docker,
) -> Result<(), Box<dyn std::error::Error>> {
// Establish loop interval
let mut interval = tokio::time::interval(Duration::from_secs(autoheal_interval));
let mut interval = tokio::time::interval(Duration::from_secs(var.interval));
loop {
// Gather all unhealthy containers
let containers = containers_list(&autoheal_container_label, docker.clone()).await;
let containers = containers_list(&var.container_label, docker.clone()).await;
// Prepare for concurrent execution
let mut handles = vec![];
// Iterate through suspected unhealthy
for container in containers {
// Prepare reusable objects
let docker_clone = docker.clone();
let apprise_url = autoheal_apprise_url.clone();
let webhook_key = autoheal_webhook_key.clone();
let webhook_url = autoheal_webhook_url.clone();
let apprise_url = var.apprise_url.clone();
let webhook_key = var.webhook_key.clone();
let webhook_url = var.webhook_url.clone();
let post_action = var.post_action.clone();
let log_excluded = var.log_excluded;

// Determine if stop override label
let s = "autoheal.stop.timeout".to_string();
let autoheal_stop_timeout = match container.labels {
Some(label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(autoheal_stop_timeout),
None => autoheal_stop_timeout,
Some(ref label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(var.stop_timeout),
None => var.stop_timeout,
},
None => autoheal_stop_timeout,
None => var.stop_timeout,
};

// Determine if excluded
let s = "autoheal.restart.exclude".to_string();
let autoheal_restart_exclude = match container.labels {
Some(ref label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(false),
None => false,
},
None => false,
};

// Execute concurrently
Expand Down Expand Up @@ -69,6 +78,14 @@ pub async fn start_loop(
name, id
);
log_message(&msg0, ERROR).await;
} else if autoheal_restart_exclude {
if log_excluded {
let msg0 = format!(
"[{}] Container ({}) is unhealthy, however is excluded from restart on request",
name, id
);
log_message(&msg0, WARNING).await;
};
} else {
// Determine failing streak of the unhealthy container
let inspection = inspect_container(docker_clone.clone(), name, &id).await;
Expand Down Expand Up @@ -116,16 +133,33 @@ pub async fn start_loop(
};

// Send webhook
if !(webhook_url.is_empty() && webhook_key.is_empty()) {
if !(webhook_url.is_empty() || webhook_key.is_empty())
&& (!autoheal_restart_exclude || log_excluded)
{
let payload = format!("{{\"{}\":\"{}\"}}", &webhook_key, &msg);
notify_webhook(&webhook_url, &payload).await;
}
// Send apprise
if !apprise_url.is_empty() {
if !apprise_url.is_empty() && (!autoheal_restart_exclude || log_excluded) {
let payload =
format!("{{\"title\":\"Docker-Autoheal\",\"body\":\"{}\"}}", &msg);
notify_webhook(&apprise_url, &payload).await;
}
// Execute post-action if not excluded
match post_action.is_empty() {
false => {
if !autoheal_restart_exclude {
execute_action(
post_action,
name,
id,
autoheal_stop_timeout.to_string(),
)
.await;
}
}
true => {}
}
}
}
});
Expand Down
42 changes: 42 additions & 0 deletions src/execute/postaction.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
use crate::{report::logging::log_message, ERROR, INFO};
use std::fs;
use std::process::Command;

pub async fn execute_action(post_action: String, name: &str, id: String, timeout: String) {
// Check if the script exists
if fs::metadata(post_action.clone()).is_ok() {
// Execute using Command
let mut command = Command::new(post_action.clone());

// Arguments to the command
command.args([name, &id, &timeout]);

// Execute the command and handle the result
let msg0 = match command.spawn() {
Ok(mut child) => {
// Wait for the child process to finish
match child.wait() {
Ok(_s) => format!(
"[{}] Post-action ({}) for container ({}) was successful",
name, post_action, id
),
Err(e) => format!(
"[{}] Post-action ({}) for container ({}) failed to complete: {}",
name, post_action, id, e
),
}
}
Err(e) => format!(
"[{}] Post-action ({}) for container ({}) failed to start: {}",
name, post_action, id, e
),
};
log_message(&msg0, INFO).await;
} else {
let msg0 = format!(
"[{}] Post-action ({}) for container ({}) not found",
name, post_action, id
);
log_message(&msg0, ERROR).await;
}
}
16 changes: 16 additions & 0 deletions src/inquire/environment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ pub struct VariablesList {
pub apprise_url: String,
pub webhook_key: String,
pub webhook_url: String,
pub post_action: String,
pub log_excluded: bool,
}

// Get environment variable
Expand Down Expand Up @@ -84,6 +86,18 @@ pub async fn get_var(opt: OptionsList) -> VariablesList {
}
},
};
let autoheal_post_action: String = match opt.post_action {
None => get_env("AUTOHEAL_POST_ACTION", ""),
Some(o) => o,
};
let mut autoheal_log_excluded = false;
if !opt.log_excluded {
if get_env("AUTOHEAL_LOG_EXCLUDED", "false") != "false" {
autoheal_log_excluded = true
}
} else {
autoheal_log_excluded = true
}

// Autoheal tcp variables
let autoheal_tcp_host: String = match opt.tcp_host {
Expand Down Expand Up @@ -156,6 +170,8 @@ pub async fn get_var(opt: OptionsList) -> VariablesList {
stop_timeout: autoheal_stop_timeout,
interval: autoheal_interval,
start_delay: autoheal_start_delay,
post_action: autoheal_post_action,
log_excluded: autoheal_log_excluded,
tcp_address: autoheal_tcp_address,
tcp_timeout: autoheal_tcp_timeout,
key_path: autoheal_key_path,
Expand Down
15 changes: 15 additions & 0 deletions src/inquire/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ pub struct OptionsList {
pub apprise_url: Option<String>,
pub webhook_key: Option<String>,
pub webhook_url: Option<String>,
pub post_action: Option<String>,
pub log_excluded: bool,
}

pub fn get_opts(args: Vec<String>) -> OptionsList {
Expand Down Expand Up @@ -83,6 +85,17 @@ pub fn get_opts(args: Vec<String>) -> OptionsList {
"<WEBHOOK_KEY>",
);
opts.optopt("w", "webhook-url", "The webhook url", "<WEBHOOK_URL>");
opts.optopt(
"",
"post-action",
"The absolute path to a script that should be executed after container restart",
"<SCRIPT_PATH>",
);
opts.optflag(
"",
"log-excluded",
"Log unhealthy, but restart excluded containers (WARNING, this could be chatty)",
);
opts.optflag("h", "help", "Print help");
opts.optflag("v", "version", "Print version information");

Expand Down Expand Up @@ -133,5 +146,7 @@ pub fn get_opts(args: Vec<String>) -> OptionsList {
apprise_url: matches.opt_str("a"),
webhook_key: matches.opt_str("j"),
webhook_url: matches.opt_str("w"),
post_action: matches.opt_str("post-action"),
log_excluded: matches.opt_present("log-excluded"),
}
}
Loading

0 comments on commit d8a43fd

Please sign in to comment.