Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.9.0 #80

Merged
merged 10 commits into from
Feb 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

## 0.9.0

### Added

- `post-action` to execute a task post-restart attempt
- `autoheal.restart.exclude` container label as override when `AUTOHEAL_CONTAINER_LABEL` set to `all`
- `log-excluded` as a switch to allow logging of containers excluded from restart

## 0.8.3

### Changed
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "docker-autoheal"
version = "0.8.3"
version = "0.9.0"
authors = ["Travis M Knight"]
license = "GPL-3.0"
description = "A cross-platform tool to monitor and remediate unhealthy Docker containers"
Expand Down
35 changes: 24 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Docker Autoheal
# Docker-Autoheal

[![GitHubRelease][GitHubReleaseBadge]][GitHubReleaseLink]
[![DockerPublishing][DockerPublishingBadge]][DockerLink]
Expand All @@ -21,24 +21,27 @@ The `docker-autoheal` binary may be executed in a native OS or from a Docker con

| Variable | Default | Description |
|:----------------------------:|:------------------------:|:-----------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autoheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts (override via label; see below) |
| **AUTOHEAL_CONNECTION_TYPE** | local | This determines how `docker-autoheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts (override via label; see below) |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every `n` seconds |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_POST_ACTION** | | The absolute path of an executable to be run after restart attempts; container `name`, `id` and `stop-timeout` are passed as arguments |
| **AUTOHEAL_LOG_EXCLUDED** | FALSE | Allow (`TRUE`/`FALSE`) logging (and webhook/apprise if set) for containers with `autostart.restart.exclusion=TRUE` |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 (ssl: 2376) | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
| **AUTOHEAL_PEM_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
| **AUTOHEAL_APPRISE_URL** | |URL to post messages to the apprise following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_KEY** | |KEY to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_URL** | |URL to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_PEM_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
| **AUTOHEAL_APPRISE_URL** | |URL to post messages to the apprise following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_KEY** | |KEY to post messages to the webhook following actions on unhealthy container |
| **AUTOHEAL_WEBHOOK_URL** | |URL to post messages to the webhook following actions on unhealthy container |

### Optional Container Labels

| Label | Description |
|:----------------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| **autoheal.stop.timeout** | Per container override (in seconds) of `AUTOHEAL_STOP_TIMEOUT` during restart (e.g. some container routinely takes longer to cleanly exit) |
| Label | Description | Example |
|:----------------------------:|:-------------------------------------------------------------------------:|:---:|
| **autoheal.stop.timeout** | Per container override (in seconds) of `AUTOHEAL_STOP_TIMEOUT` during restart | Some container routinely takes longer to cleanly exit |
| **autoheal.restart.exclusion** | Per container override (true/false) to `AUTOHEAL_CONTAINER_LABEL` | If you have a large number of containers that you wish to monitor and restart, apply this label as `TRUE` to the few that you do not wish to restart and set `AUTOHEAL_CONTAINER_LABEL` to `all` |

### Binary Options

Expand Down Expand Up @@ -72,6 +75,11 @@ Options:
The webhook json key string
-w, --webhook-url <WEBHOOK_URL>
The webhook url
--post-action <SCRIPT_PATH>
The fully qualified path to a script that should be
executed after container restart
--log-excluded Log unhealthy, but restart excluded containers
(WARNING, this could be chatty)
-h, --help Print help
-v, --version Print version information
```
Expand Down Expand Up @@ -169,6 +177,11 @@ If you need the `docker-autoheal` container timezone to match the local machine,
docker run ... -v /etc/localtime:/etc/localtime:ro
```

### A Word of Caution about Excluding from Restart and Logging Exclusions

- If you exclude containers from restart and set logging of excluded to `true` there will be a large number of log messages about that container if it becomes unhealthy
- Additionally, if you have set a webhook or apprise in this scenario, those will be executed at the same interval as monitoring is set

## Credits

- [willfarrell](https://github.com/willfarrell)
Expand Down
72 changes: 53 additions & 19 deletions src/execute/looper.rs
Original file line number Diff line number Diff line change
@@ -1,42 +1,51 @@
use crate::{
inquire::inspect::inspect_container, inquire::list::containers_list,
report::logging::log_message, report::webhook::notify_webhook, ERROR, INFO, WARNING,
execute::postaction::execute_action,
inquire::{inspect::inspect_container, list::containers_list},
report::{logging::log_message, webhook::notify_webhook},
LoopVariablesList, ERROR, INFO, WARNING,
};
use bollard::{container::RestartContainerOptions, Docker};
use std::time::Duration;

pub async fn start_loop(
autoheal_interval: u64,
autoheal_container_label: String,
autoheal_stop_timeout: isize,
autoheal_apprise_url: String,
autoheal_webhook_key: String,
autoheal_webhook_url: String,
var: LoopVariablesList,
docker: Docker,
) -> Result<(), Box<dyn std::error::Error>> {
// Establish loop interval
let mut interval = tokio::time::interval(Duration::from_secs(autoheal_interval));
let mut interval = tokio::time::interval(Duration::from_secs(var.interval));
loop {
// Gather all unhealthy containers
let containers = containers_list(&autoheal_container_label, docker.clone()).await;
let containers = containers_list(&var.container_label, docker.clone()).await;
// Prepare for concurrent execution
let mut handles = vec![];
// Iterate through suspected unhealthy
for container in containers {
// Prepare reusable objects
let docker_clone = docker.clone();
let apprise_url = autoheal_apprise_url.clone();
let webhook_key = autoheal_webhook_key.clone();
let webhook_url = autoheal_webhook_url.clone();
let apprise_url = var.apprise_url.clone();
let webhook_key = var.webhook_key.clone();
let webhook_url = var.webhook_url.clone();
let post_action = var.post_action.clone();
let log_excluded = var.log_excluded;

// Determine if stop override label
let s = "autoheal.stop.timeout".to_string();
let autoheal_stop_timeout = match container.labels {
Some(label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(autoheal_stop_timeout),
None => autoheal_stop_timeout,
Some(ref label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(var.stop_timeout),
None => var.stop_timeout,
},
None => autoheal_stop_timeout,
None => var.stop_timeout,
};

// Determine if excluded
let s = "autoheal.restart.exclude".to_string();
let autoheal_restart_exclude = match container.labels {
Some(ref label) => match label.get(&s) {
Some(v) => v.parse().unwrap_or(false),
None => false,
},
None => false,
};

// Execute concurrently
Expand Down Expand Up @@ -69,6 +78,14 @@ pub async fn start_loop(
name, id
);
log_message(&msg0, ERROR).await;
} else if autoheal_restart_exclude {
if log_excluded {
let msg0 = format!(
"[{}] Container ({}) is unhealthy, however is excluded from restart on request",
name, id
);
log_message(&msg0, WARNING).await;
};
} else {
// Determine failing streak of the unhealthy container
let inspection = inspect_container(docker_clone.clone(), name, &id).await;
Expand Down Expand Up @@ -116,16 +133,33 @@ pub async fn start_loop(
};

// Send webhook
if !(webhook_url.is_empty() && webhook_key.is_empty()) {
if !(webhook_url.is_empty() || webhook_key.is_empty())
&& (!autoheal_restart_exclude || log_excluded)
{
let payload = format!("{{\"{}\":\"{}\"}}", &webhook_key, &msg);
notify_webhook(&webhook_url, &payload).await;
}
// Send apprise
if !apprise_url.is_empty() {
if !apprise_url.is_empty() && (!autoheal_restart_exclude || log_excluded) {
let payload =
format!("{{\"title\":\"Docker-Autoheal\",\"body\":\"{}\"}}", &msg);
notify_webhook(&apprise_url, &payload).await;
}
// Execute post-action if not excluded
match post_action.is_empty() {
false => {
if !autoheal_restart_exclude {
execute_action(
post_action,
name,
id,
autoheal_stop_timeout.to_string(),
)
.await;
}
}
true => {}
}
}
}
});
Expand Down
42 changes: 42 additions & 0 deletions src/execute/postaction.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
use crate::{report::logging::log_message, ERROR, INFO};
use std::fs;
use std::process::Command;

pub async fn execute_action(post_action: String, name: &str, id: String, timeout: String) {
// Check if the script exists
if fs::metadata(post_action.clone()).is_ok() {
// Execute using Command
let mut command = Command::new(post_action.clone());

// Arguments to the command
command.args([name, &id, &timeout]);

// Execute the command and handle the result
let msg0 = match command.spawn() {
Ok(mut child) => {
// Wait for the child process to finish
match child.wait() {
Ok(_s) => format!(
"[{}] Post-action ({}) for container ({}) was successful",
name, post_action, id
),
Err(e) => format!(
"[{}] Post-action ({}) for container ({}) failed to complete: {}",
name, post_action, id, e
),
}
}
Err(e) => format!(
"[{}] Post-action ({}) for container ({}) failed to start: {}",
name, post_action, id, e
),
};
log_message(&msg0, INFO).await;
} else {
let msg0 = format!(
"[{}] Post-action ({}) for container ({}) not found",
name, post_action, id
);
log_message(&msg0, ERROR).await;
}
}
16 changes: 16 additions & 0 deletions src/inquire/environment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ pub struct VariablesList {
pub apprise_url: String,
pub webhook_key: String,
pub webhook_url: String,
pub post_action: String,
pub log_excluded: bool,
}

// Get environment variable
Expand Down Expand Up @@ -84,6 +86,18 @@ pub async fn get_var(opt: OptionsList) -> VariablesList {
}
},
};
let autoheal_post_action: String = match opt.post_action {
None => get_env("AUTOHEAL_POST_ACTION", ""),
Some(o) => o,
};
let mut autoheal_log_excluded = false;
if !opt.log_excluded {
if get_env("AUTOHEAL_LOG_EXCLUDED", "false") != "false" {
autoheal_log_excluded = true
}
} else {
autoheal_log_excluded = true
}

// Autoheal tcp variables
let autoheal_tcp_host: String = match opt.tcp_host {
Expand Down Expand Up @@ -156,6 +170,8 @@ pub async fn get_var(opt: OptionsList) -> VariablesList {
stop_timeout: autoheal_stop_timeout,
interval: autoheal_interval,
start_delay: autoheal_start_delay,
post_action: autoheal_post_action,
log_excluded: autoheal_log_excluded,
tcp_address: autoheal_tcp_address,
tcp_timeout: autoheal_tcp_timeout,
key_path: autoheal_key_path,
Expand Down
15 changes: 15 additions & 0 deletions src/inquire/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ pub struct OptionsList {
pub apprise_url: Option<String>,
pub webhook_key: Option<String>,
pub webhook_url: Option<String>,
pub post_action: Option<String>,
pub log_excluded: bool,
}

pub fn get_opts(args: Vec<String>) -> OptionsList {
Expand Down Expand Up @@ -83,6 +85,17 @@ pub fn get_opts(args: Vec<String>) -> OptionsList {
"<WEBHOOK_KEY>",
);
opts.optopt("w", "webhook-url", "The webhook url", "<WEBHOOK_URL>");
opts.optopt(
"",
"post-action",
"The absolute path to a script that should be executed after container restart",
"<SCRIPT_PATH>",
);
opts.optflag(
"",
"log-excluded",
"Log unhealthy, but restart excluded containers (WARNING, this could be chatty)",
);
opts.optflag("h", "help", "Print help");
opts.optflag("v", "version", "Print version information");

Expand Down Expand Up @@ -133,5 +146,7 @@ pub fn get_opts(args: Vec<String>) -> OptionsList {
apprise_url: matches.opt_str("a"),
webhook_key: matches.opt_str("j"),
webhook_url: matches.opt_str("w"),
post_action: matches.opt_str("post-action"),
log_excluded: matches.opt_present("log-excluded"),
}
}
Loading