Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.3.0 #18

Merged
merged 4 commits into from
Jan 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 23 additions & 22 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
[package]
name = "docker-autoheal"
version = "0.2.7"
authors = ["Travis M Knight <[email protected]>"]
license = "MIT"
description = "Monitor and restart unhealthy docker containers"
readme = "README.md"
homepage = "https://github.com/tmknight/docker-autoheal"
edition = "2021"
rust-version = "1.74.1"

[dependencies]
bollard = "*"
chrono = "0.4.*"
futures = "0.3.*"
rustls = "0.22.*"
tokio = { version = "1.*", features = ["full"] }

[[bin]]
name = "docker-autoheal"
bench = true
test = true
[package]
name = "docker-autoheal"
version = "0.3.0"
authors = ["Travis M Knight <[email protected]>"]
license = "MIT"
description = "Monitor and restart unhealthy docker containers"
readme = "README.md"
homepage = "https://github.com/tmknight/docker-autoheal"
edition = "2021"
rust-version = "1.74.1"

[dependencies]
bollard = "*"
chrono = "0.4.*"
futures = "0.3.*"
rustls = "0.22.*"
tokio = { version = "1.*", features = ["full"] }
getopts = "0.2.*"

[[bin]]
name = "docker-autoheal"
bench = true
test = true
65 changes: 45 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,20 @@ The `docker-autoheal` binary may be executed via a native OS or via a Docker con

## ENV Defaults

| Variable | Default | Description |
|:----------------------------:|:---------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autheal` connects to Docker (One of: local, socket, http |
| Variable | Default | Description |
|:----------------------------:|:---------------------:|:-----------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts <!-- (overridable via label; see below) --> |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every`n` seconds** |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
<!-- | **AUTOHEAL_KEY_PATH** | /opt/docker-autoheal/tls/key.pem | Fully qualified path to key.pem |
<!-- | **AUTOHEAL_KEY_PATH** | /opt/docker-autoheal/tls/key.pem | Fully qualified path to key.pem |
| **AUTOHEAL_CERT_PATH** | /opt/docker-autoheal/tls/cert.pem | Fully qualified path to cert.pem |
| **AUTOHEAL_CA_PATH** | /opt/docker-autoheal/tls/ca.pem | Fully qualified path to ca.pem | -->
<!-- |WEBHOOK_URL | |Post messages to the webhook following actions on unhealthy container | -->
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts <!-- (overridable via label; see below) --> |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every `n` seconds |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 (ssl: 2376) | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
| **AUTOHEAL_CERT_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
<!--
|**WEBHOOK_URL** | |Post messages to the webhook following actions on unhealthy container |
-->

<!--
### Optional Container Labels
Expand All @@ -38,12 +37,36 @@ The `docker-autoheal` binary may be executed via a native OS or via a Docker con

- See <https://docs.docker.com/engine/reference/builder/#healthcheck> for details

```bash
Options:
-c, --connection-type <CONNECTION_TYPE>
One of local, socket, http, or ssl
-l, --container-label <CONTAINER_LABEL>
Container label to monitor (e.g. autoheal)
-t, --stop-timeout <STOP_TIMEOUT>
Time in seconds to wait for action to complete
-i, --interval <INTERVAL>
Time in seconds to check health
-d, --start-delay <START_DELAY>
Time in seconds to wait for first check
-n, --tcp-host <TCP_HOST>
The hostname or IP address of the Docker host (when -c
http or ssl)
-p, --tcp-port <TCP_PORT>
The tcp port number of the Docker host (when -c http
or ssl)
-k, --cert-path <CERT_PATH>
The fully qualified path to requisite ssl PEM files
-h, --help Print help
-v, --version Print version information
```

### Local

```bash
export AUTOHEAL_CONTAINER_LABEL=all
/usr/local/bin/docker-autoheal > /var/log/docker-autoheal.log &
/usr/local/bin/docker-autoheal --container-label all > /var/log/docker-autoheal.log &
```

Will connect to the local Docker host and monitor all containers

### Socket
Expand All @@ -57,6 +80,7 @@ docker run -d \
-v /var/run/docker.sock:/var/run/docker.sock \
tmknight/docker-autoheal
```

Will connect to the Docker host via unix socket location /var/run/docker.sock or Windows named pipe location //./pipe/docker_engine and monitor only containers with a label named `autoheal`

### Http
Expand All @@ -71,6 +95,7 @@ docker run -d \
-v /path/to/certs/:/certs/:ro \
tmknight/docker-autoheal
```

Will connect to the Docker host via hostname or IP and the specified port and monitor only containers with a label named `watch-me`

## Other info
Expand All @@ -85,15 +110,15 @@ OR

c) Set ENV `AUTOHEAL_CONTAINER_LABEL=all` to watch all running containers

<!--
### SSL connection type

See <https://docs.docker.com/engine/security/https/> for how to configure TCP with mTLS

The certificates and keys need these names:

- ca.pem
- client-cert.pem
- client-key.pem
-->
- cert.pem
- key.pem

### Docker timezone

Expand Down
7 changes: 7 additions & 0 deletions src/environment.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// Get environment variable
pub fn get_env(key: &str, default: &str) -> String {
match std::env::var(key) {
Ok(val) => val.to_lowercase(),
Err(_e) => default.to_string().to_lowercase(),
}
}
36 changes: 36 additions & 0 deletions src/logging.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
use chrono::Local;
use std::io::{stdout, Write};

// Return binary information
pub const NAME: &str = env!("CARGO_PKG_NAME");
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
pub const AUTHORS: &str = env!("CARGO_PKG_AUTHORS");
pub const LICENSE: &str = env!("CARGO_PKG_LICENSE");
pub const DESCRIPTION: &str = env!("CARGO_PKG_DESCRIPTION");
pub const HOMEPAGE: &str = env!("CARGO_PKG_HOMEPAGE");

pub fn print_version() {
println!("Name: {}", NAME);
println!("Version: {}", VERSION);
println!("Authors: {}", AUTHORS);
println!("License: {}", LICENSE);
println!("Description: {}", DESCRIPTION);
println!("Homepage: {}", HOMEPAGE);
println!();
println!("This is free software; you are free to change and redistribute it.");
println!("There is NO WARRANTY, to the extent permitted by law.");
}

// Logging
pub async fn log_message(msg: &str) {
let date = Local::now().format("%Y-%m-%d %H:%M:%S%z").to_string();
let mut lock = stdout().lock();
writeln!(lock, "{} {}", date, msg).unwrap();
}

// todo
// Webhook
// pub async fn webhook (msg: &str) {
// let date = Local::now().format("%Y-%m-%d %H:%M:%S%z").to_string();
// msg;
// }
104 changes: 104 additions & 0 deletions src/looper.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
use bollard::container::{ListContainersOptions, RestartContainerOptions};
use bollard::Docker;
use std::collections::HashMap;
use std::time::Duration;

use crate::logging::log_message;

pub async fn start_loop(
autoheal_interval: u64,
autoheal_container_label: String,
autoheal_stop_timeout: isize,
docker: Docker,
) -> Result<(), Box<dyn std::error::Error>> {
// Establish loop interval
let mut interval = tokio::time::interval(Duration::from_secs(autoheal_interval));
loop {
// Build container assessment criteria
let mut filters = HashMap::new();
filters.insert("health", vec!["unhealthy"]);
filters.insert("status", vec!["running", "exited", "dead"]);
if autoheal_container_label != "all" {
filters.insert("label", vec![&autoheal_container_label]);
}

// Gather all containers that are unhealthy
let container_options = Some(ListContainersOptions {
all: true,
filters,
..Default::default()
});
let containers = docker.list_containers(container_options).await?;
for container in containers {
// Execute concurrently
let docker_clone = docker.clone();
let join = tokio::task::spawn(async move {
// Get name of container
let name_tmp = match &container.names {
Some(names) => &names[0],
None => {
let msg0 =
String::from("[ERROR] Could not reliably determine container name");
log_message(&msg0).await;
""
}
};
let name = name_tmp.trim_matches('/').trim();

// Get id of container
let id: String = match container.id {
Some(id) => id.chars().take(12).collect(),
None => {
let msg0 =
String::from("[ERROR] Could not reliably determine container id");
log_message(&msg0).await;
"".to_string()
}
};

if !(name.is_empty() && id.is_empty()) {
// Report unhealthy container
let msg0 = format!("[WARNING] [{}] Container ({}) unhealthy", name, id);
log_message(&msg0).await;

// Build restart options
let restart_options = Some(RestartContainerOptions {
t: autoheal_stop_timeout,
});

// Report container restart
let msg1 = format!(
"[WARNING] [{}] Restarting container ({}) with {}s timeout",
name, id, autoheal_stop_timeout
);
log_message(&msg1).await;

// Restart unhealthy container
let rslt = docker_clone.restart_container(&id, restart_options).await;
match rslt {
Ok(()) => {
let msg0 = format!(
"[INFO] [{}] Restart of container ({}) was successful",
name, id
);
log_message(&msg0).await;
}
Err(e) => {
let msg0 = format!(
"[ERROR] [{}] Restart of container ({}) failed: {}",
name, id, e
);
log_message(&msg0).await;
}
}
} else {
let msg0 = String::from("[ERROR] Could not reliably identify the container");
log_message(&msg0).await;
}
});
join.await?;
}
// Loop interval
interval.tick().await;
}
}
Loading