Skip to content

Commit

Permalink
Merge pull request #18 from tmknight/develop
Browse files Browse the repository at this point in the history
v0.3.0
  • Loading branch information
tmknight authored Jan 14, 2024
2 parents 437cd3e + a3fb45c commit 6683464
Show file tree
Hide file tree
Showing 7 changed files with 411 additions and 202 deletions.
18 changes: 17 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

45 changes: 23 additions & 22 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
[package]
name = "docker-autoheal"
version = "0.2.7"
authors = ["Travis M Knight <[email protected]>"]
license = "MIT"
description = "Monitor and restart unhealthy docker containers"
readme = "README.md"
homepage = "https://github.com/tmknight/docker-autoheal"
edition = "2021"
rust-version = "1.74.1"

[dependencies]
bollard = "*"
chrono = "0.4.*"
futures = "0.3.*"
rustls = "0.22.*"
tokio = { version = "1.*", features = ["full"] }

[[bin]]
name = "docker-autoheal"
bench = true
test = true
[package]
name = "docker-autoheal"
version = "0.3.0"
authors = ["Travis M Knight <[email protected]>"]
license = "MIT"
description = "Monitor and restart unhealthy docker containers"
readme = "README.md"
homepage = "https://github.com/tmknight/docker-autoheal"
edition = "2021"
rust-version = "1.74.1"

[dependencies]
bollard = "*"
chrono = "0.4.*"
futures = "0.3.*"
rustls = "0.22.*"
tokio = { version = "1.*", features = ["full"] }
getopts = "0.2.*"

[[bin]]
name = "docker-autoheal"
bench = true
test = true
65 changes: 45 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,20 @@ The `docker-autoheal` binary may be executed via a native OS or via a Docker con

## ENV Defaults

| Variable | Default | Description |
|:----------------------------:|:---------------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autheal` connects to Docker (One of: local, socket, http |
| Variable | Default | Description |
|:----------------------------:|:---------------------:|:-----------------------------------------------------:|
| **AUTOHEAL_CONNECTON_TYPE** | local | This determines how `docker-autheal` connects to Docker (One of: local, socket, http, ssl |
| **AUTOHEAL_CONTAINER_LABEL** | autoheal | This is the container label that `docker-autoheal` will use as filter criteria for monitoring - or set to `all` to simply monitor all containers on the host |
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts <!-- (overridable via label; see below) --> |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every`n` seconds** |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
<!-- | **AUTOHEAL_KEY_PATH** | /opt/docker-autoheal/tls/key.pem | Fully qualified path to key.pem |
<!-- | **AUTOHEAL_KEY_PATH** | /opt/docker-autoheal/tls/key.pem | Fully qualified path to key.pem |
| **AUTOHEAL_CERT_PATH** | /opt/docker-autoheal/tls/cert.pem | Fully qualified path to cert.pem |
| **AUTOHEAL_CA_PATH** | /opt/docker-autoheal/tls/ca.pem | Fully qualified path to ca.pem | -->
<!-- |WEBHOOK_URL | |Post messages to the webhook following actions on unhealthy container | -->
| **AUTOHEAL_STOP_TIMEOUT** | 10 | Docker waits `n` seconds for a container to stop before killing it during restarts <!-- (overridable via label; see below) --> |
| **AUTOHEAL_INTERVAL** | 5 | Check container health every `n` seconds |
| **AUTOHEAL_START_DELAY** | 0 | Wait `n` seconds before first health check |
| **AUTOHEAL_TCP_HOST** | localhost | Address of Docker host |
| **AUTOHEAL_TCP_PORT** | 2375 (ssl: 2376) | Port on which to connect to the Docker host |
| **AUTOHEAL_TCP_TIMEOUT** | 10 | Time in `n` seconds before failing connection attempt |
| **AUTOHEAL_CERT_PATH** | /opt/docker-autoheal/tls | Fully qualified path to requisite ssl certificate files (key.pem, cert.pem, ca.pem) when `AUTOHEAL_CONNECTION_TYPE=ssl` |
<!--
|**WEBHOOK_URL** | |Post messages to the webhook following actions on unhealthy container |
-->

<!--
### Optional Container Labels
Expand All @@ -38,12 +37,36 @@ The `docker-autoheal` binary may be executed via a native OS or via a Docker con

- See <https://docs.docker.com/engine/reference/builder/#healthcheck> for details

```bash
Options:
-c, --connection-type <CONNECTION_TYPE>
One of local, socket, http, or ssl
-l, --container-label <CONTAINER_LABEL>
Container label to monitor (e.g. autoheal)
-t, --stop-timeout <STOP_TIMEOUT>
Time in seconds to wait for action to complete
-i, --interval <INTERVAL>
Time in seconds to check health
-d, --start-delay <START_DELAY>
Time in seconds to wait for first check
-n, --tcp-host <TCP_HOST>
The hostname or IP address of the Docker host (when -c
http or ssl)
-p, --tcp-port <TCP_PORT>
The tcp port number of the Docker host (when -c http
or ssl)
-k, --cert-path <CERT_PATH>
The fully qualified path to requisite ssl PEM files
-h, --help Print help
-v, --version Print version information
```

### Local

```bash
export AUTOHEAL_CONTAINER_LABEL=all
/usr/local/bin/docker-autoheal > /var/log/docker-autoheal.log &
/usr/local/bin/docker-autoheal --container-label all > /var/log/docker-autoheal.log &
```

Will connect to the local Docker host and monitor all containers

### Socket
Expand All @@ -57,6 +80,7 @@ docker run -d \
-v /var/run/docker.sock:/var/run/docker.sock \
tmknight/docker-autoheal
```

Will connect to the Docker host via unix socket location /var/run/docker.sock or Windows named pipe location //./pipe/docker_engine and monitor only containers with a label named `autoheal`

### Http
Expand All @@ -71,6 +95,7 @@ docker run -d \
-v /path/to/certs/:/certs/:ro \
tmknight/docker-autoheal
```

Will connect to the Docker host via hostname or IP and the specified port and monitor only containers with a label named `watch-me`

## Other info
Expand All @@ -85,15 +110,15 @@ OR

c) Set ENV `AUTOHEAL_CONTAINER_LABEL=all` to watch all running containers

<!--
### SSL connection type

See <https://docs.docker.com/engine/security/https/> for how to configure TCP with mTLS

The certificates and keys need these names:

- ca.pem
- client-cert.pem
- client-key.pem
-->
- cert.pem
- key.pem

### Docker timezone

Expand Down
7 changes: 7 additions & 0 deletions src/environment.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
// Get environment variable
pub fn get_env(key: &str, default: &str) -> String {
match std::env::var(key) {
Ok(val) => val.to_lowercase(),
Err(_e) => default.to_string().to_lowercase(),
}
}
36 changes: 36 additions & 0 deletions src/logging.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
use chrono::Local;
use std::io::{stdout, Write};

// Return binary information
pub const NAME: &str = env!("CARGO_PKG_NAME");
pub const VERSION: &str = env!("CARGO_PKG_VERSION");
pub const AUTHORS: &str = env!("CARGO_PKG_AUTHORS");
pub const LICENSE: &str = env!("CARGO_PKG_LICENSE");
pub const DESCRIPTION: &str = env!("CARGO_PKG_DESCRIPTION");
pub const HOMEPAGE: &str = env!("CARGO_PKG_HOMEPAGE");

pub fn print_version() {
println!("Name: {}", NAME);
println!("Version: {}", VERSION);
println!("Authors: {}", AUTHORS);
println!("License: {}", LICENSE);
println!("Description: {}", DESCRIPTION);
println!("Homepage: {}", HOMEPAGE);
println!();
println!("This is free software; you are free to change and redistribute it.");
println!("There is NO WARRANTY, to the extent permitted by law.");
}

// Logging
pub async fn log_message(msg: &str) {
let date = Local::now().format("%Y-%m-%d %H:%M:%S%z").to_string();
let mut lock = stdout().lock();
writeln!(lock, "{} {}", date, msg).unwrap();
}

// todo
// Webhook
// pub async fn webhook (msg: &str) {
// let date = Local::now().format("%Y-%m-%d %H:%M:%S%z").to_string();
// msg;
// }
104 changes: 104 additions & 0 deletions src/looper.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
use bollard::container::{ListContainersOptions, RestartContainerOptions};
use bollard::Docker;
use std::collections::HashMap;
use std::time::Duration;

use crate::logging::log_message;

pub async fn start_loop(
autoheal_interval: u64,
autoheal_container_label: String,
autoheal_stop_timeout: isize,
docker: Docker,
) -> Result<(), Box<dyn std::error::Error>> {
// Establish loop interval
let mut interval = tokio::time::interval(Duration::from_secs(autoheal_interval));
loop {
// Build container assessment criteria
let mut filters = HashMap::new();
filters.insert("health", vec!["unhealthy"]);
filters.insert("status", vec!["running", "exited", "dead"]);
if autoheal_container_label != "all" {
filters.insert("label", vec![&autoheal_container_label]);
}

// Gather all containers that are unhealthy
let container_options = Some(ListContainersOptions {
all: true,
filters,
..Default::default()
});
let containers = docker.list_containers(container_options).await?;
for container in containers {
// Execute concurrently
let docker_clone = docker.clone();
let join = tokio::task::spawn(async move {
// Get name of container
let name_tmp = match &container.names {
Some(names) => &names[0],
None => {
let msg0 =
String::from("[ERROR] Could not reliably determine container name");
log_message(&msg0).await;
""
}
};
let name = name_tmp.trim_matches('/').trim();

// Get id of container
let id: String = match container.id {
Some(id) => id.chars().take(12).collect(),
None => {
let msg0 =
String::from("[ERROR] Could not reliably determine container id");
log_message(&msg0).await;
"".to_string()
}
};

if !(name.is_empty() && id.is_empty()) {
// Report unhealthy container
let msg0 = format!("[WARNING] [{}] Container ({}) unhealthy", name, id);
log_message(&msg0).await;

// Build restart options
let restart_options = Some(RestartContainerOptions {
t: autoheal_stop_timeout,
});

// Report container restart
let msg1 = format!(
"[WARNING] [{}] Restarting container ({}) with {}s timeout",
name, id, autoheal_stop_timeout
);
log_message(&msg1).await;

// Restart unhealthy container
let rslt = docker_clone.restart_container(&id, restart_options).await;
match rslt {
Ok(()) => {
let msg0 = format!(
"[INFO] [{}] Restart of container ({}) was successful",
name, id
);
log_message(&msg0).await;
}
Err(e) => {
let msg0 = format!(
"[ERROR] [{}] Restart of container ({}) failed: {}",
name, id, e
);
log_message(&msg0).await;
}
}
} else {
let msg0 = String::from("[ERROR] Could not reliably identify the container");
log_message(&msg0).await;
}
});
join.await?;
}
// Loop interval
interval.tick().await;
}
}
Loading

0 comments on commit 6683464

Please sign in to comment.