Skip to content

Commit

Permalink
Add smart input plugin for collecting S.M.A.R.T. data (#2449)
Browse files Browse the repository at this point in the history
  • Loading branch information
rickard-von-essen authored and danielnelson committed Oct 4, 2017
1 parent 002ccf3 commit e69c3f9
Show file tree
Hide file tree
Showing 4 changed files with 901 additions and 0 deletions.
1 change: 1 addition & 0 deletions plugins/inputs/all/all.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ import (
_ "github.com/influxdata/telegraf/plugins/inputs/riak"
_ "github.com/influxdata/telegraf/plugins/inputs/salesforce"
_ "github.com/influxdata/telegraf/plugins/inputs/sensors"
_ "github.com/influxdata/telegraf/plugins/inputs/smart"
_ "github.com/influxdata/telegraf/plugins/inputs/snmp"
_ "github.com/influxdata/telegraf/plugins/inputs/snmp_legacy"
_ "github.com/influxdata/telegraf/plugins/inputs/socket_listener"
Expand Down
135 changes: 135 additions & 0 deletions plugins/inputs/smart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Telegraf S.M.A.R.T. plugin

Get metrics using the command line utility `smartctl` for S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) storage devices. SMART is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs)[1] that detects and reports on various indicators of drive reliability, with the intent of enabling the anticipation of hardware failures.
See smartmontools (https://www.smartmontools.org/).

If no devices are specified, the plugin will scan for SMART devices via the following command:

```
smartctl --scan
```

Metrics will be reported from the following `smartctl` command:

```
smartctl --info --attributes --health -n <nocheck> --format=brief <device>
```

This plugin supports _smartmontools_ version 5.41 and above, but v. 5.41 and v. 5.42
might require setting `nocheck`, see the comment in the sample configuration.

To enable SMART on a storage device run:

```
smartctl -s on <device>
```

## Measurements

- smart_device:

* Tags:
- `capacity`
- `device`
- `device_model`
- `enabled`
- `health`
- `serial_no`
- `wwn`
* Fields:
- `exit_status`
- `health_ok`
- `read_error_rate`
- `seek_error`
- `temp_c`
- `udma_crc_errors`

- smart_attribute:

* Tags:
- `device`
- `fail`
- `flags`
- `id`
- `name`
- `serial_no`
- `wwn`
* Fields:
- `exit_status`
- `raw_value`
- `threshold`
- `value`
- `worst`

### Flags

The interpretation of the tag `flags` is:
- *K* auto-keep
- *C* event count
- *R* error rate
- *S* speed/performance
- *O* updated online
- *P* prefailure warning

### Exit Status

The `exit_status` field captures the exit status of the smartctl command which
is defined by a bitmask. For the interpretation of the bitmask see the man page for
smartctl.

### Device Names

Device names, e.g., `/dev/sda`, are *not persistent*, and may be
subject to change across reboots or system changes. Instead, you can the
*World Wide Name* (WWN) or serial number to identify devices. On Linux block
devices can be referenced by the WWN in the following location:
`/dev/disk/by-id/`.

## Configuration

```toml
# Read metrics from storage devices supporting S.M.A.R.T.
[[inputs.smart]]
## Optionally specify the path to the smartctl executable
# path = "/usr/bin/smartctl"
#
## On most platforms smartctl requires root access.
## Setting 'use_sudo' to true will make use of sudo to run smartctl.
## Sudo must be configured to to allow the telegraf user to run smartctl
## with out password.
# use_sudo = false
#
## Skip checking disks in this power mode. Defaults to
## "standby" to not wake up disks that have stoped rotating.
## See --nockeck in the man pages for smartctl.
## smartctl version 5.41 and 5.42 have faulty detection of
## power mode and might require changing this value to
## "never" depending on your storage device.
# nocheck = "standby"
#
## Gather detailed metrics for each SMART Attribute.
## Defaults to "false"
##
# attributes = false
#
## Optionally specify devices to exclude from reporting.
# excludes = [ "/dev/pass6" ]
#
## Optionally specify devices and device type, if unset
## a scan (smartctl --scan) for S.M.A.R.T. devices will
## done and all found will be included except for the
## excluded in excludes.
# devices = [ "/dev/ada0 -d atacam" ]
```

To run `smartctl` with `sudo` create a wrapper script and use `path` in
the configuration to execute that.

## Output

Example output from an _Apple SSD_:
```
> smart_attribute,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=199,name=UDMA_CRC_Error_Count,flags=-O-RC-,fail=-,host=mbpro.local,device=/dev/rdisk0 threshold=0i,raw_value=0i,exit_status=0i,value=200i,worst=200i 1502536854000000000
> smart_attribute,device=/dev/rdisk0,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=240,name=Unknown_SSD_Attribute,flags=-O---K,fail=-,host=mbpro.local exit_status=0i,value=100i,worst=100i,threshold=0i,raw_value=0i 1502536854000000000
> smart_device,enabled=Enabled,host=mbpro.local,device=/dev/rdisk0,model=APPLE\ SSD\ SM0512F,serial_no=S1K5NYCD964433,wwn=5002538655584d30,capacity=500277790720 udma_crc_errors=0i,exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=40i 1502536854000000000
```
Loading

0 comments on commit e69c3f9

Please sign in to comment.