-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smart: Gather S.M.A.R.T. information from storage devices #2449
Changes from all commits
11ce941
a71cb5f
ea0da08
d73b6d6
0ccdfed
73d66b2
2c02b6c
9535507
61088ed
364eb9f
de5998e
5ff8f0f
77b3eab
3d44bad
ade9f19
c9c7122
da606ac
6aac2cd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
# Telegraf S.M.A.R.T. plugin | ||
|
||
Get metrics using the command line utility `smartctl` for S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) storage devices. SMART is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs)[1] that detects and reports on various indicators of drive reliability, with the intent of enabling the anticipation of hardware failures. | ||
See smartmontools (https://www.smartmontools.org/). | ||
|
||
If no devices are specified, the plugin will scan for SMART devices via the following command: | ||
|
||
``` | ||
smartctl --scan | ||
``` | ||
|
||
Metrics will be reported from the following `smartctl` command: | ||
|
||
``` | ||
smartctl --info --attributes --health -n <nocheck> --format=brief <device> | ||
``` | ||
|
||
This plugin supports _smartmontools_ version 5.41 and above, but v. 5.41 and v. 5.42 | ||
might require setting `nocheck`, see the comment in the sample configuration. | ||
|
||
To enable SMART on a storage device run: | ||
|
||
``` | ||
smartctl -s on <device> | ||
``` | ||
|
||
## Measurements | ||
|
||
- smart_device: | ||
|
||
* Tags: | ||
- `capacity` | ||
- `device` | ||
- `device_model` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think just |
||
- `enabled` | ||
- `health` | ||
- `serial_no` | ||
- `wwn` | ||
* Fields: | ||
- `exit_status` | ||
- `health_ok` | ||
- `read_error_rate` | ||
- `seek_error` | ||
- `temp_c` | ||
- `udma_crc_errors` | ||
|
||
- smart_attribute: | ||
|
||
* Tags: | ||
- `device` | ||
- `fail` | ||
- `flags` | ||
- `id` | ||
- `name` | ||
- `serial_no` | ||
- `wwn` | ||
* Fields: | ||
- `exit_status` | ||
- `raw_value` | ||
- `threshold` | ||
- `value` | ||
- `worst` | ||
|
||
### Flags | ||
|
||
The interpretation of the tag `flags` is: | ||
- *K* auto-keep | ||
- *C* event count | ||
- *R* error rate | ||
- *S* speed/performance | ||
- *O* updated online | ||
- *P* prefailure warning | ||
|
||
### Exit Status | ||
|
||
The `exit_status` field captures the exit status of the smartctl command which | ||
is defined by a bitmask. For the interpretation of the bitmask see the man page for | ||
smartctl. | ||
|
||
### Device Names | ||
|
||
Device names, e.g., `/dev/sda`, are *not persistent*, and may be | ||
subject to change across reboots or system changes. Instead, you can the | ||
*World Wide Name* (WWN) or serial number to identify devices. On Linux block | ||
devices can be referenced by the WWN in the following location: | ||
`/dev/disk/by-id/`. | ||
|
||
## Configuration | ||
|
||
```toml | ||
# Read metrics from storage devices supporting S.M.A.R.T. | ||
[[inputs.smart]] | ||
## Optionally specify the path to the smartctl executable | ||
# path = "/usr/bin/smartctl" | ||
# | ||
## On most platforms smartctl requires root access. | ||
## Setting 'use_sudo' to true will make use of sudo to run smartctl. | ||
## Sudo must be configured to to allow the telegraf user to run smartctl | ||
## with out password. | ||
# use_sudo = false | ||
# | ||
## Skip checking disks in this power mode. Defaults to | ||
## "standby" to not wake up disks that have stoped rotating. | ||
## See --nockeck in the man pages for smartctl. | ||
## smartctl version 5.41 and 5.42 have faulty detection of | ||
## power mode and might require changing this value to | ||
## "never" depending on your storage device. | ||
# nocheck = "standby" | ||
# | ||
## Gather detailed metrics for each SMART Attribute. | ||
## Defaults to "false" | ||
## | ||
# attributes = false | ||
# | ||
## Optionally specify devices to exclude from reporting. | ||
# excludes = [ "/dev/pass6" ] | ||
# | ||
## Optionally specify devices and device type, if unset | ||
## a scan (smartctl --scan) for S.M.A.R.T. devices will | ||
## done and all found will be included except for the | ||
## excluded in excludes. | ||
# devices = [ "/dev/ada0 -d atacam" ] | ||
``` | ||
|
||
To run `smartctl` with `sudo` create a wrapper script and use `path` in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't we make this into a config file bool? Here's my wrapper so far, yields
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your wrapper has to pass all arguments, so: #!/usr/bin/env bash
sudo /usr/sbin/smartctl $@ Exit code 1 means command line pars failed for
If the maintainers like to have that I can add it but IMHO it's unnecessary when you have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be a nice touch to have sudo support, it's just a little more convenient. You can add a use_sudo field like we did in [fail2ban(https://github.com/influxdata/telegraf/blob/ca9cec2c84e7c8796c2e8a747d17d1ad86ce1ae6/plugins/inputs/fail2ban/README.md#configuration), or it might be more readable and extensible to have something like ansible: become_method = "sudo" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added |
||
the configuration to execute that. | ||
|
||
## Output | ||
|
||
Example output from an _Apple SSD_: | ||
``` | ||
> smart_attribute,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=199,name=UDMA_CRC_Error_Count,flags=-O-RC-,fail=-,host=mbpro.local,device=/dev/rdisk0 threshold=0i,raw_value=0i,exit_status=0i,value=200i,worst=200i 1502536854000000000 | ||
> smart_attribute,device=/dev/rdisk0,serial_no=S1K5NYCD964433,wwn=5002538655584d30,id=240,name=Unknown_SSD_Attribute,flags=-O---K,fail=-,host=mbpro.local exit_status=0i,value=100i,worst=100i,threshold=0i,raw_value=0i 1502536854000000000 | ||
> smart_device,enabled=Enabled,host=mbpro.local,device=/dev/rdisk0,model=APPLE\ SSD\ SM0512F,serial_no=S1K5NYCD964433,wwn=5002538655584d30,capacity=500277790720 udma_crc_errors=0i,exit_status=0i,health_ok=true,read_error_rate=0i,temp_c=40i 1502536854000000000 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not generating any output for the attribute checker on RHEL7 using smartctl 6.2. See below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the most important thing to figure out. Did the format change or does this drive just not have any attributes?