Skip to content

Commit

Permalink
Fallback to smartctl temp if hddtemp not available
Browse files Browse the repository at this point in the history
  • Loading branch information
reefland committed Jul 30, 2022
1 parent 0e8da5f commit 888ef40
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 30 deletions.
53 changes: 31 additions & 22 deletions 36-diskstatus
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
#
# DESCRIPTION: Report status of each drive installed, incuding temperature as
# reported by HDDtemp and drive testing status from smartctl.
# if HDDtemp not available or no temp found the script will
# fallback to smartctl scrapes for temperature information
#
# Originally Based on: https://github.com/yboetz/motd
#
# AUTHOR : Richard J. DURSO
# DATE : 07/29/2022
# DATE : 07/30/2022
# VERSION : 1.8.0
##############################################################################

Expand Down Expand Up @@ -97,6 +99,7 @@ if [[ ! -z $fetch_hddtemp ]]; then
fetch_hddtemp=$(echo -n | nc $hddtemp_host $hddtemp_port |sed 's/|//m' | sed 's/||/ \n/g')
done
else
# This value is not tested for, might be handy in debugging
hddtemp="none"
fi

Expand Down Expand Up @@ -125,42 +128,48 @@ for i in "${!disksalias[@]}"; do #for every /dev/sdX device name
break
fi

# Uncomment this to discover other reasons for no test results
# If this show nothing, then data has rotated out of where "$logfiles" is checking.
# tac $logfiles 2>/dev/null | grep -m 1 -HiP "${name}.*self-test"
# Uncomment this to discover other reasons for no test results
# If this show nothing, then data has rotated out of where "$logfiles" is checking.
# tac $logfiles 2>/dev/null | grep -m 1 -HiP "${name}.*self-test"

# Still no result, device not being monitored? See if smartctl has a status we can show
result=$(smartctl -a /dev/${disksalias[$i]} | awk '/^SMART.*result:/{print $(NF)}')
fi
# Still no result, device not being monitored? See if smartctl has a status we can show
result=$(smartctl -a /dev/${disksalias[$i]} | awk '/^SMART.*result:/{print $(NF)}')
fi
done

# See if no hddtemp data
if [ "$hddtemp" == "none" ]; then
# See if NVMe device
if [ $(echo ${disksalias[$i]} | grep -ci "nvme") -eq 1 ]; then
# Assume something like: "Temperature: 37 Celsius" is returned
devicetmp=$(smartctl -A /dev/${disksalias[$i]} | awk '/^Temperature:/{print $2, $3}')
temp=$(echo $devicetmp | awk '{print $1}')
unit=$(echo $devicetmp | awk '{print substr($2,1,1)}') #Only need 1st character
else
# if NVMe device get temp and unit from smartctl
if [ $(echo ${disksalias[$i]} | grep -ci "nvme") -eq 1 ]; then
# Assume something like: "Temperature: 37 Celsius" is returned
devicetmp=$(smartctl -A /dev/${disksalias[$i]} | awk '/^Temperature:/{print $2, $3}')
temp=$(echo $devicetmp | awk '{print $1}')
unit=$(echo $devicetmp | awk '{print substr($2,1,1)}') #Only need 1st character

# Not nvme see if we have hddtemp data
else
# Get Temperature and Unit
temp=$( (grep "${disksalias[$i]}" <<< "${hddtemp}") | cut -d "|" -f3)
# Get The Unit from the temperature
unit=$( (grep "${disksalias[$i]}" <<< "${hddtemp}") | cut -d "|" -f4)
unit=${unit% } # Trim trailing space if it has it

# If no temp data collected from hddtemp, fallback to smartctl
if [ -z $temp ]; then
# Assume something like:
# "194 Temperature_Celsius 0x0022 025 055 000 Old_age Always - 25 (Min/Max 22/44)"
# "190 Airflow_Temperature_Cel 0x0032 070 056 000 Old_age Always - 30"
temp=$(smartctl -A /dev/${disksalias[$i]} | awk '/.*Temperature_Cel.*/{print $10}')
unit="C"
fi
fi

# if we have a temp see if we need to convert to F
if [ ! -z $temp ]; then
# See if we need to convert C to F
if $($convert_c_to_f) && ([ $unit == "C" ] || [ $unit == "c"])
then
temp=$(expr 9 '*' $temp / 5 + 32)
unit="F"
fi
else
# Get Temperature and Unit
temp=$( (grep "${disksalias[$i]}" <<< "${hddtemp}") | cut -d "|" -f3)
# Get The Unit from the temperature
unit=$( (grep "${disksalias[$i]}" <<< "${hddtemp}") | cut -d "|" -f4)
unit=${unit% } # Trim trailing space if it has it
fi

# Determine if MAX_TEMP is based on C or F
Expand Down
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Collection of 'Message of the Day' scripts with ZFS Enhancements

* [update-motd](https://launchpad.net/update-motd)
* [figlet](http://www.figlet.org/) & [lolcat](https://github.com/busyloop/lolcat) (for `10-hostname`)
* [hddtemp](https://savannah.nongnu.org/projects/hddtemp/) (for `36-diskstatus`)
* [hddtemp](https://savannah.nongnu.org/projects/hddtemp/) (for `36-diskstatus`) [optional]
* [smartmontools](https://www.smartmontools.org/) (for `36-diskstatus`)

### How do I set it up?
Expand All @@ -26,7 +26,16 @@ so that the logs are not compressed and can be read by `grep`.
![screen_shot](screen_shot.png)

---
The HDDTemp project is falling behind. It lacks database entries for many not so new technologies. It should be straight forward to add sensors for SATA SSD devices, but it lacks any NVMe support (see below for workaround).

## HDDTemp not Required

The `hddtemp` utility was once the primary way to monitor and gather drive temperature information. However HDDTemp project is considered dead and no longer maintained. It is no longer included in many distribution repositories. If you do not have HDDTemp this script will fallback to scraping `smartctl` for temperature information.

---

## Adding Sensors to HDDTemp

Since HDDTemp project is no longer maintained it lacks database entries for many not so new technologies. It should be straight forward to add sensors for SATA SSD devices, but it lacks any NVMe support.

If `hddtemp` is unable to locate a temperature sensor but `smartctl` shows a sensor attribute exists, it can be added:

Expand Down Expand Up @@ -106,12 +115,7 @@ Should you have many devices and one reports `FAILED` having part of the serial

### NVMe Device Temperature

The HDD Temp utility does not support NVMe devices. If `36-diskstatus` script detects a NVMe device, it will try to get the temperature from `smartctl` by looking for `Temperature:` and parsing the value such as `37 Celsius`.

![NVME Shows Temp Status](nvme_status_untested.png)

* If you see `untested` (or `PASSED`) that indicates no previous test results could be parsed from log files. Review `/etc/smartd.conf` file to see if its part of the testing schedule.
* If the script is unable to located any previous test results then it will return whatever the current status from `smartctl` value of `SMART overall-health self-assessment test result:` is such as `PASSED`.
The HDDTemp utility does not support NVMe devices. If `36-diskstatus` script detects a NVMe device, it will try to get the temperature from `smartctl` by looking for `Temperature:` and parsing the value such as `38 Celsius`.

![NVMe Test Status](nvme_status_passed.png)

Expand Down

0 comments on commit 888ef40

Please sign in to comment.