-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add SSD Health feature design (#378)
* Add SSD Health feature design Signed-off-by: Andriy Moroz <[email protected]> * Fix base class path Signed-off-by: Andriy Moroz <[email protected]>
- Loading branch information
1 parent
0c262e8
commit 03b39a8
Showing
1 changed file
with
155 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
## Motivation | ||
Add to SONiC an ability to check storage health state. Basic functionality will be implemented as a CLI command. Optionally pmon daemon could be added for constant disk state monitoring. | ||
|
||
## CLI | ||
|
||
### Syntax | ||
show platform ssdhealth [verbose/vendor] | ||
|
||
### Output example | ||
#### Brief | ||
admin@sonic-switch: ~$ show platform ssdhealth | ||
Device Model : InnoDisk Corp. - mSATA 3ME | ||
Health: 72.9% | ||
Temperature: N/A | ||
admin@sonic-switch: ~$ | ||
|
||
#### Verbose | ||
admin@sonic-switch: ~$ show platform ssdhealth verbose | ||
Device Model : InnoDisk Corp. - mSATA 3ME | ||
FW Version : S140714 | ||
Serial Number : 20160429AA1134000035 | ||
Health : 72.9% | ||
Capacity : 29.818199 GB | ||
Temperature : N/A | ||
Power On Hours : 1576 hours | ||
Power Cycle count: 130 | ||
Something else??? | ||
|
||
#### Vendor | ||
admin@sonic-switch: ~$ show platform ssdhealth vendor | ||
|
||
******************************************************************************************** | ||
* Innodisk iSMART V3.9.41 2018/05/25 * | ||
******************************************************************************************** | ||
Model Name: InnoDisk Corp. - mSATA 3ME | ||
FW Version: S140714 | ||
Serial Number: 20160429AA1134000035 | ||
Health: 72.900% | ||
Capacity: 29.818199 GB | ||
P/E Cycle: 3000 | ||
Lifespan : 1576 (Years : 4 Months : 3 Days : 26) | ||
Write Protect: Disable | ||
InnoRobust: Enable | ||
-------------------------------------------------------------------------------------------- | ||
ID SMART Attributes Value Raw Value | ||
-------------------------------------------------------------------------------------------- | ||
[09] Power On Hours [18304] [090200646480470000000000] | ||
[0C] Power Cycle Count [ 130] [0C0200646482000000000000] | ||
[AA] Total Bad Block Count [ 15] [AA0300646400000F00000000] | ||
[AD] Erase Count Max. [ 883] [AD020064642D037303000000] | ||
[AD] Erase Count Avg. [ 813] [AD020064642D037303000000] | ||
[C2] Temperature [ 0] [000000000000000000000000] | ||
[EB] Later Bad Block [ 0] [EB0200640000000000000000] | ||
[EB] Read Block [ 0] [EB0200640000000000000000] | ||
[EB] Write Block [ 0] [EB0200640000000000000000] | ||
[EB] Erase Block [ 0] [EB0200640000000000000000] | ||
[EC] Unstable Power Count [ 0] [EC0200646400000000000000] | ||
admin@sonic-switch: ~$ | ||
|
||
## Implementation | ||
### Generic part | ||
#### 'show' utility update | ||
New item under menu `platform` in `show/main.py` | ||
It will execute "ssdhealth -d /dev/sdX" [options] | ||
|
||
#### ssdhealth utility | ||
New utility in `sonic-utilities/scripts/` | ||
It will import device plugin `ssdutil.py` and print the output returned by different API functions | ||
|
||
**Syntax:** | ||
|
||
root@mts-sonic-dut:/home/admin# ssdhealth -h | ||
usage: ssdhealth -d DEVICE [-h] [-v] [-e] | ||
|
||
Show disk device health status | ||
|
||
optional arguments: | ||
-h, --help show this help message and exit | ||
-d, --device disk device to get information for | ||
-v, --verbose show verbose output (more parameters) | ||
-e, --vEndor show vendor specific disk information | ||
|
||
Examples: | ||
ssdhealth -d /dev/sda | ||
ssdhealth -d /dev/sda -v | ||
ssdhealth -d /dev/sda -e | ||
|
||
|
||
#### Plugins design | ||
##### Class SsdBase | ||
Location: `sonic-buildimage/src/sonic-platform-common/sonic_platform_base/sonic_ssd/ssd_base.py` | ||
Generic implementation of the API. Will use specific utilities for known disks or the `systemctl` utility for others. Since not all disk models are in smartctl's database, some information can be unavailable or incomplete. | ||
|
||
class SsdBase: | ||
... | ||
|
||
##### Class SsdUtil | ||
Inherited from SsdBase. Can be implemented by vendors to provide detailed info about the disk installed. | ||
Location: `sonic-buildimage/device/{{vendor}}/platform/plugins/ssdutil.py` | ||
|
||
class SsdUtil(SsdBase): | ||
... | ||
|
||
#### API | ||
* **get\_disk\_health(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:float - Floating point in range 0-100 representing disk health in percentages. -1 if not available | ||
* **get\_temperature(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:string - Integer (floating point?) disk temperature in centigrade. Zero if not available | ||
* **get\_model(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:string - Human readable string holding disk model. Empty if not available | ||
* **get\_firmware(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:string - Human readable string holding disk firmware version. Empty if not available | ||
* **get\_serial(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:string - Human readable string holding disk serial number. Empty if not available | ||
* **get\_vendor_output(diskdev)** | ||
* Accepts: | ||
* diskdev:string - disk device name (e.g. /dev/sda) | ||
* Returns: | ||
* res:string - Human readable string. Output of vendor application. Empty if not available | ||
|
||
## Utilities and packages | ||
#### smartctl | ||
Part of smartmontools package (1.9M) | ||
PR: [https://github.com/Azure/sonic-buildimage/pull/2703](https://github.com/Azure/sonic-buildimage/pull/2703) | ||
|
||
#### iSmart | ||
Utility for InnoDisk Corp. SSDs (<120K) | ||
https://www.innodisk.com/en/iService/utility/iSMART | ||
Need to be added as binary. | ||
|
||
#### SmartCmd | ||
Utility for StorFly and Virtium (2.2M) | ||
|
||
## (Optional) Daemon for monitoring | ||
Daemon in Pmon (ssdmond) which will periodically query disk health (get_health()) and raise alarm when value decides to some critical value. | ||
|
||
## Open questions | ||
1. Daemon and monitoring? | ||
2. SNMP needed? | ||
|