Skip to content

Commit

Permalink
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
Browse files Browse the repository at this point in the history
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
  • Loading branch information
volodymyrsamotiy authored Jan 28, 2024
1 parent 6a38584 commit f1d6655
Show file tree
Hide file tree
Showing 14 changed files with 14 additions and 3 deletions.
2 changes: 1 addition & 1 deletion device/mellanox/x86_64-mlnx_msn2010-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax acpi=noirq"
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax acpi=noirq libata.force=noncq"
2 changes: 1 addition & 1 deletion device/mellanox/x86_64-mlnx_msn2700-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax acpi=noirq"
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax acpi=noirq libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn2700a1-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn3420-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn3700-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn3700c-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn3800-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn4410-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn4600-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn4600c-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-mlnx_msn4700-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
2 changes: 1 addition & 1 deletion device/mellanox/x86_64-nvidia_sn2201-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax"
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="acpi_enforce_resources=lax libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-nvidia_sn4800-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"
1 change: 1 addition & 0 deletions device/mellanox/x86_64-nvidia_sn5600-r0/installer.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ONIE_PLATFORM_EXTRA_CMDLINE_LINUX="libata.force=noncq"

0 comments on commit f1d6655

Please sign in to comment.