Skip to content

Commit

Permalink
Merge branch 'mlxsw-core-thermal-control-fixes'
Browse files Browse the repository at this point in the history
Ido Schimmel says:

====================
mlxsw: core: Thermal control fixes

This series includes two fixes for thermal control in mlxsw.

Patch #1 validates that the alarm temperature threshold read from a
transceiver is above the warning temperature threshold. If not, the
current thresholds are maintained. It was observed that some transceiver
might be unreliable and sometimes report a too low alarm temperature
threshold which would result in thermal shutdown of the system.

Patch #2 increases the temperature threshold above which thermal
shutdown is triggered for the ASIC thermal zone. It is currently too low
and might result in thermal shutdown under perfectly fine operational
conditions.
====================

Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
kuba-moo committed Jan 10, 2021
2 parents b774134 + b06ca3d commit 26c49f0
Showing 1 changed file with 8 additions and 5 deletions.
13 changes: 8 additions & 5 deletions drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
#define MLXSW_THERMAL_ASIC_TEMP_NORM 75000 /* 75C */
#define MLXSW_THERMAL_ASIC_TEMP_HIGH 85000 /* 85C */
#define MLXSW_THERMAL_ASIC_TEMP_HOT 105000 /* 105C */
#define MLXSW_THERMAL_ASIC_TEMP_CRIT 110000 /* 110C */
#define MLXSW_THERMAL_ASIC_TEMP_CRIT 140000 /* 140C */
#define MLXSW_THERMAL_HYSTERESIS_TEMP 5000 /* 5C */
#define MLXSW_THERMAL_MODULE_TEMP_SHIFT (MLXSW_THERMAL_HYSTERESIS_TEMP * 2)
#define MLXSW_THERMAL_ZONE_MAX_NAME 16
Expand Down Expand Up @@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
if (err)
return err;

if (crit_temp > emerg_temp) {
dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n",
tz->tzdev->type, crit_temp, emerg_temp);
return 0;
}

/* According to the system thermal requirements, the thermal zones are
* defined with four trip points. The critical and emergency
* temperature thresholds, provided by QSFP module are set as "active"
Expand All @@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp;
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp;
if (emerg_temp > crit_temp)
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
MLXSW_THERMAL_MODULE_TEMP_SHIFT;
else
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp;

return 0;
}
Expand Down

0 comments on commit 26c49f0

Please sign in to comment.