Skip to content

Commit

Permalink
ethtool: Add ability to control transceiver modules' power mode
Browse files Browse the repository at this point in the history
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and
'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver
modules parameters and retrieve their status.

The first parameter to control is the power mode of the module. It is
only relevant for paged memory modules, as flat memory modules always
operate in low power mode.

When a paged memory module is in low power mode, its power consumption
is reduced to the minimum, the management interface towards the host is
available and the data path is deactivated.

User space can choose to put modules that are not currently in use in
low power mode and transition them to high power mode before putting the
associated ports administratively up. This is useful for user space that
favors reduced power consumption and lower temperatures over reduced
link up times. In QSFP-DD modules the transition from low power mode to
high power mode can take a few seconds and this transition is only
expected to get longer with future / more complex modules.

User space can control the power mode of the module via the power mode
policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible
values:

* high: Module is always in high power mode.

* auto: Module is transitioned by the host to high power mode when the
  first port using it is put administratively up and to low power mode
  when the last port using it is put administratively down.

The operational power mode of the module is available to user space via
the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not
reported to user space when a module is not plugged-in.

The user API is designed to be generic enough so that it could be used
for modules with different memory maps (e.g., SFF-8636, CMIS).

The only implementation of the device driver API in this series is for a
MAC driver (mlxsw) where the module is controlled by the device's
firmware, but it is designed to be generic enough so that it could also
be used by implementations where the module is controlled by the CPU.

CMIS testing
============

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

The module is not in low power mode, as it is not forced by hardware
(LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off).

The power mode can be queried from the kernel. In case
LowPwrAllowRequestHW was on, the kernel would need to take into account
the state of the LowPwrRequestHW signal, which is not visible to user
space.

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'auto':

 # ethtool --set-module swp11 power-mode-policy auto

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp11 up

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x03 (ModuleReady)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : Off

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp11 down

Query the power mode again:

 $ ethtool --show-module swp11
 Module parameters for swp11:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp11
 Identifier                                : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628))
 ...
 Module State                              : 0x01 (ModuleLowPwr)
 LowPwrAllowRequestHW                      : Off
 LowPwrRequestSW                           : On

SFF-8636 testing
================

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7733 mW / -1.12 dBm
 Transmit avg optical power (Channel 2)    : 0.7649 mW / -1.16 dBm
 Transmit avg optical power (Channel 3)    : 0.7790 mW / -1.08 dBm
 Transmit avg optical power (Channel 4)    : 0.7837 mW / -1.06 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9302 mW / -0.31 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9079 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.8993 mW / -0.46 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8778 mW / -0.57 dBm

The module is not in low power mode, as it is not forced by hardware
(Power override is on) or by software (Power set is off).

The power mode can be queried from the kernel. In case Power override
was off, the kernel would need to take into account the state of the
LPMode signal, which is not visible to user space.

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy high
 power-mode high

Change the power mode policy to 'auto':

 # ethtool --set-module swp13 power-mode-policy auto

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Put the associated port administratively up which will instruct the host
to transition the module to high power mode:

 # ip link set dev swp13 up

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode high

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) enabled
 Power set                                 : Off
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.7934 mW / -1.01 dBm
 Transmit avg optical power (Channel 2)    : 0.7859 mW / -1.05 dBm
 Transmit avg optical power (Channel 3)    : 0.7885 mW / -1.03 dBm
 Transmit avg optical power (Channel 4)    : 0.7985 mW / -0.98 dBm
 Rcvr signal avg optical power(Channel 1)  : 0.9325 mW / -0.30 dBm
 Rcvr signal avg optical power(Channel 2)  : 0.9034 mW / -0.44 dBm
 Rcvr signal avg optical power(Channel 3)  : 0.9086 mW / -0.42 dBm
 Rcvr signal avg optical power(Channel 4)  : 0.8885 mW / -0.51 dBm

Put the associated port administratively down which will instruct the
host to transition the module to low power mode:

 # ip link set dev swp13 down

Query the power mode again:

 $ ethtool --show-module swp13
 Module parameters for swp13:
 power-mode-policy auto
 power-mode low

Verify with the data read from the EEPROM:

 # ethtool -m swp13
 Identifier                                : 0x11 (QSFP28)
 ...
 Extended identifier description           : 5.0W max. Power consumption,  High Power Class (> 3.5 W) not enabled
 Power set                                 : On
 Power override                            : On
 ...
 Transmit avg optical power (Channel 1)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 2)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 3)    : 0.0000 mW / -inf dBm
 Transmit avg optical power (Channel 4)    : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 1)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 2)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 3)  : 0.0000 mW / -inf dBm
 Rcvr signal avg optical power(Channel 4)  : 0.0000 mW / -inf dBm

Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
idosch authored and kuba-moo committed Oct 7, 2021
1 parent 9cbfc51 commit 353407d
Show file tree
Hide file tree
Showing 8 changed files with 335 additions and 3 deletions.
71 changes: 69 additions & 2 deletions Documentation/networking/ethtool-netlink.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,11 @@ In the message structure descriptions below, if an attribute name is suffixed
with "+", parent nest can contain multiple attributes of the same type. This
implements an array of entries.

Attributes that need to be filled-in by device drivers and that are dumped to
user space based on whether they are valid or not should not use zero as a
valid value. This avoids the need to explicitly signal the validity of the
attribute in the device driver API.


Request header
==============
Expand Down Expand Up @@ -179,7 +184,7 @@ according to message purpose:

Userspace to kernel:

===================================== ================================
===================================== =================================
``ETHTOOL_MSG_STRSET_GET`` get string set
``ETHTOOL_MSG_LINKINFO_GET`` get link settings
``ETHTOOL_MSG_LINKINFO_SET`` set link settings
Expand Down Expand Up @@ -213,7 +218,9 @@ Userspace to kernel:
``ETHTOOL_MSG_MODULE_EEPROM_GET`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET`` get standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET`` get PHC virtual clocks info
===================================== ================================
``ETHTOOL_MSG_MODULE_SET`` set transceiver module parameters
``ETHTOOL_MSG_MODULE_GET`` get transceiver module parameters
===================================== =================================

Kernel to userspace:

Expand Down Expand Up @@ -252,6 +259,7 @@ Kernel to userspace:
``ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY`` read SFP module EEPROM
``ETHTOOL_MSG_STATS_GET_REPLY`` standard statistics
``ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY`` PHC virtual clocks info
``ETHTOOL_MSG_MODULE_GET_REPLY`` transceiver module parameters
======================================== =================================

``GET`` requests are sent by userspace applications to retrieve device
Expand Down Expand Up @@ -1521,6 +1529,63 @@ Kernel response contents:
``ETHTOOL_A_PHC_VCLOCKS_INDEX`` s32 PHC index array
==================================== ====== ==========================

MODULE_GET
==========

Gets transceiver module parameters.

Request contents:

===================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested request header
===================================== ====== ==========================

Kernel response contents:

====================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested reply header
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
``ETHTOOL_A_MODULE_POWER_MODE`` u8 operational power mode
====================================== ====== ==========================

The optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute encodes the
transceiver module power mode policy enforced by the host. The default policy
is driver-dependent, but "auto" is the recommended default and it should be
implemented by new drivers and drivers where conformance to a legacy behavior
is not critical.

The optional ``ETHTHOOL_A_MODULE_POWER_MODE`` attribute encodes the operational
power mode policy of the transceiver module. It is only reported when a module
is plugged-in. Possible values are:

.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_module_power_mode

MODULE_SET
==========

Sets transceiver module parameters.

Request contents:

====================================== ====== ==========================
``ETHTOOL_A_MODULE_HEADER`` nested request header
``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` u8 power mode policy
====================================== ====== ==========================

When set, the optional ``ETHTOOL_A_MODULE_POWER_MODE_POLICY`` attribute is used
to set the transceiver module power policy enforced by the host. Possible
values are:

.. kernel-doc:: include/uapi/linux/ethtool.h
:identifiers: ethtool_module_power_mode_policy

For SFF-8636 modules, low power mode is forced by the host according to table
6-10 in revision 2.10a of the specification.

For CMIS modules, low power mode is forced by the host according to table 6-12
in revision 5.0 of the specification.

Request translation
===================

Expand Down Expand Up @@ -1620,4 +1685,6 @@ are netlink only.
n/a ``ETHTOOL_MSG_CABLE_TEST_TDR_ACT``
n/a ``ETHTOOL_MSG_TUNNEL_INFO_GET``
n/a ``ETHTOOL_MSG_PHC_VCLOCKS_GET``
n/a ``ETHTOOL_MSG_MODULE_GET``
n/a ``ETHTOOL_MSG_MODULE_SET``
=================================== =====================================
22 changes: 22 additions & 0 deletions include/linux/ethtool.h
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,17 @@ struct ethtool_module_eeprom {
u8 *data;
};

/**
* struct ethtool_module_power_mode_params - module power mode parameters
* @policy: The power mode policy enforced by the host for the plug-in module.
* @mode: The operational power mode of the plug-in module. Should be filled by
* device drivers on get operations.
*/
struct ethtool_module_power_mode_params {
enum ethtool_module_power_mode_policy policy;
enum ethtool_module_power_mode mode;
};

/**
* struct ethtool_ops - optional netdev operations
* @cap_link_lanes_supported: indicates if the driver supports lanes
Expand Down Expand Up @@ -580,6 +591,11 @@ struct ethtool_module_eeprom {
* @get_eth_ctrl_stats: Query some of the IEEE 802.3 MAC Ctrl statistics.
* @get_rmon_stats: Query some of the RMON (RFC 2819) statistics.
* Set %ranges to a pointer to zero-terminated array of byte ranges.
* @get_module_power_mode: Get the power mode policy for the plug-in module
* used by the network device and its operational power mode, if
* plugged-in.
* @set_module_power_mode: Set the power mode policy for the plug-in module
* used by the network device.
*
* All operations are optional (i.e. the function pointer may be set
* to %NULL) and callers must take this into account. Callers must
Expand Down Expand Up @@ -705,6 +721,12 @@ struct ethtool_ops {
void (*get_rmon_stats)(struct net_device *dev,
struct ethtool_rmon_stats *rmon_stats,
const struct ethtool_rmon_hist_range **ranges);
int (*get_module_power_mode)(struct net_device *dev,
struct ethtool_module_power_mode_params *params,
struct netlink_ext_ack *extack);
int (*set_module_power_mode)(struct net_device *dev,
const struct ethtool_module_power_mode_params *params,
struct netlink_ext_ack *extack);
};

int ethtool_check_ops(const struct ethtool_ops *ops);
Expand Down
23 changes: 23 additions & 0 deletions include/uapi/linux/ethtool.h
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,29 @@ enum ethtool_stringset {
ETH_SS_COUNT
};

/**
* enum ethtool_module_power_mode_policy - plug-in module power mode policy
* @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
* @ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO: Module is transitioned by the host
* to high power mode when the first port using it is put administratively
* up and to low power mode when the last port using it is put
* administratively down.
*/
enum ethtool_module_power_mode_policy {
ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH = 1,
ETHTOOL_MODULE_POWER_MODE_POLICY_AUTO,
};

/**
* enum ethtool_module_power_mode - plug-in module power mode
* @ETHTOOL_MODULE_POWER_MODE_LOW: Module is in low power mode.
* @ETHTOOL_MODULE_POWER_MODE_HIGH: Module is in high power mode.
*/
enum ethtool_module_power_mode {
ETHTOOL_MODULE_POWER_MODE_LOW = 1,
ETHTOOL_MODULE_POWER_MODE_HIGH,
};

/**
* struct ethtool_gstrings - string set for data tagging
* @cmd: Command number = %ETHTOOL_GSTRINGS
Expand Down
17 changes: 17 additions & 0 deletions include/uapi/linux/ethtool_netlink.h
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ enum {
ETHTOOL_MSG_MODULE_EEPROM_GET,
ETHTOOL_MSG_STATS_GET,
ETHTOOL_MSG_PHC_VCLOCKS_GET,
ETHTOOL_MSG_MODULE_GET,
ETHTOOL_MSG_MODULE_SET,

/* add new constants above here */
__ETHTOOL_MSG_USER_CNT,
Expand Down Expand Up @@ -90,6 +92,8 @@ enum {
ETHTOOL_MSG_MODULE_EEPROM_GET_REPLY,
ETHTOOL_MSG_STATS_GET_REPLY,
ETHTOOL_MSG_PHC_VCLOCKS_GET_REPLY,
ETHTOOL_MSG_MODULE_GET_REPLY,
ETHTOOL_MSG_MODULE_NTF,

/* add new constants above here */
__ETHTOOL_MSG_KERNEL_CNT,
Expand Down Expand Up @@ -833,6 +837,19 @@ enum {
ETHTOOL_A_STATS_RMON_MAX = (__ETHTOOL_A_STATS_RMON_CNT - 1)
};

/* MODULE */

enum {
ETHTOOL_A_MODULE_UNSPEC,
ETHTOOL_A_MODULE_HEADER, /* nest - _A_HEADER_* */
ETHTOOL_A_MODULE_POWER_MODE_POLICY, /* u8 */
ETHTOOL_A_MODULE_POWER_MODE, /* u8 */

/* add new constants above here */
__ETHTOOL_A_MODULE_CNT,
ETHTOOL_A_MODULE_MAX = (__ETHTOOL_A_MODULE_CNT - 1)
};

/* generic netlink info */
#define ETHTOOL_GENL_NAME "ethtool"
#define ETHTOOL_GENL_VERSION 1
Expand Down
2 changes: 1 addition & 1 deletion net/ethtool/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ obj-$(CONFIG_ETHTOOL_NETLINK) += ethtool_nl.o
ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o \
linkstate.o debug.o wol.o features.o privflags.o rings.o \
channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o module.o
Loading

0 comments on commit 353407d

Please sign in to comment.