From 2cd2a277a5a1dda944cc8ee1469c412b90e03c32 Mon Sep 17 00:00:00 2001 From: Greg Paussa Date: Thu, 20 Aug 2020 08:10:23 -0700 Subject: [PATCH 1/3] Syslog message interface name translation HLD --- system/syslog-msg-intf-name-xlate-HLD.md | 620 +++++++++++++++++++++++ 1 file changed, 620 insertions(+) create mode 100755 system/syslog-msg-intf-name-xlate-HLD.md diff --git a/system/syslog-msg-intf-name-xlate-HLD.md b/system/syslog-msg-intf-name-xlate-HLD.md new file mode 100755 index 000000000000..7d595ea1dc5c --- /dev/null +++ b/system/syslog-msg-intf-name-xlate-HLD.md @@ -0,0 +1,620 @@ + +# Feature Name + +Syslog Message Interface Name Translation + +# High Level Design Document + +#### Rev 0.6 + +# Table of Contents + +* [List of Tables](#list-of-tables) + +* [Revision](#revision) + +* [About This Manual](#about-this-manual) + +* [Scope](#scope) + +* [Definition/Abbreviation](#definitionabbreviation) + +# List of Tables + +[Table 1: Abbreviations](#table-1-abbreviations) + +# Revision + +| Rev | Date | Author | Change Description | +| ----- | ---------- | ----------- | ------------------ | +| 0.1 | 07/13/2020 | Greg Paussa | Requirements and general design guidance | +| 0.2 | 07/14/2020 | Greg Paussa | Address initial review comments. Cannot assume fixed alias mapping for DPB, or restart is required for interface naming mode changes. | +| 0.3 | 07/17/2020 | Greg Paussa | Use SIGHUP instead of rsyslog restart for DPB config changes. | +| 0.4 | 07/23/2020 | Greg Paussa | Do not rely on PORT table for base port alias names. | +| 0.5 | 08/03/2020 | Greg Paussa | Handle non-base breakout port alias name variations. Special-case master port references in log messages. | +| 0.6 | 08/12/2020 | Greg Paussa | Use STATE_DB update to indicate DPB change instead of CONFIG_DB. | + +# About this Manual + +This document describes replacing native port names in syslog messages with their standard names. + +# Scope + +This document captures syslog message interface name translation requirements and provides design overview. + +# Definition/Abbreviation + +### Table 1: Abbreviations + +| Term | Meaning | +| ---- | ---- | +| alias | User-friendly name reference to an Ethernet port. | +| CONFIG_DB | SONiC configuration database in Redis. | +| DPB | Dynamic port breakout. | +| master port | Base port referenced by DPB from which ports are broken out. | +| native mode | SONiC naming convention for Ethernet ports, such as Ethernet0. | +| standard mode | Industry standard naming convention for Ethernet ports, such as Eth1/1/1. | +| STATE_DB | SONiC state database in Redis. | + +# 1 Feature Overview + +Log messages in SONiC originate from a variety of internal sources, such as Docker containers, and are written primarily to /var/log/syslog and/or possibly some other specialized .log files. The specific message format is defined by SONiC and is made uniform for all log messages. The body of each log message may contain reference(s) to an interface name designated by a user-friendly string, such as "Ethernet0", which is called its alias. There are different types +of interfaces, but this document is only concerned with physical port names, which are defined for each platform as "EthernetXXX". + +The Interface-Naming Feature HLD describes the impetus for this syslog naming modification, primarily by allowing the user to configure the switch in either 'native' or 'standard' interface naming mode. Native mode interface names are Ethernet0, Ethernet8 etc., whereas standard names are Eth1/1, Eth1/3/1, etc. The standard interface naming mode is used by the SONiC management framework for Klish; it is not applicable to Click. The standard name alias value assignments are device-specific, as defined by its platform.json file. The current user interface naming mode resides in the CONFIG_DB DEVICE_METADATA|localhost table as the intf-naming-mode attribute (or absence thereof). Subinterface designations (e.g. Ethernet0.5, Eth1/1.5) may also be used regardless of naming mode. + +The SONiC dynamic port breakout (DPB) feature offers the user a means of splitting a single physical port into multiple lower speed ports. Details of DPB are not relevant here, but the fact that this can occur while the system is operational without requiring a config reload or system reboot does influence the design requirements herein. Note that a base port (i.e. one that can be split out into multiple ports) may use a different standard alias name for breakout vs. non-breakout mode. Also note that the standard alias name used for non-base breakout ports varies according to the configured breakout mode (e.g. 2x, 4x, 8x). + +A high-level view of the operation: + +1. All syslog messages are generated, formatted and propagated to the localhost Debian environment on the device, same as before. + +2. Any interface name translation of the syslog message body occurs within the localhost rsyslogd process via its lookup table facility. + +3. The localhost rsyslogd writes the modified message text to its log file(s) as usual. + +4. Relies on a custom-built translation table generated by the localhost rsyslog configuration script whenever the rsyslog-config service is started or restarted. + +5. The localhost rsyslogd is not state-aware of any context from which a syslog message is originated. + + +## 1.1 Requirements + +### High Level Requirements + +1. A user-initiated change in the interface naming mode configuration or dynamic port breakout mode does **not** require a config reload or system reboot to take effect. + - The relevant sonic-cli config command for interface naming mode is: + ```[no] interface-naming standard``` + - The relevant sonic-cli config command for port breakout is: + ```interface breakout port mode ``` + +2. Syslog message translation: + - Only occurs for physical interface types (e.g. "Ethernet0"). + - From native to standard: interface name translation shall only occur when the standard naming mode is active. + - From standard to native: interface names are NOT translated, regardless of the current interface naming mode setting. + - Includes syslog messages that are displayed on the console (see below). + - Includes syslog messages that are forwarded by the localhost rsyslog service to a remote syslog server. + +3. All standard interface name aliases follow the same format: **Eth\/\** or **Eth\/\/\**. + - The \ designation for base ports varies depending on whether the port is broken out or not. + - Subinterface designations are preserved during conversion from native to standard naming convention. + - For example, Ethernet0.5 --> Eth1/1.5 (or Eth1/1/1.5 in breakout mode). + +4. The mapping from native to standard interface names is device (i.e. platform) specific and therefore must be determined at runtime. + - Any inability to translate from native to standard interface name silently preserves the native name in the syslog message. + +5. There is no change in the ultimate log file destination(s) for syslog messages, whether interface name translation has occurred or not. + +6. Interface name translation relies on information present in the CONFIG_DB prior to (re)starting the rsyslog service. + - The 'platform', 'hwsku', and 'intf_naming_mode' configuration properties in the DEVICE_METADATA|localhost record. + - Absence of the 'intf_naming_mode' attribute in this CONFIG_DB table implies native mode is in effect. + - The 'brkout_mode' property in the BREAKOUT_CFG|Ethernet\ records. + +7. The rsyslog service cannot infer or otherwise determine the semantic intent of an interface name that appears in a syslog message. + - The rsyslog daemon performs a literal substitution of the standard alias for its native name while configured for standard interface naming mode, regardless of where it appears in the syslog message. + - Rsyslog is not aware of the context of the message sender, for example, whether it originated a message before or after processing DPB-related port events that modify a standard alias name. + +8. It is SUGGESTED to also translate non-SONiC native names in syslog messages, such as Linux device names "eth1" into standard naming convention as well. + - Only if it makes sense to do so and does not put undue burden on rsyslog. + - ***TBD: Deferred for future study.*** + + +### 1.1.1 Functional (detail) Requirements + +1. The interface name mapping between native and standard aliases is defined in a device-specific **platform.json** file. + - File is located in /usr/share/sonic/device/\/\/platform.json + - Where \ and \ are specified by the CONFIG_DB DEVICE_METADATA|localhost 'platform' and 'hwsku' attributes, respectively. + - This file contains a complete set of standard interface aliases, including all possible DPB breakout ports. + - Refer to the "alias_at_lanes" key for the complete set of DPB port standard aliases for a given base port. + +2. Handle variations in standard alias name assignments based on the breakout mode. + - The name format for a ***base*** port (e.g. Ethernet0, Ethernet4, ...) **varies** depending on whether it is currently broken out or not. + - For example, the standard alias for Ethernet0 may be Eth1/1/1 when broken out, but is Eth1/1 when not broken out. + - Use the first 'alias_at_lanes' entry with the [/\] designation removed for the non-breakout alias of a base port. + - The \ designation for a ***non-base*** port (e.g. Ethernet1, Ethernet2, or Ethernet3) **varies** depending on the current port breakout mode. + - For example, with klish, the standard alias for Ethernet2 may be Eth1/1/3 when broken out in 4x25G mode, but is Eth1/1/2 when broken out in 2x50G mode (since Ethernet1 is not used). + - The equivalent breakout in click reserves space in the alias naming such that Ethernet2 is always Eth1/1/3 regardless of whether it's being used in 4x25G or 2x50G breakout mode. However, standard alias naming is not officially supported in click. + +3. Allow for the possibility of syslog messages that contain multiple Ethernet\ references in the same message. + - Although uncommon, these are typically associated with DPB config changes. + - The multiple references are normally for a breakout port and its master port. + - No more than two unique Ethernet\ name translations shall occur per message. + - A master port reference is hereby defined as ```Ethernet [a, b, c, d]``` where \ represents the native port name and [a, b, c, d] is a reference to the lanes used in the breakout for that port. + - Some latitude is allowed regarding the spacing before the \[ ] part, and the contents within the \[ ] part. For example, both ```Ethernet48 [77, 78, 79, 80]``` and ```Ethernet48[77,78,79,80/77,78,79,80]``` are recognized as master port references. Here is the regex being used by rsyslog: ```Ethernet([[:digit:]]+) *[[][[:digit:],/ ]+[]]``` + - Interface name translation for a master port reference always uses its non-breakout alias name regardless of the current breakout status or mode for that port. + +4. The platform.json translation mapping must fill in any "gaps" in the native alias naming sequence. + - Do not assume an uninterrupted sequence of Ethernet0, Ethernet1,...Ethernet127, for example. + - Must process the platform.json section by section, starting each set of native keys with the name given. + - Fill in sequential entries for all possible breakout ports from the "alias_at_lanes" content (usually 1, 4 or 8 values each). + - Any gaps in the native name integer sequence must be filled with a dummy entry in the translation table containing a value of "none", e.g.: + ```{ "index": 65, "value": "none" }``` + +5. Processing of the platform.json file by rsyslog must occur whenever the rsyslog service in the localhost environment is started or restarted, or following a DPB config change. + - Do not rely on any other part of SONiC to create a proper rsyslog translation table. + - Do not rely on the CONFIG_DB PORT|Ethernet\* keys for obtaining the complete set of port keys for deriving the standard alias names. + - This table does not contain any ports not currently broken out. + - There could be a race condition between the time changes occur in the BREAKOUT_CFG table (front-end), the STATE_DB PORT_BREAKOUT table (back-end) and the time the CONFIG_DB PORT table 'alias' field is updated (back-end). + - Do not use the platform.json file directly as a lookup mechanism on a per-message basis (wrong format, too slow). + +6. The rsyslog-config service shall operate as a daemon, continuously monitoring for configuration changes that affect the interface name translation. + - Subscribe to changes in the CONFIG_DB DEVICE_METADATA|localhost table. + - Subscribe to changes in the STATE_DB PORT_BREAKOUT table. + +7. The rsyslog configuration requires the translation table to be constructed according to the following format (actual values may vary): + - The "index" is an integer value representing the N part of the "Ethernet\" native name. + - There MUST NOT be any integer index gaps in the table. If a gap is detected while processing the platform.json file, insert the missing index(es) in the table with a "value" of "none" (must be same value as specified for the "nomatch" entry). + - This permits an *array* table type to be used, which performs lookups in O(1) time instead of O(log(n)). + - The rsyslog translation table shall be stored in the JSON-formatted plain text file: **/var/lib/rsyslog/rsyslog_port_aliases.json**. + - Note: rsyslog is a new subdirectory of /var/lib in the fsroot. + - In the following example, Ethernet0 is not currently broken out, Ethernet4 is broken out in 4x mode, and Ethernet8 is broken out in 2x mode. + - Even though Ethernet0 is not broken out, Ethernet1-3 still refer to their full breakout alias names in case any syslog messages appear with them in it. There is no naming ambiguity during translation in this case. +``` +{"version": 1, + "nomatch": "none", + "type": "array", + "table": [ + {"index": 0, "value": "Eth1/1"}, + {"index": 1, "value": "Eth1/1/2"}, + {"index": 2, "value": "Eth1/1/3"}, + {"index": 3, "value": "Eth1/1/4"}, + {"index": 4, "value": "Eth1/2/1"}, + {"index": 5, "value": "Eth1/2/2"}, + {"index": 6, "value": "Eth1/2/3"}, + {"index": 7, "value": "Eth1/2/4"}, + {"index": 8, "value": "Eth1/3/1"}, + {"index": 9, "value": "none"}, + {"index": 10, "value": "Eth1/3/2"}, + {"index": 11, "value": "none"}, + <> + {"index": 127, "value": "Eth1/32/4"}, + {"index": 128, "value": "Eth1/33"}, + {"index": 129, "value": "Eth1/34"} + ] +} +``` + +8. To facilitate the special case translation of an explicit master port reference in a syslog message , a secondary rsyslog translation table **/var/lib/rsyslog/rsyslog_baseport_aliases.json** is used. + - This table is indexed by small string keys, with each base port represented by two keys, one with and one without a left square bracket: + - **"n["**: The alias name to use for a *master* port reference for Ethernet\. + - **"n"**: The alias name to use for a *non-master* port reference for Ethernet\. + - This secondary table is only used for translations that involve a master port reference in a syslog message. + - If the same Ethernet\ base port name appears multiple times in such a message, the aliases contained in the secondary table are used to translate each occurrence. + - As an example, the following portion of a syslog message is translated as follows: + - From: ```Ethernet48 from Ethernet48 [77,78,79,80]``` + - To: ```Eth1/48/1 from Eth1/48 [77,78,79,80]``` + - Here is a partial representation of the secondary rsyslog translation table: +``` +{"version": 1, + "nomatch": "none", + "type": "string", + "table": [ + {"index": "0[", "value": "Eth1/1"}, + {"index": "0", "value": "Eth1/1/1"}, + {"index": "4[", "value": "Eth1/2"}, + {"index": "4", "value": "Eth1/2/1"}, + {"index": "8[", "value": "Eth1/3"}, + {"index": "8", "value": "Eth1/3/1"}, + <> + ] +} +``` + +9. The localhost **/etc/rsyslog.conf** file shall be *conditionally* modified to include the lookup_table() reference and $msg string transformation. + - The conditional logic shall be active when the CONFIG_DB 'intf_naming_mode' 'standard' value is in use. + - The interface name translation action must be positioned before any remote syslog message redirection action occurs. + - The interface name translation action must be positioned before any message discard (i.e. stop) action occurs. + - The interface name string replacement in a syslog message shall not alter the overall structure of the SONiC message format, just the message string text itself. + - If a native interface name index lookup returns with the value "none", no name translation shall occur in the message (the native name shall be retained). + - The following conditional logic is to be added to the **files/image_config/rsyslog/rsyslog_config.j2**: +``` +{% if ( + (DEVICE_METADATA is defined) and + (DEVICE_METADATA['localhost'] is defined) and + (DEVICE_METADATA['localhost']['intf_naming_mode'] is defined) and + (DEVICE_METADATA['localhost']['intf_naming_mode'] == "standard") + ) %} +# +# Include special config file for interface alias renaming +# (it must preceed all of the rsyslog.d config files) +# +$IncludeConfig /etc/rsyslog-intf-name.conf +{% endif %} +``` + +10. Any rsyslog message that is also directed to the console shall display the translated interface name, to the same extent that rsyslog performs translation on that message. + - SONiC currently displays all EMERGENCY(0) messages on the console of every logged-in user. + - Certain other messages containing specific words or phrases (e.g. segfault, Runtime error) are also displayed on the console. + - See /etc/rsyslog.d/01-sonic-broadcom.conf for details. + +11. The **/etc/rsyslog-intf-name.conf** file shall instantiate the rsyslog lookup table and the new alias translation action. + - Enable the lookup table to be reloaded from its .json file on a SIGHUP signal. + - Handle translating up to two native interface names in the same syslog message. + - Allow an exemption for all messages at or below (lower severity than) a specified severity level. + - Allow an exemption list of an entire category of message originators, as defined by the syslog message tag field (specifically, $programname). + - The current plan is to **exempt** the following from translation: + - ***All DEBUG(7) messages.*** + - ***All messages originating from the mgmt-framework Docker.*** + +12. Replace the **/usr/bin/rsyslog-config.sh** file that starts the localhost rsyslog service with **/usr/bin/rsyslog-config.py**. + - This is because more functionality is needed, including subscribing to Redis, which is more readily done using a Python script. + - The Python script shall run as a ***daemon*** process rather than as a one-shot script execution. + - Requires changes to the **/etc/systemd/system/rsyslog-config.service** file to treat rsyslog-config as a daemon service. + - Details left to the implementer. + - Dynamically create the **/var/lib/rsyslog/rsyslog_port_aliases.json** mapping table file. + - Dynamically create the **/var/lib/rsyslog/rsyslog_baseport_aliases.json** secondary mapping table file. + - Update or regenerate the /etc/rsyslog.conf file. + - This is because rsyslog.conf.j2 contains a conditional section based on the current interface-naming mode value. + +13. Minimize restarts of the rsyslog service. + - Syslog restarts are sometimes viewed as an error condition. + - That perspective needs to change somewhat for this to work, but the goal is to minimize restarting the rsyslog service in the host environment. + - A change in the interface-naming mode *does* require restarting the rsyslog service. + - This is a rather significant, yet infrequent, config event in the system. + - It allows the /etc/rsyslog.conf file to be rewritten to add/remove the lookup table operation. + - A change in DPB configuration only requires issuing a SIGHUP signal to the rsyslog process to force a reload of the lookup table (when in use). + - The rsyslog process remains operational and is not restarted. + - Much faster: approximately 0.5 msec for SIGHUP vs. 30 msec for rsyslog service restart. + - The implementation can trigger the SIGHUP as soon as DPB signals via the STATE_DB that the current breakout transaction has reached a certain point, which is after the old ports have been deleted, but before the new ports are created. + - The subsequent translation table update can proceed, since it does not rely on any CONFIG_DB PORT table updates, which may occur asynchronously. + + +### 1.1.2 Configuration and Management Requirements + +There is no additional user configuration needed for interface name translation to occur. + +- The existing interface naming mode config is sufficient. + + +### 1.1.3 Scalability Requirements + +Interface name translation within syslog messages shall not significantly impact system logging performance. + +- Watch for indications of excessive dropped/lost log messages. +- Watch for higher than normal CPU utilization for the ryslogd process running in the switch host environment. + + +### 1.1.4 Warm Boot Requirements + +The interface name translation in syslog messages is still applicable after a warmboot, based on the current setting of the interface-naming config mode. + + +## 1.2 Design Overview + +## 1.2.1 Switch Host rsyslog Service + +The rsyslog service running in the switch host environment is where the interface name translation for syslog messages takes place. + +**Overview:** + +1. The rsyslog service is started automatically during system init, or by issuing the ```systemctl restart rsyslog-config``` Linux command. + - The /etc/systemd/system/rsyslog-config.service file invokes the appropriate script. + - This is currently /usr/bin/rsyslog-config.sh, but needs to change to invoke the new /usr/bin/rsyslog-config.py script instead. + - The new /usr/bin/rsyslog-config.py script generates the /var/lib/rsyslog/rsyslog_port_aliases.json and /var/lib/rsyslog/rsyslog_baseport_aliases.json files from platform.json, CONFIG_DB, and STATE_DB info. + - Then continues running as a daemon, subscribing to relevant CONFIG_DB and STATE_DB changes. + - Once the new rsyslog JSON files are created, the rsyslog-config.py script either: + - (Re)generates the rsyslog.conf file and restarts the rsyslog service, or + - Reloads just the mapping table from the rsyslog_port_aliases.json file via a SIGHUP signal. + - For example: +``` +--- restart rsyslog service --- + sonic-cfggen -d -t /usr/share/sonic/templates/rsyslog.conf.j2 >/etc/rsyslog.conf + systemctl restart rsyslog + +--- reload mapping tables --- + kill -HUP `ps -eo pid,stat,cmd | grep "rsyslog" | grep "Ssl" | aux '{ print $1 }'` +``` + +2. The /etc/rsyslog.conf file is dynamically generated from a jinja2 template using elements from the CONFIG_DB DEVICE_METADATA|localhost configuration. + - The /usr/share/sonic/templates/rsyslog.conf.j2 must *conditionally* include the (new) /etc/rsyslog-intf-name.conf file only when standard interface naming mode is used. + - The /etc/rsyslog-intf-name.conf file reads the /var/lib/rsyslog/rsyslog_port_aliases.json and /var/lib/rsyslog/rsyslog_baseport_aliases.json files into separate lookup tables to prepare them for rsyslog use. It also defines the syslog message interface name translation action rules. + +3. All syslog messages in the individual Docker containers continue to be formatted and forwarded to the localhost rsyslog service for logging to a file. + - No change from current behavior. + +4. The localhost rsyslog service uses the new action from the /etc/rsyslog-intf-name.conf to replace a native interface name with its standard alias name. + - This only occurs when standard interface naming mode is active. + - Defaults to leaving the native interface name intact in the syslog message if the native-to-standard translation process is unsuccessful for any reason. + - Allows for certain hard-coded exemptions: + - Message severities at or below a certain level. + - Certain syslog message senders whose messages are deemed never to be translated. + +5. The modified syslog message continues to be written to its intended destination log file(s) per the rsyslog configuration. + - This is typically /var/log/syslog, or possibly another file, based on individual action rules. + - The destination log file(s) for the message is not affected by interface name translation within the log message text. + - The original (i.e. untranslated) syslog message text is used for this decision. + + +### 1.2.1.1 Effect of DPB on Interface Name Translations + +The expectation is that a given device's platform.json file contains a single mapping from native to standard alias based upon the 'alias_at_lanes' list that is defined for each base port that is eligible for breakout. These alias names are the ones used for each of the breakout ports in the case where a given base port supports it. However, the first alias in the list is typically not the one used for a base port that has not been broken out. + +DPB related configuration changes do *not* require a system reboot or config reload, therefore the rsyslog translation table must be updated whenever a change in port breakout mode is detected in the STATE_DB PORT_BREAKOUT table (only interested in base port DPB status changes). The current breakout mode can be read from the CONFIG_DB BREAKOUT_CFG table 'brkout_mode' attribute listed for the base port. + +For the base ports, their standard alias name is determined by reading the first 'alias_at_lanes' entry in their platform.json definition, subject to the following modification: if the port supports breakout, but is currently not broken out, then the trailing "/ part of the alias name is removed. Once the new translation table JSON file is built, a SIGHUP signal is used to tell rsyslog to reload its lookup table from the JSON file without restarting the process. + + +### 1.2.1.2 Systemd Journal Remains Untranslated + +The Linux systemd journal is a completely separate logging facility that runs in the localhost Debian environment of the switch. It captures certain syslog entries and logs from other parts of the system into a central location. This operates independently of rsyslog, therefore this design has no effect on the systemd journal entries. + +- All interface name translations affect rsyslog only; the systemd journal log always shows the original native interface names. +- SONiC configures the host systemd journal to capture log messages of severity EMERGENCY(0), ALERT(1), or CRITICAL(2) only. +- Only log messages originating from within the localhost Debian environment are potentially logged in the systemd journal as well as to rsyslog. +- In SONiC, none of the rsyslog messages originating from within any of the Docker containers are captured in the systemd journal. + +There is no way to modify the syslog message text that is stored in the systemd journal, therefore native interface names will always continue to appear there. However, since none of the syslog messages from the Docker containers appear in the systemd journal, and since only CRITICAL(2) or higher severity messages are logged in the journal, the problem scope is significantly reduced. For purposes of this design, no attempt is made to modify the systemd journal log entries centrally. If any problematic log entries should appear in the journal containing an untranslated interface name, the solution is to modify the source code of the message sender to put in the correct alias name. There should be very few (if any) of these messages to correct. + + +### 1.2.2 Container + +The base assumption is that this does not affect any individual Docker container, but only the operation of the rsyslog service running in the localhost Debian environment of the switch. + +**Possible Exception:** +The FRR logging that takes place in the **bgp** container has its own **/etc/rsyslog.d/45-frr.conf** file to log its messages locally in that container and prevent them from being duplicated in the switch /var/log/syslog. + +- In this case, it would be necessary to add similar translation logic to the /etc/rsyslog.conf file that resides in the bgp Docker container. +- ***TBD: Deferred for future study.*** + + +### 1.2.3 SAI Overview + +N/A + + +# 2 Functionality + + +## 2.1 Target Deployment Use Cases + +Platforms that support standard interface naming mode. + + + +## 2.2 Functional Description + + + + +**TBD + + + + + + + +# 3 Design + + + + + +## 3.1 Overview + +**TBD + + + + +## 3.2 DB Changes + +None. + + + +### 3.2.1 CONFIG DB + +This feature does not add any new information to the CONFIG_DB. + +The following information in the CONFIG_DB is referenced by this feature: + +- BREAKOUT_CFG|Ethernet\ + - brkout_mode + +- DEVICE_METADATA|localhost + - platform + - hwsku + - intf_naming_mode + +The following values in the CONFIG_DB must be monitored for changes via subscription to Redis: + +- DEVICE_METADATA|localhost + - intf_naming_mode + + +### 3.2.2 APP DB + +N/A + +### 3.2.3 STATE DB + +The following information in the STATE_DB is referenced by this feature: + +- PORT_BREAKOUT|Ethernet\ + - phase + +The following values in the STATE_DB must be monitored for changes via subscription to Redis: + +- PORT_BREAKOUT|Ethernet\ + - phase + +### 3.2.4 ASIC DB + +N/A + +### 3.2.5 COUNTER DB + +N/A + +## 3.3 Switch State Service Design + +N/A + +### 3.3.1 Orchestration Agent + +N/A + +### 3.3.2 Other Process + +N/A + +## 3.4 SyncD + +N/A + +## 3.5 SAI + +N/A + + +## 3.6 User Interface + +This feature has no additional requirements for the UI. + +- Activation of rsyslog message translation requires issuing the ```interface-naming standard``` configuration command from the sonic-cli. + +### 3.6.1 Data Models + +N/A + +### 3.6.2 CLI + +N/A + +#### 3.6.2.1 Configuration Commands + +N/A + +#### 3.6.2.2 Show Commands + +N/A + +### 3.6.3 REST API Support + +N/A + +### 3.6.4 Service and Docker Management + +The rsyslog service in the localhost Debian environment is where interface name translation occurs during the processing of syslog messages. Since this relies on a translation table built at runtime from platform-specific alias information, the ryslog service is assumed to be restarted at the following times: + +- cold boot +- warm boot +- config reload +- interface naming mode config change + +The rsyslog service will simply reload its translation table upon receipt of a SIGHUP signal for the following: + +- DPB config change + +The rsyslog-config service creates the rsyslog configuration prior to (re)starting the rsyslog service. It is restarted during cold boot, warm boot, and config reload along with the rest of the system. The rsyslog-config service runs as a daemon process in the localhost environment. + +# 4 Flow Diagrams + +**TBD + +**Provide flow diagrams for inter-container and intra-container interactions. + + + + + + + +# 5 Error Handling + +1. Any error creating the translation table file during rsyslog service start/restart shall inhibit interface name translations in syslog messages, but system logging operations are otherwise unaffected. +2. Any error while looking up the native interface name in the rsyslog translation table shall be silently ignored, with the original native interface name retained in the syslog message. +3. Any log message that refers to a native interface name that is not in use per the current DPB configuration shall not translate that interface name. +4. If more than two distinct native interface names are referenced in the same log message, only the first two are translated, with all remaining interface names not translated. + + + + + + + +# 6 Serviceability and Debug + +There are many 'rsyslogd' processes listed in the 'ps -a' output in the localhost Debian environment. Use the following command to determine the PID of the main rsyslogd process. This can then be monitored using 'top -p \'. +``` +systemctl status rsyslog | grep -i "main pid" | awk '{ print $3 }' +``` + + + + + +# 7 Warm Boot Support + + +No special provisions needed. + + + + + + +# 8 Scalability + +N/A + + + + + + +# 9 Unit Test + + + + +It is highly recommended that developers use the following command on a test switch to verify any coding changes that affect the rsyslog configuration ***before*** committing such changes to the code base. +``` +rsyslogd -f /etc/rsyslog.conf -N1 +``` + +A successful result from the above command looks similar to this: +``` +rsyslogd: version 8.24.0, config validation run (level 1), master config /etc/rsyslog.conf +rsyslogd: End of config validation run. Bye. +``` + +The *logger* utility is handy for creating test log messages with specific content and severity. This can be initiated directly from the host environment or from within a Docker, such as: +``` +docker exec -it swss logger -p 5 "This is a test message regarding Ethernet48" +``` + +Putting the logger command inside a script that issues the command 10,000 times is one way to check the CPU utilization of the host rsyslogd process via ```top```. This script can be copied into a Docker and executed remotely from there. + + + +# 10 Internal Design Information + + + + +**TBD +Internal BRCM information to be removed before sharing with the community. + + From d463f0e1ae5584ca9a3b28d2a4d2d474e07f417e Mon Sep 17 00:00:00 2001 From: Greg Paussa Date: Mon, 24 Aug 2020 09:30:26 -0700 Subject: [PATCH 2/3] Update PR #70 with DPB translation disclaimer. --- system/syslog-msg-intf-name-xlate-HLD.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/system/syslog-msg-intf-name-xlate-HLD.md b/system/syslog-msg-intf-name-xlate-HLD.md index 7d595ea1dc5c..b2f4504d5ca5 100755 --- a/system/syslog-msg-intf-name-xlate-HLD.md +++ b/system/syslog-msg-intf-name-xlate-HLD.md @@ -5,7 +5,7 @@ Syslog Message Interface Name Translation # High Level Design Document -#### Rev 0.6 +#### Rev 0.7 # Table of Contents @@ -33,6 +33,7 @@ Syslog Message Interface Name Translation | 0.4 | 07/23/2020 | Greg Paussa | Do not rely on PORT table for base port alias names. | | 0.5 | 08/03/2020 | Greg Paussa | Handle non-base breakout port alias name variations. Special-case master port references in log messages. | | 0.6 | 08/12/2020 | Greg Paussa | Use STATE_DB update to indicate DPB change instead of CONFIG_DB. | +| 0.7 | 08/24/2020 | Greg Paussa | Added disclaimer for DPB log message translations in Section 1.1. | # About this Manual @@ -113,6 +114,8 @@ A high-level view of the operation: 7. The rsyslog service cannot infer or otherwise determine the semantic intent of an interface name that appears in a syslog message. - The rsyslog daemon performs a literal substitution of the standard alias for its native name while configured for standard interface naming mode, regardless of where it appears in the syslog message. - Rsyslog is not aware of the context of the message sender, for example, whether it originated a message before or after processing DPB-related port events that modify a standard alias name. + - **Interface name translation in some DPB event logs may not be accurate in the context of the event, or may not get translated.** + - Such log messages should be fixed at the source (back-end). 8. It is SUGGESTED to also translate non-SONiC native names in syslog messages, such as Linux device names "eth1" into standard naming convention as well. - Only if it makes sense to do so and does not put undue burden on rsyslog. From 3e3c756bb0ad14b4431a42cdd89f3f09d16cb69e Mon Sep 17 00:00:00 2001 From: Greg Paussa Date: Fri, 4 Sep 2020 14:00:54 -0700 Subject: [PATCH 3/3] Update PR #70 with note about HUP and new section on resource usage. --- system/syslog-msg-intf-name-xlate-HLD.md | 39 ++++++++++++++++++------ 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/system/syslog-msg-intf-name-xlate-HLD.md b/system/syslog-msg-intf-name-xlate-HLD.md index b2f4504d5ca5..ffaf61c7d2d5 100755 --- a/system/syslog-msg-intf-name-xlate-HLD.md +++ b/system/syslog-msg-intf-name-xlate-HLD.md @@ -5,7 +5,7 @@ Syslog Message Interface Name Translation # High Level Design Document -#### Rev 0.7 +#### Rev 0.8 # Table of Contents @@ -34,6 +34,7 @@ Syslog Message Interface Name Translation | 0.5 | 08/03/2020 | Greg Paussa | Handle non-base breakout port alias name variations. Special-case master port references in log messages. | | 0.6 | 08/12/2020 | Greg Paussa | Use STATE_DB update to indicate DPB change instead of CONFIG_DB. | | 0.7 | 08/24/2020 | Greg Paussa | Added disclaimer for DPB log message translations in Section 1.1. | +| 0.8 | 09/03/2020 | Greg Paussa | Added a note regarding an inband log message as a faster alternative to SIGHUP for reloading tables. Added Section 3.6.5 describing system resource usage. | # About this Manual @@ -78,6 +79,9 @@ A high-level view of the operation: 5. The localhost rsyslogd is not state-aware of any context from which a syslog message is originated. +### A Note About SIGHUP Usage + +This document mentions using the SIGHUP signal to notify the rsyslogd process to reload its translation tables from files. While this works and is a valid way to reload the tables, experimentation has shown that there is a quicker way to do this by sending a special log message to rsyslog that is detected by an rsyslog configuration rule and calls reload_lookup_table() as its action. This special inband log message can be discarded. From a design perspective these two techniques are equivalent, however the implementation can use either one. ## 1.1 Requirements @@ -251,7 +255,7 @@ $IncludeConfig /etc/rsyslog-intf-name.conf - See /etc/rsyslog.d/01-sonic-broadcom.conf for details. 11. The **/etc/rsyslog-intf-name.conf** file shall instantiate the rsyslog lookup table and the new alias translation action. - - Enable the lookup table to be reloaded from its .json file on a SIGHUP signal. + - Enable the lookup table to be reloaded from its .json file automatically on a SIGHUP signal (if desired). - Handle translating up to two native interface names in the same syslog message. - Allow an exemption for all messages at or below (lower severity than) a specified severity level. - Allow an exemption list of an entire category of message originators, as defined by the syslog message tag field (specifically, $programname). @@ -275,10 +279,10 @@ $IncludeConfig /etc/rsyslog-intf-name.conf - A change in the interface-naming mode *does* require restarting the rsyslog service. - This is a rather significant, yet infrequent, config event in the system. - It allows the /etc/rsyslog.conf file to be rewritten to add/remove the lookup table operation. - - A change in DPB configuration only requires issuing a SIGHUP signal to the rsyslog process to force a reload of the lookup table (when in use). + - A change in DPB configuration only requires that the rsyslog process reload its lookup table (SIGHUP signal or special inband log message) when in use. - The rsyslog process remains operational and is not restarted. - Much faster: approximately 0.5 msec for SIGHUP vs. 30 msec for rsyslog service restart. - - The implementation can trigger the SIGHUP as soon as DPB signals via the STATE_DB that the current breakout transaction has reached a certain point, which is after the old ports have been deleted, but before the new ports are created. + - The implementation can initiate the lookup table rebuild and reload as soon as DPB signals via the STATE_DB that the current breakout transaction has reached a certain point, which is after the old ports have been deleted, but before the new ports are created. - The subsequent translation table update can proceed, since it does not rely on any CONFIG_DB PORT table updates, which may occur asynchronously. @@ -317,15 +321,15 @@ The rsyslog service running in the switch host environment is where the interfac - Then continues running as a daemon, subscribing to relevant CONFIG_DB and STATE_DB changes. - Once the new rsyslog JSON files are created, the rsyslog-config.py script either: - (Re)generates the rsyslog.conf file and restarts the rsyslog service, or - - Reloads just the mapping table from the rsyslog_port_aliases.json file via a SIGHUP signal. - - For example: + - Reloads just the mapping table from the rsyslog_port_aliases.json file via a SIGHUP signal or special inband log message. + - For example, using SIGHUP: ``` --- restart rsyslog service --- sonic-cfggen -d -t /usr/share/sonic/templates/rsyslog.conf.j2 >/etc/rsyslog.conf systemctl restart rsyslog --- reload mapping tables --- - kill -HUP `ps -eo pid,stat,cmd | grep "rsyslog" | grep "Ssl" | aux '{ print $1 }'` + systemctl kill -s HUP rsyslog ``` 2. The /etc/rsyslog.conf file is dynamically generated from a jinja2 template using elements from the CONFIG_DB DEVICE_METADATA|localhost configuration. @@ -354,7 +358,7 @@ The expectation is that a given device's platform.json file contains a single ma DPB related configuration changes do *not* require a system reboot or config reload, therefore the rsyslog translation table must be updated whenever a change in port breakout mode is detected in the STATE_DB PORT_BREAKOUT table (only interested in base port DPB status changes). The current breakout mode can be read from the CONFIG_DB BREAKOUT_CFG table 'brkout_mode' attribute listed for the base port. -For the base ports, their standard alias name is determined by reading the first 'alias_at_lanes' entry in their platform.json definition, subject to the following modification: if the port supports breakout, but is currently not broken out, then the trailing "/ part of the alias name is removed. Once the new translation table JSON file is built, a SIGHUP signal is used to tell rsyslog to reload its lookup table from the JSON file without restarting the process. +For the base ports, their standard alias name is determined by reading the first 'alias_at_lanes' entry in their platform.json definition, subject to the following modification: if the port supports breakout, but is currently not broken out, then the trailing "/ part of the alias name is removed. Once the new translation table JSON file is built, rsyslog is told to reload its lookup table from the JSON file (SIGHUP or special inband log message) without restarting the process. ### 1.2.1.2 Systemd Journal Remains Untranslated @@ -398,7 +402,6 @@ Platforms that support standard interface naming mode. - **TBD @@ -526,12 +529,28 @@ The rsyslog service in the localhost Debian environment is where interface name - config reload - interface naming mode config change -The rsyslog service will simply reload its translation table upon receipt of a SIGHUP signal for the following: +The rsyslog service will simply reload its translation table (SIGHUP signal or special inband log message) for the following: - DPB config change The rsyslog-config service creates the rsyslog configuration prior to (re)starting the rsyslog service. It is restarted during cold boot, warm boot, and config reload along with the rest of the system. The rsyslog-config service runs as a daemon process in the localhost environment. +### 3.6.5 Resource Usage + +Interface name translation by the rsyslog service requires some translation tables and additional rules in the rsyslog.conf configuration. Although exact numbers can vary, the following incremental resource usage was observed with ```top``` on a vSONIC VS device while running in standard naming mode as compared to native mode: + +- CPU Utilization: +1% (approximately) +- Memory: +80 MB + +The CPU utilization was measured while sending 10000 logger messages from the swss Docker (approximately 435 messages per second on this setup). + +In addition, the rsyslog-config service is now a daemon process that consumes the following system resources: + +- CPU Utilization: 0% +- Memory: 56 MB + +The rsyslog-config daemon metrics are not affected by the current interface-naming mode. + # 4 Flow Diagrams **TBD