-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to periodically update oper status of management interface #21245
Conversation
Signed-off-by: Suvarna Meenakshi <[email protected]>
Signed-off-by: Suvarna Meenakshi <[email protected]>
Signed-off-by: Suvarna Meenakshi <[email protected]>
Signed-off-by: Suvarna Meenakshi <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
for port in mgmt_ports: | ||
state_db_key = "MGMT_PORT_TABLE|{}".format(port) | ||
# Reset status of mgmt port before updating with latest status | ||
db.set(db.STATE_DB, state_db_key, 'oper_status', 'unknown') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do a read first and only when there is a change of oper state, the log there is a change whether it be from UP to Down or Down to UP transition.
Let's not reset it to unknown as we want to avoid making any changes unless there is a need to do so (state change detected). For the state cahnge from UP to Down, let.s make it as a warning. For state change from down to up let's make it as a INFO level syslog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified as suggested
Signed-off-by: Suvarna Meenakshi <[email protected]>
Signed-off-by: Suvarna Meenakshi <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
current_oper_status = subprocess.run(['cat', port_operstate_path], capture_output=True, text=True) | ||
if current_oper_status.stdout.strip() != prev_oper_status: | ||
db.set(db.STATE_DB, state_db_key, 'oper_status', current_oper_status.stdout.strip()) | ||
syslog.syslog(syslog.LOG_INFO, "mgmt_oper_status_check: {}".format(current_oper_status.stdout.strip())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make syslog WARNING for down case and keep INFO for up case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change in log level as suggested.
Signed-off-by: Suvarna Meenakshi <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Suvarna Meenakshi <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
# Periodically update oper status of mgmt interface in STATE_DB | ||
check program mgmtOperStatus with path "/usr/bin/mgmt_oper_status.py" | ||
every 1 cycles | ||
if status != 0 for 3 cycle then alert repeat every 1 cycles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have we tested out config relolad/minigraph scenario. Are we not getting monit error as that can impact nightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verified reload/reboot, did not see the monit log message, is there any other specific concern on why this error might get logged?
Signed-off-by: Suvarna Meenakshi <[email protected]>
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
Why I did it Cherry-pick of changes in sonic-net/sonic-buildimage#21245 Work item tracking Microsoft ADO 30279044: How I did it Remove update of mgmt oper status from sonic-swss Use monit script to periodically update oper status of mgmt interface. How to verify it Verified on Chassis platform
Currently operational status of mgmt interface is not present or correct for multi-asic devices. Why I did it Initial PR that added mgmt oper status feature in swss: #630 sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss. --------- Signed-off-by: Suvarna Meenakshi <[email protected]>
<!-- Please make sure you have read and understood the contribution guildlines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md 1. Make sure your commit includes a signature generted with `git commit -s` 2. Make sure your commit title follows the correct format: [component]: description 3. Make sure your commit message contains enough details about the change and related tests 4. Make sure your pull request adds related reviewers, asignees, labels Please also provide the following information in this pull request: --> **What I did** Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices. **Why I did it** Initial PR that added mgmt oper status feature in swss: sonic-net/sonic-swss#630 sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss. **How I verified it** Verified on single-asic platform 1. verified on single-asic platform and multi-asic Chassis platfrom. Single ASIC Arista device verification along with sonic-net/sonic-buildimage#21245 changes Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script. ``` #!/bin/bash CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB is $CUR_STATUS :: expected to have up state" snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result before config reload 5min $snmp_result :: expected to have 1 in snmp result" sudo config reload -y CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB after config_reload is $CUR_STATUS :: expected to have empty" # sleep for snmp to start sleep 60 snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result after config reload $snmp_result :: expected to return error since STATE_DB is not yet populated" # monit will populate mgmt oper status after each 5min cycle sleep 240 CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB after config_reload after 5min $CUR_STATUS :: expected to have up state" snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result after config reload after 5min $snmp_result :: expected to have 1 in snmp result" ``` Result of above script: ``` current status in STATE_DB is {'oper_status': 'up'} :: expected to have up state snmp result before config reload 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock current status in STATE_DB after config_reload is {} :: expected to have empty snmp result after config reload iso.3.6.1.2.1.2.2.1.8.10000 = No Such Object available on this agent at this OID :: expected to return error since STATE_DB is not yet populated current status in STATE_DB after config_reload after 5min {'oper_status': 'up'} :: expected to have up state snmp result after config reload after 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result ``` **Details if related**
What I did Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices. Why I did it Initial PR that added mgmt oper status feature in swss: sonic-net/sonic-swss#630 sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss. How I verified it Verified on single-asic platform verified on single-asic platform and multi-asic Chassis platfrom. Single ASIC Arista device verification along with Add script to periodically update oper status of management interface sonic-net/sonic-buildimage#21245 changes Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script. #!/bin/bash CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB is $CUR_STATUS :: expected to have up state" snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result before config reload 5min $snmp_result :: expected to have 1 in snmp result" sudo config reload -y CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB after config_reload is $CUR_STATUS :: expected to have empty" # sleep for snmp to start sleep 60 snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result after config reload $snmp_result :: expected to return error since STATE_DB is not yet populated" # monit will populate mgmt oper status after each 5min cycle sleep 240 CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"` echo "current status in STATE_DB after config_reload after 5min $CUR_STATUS :: expected to have up state" snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000` echo "snmp result after config reload after 5min $snmp_result :: expected to have 1 in snmp result" Result of above script: current status in STATE_DB is {'oper_status': 'up'} :: expected to have up state snmp result before config reload 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result Acquired lock on /etc/sonic/reload.lock Disabling container and routeCheck monitoring ... Stopping SONiC target ... Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db Running command: /usr/local/bin/db_migrator.py -o migrate Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment Restarting SONiC target ... Enabling container and routeCheck monitoring ... Reloading Monit configuration ... Reinitializing monit daemon Released lock on /etc/sonic/reload.lock current status in STATE_DB after config_reload is {} :: expected to have empty snmp result after config reload iso.3.6.1.2.1.2.2.1.8.10000 = No Such Object available on this agent at this OID :: expected to return error since STATE_DB is not yet populated current status in STATE_DB after config_reload after 5min {'oper_status': 'up'} :: expected to have up state snmp result after config reload after 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
…sonic-net#21245) Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices. Root cause: Operational status of mgmt interface is updated by portsyncd in swss docker. In case of multi-asic platform, swss service is started only in asic namespace context. Since portsyncd is running in a specific network namespace context, it is not aware of mgmt interface present in the host namespace of multi-asic platform. Therefore there is no way for portsyncd to find the operational status of mgmt interface and update in STATE_DB MGMT_PORT_TABLE. Use case: SNMP interface MIB reads MGMT_PORT_TABLE in STATE_DB to retrieve oper status of mgmt interface periodically. In case of multi-asic platform, currently this is returning the oper status of 'eth0' interface which is the virtual interface that is present inside asic namespace which gets created as a part of database docker and is not the actual management interface. --------- Signed-off-by: Suvarna Meenakshi <[email protected]>
Currently operational status of mgmt interface is not present or correct for multi-asic devices. Why I did it Initial PR that added mgmt oper status feature in swss: sonic-net#630 sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss. --------- Signed-off-by: Suvarna Meenakshi <[email protected]>
Why I did it
Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices.
Root cause: Operational status of mgmt interface is updated by portsyncd in swss docker. In case of multi-asic platform, swss service is started only in asic namespace context. Since portsyncd is running in a specific network namespace context, it is not aware of mgmt interface present in the host namespace of multi-asic platform. Therefore there is no way for portsyncd to find the operational status of mgmt interface and update in STATE_DB MGMT_PORT_TABLE.
Use case: SNMP interface MIB reads MGMT_PORT_TABLE in STATE_DB to retrieve oper status of mgmt interface periodically. In case of multi-asic platform, currently this is returning the oper status of 'eth0' interface which is the virtual interface that is present inside asic namespace which gets created as a part of database docker and is not the actual management interface.
Work item tracking
How I did it
How to verify it
Single ASIC Arista device verification:
Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script.
Result of above script:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)