Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to periodically update oper status of management interface #21245

Merged
merged 9 commits into from
Jan 9, 2025

Conversation

SuvarnaMeenakshi
Copy link
Contributor

@SuvarnaMeenakshi SuvarnaMeenakshi commented Dec 20, 2024

Why I did it

Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices.
Root cause: Operational status of mgmt interface is updated by portsyncd in swss docker. In case of multi-asic platform, swss service is started only in asic namespace context. Since portsyncd is running in a specific network namespace context, it is not aware of mgmt interface present in the host namespace of multi-asic platform. Therefore there is no way for portsyncd to find the operational status of mgmt interface and update in STATE_DB MGMT_PORT_TABLE.
Use case: SNMP interface MIB reads MGMT_PORT_TABLE in STATE_DB to retrieve oper status of mgmt interface periodically. In case of multi-asic platform, currently this is returning the oper status of 'eth0' interface which is the virtual interface that is present inside asic namespace which gets created as a part of database docker and is not the actual management interface.

Work item tracking
  • Microsoft ADO 30279044:

How I did it

  1. Remove update of mgmt oper status from sonic-swss
  2. Use monit script to periodically update oper status of mgmt interface.

How to verify it

  1. verified on single-asic platform and multi-asic Chassis platfrom.
    Single ASIC Arista device verification:
    Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script.
#!/bin/bash
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB is $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  before config reload 5min $snmp_result :: expected to have 1 in snmp result"
sudo config reload -y
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload is $CUR_STATUS :: expected to have empty"
# sleep for snmp to start
sleep 60
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload $snmp_result :: expected to return error since STATE_DB is not yet populated"
# monit will populate mgmt oper status after each 5min cycle
sleep 240
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload after 5min $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload after 5min $snmp_result :: expected to have 1 in snmp result"

Result of above script:

current status in STATE_DB is {'oper_status': 'up'} :: expected to have up state
snmp result  before config reload 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
current status in STATE_DB after config_reload is {} :: expected to have empty
snmp result  after config reload iso.3.6.1.2.1.2.2.1.8.10000 = No Such Object available on this agent at this OID :: expected to return error since STATE_DB is not yet populated
current status in STATE_DB after config_reload after 5min {'oper_status': 'up'} :: expected to have up state
snmp result  after config reload after 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@SuvarnaMeenakshi SuvarnaMeenakshi changed the title Mgmt if status Add script to periodically update oper status of management interface Dec 20, 2024
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

for port in mgmt_ports:
state_db_key = "MGMT_PORT_TABLE|{}".format(port)
# Reset status of mgmt port before updating with latest status
db.set(db.STATE_DB, state_db_key, 'oper_status', 'unknown')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a read first and only when there is a change of oper state, the log there is a change whether it be from UP to Down or Down to UP transition.
Let's not reset it to unknown as we want to avoid making any changes unless there is a need to do so (state change detected). For the state cahnge from UP to Down, let.s make it as a warning. For state change from down to up let's make it as a INFO level syslog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified as suggested

Signed-off-by: Suvarna Meenakshi <[email protected]>
Signed-off-by: Suvarna Meenakshi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

current_oper_status = subprocess.run(['cat', port_operstate_path], capture_output=True, text=True)
if current_oper_status.stdout.strip() != prev_oper_status:
db.set(db.STATE_DB, state_db_key, 'oper_status', current_oper_status.stdout.strip())
syslog.syslog(syslog.LOG_INFO, "mgmt_oper_status_check: {}".format(current_oper_status.stdout.strip()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make syslog WARNING for down case and keep INFO for up case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change in log level as suggested.

Signed-off-by: Suvarna Meenakshi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Suvarna Meenakshi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Collaborator

@gechiang gechiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

# Periodically update oper status of mgmt interface in STATE_DB
check program mgmtOperStatus with path "/usr/bin/mgmt_oper_status.py"
every 1 cycles
if status != 0 for 3 cycle then alert repeat every 1 cycles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we tested out config relolad/minigraph scenario. Are we not getting monit error as that can impact nightly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified reload/reboot, did not see the monit log message, is there any other specific concern on why this error might get logged?

@rlhui rlhui added the P0 Priority of the issue label Jan 8, 2025
Signed-off-by: Suvarna Meenakshi <[email protected]>
@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rlhui rlhui merged commit e50863e into sonic-net:master Jan 9, 2025
21 checks passed
arlakshm added a commit to Azure/sonic-buildimage-msft that referenced this pull request Jan 9, 2025
Why I did it
Cherry-pick of changes in sonic-net/sonic-buildimage#21245

Work item tracking
Microsoft ADO 30279044:
How I did it
Remove update of mgmt oper status from sonic-swss
Use monit script to periodically update oper status of mgmt interface.

How to verify it
Verified on Chassis platform
rlhui pushed a commit to sonic-net/sonic-swss that referenced this pull request Jan 10, 2025
Currently operational status of mgmt interface is not present or correct for multi-asic devices.

Why I did it
Initial PR that added mgmt oper status feature in swss: #630
sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss.

---------

Signed-off-by: Suvarna Meenakshi <[email protected]>
mssonicbld added a commit to mssonicbld/sonic-swss.msft that referenced this pull request Jan 10, 2025
<!--
Please make sure you have read and understood the contribution guildlines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

1. Make sure your commit includes a signature generted with `git commit -s`
2. Make sure your commit title follows the correct format: [component]: description
3. Make sure your commit message contains enough details about the change and related tests
4. Make sure your pull request adds related reviewers, asignees, labels

Please also provide the following information in this pull request:
-->

**What I did**
Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices.

**Why I did it**
Initial PR that added mgmt oper status feature in swss: sonic-net/sonic-swss#630
sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss.

**How I verified it**
Verified on single-asic platform
1. verified on single-asic platform and multi-asic Chassis platfrom.
Single ASIC Arista device verification along with sonic-net/sonic-buildimage#21245 changes
Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script.
```
#!/bin/bash
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB is $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  before config reload 5min $snmp_result :: expected to have 1 in snmp result"
sudo config reload -y
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload is $CUR_STATUS :: expected to have empty"
# sleep for snmp to start
sleep 60
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload $snmp_result :: expected to return error since STATE_DB is not yet populated"
# monit will populate mgmt oper status after each 5min cycle
sleep 240
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload after 5min $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload after 5min $snmp_result :: expected to have 1 in snmp result"
```
Result of above script:
```
current status in STATE_DB is {'oper_status': 'up'} :: expected to have up state
snmp result  before config reload 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
current status in STATE_DB after config_reload is {} :: expected to have empty
snmp result  after config reload iso.3.6.1.2.1.2.2.1.8.10000 = No Such Object available on this agent at this OID :: expected to return error since STATE_DB is not yet populated
current status in STATE_DB after config_reload after 5min {'oper_status': 'up'} :: expected to have up state
snmp result  after config reload after 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
```

**Details if related**
arlakshm added a commit to Azure/sonic-swss.msft that referenced this pull request Jan 10, 2025
What I did
Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices.

Why I did it
Initial PR that added mgmt oper status feature in swss: sonic-net/sonic-swss#630
sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss.

How I verified it
Verified on single-asic platform

verified on single-asic platform and multi-asic Chassis platfrom.
Single ASIC Arista device verification along with Add script to periodically update oper status of management interface sonic-net/sonic-buildimage#21245 changes
Ran the below bash script to verify the state of STATE_DB: MGMT_OPER_STATUS table and also execute config_reload, verify if STATE_DB is flushed out and repopulated after monit starts periodic script.
#!/bin/bash
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB is $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  before config reload 5min $snmp_result :: expected to have 1 in snmp result"
sudo config reload -y
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload is $CUR_STATUS :: expected to have empty"
# sleep for snmp to start
sleep 60
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload $snmp_result :: expected to return error since STATE_DB is not yet populated"
# monit will populate mgmt oper status after each 5min cycle
sleep 240
CUR_STATUS=`sonic-db-cli STATE_DB hgetall "MGMT_PORT_TABLE|eth0"`
echo "current status in STATE_DB after config_reload after 5min $CUR_STATUS :: expected to have up state"
snmp_result=`docker exec snmp snmpwalk -v2c -c <comm> <IP> 1.3.6.1.2.1.2.2.1.8.10000`
echo "snmp result  after config reload after 5min $snmp_result :: expected to have 1 in snmp result"
Result of above script:

current status in STATE_DB is {'oper_status': 'up'} :: expected to have up state
snmp result  before config reload 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
Acquired lock on /etc/sonic/reload.lock
Disabling container and routeCheck monitoring ...
Stopping SONiC target ...
Running command: /usr/local/bin/sonic-cfggen -j /etc/sonic/init_cfg.json -j /etc/sonic/config_db.json --write-to-db
Running command: /usr/local/bin/db_migrator.py -o migrate
Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment
Restarting SONiC target ...
Enabling container and routeCheck monitoring ...
Reloading Monit configuration ...
Reinitializing monit daemon
Released lock on /etc/sonic/reload.lock
current status in STATE_DB after config_reload is {} :: expected to have empty
snmp result  after config reload iso.3.6.1.2.1.2.2.1.8.10000 = No Such Object available on this agent at this OID :: expected to return error since STATE_DB is not yet populated
current status in STATE_DB after config_reload after 5min {'oper_status': 'up'} :: expected to have up state
snmp result  after config reload after 5min iso.3.6.1.2.1.2.2.1.8.10000 = INTEGER: 1 :: expected to have 1 in snmp result
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
…sonic-net#21245)

Issue to be fix: Currently operational status of mgmt interface is not present or correct for multi-asic devices.
Root cause: Operational status of mgmt interface is updated by portsyncd in swss docker. In case of multi-asic platform, swss service is started only in asic namespace context. Since portsyncd is running in a specific network namespace context, it is not aware of mgmt interface present in the host namespace of multi-asic platform. Therefore there is no way for portsyncd to find the operational status of mgmt interface and update in STATE_DB MGMT_PORT_TABLE.
Use case: SNMP interface MIB reads MGMT_PORT_TABLE in STATE_DB to retrieve oper status of mgmt interface periodically. In case of multi-asic platform, currently this is returning the oper status of 'eth0' interface which is the virtual interface that is present inside asic namespace which gets created as a part of database docker and is not the actual management interface.

---------

Signed-off-by: Suvarna Meenakshi <[email protected]>
shiraez pushed a commit to Marvell-switching/sonic-swss that referenced this pull request Feb 17, 2025
Currently operational status of mgmt interface is not present or correct for multi-asic devices.

Why I did it
Initial PR that added mgmt oper status feature in swss: sonic-net#630
sonic-net/sonic-buildimage#21245 adds a script to update oper status of management interface periodically. In doing so, we no longer need to update oper status of mgmt interface in swss.

---------

Signed-off-by: Suvarna Meenakshi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants