Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating SONiC for OOPT #1

Open
wants to merge 7,084 commits into
base: master
Choose a base branch
from

Conversation

haris-khan1596
Copy link

- What I did
Updated SONiC for OOPT
- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

cscarpitta and others added 30 commits October 25, 2024 09:28
This PR fixes the SRv6 SID uninstall introduced in this PR: #18715
This fix has been already merged in the FRR mainline: FRRouting/frr#16835

Signed-off-by: Carmine Scarpitta <[email protected]>
* [Micas/Platform]platform support M2-W6920-32QC2X

Signed-off-by: philo <[email protected]>

* update platform files

Signed-off-by: philo <[email protected]>

* fix Semgrep

Signed-off-by: philo <[email protected]>

* Update control

---------

Signed-off-by: philo <[email protected]>
… automatically (#20624)

#### Why I did it
src/sonic-platform-common
```
* 7268fad - (HEAD -> master, origin/master, origin/HEAD) [SmartSwitch] Add a new API for the DPU chassis to query dataplane and midplane states (#509) (7 hours ago) [Oleksandr Ivantsiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…D automatically (#20609)

#### Why I did it
src/sonic-platform-daemons
```
* f169f86 - (HEAD -> master, origin/master, origin/HEAD) Move DomInfoUpdateTask class to a separate file (#552) (2 days ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… control (#19476)

- Why I did it
On Mellanox platforms, currently only CMIS active ports can be controlled by the SW, and all copper modules are controlled by FW.
We want to let Sonic control passive copper modules as well, for CMIS and SFF (sff8636 and sff8436).

- How I did it
I updated the module detection flow to tag CMIS and SFF passive modules as SW control.

- How to verify it
Manual tests.
HLD link: sonic-net/SONiC#1522

- Why I did it
SONiC provides two Python logger implementations: sonic_py_common.logger.Logger and sonic_py_common.syslogger.SysLogger. Both of them do not provide the ability to change log level at real time. Sometimes, in order to get more debug information, developer has to manually change the log level in code on a running switch and restart the Python daemon. This is not convenient.

SONiC also provides a C/C++ logger implementation in sonic-platform-common.common.logger.cpp. This C/C++ logger implementation is also a wrapper of Linux standard syslog which is widely used by swss/syncd. It provides the ability to set log level on fly by starting a thread to listen to CONFIG DB LOGGER table change. SONiC infrastructure also provides the Python wrapper for sonic-platform-common.common.logger.cpp which is swsscommon.Logger. However, this logger implementation also has some drawbacks:

swsscommon.Logger assumes redis DB is ready to connect. This is a valid assumption for swss/syncd. But it is not good for a Python logger implementation because some Python script may be called before redis server starting.
swsscommon.Logger wraps Linux syslog which only support single log identifier for a daemon.
So, swsscommon.Logger is not an option too.

This PR is a Python logger enhancement which allows user setting log level at run time.

- How I did it
swsscommon.Logger depends on a thread to listen to CONFIG DB LOGGER table change. It refreshes log level for each logger instances once the thread detects a DB entry change. A thread is considered heavy in a python script, especially that there are many short and simple python scripts which also use logger. To keep python logger light weight, it uses a different design than swsscommon.Logger:

A class level logger registry shall be added to SysLoggerclass
Each logger instance shall register itself to logger register if enables runtime configuration
Logger configuration shall be refreshed by CLI which send a SIGHUP signal to the daemon

- How to verify it
Manual test
New unit test cases
…C2/v32.42.1000, BFSoC to 4.9.0 (#20565)

- Why I did it
To include latest fixes and new functionality

- How I did it
SDK_VERSION 24.7-RC4 -> 24.10-RC2
FW_VERSION 32.41.1000 -> 32.42.1000
SAI_VERSION SAIBuild0.0.32.0 -> SAIBuild0.0.36.0
BFSOC_VERSION: 4.7.0 -> 4.9.0

- How to verify it
Build an image and run tests from "sonic-mgmt".
…20580)

* sonic-buildimage: rename qsp 128x400g to o128s2

In keeping with normative convention, renaming the hwsku
folders for qsp/qspr from 128x400G to O128S2.

* sonic-buildimage: fix qsp-o128s2 port_config typo

There is a typo in the lanes used for
Ethernet356 within port_config.ini, where lanes
381 and 382 appear twice instead of being
followed by the intended 383 and 384.
This change fixes that typo.

This exact typo is not present in the other
hwskus under x86_64-arista_7060x6_64pe or
x86_64-arista_7060x6_64de.
#### Why I did it


Adding yang model for CONFIG_DB table XCVRD_LOG|Y_CABLE.
Introduced by 
https://github.com/sonic-net/sonic-utilities/blob/master/config/muxcable.py#L1230-L1235

#### How I did it
Added the changes in sonic-yang-models
#### How to verify it
UT test 

```
==================================================================================== test session starts ====================================================================================
platform linux -- Python 3.9.2, pytest-6.0.2, py-1.10.0, pluggy-0.13.0
rootdir: /sonic/src/sonic-yang-models
plugins: pyfakefs-5.2.3, cov-2.10.1
collected 3 items                                                                                                                                                                           

tests/test_sonic_yang_models.py ..                                                                                                                                                    [ 66%]
tests/yang_model_tests/test_yang_model.py .                                                                                                                                           [100%]

===================================================================================== 3 passed in 2.06s =====================================================================================
```
* MAB common header files for genereic files

* Addressed review comments
…#20074)

- Why I did it
Extend Nvidia Bluefield SONiC infrastructure to support DPU NIC FW auto upgrade.

- How I did it
Extend the build system and init scripts to support the FW upgrade.

- How to verify it
Compile an image with the new FW version. Run image installation. Verify that the running FW is upgraded after the image installation.
…ge installation (#19910)

- Why I did it
The DPU reset after the image installation is required to boot the DPU with the new NIC FW

- How I did it
Trigger DPU reset with the dpuctl utility after the image installation

- How to verify it
Build and install the image
Why I did it
This PR is to add a patch to fix potential fd leak issue in AsyncSniffer in scapy python library.
There are two fd leak scenarios.

When starting worker thread _run, if an interface is down, an OSError is thrown, and the sockets that have been created will be leaked as it never got a chance to be closed.
When stopping the worker thread, same error can happen when calling close. The sockets not closed will be leaked.

How I did it
Catch OSError when creating sockets, and catch any exception when closing socket to ensure all sockets are closed.

How to verify it
Verified by the testing code above. No fd leak happened.
- Why I did it
Setting the KV attribute for WECMP normalization

- How I did it
Update common sai.profile

- How to verify it
Running basic WECMP tests.
…D automatically (#20660)

#### Why I did it
src/sonic-platform-daemons
```
* fc557a1 - (HEAD -> master, origin/master, origin/HEAD) [SmartSwitch] Add implementation for the DPU chassis daemon. (#554) (12 hours ago) [Oleksandr Ivantsiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… automatically (#20630)

#### Why I did it
src/sonic-platform-common
```
* 4668bdc - (HEAD -> master, origin/master, origin/HEAD) Enhanced NVMe disk support, added limited eUSB disk support (#493) (3 days ago) [Ashwin Srinivasan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (#20540)

#### Why I did it
src/sonic-sairedis
```
* e394ced7 - (HEAD -> master, origin/master, origin/HEAD) Fix compilation on Buster (#1449) (11 hours ago) [Saikrishna Arcot]
* 4d504ff8 - Rename file name to fit case insensitive file system. (#1444) (2 days ago) [Liu Shilong]
* fe650bb7 - [syncd] Add workaround for port error status notification (#1430) (6 days ago) [Kamil Cudnik]
* cd2773a3 - [syncd] Fix inspect asic command (#1434) (7 days ago) [Kamil Cudnik]
* 2d873766 - [syncd] Make sure notification queue release memory when drained (#1427) (8 days ago) [Kamil Cudnik]
* b8a8856a - Fix adding flex counter to wrong context (#1421) (8 days ago) [byu343]
* 40979e0b - [fastboot] Notify SAI that fastboot is done (#1396) (8 days ago) [Junchao-Mellanox]
* 952ee406 - [codeql] Change pull_request_target to pull_request (#1442) (9 days ago) [Kamil Cudnik]
* 697d86b5 - [syncd] Create neighbor entries before next hop (#1432) (9 days ago) [Kamil Cudnik]
* fa76ca13 - [codeql] Remove git ancestry (#1441) (10 days ago) [Kamil Cudnik]
* 3838d7ee - [codeql] Show git ancestry graph (#1440) (10 days ago) [Kamil Cudnik]
* 2e7d946b - [codeql] Show gcc version before compile (#1438) (10 days ago) [Kamil Cudnik]
* a1e93f58 - [submodule] Update SAI to latest master (#1431) (2 weeks ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog
- Why I did it
Implement the interface required to run DPU chassisd on the Nvidia Smart Switch.
Implement get_dpu_id API that deducts the DPU ID based on the midplane interface IP address.

- How I did it
Implement platform API

- How to verify it
The implementation is covered by the UT.
… switch by Cli (#20642)

Why I did it
In smart switch, there is an issue that Cli query dhcp lease got unknow interface due to dpu fdb hasn't present in STATE_DB FDB_TABLE. Issue: #20155

How I did it
Query bridge fdb if there is no fdb record in STATE_DB

How to verify it
UT passed
…lly (#20610)

#### Why I did it
src/sonic-swss
```
* 93f7c150 - (HEAD -> master, origin/master, origin/HEAD) Fix State Db LAG_MEMBER_TABLE removal not happening. (#3347) (10 hours ago) [abdosi]
* d76c34e4 - fix error in rif_rates.lua (#3218) (31 hours ago) [InspurSDN]
* a3aaa398 - Add suppport for SAI DASH appliance object (#3284) (32 hours ago) [Mukesh Moopath Velayudhan]
* 064f2e3d - Fix the tlm_teamd deleting STATE_DB LAG_TABLE entry. (6 days ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (#20671)

#### Why I did it
src/sonic-swss
```
* 9dd28489 - (HEAD -> master, origin/master, origin/HEAD) trap_rates.lua get value error (#3219) (3 hours ago) [InspurSDN]
```
#### How I did it
#### How to verify it
#### Description for the changelog
… automatically (#20670)

#### Why I did it
src/sonic-platform-common
```
* 59babf5 - (HEAD -> master, origin/master, origin/HEAD) Add/modify VDM and Status related cmis fields for onboarding xcvr diagnostic features (#510) (3 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…utomatically (#20668)

#### Why I did it
src/sonic-host-services
```
* 13a5419 - (HEAD -> master, origin/master, origin/HEAD) Correct real time CPU Utilization calculation (#173) (3 hours ago) [Feng-msft]
* f95b7cd - Optimize state_db update into batch way. (#176) (3 hours ago) [Feng-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (#20677)

#### Why I did it
src/sonic-swss
```
* 368e1d62 - (HEAD -> master, origin/master, origin/HEAD) [MultiDB]:sonic-swss replace old API with new APIs (#3292) (11 hours ago) [PanXuntao]
```
#### How I did it
#### How to verify it
#### Description for the changelog
oleksandrivantsiv and others added 30 commits December 11, 2024 10:04
- Why I did it
Remove the virtual smart switch leftovers from the Nvidia DPU initialization flow.

- How I did it
Remove unused code from the bash script.
…atically (#21128)

#### Why I did it
src/sonic-utilities
```
* d5cbe464 - (HEAD -> master, origin/master, origin/HEAD) [GCU] Add data acl table and rule check (#3668) (3 minutes ago) [jingwenxie]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…utomatically (#21106)

#### Why I did it
src/sonic-host-services
```
* b0b3ca5 - (HEAD -> master, origin/master, origin/HEAD) Support for Memory Statistics Host-Services (#167) (14 hours ago) [kanza-latif]
* d455924 - Update pipeline to Bookworm (#193) (33 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
FRR 10.0.1 upgrade (#20269) brought in a mgmtd daemon for FRR. This needs to be started up in docker-sonic-vs as part of the other daemons in this container.

Additionally, Debian Bookworm provides version 2.5.0 of scapy, but the pip3 command later in the file downgraded it to 2.4.5, which does not work in Bookworm. Fix this by removing the pip3 installation for scapy, and updating the other packages installed via pip3.

Signed-off-by: Saikrishna Arcot <[email protected]>
(this is a replacement for PR #20603; this PR includes a revert for the "Disable lpmode for chassis" change)

Add support for new psus
Add support for new fan modules
Report fan information from all fabric cards
Fix psud warning due to invalid pmbus threshold reported
Why I did it
Add platform vpp to sonic-buildimage to enable building sonic-vpp as other platforms.

How I did it
Add vpp submodule from https://github.com/sonic-net/sonic-platform-vpp

How to verify it
git clone --recurse-submodules https://github.com/sonic-net/sonic-buildimage.git
make init
make configure PLATFORM=vpp
make SONIC_BUILD_JOBS=4 target/sonic-vpp.img.gz

Signed-off-by: Yue Gao <[email protected]>
- Why I did it
Fix #20925

Not cherry-pickable to 202405, will raise a separate PR

- How I did it
Mount /etc/localtime on containers and remove /etc/timezone dependency

- How to verify it
root@sonic:/home/admin# zdump /etc/localtime
/etc/localtime  Fri Dec  6 23:24:03 2024 IST

root@sonic:/home/admin# docker exec swss zdump /etc/localtime
/etc/localtime  Fri Dec  6 23:24:12 2024 IST
Verify swss.rec/sairedis.rec and syslog are following same time

Signed-off-by: Vivek <[email protected]>
…e via Netlink to fpmsyncd (#20692)

* Added patch in FRR to send tag value associated with route
via NETLINK RTA_PRIORITY field which can be used as attribute/metadata
in fpmsyncd for different use-cases.

---------

Signed-off-by: Abhishek Dosi <[email protected]>
…21035)

Why I did it
This commit is to avoid Broadcom driver dependency on the MMU config

Work item tracking
Microsoft ADO (number only):
How I did it
We upload the basic version of the Broadcom configure with limited configuration.

How to verify it
We verified the change internally in Arista
…20992)

- Why I did it
Update buffer calculations for Mellanox-SN5600-C224O8 HwSKU.

- How I did it
Update the lossless pull size

- How to verify it
Run sonic-mgmt QoS test
- Why I did it
Update buffer calculations for Mellanox-SN5600-C256S1 HwSKU.

- How I did it
Update lossless pool

- How to verify it
Run sonic-mgmt QoS test
…D automatically (#21160)

#### Why I did it
src/sonic-platform-daemons
```
* b150147 - (HEAD -> master, origin/master, origin/HEAD) Advanced Azure pipeline to Bookworm (#572) (9 hours ago) [Ashwin Srinivasan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…ly (#21159)

#### Why I did it
src/dhcprelay
```
* 2a2fb68 - (HEAD -> master, origin/master, origin/HEAD) Clear counter when dhcp6relay init (#51) (2 hours ago) [Yaqiang Zhu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…atically (#21161)

#### Why I did it
src/sonic-utilities
```
* 200ef363 - (HEAD -> master, origin/master, origin/HEAD) Speed up route_check script (#3678) (32 hours ago) [Deepak Singhal]
* 7dc40ac3 - Fixed the issues with sonic-clear queuecounter for egress queue and voq (#3671) (2 days ago) [saksarav-nokia]
* 72ee4fc1 - [config db] Trim garbage charactor in "DEVICE_METADATA" of config db (#3345) (2 days ago) [wenyiz2021]
* b2ba0825 - [show][interfaces] Add proposal for show interface errors {port} (#3623) (3 days ago) [vdahiya12]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Due to excess whitespace, the table of contents does not render as links in github, leading to navigation issues.

This might cause some minor "pain" for other merge requests due to the formatting differences in the ToC, but it should really only take a second to correct. Better to fix it once and make everyone's life easier than just living with it.

Work item tracking
How I did it
Updated whitespace and a couple other very minor correctness changes for MarkDown. There are still other changes I'd like to do but much lower priority and possibly not worth the disturbance.

How to verify it
View the corrected table of contents here:
https://github.com/bradh352/sonic-buildimage/blob/configuration.md/src/sonic-yang-models/doc/Configuration.md

Compared to original unrendered ToC here:
https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md

Which release branch to backport (provide reason below if selected)
Not needed

Tested branch (Please provide the tested image version)
master as of 20241206

Description for the changelog
YANG Configuration.md fix table of contents links
Why I did it
Cannot configure unified bgp for vxlan evpn without specifying advertise-all-vpn. The setting appears to have been introduced as part of PR #5142, can be seen it is already honored as an option here:

sonic-buildimage/src/sonic-frr-mgmt-framework/templates/bgpd/bgpd.conf.db.addr_family.evpn.j2

Lines 1 to 3 in 8e0f1c6

 {% if 'advertise-all-vni' in af_val and af_val['advertise-all-vni'] == 'true' %} 
   advertise-all-vni 
 {% endif %} 
Work item tracking
How I did it
Added basic yang rule

How to verify it
Configure

"BGP_GLOBALS_AF": {
        "default|l2vpn_evpn": {
            "advertise-all-vni": "true"
        }
    }
and run config replace.

Tested branch (Please provide the tested image version)
master as of 20241205

Description for the changelog
[yang] bgp address family l2vpn advertise-all-vni
Signed-off-by: Anand Mehra [email protected]

Release for Cisco 8800 Chassis, 8101

Chassis 8800
Test case fix test_thermal_global_state_db for FC in slot 7
XR to SONiC migration process broken in two steps
Apply SONiC config as part of XR to SONiC migration via minigraph.xml
Fixed BFD staying up on portchannel member down issue

8101
Fix for MIGSMSFT-771 - Link flap with disabled bad link detection
iccpd: fix a bug related to stack overflow in iccpd(update_peerlink_isolate_from_all_csm_lif) when the number of mclag members > 32 or more

change array length for 'mlag_po_buf' to 2048.

Add length check before using 'snprintf'. Note:The return value of the snprintf function is the length of the source string.

Signed-off-by: ccyyrr92 [email protected]
…rm (#21149)

The pipeline build links are pointing to a wrong folder for marvell-teralynx platform after renaming PR (#19829)
…#21146)

Why I did it
Reduce high CPU usage on zebra after performing port toggle on all interfaces simultaneously

How I did it
Apply zebra fpm backpressure patches from FRR mainline to dplane_fpm_sonic:

zebra: Use built in data structure counter (zebra: Use built in data structure counter FRRouting/frr#16221)
Zebra fpm backpressure (Zebra fpm backpressure FRRouting/frr#16220)

Signed-off-by: cscarpitta <[email protected]>
Why I did it
Setting the nexthop-group keep parameter to 1. This will instruct zebra not to save nexthop group for more than 1 second after removal. Without this zebra will keep nexthop group in the system for 180 seconds.
In scaled scenarios when this parameter is not set it resulted in the queue growing so big and crashing zebra due to OOM when there is test on link flapping.

How I did it
Update the zebra template and initialize nexthop-group keep as 1.

How to verify it
Running the scale test with link flapping and ensure no memory increase in zebra.
In our testing, we found that unplugging SFP1 alone resulted in a failure at port 66, while unplugging SFP2 alone resulted in a failure at port 65, which did not match our expectations.

Signed-off-by: philo <[email protected]>
Why I did it
The reboot cause is not properly determined after each reboot. So the reboot history is also not maintained.

How I did it
The failure in determining the reboot cause is due to pcisysfs.py script failure in reading the registers.
The pcisysfs.py script used was using the old python2 format which were failing.
Modified the install scripts to use the latest pcisysfs.py for register read and write.
Supervisord emits warnings due to the use of `stdout_logfile=syslog`
and `stderr_logfile=syslog`.  Replace with the modern configuration
options of `stdout_syslog=true` and `stderr_syslog=true` and set
the log file itself to `NONE` so it doesn't generate a file-based
log.

Warnings corrected look like:
```
2024 Dec  1 15:31:06.467218 sw2 INFO pmon#supervisord 2024-12-01 15:31:04,033 WARN For [program:xcvrd], stderr_logfile=syslog but this is deprecated and will be removed.  Use stderr_syslog=true to enable syslog instead.
```

Signed-off-by: Brad House (@bradh352)
…n with bookworm libasan (#21134)

syncd is linking to libasan v8 during build after the bookwork upgrade #18651 but libasan v6 is installed in the syncd container for the mellanox platform which is causing runtime errors.

Signed-off-by: Andriy Yurkiv <[email protected]>
Fixes: #20730

Why I did it
The generated t1 config fails YANG validation, which leads to config setup failure since we enforce YANG validation in config reload.

How I did it
Update config to align with YANG

How to verify it
Run YANG validate on generated config.
…D automatically (#21178)

#### Why I did it
src/sonic-platform-daemons
```
* 3fe8841 - (HEAD -> master, origin/master, origin/HEAD) Added SmartSwitch support in chassisd and enabling chassisd (#467) (9 hours ago) [rameshraghupathy]
* 88d0dd7 - Take non-CMIS xcvrs out of lpmode in SFF Manager (#565) (3 days ago) [Peter Bailey]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tically (#21193)

#### Why I did it
src/sonic-sairedis
```
* 9fe90f6b - (HEAD -> master, origin/master, origin/HEAD) syncd init: rename marvell to marvell-prestera (#1465) (7 hours ago) [krismarvell]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.