Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RDMA] correct egress buffer size for Arista-7050CX3-32S-D48C8 DualToR #21320

Conversation

XuChen-MSFT
Copy link
Contributor

Why I did it

Symptom:

[MSFT ADO 28240256 [SONiC_Nightly][Failed_Case][qos.test_qos_sai.TestQosSai][testQosSaiHeadroomPoolSize][20231110][broadcom][Arista-7050CX3-32S-D48C8]

For Arista-7050CX3-32S-D48C8 (BCM56870_A0 / TD3), in headroom pool size test,
inject lossless traffic into multiple ingress ports, exhausted share buffer first, then before exhaust all the headroom pool observed egress drop.
Expected appearance is ingress drop, so test failed.

RCA:

(Check BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" in detail)

Pool: egress_lossless_pool
----  --------
mode  static
size  32340992
type  egress
----  --------
... ...
Pool: ingress_lossless_pool
----  --------
mode  dynamic
size  32689152
type  ingress
xoff  2058240
----  --------

As above output of command "mmuconfig --list", for Arista-7050CX3-32S-D48C8's buffer configuration, egress buffer is less than ingress buffer.
So, before exhausting all headroom pool, reach egress buffer's limit first. and then trigger egress drop.

MMU register dump analysis

Total Ingress buffer limit for Pool 0:
Shared: THDI_BUFFER_CELL_LIMIT_SP=0x1CDC4
Headroom: THDI_HDRM_BUFFER_CELL_LIMIT_HP: 0x1F68
Min reserved per PG: 0x12 cells per PG. Check THDI_PORT_PG_CONFIG_PIPE0, THDI_PORT_PG_CONFIG_PIPE1. There are total 80 PG with Min limit configured to 0x12. This takes up a total of 80*0x12 = 0x5A0 cells.
Total ingress for Pool0 : 0x1CDC4 + 0x1F68 + 0x5A0 = **0x1F2CC (127692 cells). **

Total Egress buffer limits for Pool 0:

Shared: MMU_THDM_DB_POOL_SHARED_LIMIT = 0x1ed7c
Reserved: Q_MIN for lossless Queue 3,4 : 0

In your scenario, your total usage stats would be:
Ingress: Total number of Active PGs * PG_MIN + Shared_count + Headroom count = 0x1ED7E
Egress: Total egress usage count: 0x1ed7d

Look at the above allocation, can clearly see that, if number of ingress ports is LESS, then Ingress Cell usage will decrease because Min guarantee per PG will decrease, so Total Ingress will be less than Total Egress in that case.
If number of ingress ports increase, the Ingress Usage increases, which makes Total Ingress greater than Total Egress, and this results in Egress Queue Drops.

Work item tracking
  • Microsoft ADO **28240256 **:

How I did it

In BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" , brcm update mmuconfig .

Platform                    Type       Config      Uplinks    Downlinks   Standby     All Ports Up                                                All Ports Down                                              Notes

Arista-7050CX3-32S-D48C8    (none)     DualTOR     8          24          24          m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246                 m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=119694                 noe ### When there is a linkdown event on an in-use uplink port:      
                                                                                      m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726      m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=127734      noe ###     THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 93                  
                                                                                      m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15831       m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15957       noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 93       
                                                                                      m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=92288       m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=95255       noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 11       
                                                                                      m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11527       m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11897       noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 74       
                                                                                      m MMU_THDR_DB_CONFIG1_PRIQ SPID=1                           m MMU_THDR_DB_CONFIG1_PRIQ SPID=1                           noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 9        
                                                                                      for x=0,639,10 '\                                           for x=0,639,10 '\                                           noe ### When there is a linkdown event on an in-use downlink port:    
                                                                                          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\        mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\    noe ###     THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 71                  
                                                                                          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1'          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1'      noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 71       
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 8        
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 56       
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 7        

And applied egress buffer pool size relevant part to image repo, as below:

m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246              
m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726 

How to verify it

  • push change to private branch "xuchen3/20231110.24/CS00012358392/Arista-7050CX3-32S-D48C8.dualtor" to build private image
$ git log -
* c363f5b1c8 (2024-10-30 23:12) - bugfix: CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor, static_th <Xu Chen>
* 9c284f015c (2024-10-29 09:15) - bugfix : CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor <Xu Chen>
* 7f855c8ae8 (2024-10-28 23:52) - CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor <Xu Chen>

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@r12f r12f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks very straightforward and with very thorough analysis. LGTM.

Copy link
Contributor

@StormLiangMS StormLiangMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for take efforts to get this issue fixed.

@StormLiangMS StormLiangMS merged commit fe7ea66 into sonic-net:master Jan 8, 2025
22 checks passed
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #21347

VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
…c-net#21320)

Why I did it
Symptom:
[MSFT ADO 28240256 [SONiC_Nightly][Failed_Case][qos.test_qos_sai.TestQosSai][testQosSaiHeadroomPoolSize][20231110][broadcom][Arista-7050CX3-32S-D48C8]

For Arista-7050CX3-32S-D48C8 (BCM56870_A0 / TD3), in headroom pool size test,
inject lossless traffic into multiple ingress ports, exhausted share buffer first, then before exhaust all the headroom pool observed egress drop.
Expected appearance is ingress drop, so test failed.

RCA:
(Check BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" in detail)

Pool: egress_lossless_pool
----  --------
mode  static
size  32340992
type  egress
----  --------
... ...
Pool: ingress_lossless_pool
----  --------
mode  dynamic
size  32689152
type  ingress
xoff  2058240
----  --------
As above output of command "mmuconfig --list", for Arista-7050CX3-32S-D48C8's buffer configuration, egress buffer is less than ingress buffer.
So, before exhausting all headroom pool, reach egress buffer's limit first. and then trigger egress drop.

MMU register dump analysis
Total Ingress buffer limit for Pool 0:
Shared: THDI_BUFFER_CELL_LIMIT_SP=0x1CDC4
Headroom: THDI_HDRM_BUFFER_CELL_LIMIT_HP: 0x1F68
Min reserved per PG: 0x12 cells per PG. Check THDI_PORT_PG_CONFIG_PIPE0, THDI_PORT_PG_CONFIG_PIPE1. There are total 80 PG with Min limit configured to 0x12. This takes up a total of 80*0x12 = 0x5A0 cells.
Total ingress for Pool0 : 0x1CDC4 + 0x1F68 + 0x5A0 = **0x1F2CC (127692 cells). **

Total Egress buffer limits for Pool 0:

Shared: MMU_THDM_DB_POOL_SHARED_LIMIT = 0x1ed7c
Reserved: Q_MIN for lossless Queue 3,4 : 0

In your scenario, your total usage stats would be:
Ingress: Total number of Active PGs * PG_MIN + Shared_count + Headroom count = 0x1ED7E
Egress: Total egress usage count: 0x1ed7d

Look at the above allocation, can clearly see that, if number of ingress ports is LESS, then Ingress Cell usage will decrease because Min guarantee per PG will decrease, so Total Ingress will be less than Total Egress in that case.
If number of ingress ports increase, the Ingress Usage increases, which makes Total Ingress greater than Total Egress, and this results in Egress Queue Drops.

Work item tracking
Microsoft ADO **28240256 **:
How I did it
In BRCM CSP CS00012358392 "Egress lossless pool size update for Arista-7050CX3-32S-D48C8 DualToR" , brcm update mmuconfig .

Platform                    Type       Config      Uplinks    Downlinks   Standby     All Ports Up                                                All Ports Down                                              Notes

Arista-7050CX3-32S-D48C8    (none)     DualTOR     8          24          24          m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246                 m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=119694                 noe ### When there is a linkdown event on an in-use uplink port:      
                                                                                      m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726      m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=127734      noe ###     THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 93                  
                                                                                      m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15831       m MMU_THDM_DB_POOL_RESUME_LIMIT(0) RESUME_LIMIT=15957       noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 93       
                                                                                      m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=92288       m MMU_THDM_DB_POOL_SHARED_LIMIT(1) SHARED_LIMIT=95255       noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 11       
                                                                                      m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11527       m MMU_THDM_DB_POOL_RESUME_LIMIT(1) RESUME_LIMIT=11897       noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 74       
                                                                                      m MMU_THDR_DB_CONFIG1_PRIQ SPID=1                           m MMU_THDR_DB_CONFIG1_PRIQ SPID=1                           noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 9        
                                                                                      for x=0,639,10 '\                                           for x=0,639,10 '\                                           noe ### When there is a linkdown event on an in-use downlink port:    
                                                                                          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\        mod MMU_THDM_DB_QUEUE_CONFIG_PIPE0 $x 10 Q_SPID=1 ;\    noe ###     THDI_BUFFER_CELL_LIMIT_SP(0).LIMIT += 71                  
                                                                                          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1'          mod MMU_THDM_DB_QUEUE_CONFIG_PIPE1 $x 10 Q_SPID=1'      noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(0).SHARED_LIMIT += 71       
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 8        
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_SHARED_LIMIT(1).SHARED_LIMIT += 56       
                                                                                                                                                                                                              noe ###     MMU_THDM_DB_POOL_RESUME_LIMIT(0).RESUME_LIMIT += 7        
And applied egress buffer pool size relevant part to image repo, as below:

m THDI_BUFFER_CELL_LIMIT_SP(0) LIMIT=117246              
m MMU_THDM_DB_POOL_SHARED_LIMIT(0) SHARED_LIMIT=126726 
How to verify it
push change to private branch "xuchen3/20231110.24/CS00012358392/Arista-7050CX3-32S-D48C8.dualtor" to build private image
$ git log -
* c363f5b1c8 (2024-10-30 23:12) - bugfix: CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor, static_th <Xu Chen>
* 9c284f015c (2024-10-29 09:15) - bugfix : CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor <Xu Chen>
* 7f855c8ae8 (2024-10-28 23:52) - CS00012358392 change ingerss/egress buffer size for Arista-7050CX3-32S-D48C8 dualtor <Xu Chen>
aand then run qos sai test, pass all qos sai test, include headroom pool size test.
https://elastictest.org/scheduler/testplan/673e052ad3c216e9a194b719?testcase=qos%2ftest_qos_sai.py&type=console
image

and run full nightly test , not observed regression issue.
https://dev.azure.com/mssonic/internal/_build/results?buildId=718645&view=results

PS. also run additional test to verify above changes just work for Arista-7050CX3-32S-D48C8 dualtor, not impact other platforms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants