Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the tlm_teamd deleting STATE_DB LAG_TABLE entry. #3333

Merged
merged 3 commits into from
Oct 24, 2024

Conversation

abdosi
Copy link
Contributor

@abdosi abdosi commented Oct 18, 2024

What I did:
Fixes:
sonic-net/sonic-buildimage#20059

Why I did:
On T2 testbed with multiple backend port channel we have seen sometime Portchannel gets created fine and with entry in APP_DB and STATE_DB gets populated. tlm_teamd is able to get the teamdctl handle to get state dump view of teamd. However while getting dump if might have passed in 1st iteration but it might fail in 2nd iteration (transient issue in getting data using teamdctl) which result in deletion of State db entry which is not correct. Instead we should just clean up local cache and wait for retry done as part of Select Timeout cycle where we try to get dump again.

2024 Aug 27 20:33:56.985209 sfd-t2-sup INFO teamd0#supervisord: teammgrd Using team device "PortChannel02".
2024 Aug 27 20:33:56.985209 sfd-t2-sup INFO teamd0#supervisord: teammgrd Using PID file "/var/run/teamd/PortChannel02.pid"
2024 Aug 27 20:33:56.985506 sfd-t2-sup INFO teamd0#supervisord: teammgrd This program is not intended to be run as root.
2024 Aug 27 20:33:58.743164 sfd-t2-sup NOTICE teamd0#teammgrd: :- addLag: Start port channel PortChannel02 with teamd
2024 Aug 27 20:33:58.987897 sfd-t2-sup INFO teamd0#supervisord 2024-08-27 20:33:58,987 INFO success: teamsyncd entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024 Aug 27 20:33:59.079507 sfd-t2-sup NOTICE teamd0#tlm_teamd: :- try_add_lag: The LAG 'PortChannel01' has been added.
2024 Aug 27 20:33:59.084771 sfd-t2-sup NOTICE teamd0#tlm_teamd: :- try_add_lag: The LAG 'PortChannel02' has been added.
2024 Aug 27 20:33:59.198619 sfd-t2-sup NOTICE teamd0#teammgrd: :- setLagAdminStatus: Set port channel PortChannel02 admin status to up
2024 Aug 27 20:33:59.363037 sfd-t2-sup NOTICE teamd0#teammgrd: :- setLagMtu: Set port channel PortChannel02 MTU to 9100
2024 Aug 27 20:33:59.363037 sfd-t2-sup NOTICE teamd0#teammgrd: :- setLagTpid: Set port channel PortChannel02 TPID to 0x8100
2024 Aug 27 20:33:59.363049 sfd-t2-sup NOTICE teamd0#teammgrd: :- doLagTask: Configure PortChannel02 TPID to 0x8100
2024 Aug 27 20:33:59.375011 sfd-t2-sup INFO teamd0#supervisord: teammgrd Using team device "PortChannel03".
2024 Aug 27 20:33:59.377051 sfd-t2-sup INFO teamd0#supervisord: teammgrd 
2024 Aug 27 20:33:59.377051 sfd-t2-sup INFO teamd0#supervisord: teammgrd Using PID file "/var/run/teamd/PortChannel03.pid"
2024 Aug 27 20:33:59.377051 sfd-t2-sup INFO teamd0#supervisord: teammgrd 
2024 Aug 27 20:33:59.377134 sfd-t2-sup INFO teamd0#supervisord: teammgrd This program is not intended to be run as root.
2024 Aug 27 20:33:59.378066 sfd-t2-sup INFO teamd0#supervisord: teammgrd 
2024 Aug 27 20:34:00.480451 sfd-t2-sup INFO teamd0#supervisord 2024-08-27 20:34:00,479 INFO exited: dependent-startup (exit status 0; expected)
2024 Aug 27 20:34:08.515997 sfd-t2-sup NOTICE teamd0#tlm_teamd: :- remove_lag: The LAG 'PortChannel02' has been removed.
2024 Aug 27 20:34:08.516023 sfd-t2-sup NOTICE teamd0#tlm_teamd: :- remove_lag: The LAG 'PortChannel02' had errored while getting dump, removing it

How i verify:
Ran 20+ iteration of config reload and did not see the issue. Without fix issue will come within 1 or 2 iteration.

@abdosi abdosi requested a review from judyjoseph as a code owner October 18, 2024 00:54
@abdosi abdosi changed the title Update values_store.cpp Fix the tlm_teamd deleting STATE_DB LAB_TABLE entry. Oct 18, 2024
@abdosi
Copy link
Contributor Author

abdosi commented Oct 18, 2024

@anamehra for viz.

@abdosi abdosi changed the title Fix the tlm_teamd deleting STATE_DB LAB_TABLE entry. Fix the tlm_teamd deleting STATE_DB LAG_TABLE entry. Oct 18, 2024
@anamehra
Copy link

LGTM, ran config reloads overnight (~130 config reloads) on Sup and did not hit the issue.

@abdosi
Copy link
Contributor Author

abdosi commented Oct 18, 2024

@judyjoseph : can you help review this,

@yejianquan
Copy link

/azp run

Copy link

Commenter does not have sufficient privileges for PR 3333 in repo sonic-net/sonic-swss

@yejianquan
Copy link

@judyjoseph @abdosi can we merge this so that we can include this change in the newest nightly test

Copy link
Contributor

@judyjoseph judyjoseph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3340

prsunny pushed a commit that referenced this pull request Oct 30, 2024
What I did:
Original issue and and PR: #3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
mssonicbld pushed a commit to mssonicbld/sonic-swss that referenced this pull request Oct 30, 2024
What I did:
Original issue and and PR: sonic-net#3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
mssonicbld pushed a commit that referenced this pull request Oct 31, 2024
What I did:
Original issue and and PR: #3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
stepanblyschak pushed a commit to stepanblyschak/sonic-swss that referenced this pull request Nov 13, 2024
What I did:
Original issue and and PR: sonic-net#3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
divyachandralekha pushed a commit to divyachandralekha/sonic-swss that referenced this pull request Dec 12, 2024
What I did:
Original issue and and PR: sonic-net#3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
divyachandralekha pushed a commit to divyachandralekha/sonic-swss that referenced this pull request Dec 12, 2024
What I did:
Original issue and and PR: sonic-net#3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
shiraez pushed a commit to Marvell-switching/sonic-swss that referenced this pull request Feb 17, 2025
What I did:
Original issue and and PR: sonic-net#3333

Why I did:
Fix the failure in test_po_update.py as seen in this PR checker: sonic-net/sonic-buildimage#20610
Previous fix blocked State Db LAG_MEMBER_TABLE deletion which is not correct as this table is created by tlm_teamd (owner).
Intention was to prevent deletion of State Db LAG_TABLE owner of which is teamsyncd.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants