You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
As reported in #5758, there is an issue w/ reference counting for next hop groups in orchagent. It looks like during route changes we are often hitting this issue where we can't delete next hop groups because there are still objects referencing them.
Steps to reproduce the issue:
Add and remove a large volume of routes (the route_perf test is good for this)
Describe the results you received:
Oct 30 16:21:50.210939 str-dx010-acs-4 ERR swss#orchagent: :- meta_generic_validation_remove: object 0x5000000000686 reference count is 2, can't remove
Oct 30 16:21:50.210939 str-dx010-acs-4 ERR swss#orchagent: :- removeNextHopGroup: Failed to remove next hop group 5000000000686, rv:-17
Describe the results you expected:
We should be able to add/remove routes w/o issues.
What I did
Remove next-hop groups after updating the reference counter for a bulk of routes instead of removing next-hop groups in the loop of updating the reference counter.
Fixsonic-net/sonic-buildimage#5813
Why I did it
The bulk route API has two loops of updating the reference counter:
1. update the sairedis reference counter
2. update the orchagent reference counter
Before this commit, the removal of next-hop groups is triggered in the second loop of updating the orchagent reference counter when the reference counter decreases to zero. This may result in a reference counter mismatch between orchagent and sairedis since the sairedis reference counter has already included the operation of the whole bulk but the orchagent reference counter has not. Therefore, the removal of next-hop group may fail due to the mismatch in reference counter (e.g., there are some other routes point to the next-hop group but has not been counted in orchagent yet).
To fix this problem, the next-hop group removal operation should be done after updating the reference counter of the whole bulk to make sure the reference counters sairedis and orchagent matches.
daall
pushed a commit
to daall/sonic-swss
that referenced
this issue
Dec 7, 2020
…ic-net#1501)
What I did
Remove next-hop groups after updating the reference counter for a bulk of routes instead of removing next-hop groups in the loop of updating the reference counter.
Fixsonic-net/sonic-buildimage#5813
Why I did it
The bulk route API has two loops of updating the reference counter:
1. update the sairedis reference counter
2. update the orchagent reference counter
Before this commit, the removal of next-hop groups is triggered in the second loop of updating the orchagent reference counter when the reference counter decreases to zero. This may result in a reference counter mismatch between orchagent and sairedis since the sairedis reference counter has already included the operation of the whole bulk but the orchagent reference counter has not. Therefore, the removal of next-hop group may fail due to the mismatch in reference counter (e.g., there are some other routes point to the next-hop group but has not been counted in orchagent yet).
To fix this problem, the next-hop group removal operation should be done after updating the reference counter of the whole bulk to make sure the reference counters sairedis and orchagent matches.
Description
As reported in #5758, there is an issue w/ reference counting for next hop groups in orchagent. It looks like during route changes we are often hitting this issue where we can't delete next hop groups because there are still objects referencing them.
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
We should be able to add/remove routes w/o issues.
Output of
show version
:The text was updated successfully, but these errors were encountered: