-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
orchagent crashed when adding the members, addresses from an existing portchannel to a newly created portchannel #17665
Comments
Do you've any delays between the above commands or copy-paste the whole set? Would you try with a sleep of 3-4 sec between step 4 and step 5 and share the result? |
In which version did you encounter this problem? Please add a screenshot or text of "show version" |
show ver SONiC Software Version: SONiC.20220532.54 |
Fix will be in sonic-mgmt test as well as in swss to add some protection. |
@prsunny , we had the orchagent crash with same signature when we remove the ports from Portchannel and remove the Ip and ipv6 address from Portchannel. What i noticed in the log/code is that the addNeighbor adds the remote system neighbor against the remote system port and increment the RIF reference counter for remote system port. However when it adds the nextHop in addNextHop , it adds it against Inband port with RIF-ID of remote system port, but increases the RIF reference count of Inband port instead of remote system port. When the neighbor is removed in removeNeighbor, it decreases the ref count of remote system port for RIF. But when it removes the nexthop in removeNextHop, it decreases the ref count for remote system port. I think the SWSS PR sonic-net/sonic-swss#1686 made this change in addNextHop as part of Mpls support. Since RIF-If od remote system port is used for nexthop, we should be increasing the ref count for remote system port in addNextHop right?. Please let me know your thoughts on this. |
I see. Would you provide a fix PR? |
Yes. We are testing the fix for both Port channel scenarios and once we confirm that the fix is working, i will create a PR. |
Created PR sonic-net/sonic-swss#3042 |
This is the same as #17204 |
Description
orchagent crashed in testcase test_po_update_io_no_loss[lc]::teardown. This issue is also reproducible manually
The orchagent crash is caused by:
old Portchannel to the newly created portchannel .
Please see syslog attached.
syslog.txt
Jan 3 21:40:15.387019 ixre-egl-board3 NOTICE swss1#orchagent: :- removeRouterIntfs: Remove router interface for port PortChannel106
Jan 3 21:40:15.492690 ixre-egl-board3 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.1@Ethernet-IB1,10.0.0.5@Ethernet-IB1,10.0.0.7@PortChannel999,10.0.0.11@Ethernet184
Jan 3 21:40:15.521978 ixre-egl-board3 NOTICE swss1#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.7@PortChannel999,10.0.0.11@Ethernet184
Jan 3 21:40:15.711079 ixre-egl-board3 NOTICE swss0#orchagent: :- removeNextHopGroup: Delete next hop group fc00::2@PortChannel102,fc00::a@Ethernet64,fc00::e@Ethernet-IB0,fc00::16@Ethernet-IB0
Jan 3 21:40:15.717072 ixre-egl-board3 NOTICE swss0#orchagent: :- removeNextHopGroup: Delete next hop group fc00::e@Ethernet-IB0,fc00::16@Ethernet-IB0
Jan 3 21:40:15.749977 ixre-egl-board3 NOTICE swss0#orchagent: :- addLag: Create an empty LAG ixre-egl-board3|asic1|PortChannel999 lid:2000000000a95
Jan 3 21:40:15.754730 ixre-egl-board3 NOTICE syncd0#syncd: :- removeRif: Trying to remove nonexisting router interface counter from Id 0x6000000000847
Jan 3 21:40:15.755829 ixre-egl-board3 ERR swss0#orchagent: :- meta_generic_validation_remove: object 0x6000000000847 reference count is 4, can't remove
Jan 3 21:40:15.756968 ixre-egl-board3 ERR swss0#orchagent: :- removeRouterIntfs: Failed to remove router interface for port ixre-egl-board3|asic1|PortChannel106, rv:-17
Jan 3 21:40:15.758146 ixre-egl-board3 ERR swss0#orchagent: :- handleSaiRemoveStatus: Encountered failure in remove operation, exiting orchagent, SAI API: SAI_API_ROUTER_INTERFACE, status: SAI_STATUS_OBJECT_IN_USE
Steps to reproduce the issue:
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: