Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warmboot] OA RESTARTCHECK failure: warmRestartCheck busy on "ROUTE_TABLE:::/0|SET|nexthop" #7919

Closed
vaibhavhd opened this issue Jun 19, 2021 · 1 comment

Comments

@vaibhavhd
Copy link
Contributor

Description

Warmboot RESTARTCHECK failure due to warmRestartCheck stuck on ROUTE_TABLE:::/0|SET|nexthop

Steps to reproduce the issue:

  1. Install latest master
  2. Run test_warm_reboot
  3. Warm reboot will fail with RESTARTCHECK failure.

Describe the results you received:

Warmreboot failed with RESTARTCHECK failure.

From the syslog, it appears that SAI_STATUS_FAILURE is seen while executing SAI_API_NEXT_HOP_GROUP

Jun 19 04:26:47.865460 str2-7260cx3-acs-9 ERR syncd#syncd: [none] SAI_API_NEXT_HOP_GROUP:brcm_sai_create_next_hop_group_member:548 Invalid next hop member weight passed: 32765
Jun 19 04:26:47.865460 str2-7260cx3-acs-9 ERR syncd#syncd: [none] SAI_API_NEXT_HOP_GROUP:brcm_sai_create_next_hop_group_member:548 Invalid next hop member weight passed: 32765
Jun 19 04:26:47.865460 str2-7260cx3-acs-9 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun 19 04:26:47.865927 str2-7260cx3-acs-9 ERR swss#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun 19 04:26:47.866034 str2-7260cx3-acs-9 ERR swss#orchagent: :- addNextHopGroup: Failed to create next hop group 5000000001766 member 0: 0
Jun 19 04:26:47.867027 str2-7260cx3-acs-9 NOTICE swss#orchagent: :- addNextHopGroup: Create next hop group 10.0.0.35@PortChannel0002,10.0.0.37@PortChannel0003


Jun 19 04:26:47.889436 str2-7260cx3-acs-9 ERR syncd#syncd: [none] SAI_API_NEXT_HOP_GROUP:brcm_sai_create_next_hop_group_member:548 Invalid next hop member weight passed: 32765
Jun 19 04:26:47.889648 str2-7260cx3-acs-9 ERR syncd#syncd: [none] SAI_API_NEXT_HOP_GROUP:brcm_sai_create_next_hop_group_member:548 Invalid next hop member weight passed: 32765
Jun 19 04:26:47.889755 str2-7260cx3-acs-9 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_BULK_CREATE failed in syncd mode: SAI_STATUS_FAILURE
Jun 19 04:26:47.891681 str2-7260cx3-acs-9 ERR swss#orchagent: :- flush_creating_entries: ObjectBulker.flush create entries failed, number of entries to create: 2, status: SAI_STATUS_FAILURE
Jun 19 04:26:47.891830 str2-7260cx3-acs-9 ERR swss#orchagent: :- addNextHopGroup: Failed to create next hop group 5000000001769 member 0: 0

This seems to have ultimately led to orchagent RESTARTCHECK failure:

Jun 19 01:42:08.505527 str2-7260cx3-acs-9 NOTICE swss#orchagent: :- warmRestartCheck:     ROUTE_TABLE:20c1:f8::/64|SET|nexthop:fc00::22,fc00::26,fc00::2a,fc00::2e|ifname:PortChannel0001,PortChannel0002,PortChannel0003,PortChannel0004
Jun 19 01:42:08.505527 str2-7260cx3-acs-9 NOTICE swss#orchagent: :- warmRestartCheck:     ROUTE_TABLE:::/0|SET|nexthop:fc00::22,fc00::26,fc00::2a,fc00::2e|ifname:PortChannel0001,PortChannel0002,PortChannel0003,PortChannel0004
Jun 19 01:42:08.505527 str2-7260cx3-acs-9 NOTICE swss#orchagent: :- warmRestartCheck: Restart check result: NOT_READY
Jun 19 01:42:08.505527 str2-7260cx3-acs-9 NOTICE swss#orchagent_restart_check: :- main: RESTARTCHECK failed, orchagent is not ready for warm restart with status NOT_READY

Describe the results you expected:

Warm reboot should proceed and should not produce any issues.

Output of show version:

master.20065-b25962487

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

lguohan pushed a commit to sonic-net/sonic-swss that referenced this issue Jun 23, 2021
…" (#1798)

This reverts commit a44e651.

Proper weight initialization is needed when weights are not provided:
https://github.com/Azure/sonic-swss/blob/a44e6513556cfed7de1b70690db90cc09fcef666/orchagent/routeorch.cpp#L638

Warmboot is failing on various platforms: sonic-net/sonic-buildimage#7919

The original PR needs some negative unit testcases when the weight is not present for the nexthop group.
@vaibhavhd
Copy link
Contributor Author

This issue is not seen on latest master.

raphaelt-nvidia pushed a commit to raphaelt-nvidia/sonic-swss that referenced this issue Oct 5, 2021
…-net#1752)" (sonic-net#1798)

This reverts commit a44e651.

Proper weight initialization is needed when weights are not provided:
https://github.com/Azure/sonic-swss/blob/a44e6513556cfed7de1b70690db90cc09fcef666/orchagent/routeorch.cpp#L638

Warmboot is failing on various platforms: sonic-net/sonic-buildimage#7919

The original PR needs some negative unit testcases when the weight is not present for the nexthop group.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant