You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting with Master branch build 353 Orchagent crashes for BRCM DUTs.
If you look into the SYS log you will find the following:
...
Jul 26 22:32:45.980049 str-s6000-acs-8 NOTICE syncd#syncd: :- helperSaveDiscoveredObjectsToRedis: save discovered objects to redis took 2.731556 sec
Jul 26 22:32:45.984351 str-s6000-acs-8 WARNING syncd#syncd: :- helperGetSwitchAttrOid: failed to get SAI_SWITCH_ATTR_ECMP_HASH: SAI_STATUS_NOT_SUPPORTED
Jul 26 22:32:45.984351 str-s6000-acs-8 WARNING syncd#syncd: :- helperGetSwitchAttrOid: failed to get SAI_SWITCH_ATTR_LAG_HASH: SAI_STATUS_NOT_SUPPORTED
Jul 26 22:32:45.984351 str-s6000-acs-8 ERR syncd#syncd: :- getSwitchType: failed to get switch type
Jul 26 22:32:45.984351 str-s6000-acs-8 NOTICE syncd#syncd: :- SaiSwitch: constructor took 3.042757 sec
Jul 26 22:32:45.987625 str-s6000-acs-8 ERR syncd#syncd: :- run: Runtime error: :- getSwitchType: failed to get switch type
Jul 26 22:32:45.987625 str-s6000-acs-8 NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x0
Jul 26 22:32:45.988511 str-s6000-acs-8 NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull
...
Which leads to Orchagent crash.
GuoHan suspected that this crash was introduced by:
sonic-sairedis: Add support to sonic-sairedis for gearbox phys (#632)
(sonic-net/sonic-sairedis#632)
I have tried commenting out the code that was causing this issue in
sai_switch_type_t SaiSwitch::getSwitchType() where the get switch_type failure on BRCM DUT (with return status -3 NO MEMORY) was causing the SWSS_LOG_THROW("failed to get switch type") that leads to Orchagent crash by not causing the crash but to default to return "SAI_SWITCH_TYPE_NPU" instead. This helped resolved the Orchagent crash issue but it exposed the next layer of issues where it now continuously showing something as following:
Jul 25 20:48:33.344512 str-s6000-acs-8 ERR syncd#syncd: :- guard: RedisReply catches system_error: command: *8#015#012$7#015#012EVALSHA#015#012$40#015#01231fc701ca9b1b9f968f501c92b639f50f6346a9c#015#012$1#015#0121#015#012$19#015#012oid:0x1000000000008#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$7#015#0121000000#015#012$2#015#012''#15#012, reason: ERR Error running script (call to f_31fc701ca9b1b9f968f501c92b639f50f6346a9c): @user_script:21: user_script:21: attempt to perform arithmetic on local 'alpha' (a boolean value) : Input/output error
Jul 25 20:48:33.345213 str-s6000-acs-8 ERR syncd#syncd: :- runRedisScript: Caught exception while running Redis lua script: ERR Error running script (call to f_31fc701ca9b1b9f968f501c92b639f50f6346a9c): @user_script:21: user_script:21: attempt to perform arithmetic on local 'alpha' (a boolean value) : Input/output error
BUG REPORT INFORMATION
Steps to reproduce the issue:
Load any master branch build starting with image 353 on any BRCM DUT
Once it boots up you will see Orchagent crashed and if you investigate the syslog you will see the first error I reported above.
Master.350 was the last good build that did not suffer this issue. Unfortunately master.351 and master.352 both had build issues so we don't have their test results. Output of show version:
admin@str-s6000-acs-8:~$ show version
SONiC Software Version: SONiC.master.353-b3ae7858
Distribution: Debian 10.4
Kernel: 4.19.0-9-2-amd64
Build commit: b3ae785
Build date: Sun Jul 19 13:27:43 UTC 2020
Built by: johnar@jenkins-worker-7
Starting with Master branch build 353 Orchagent crashes for BRCM DUTs.
If you look into the SYS log you will find the following:
...
Jul 26 22:32:45.980049 str-s6000-acs-8 NOTICE syncd#syncd: :- helperSaveDiscoveredObjectsToRedis: save discovered objects to redis took 2.731556 sec
Jul 26 22:32:45.984351 str-s6000-acs-8 WARNING syncd#syncd: :- helperGetSwitchAttrOid: failed to get SAI_SWITCH_ATTR_ECMP_HASH: SAI_STATUS_NOT_SUPPORTED
Jul 26 22:32:45.984351 str-s6000-acs-8 WARNING syncd#syncd: :- helperGetSwitchAttrOid: failed to get SAI_SWITCH_ATTR_LAG_HASH: SAI_STATUS_NOT_SUPPORTED
Jul 26 22:32:45.984351 str-s6000-acs-8 ERR syncd#syncd: :- getSwitchType: failed to get switch type
Jul 26 22:32:45.984351 str-s6000-acs-8 NOTICE syncd#syncd: :- SaiSwitch: constructor took 3.042757 sec
Jul 26 22:32:45.987625 str-s6000-acs-8 ERR syncd#syncd: :- run: Runtime error: :- getSwitchType: failed to get switch type
Jul 26 22:32:45.987625 str-s6000-acs-8 NOTICE syncd#syncd: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x0
Jul 26 22:32:45.988511 str-s6000-acs-8 NOTICE syncd#syncd: :- sendShutdownRequestAfterException: notification send successfull
...
Which leads to Orchagent crash.
GuoHan suspected that this crash was introduced by:
sonic-sairedis: Add support to sonic-sairedis for gearbox phys (#632)
(sonic-net/sonic-sairedis#632)
I have tried commenting out the code that was causing this issue in
sai_switch_type_t SaiSwitch::getSwitchType() where the get switch_type failure on BRCM DUT (with return status -3 NO MEMORY) was causing the SWSS_LOG_THROW("failed to get switch type") that leads to Orchagent crash by not causing the crash but to default to return "SAI_SWITCH_TYPE_NPU" instead. This helped resolved the Orchagent crash issue but it exposed the next layer of issues where it now continuously showing something as following:
Jul 25 20:48:33.344512 str-s6000-acs-8 ERR syncd#syncd: :- guard: RedisReply catches system_error: command: *8#015#012$7#015#012EVALSHA#015#012$40#015#01231fc701ca9b1b9f968f501c92b639f50f6346a9c#015#012$1#015#0121#015#012$19#015#012oid:0x1000000000008#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$7#015#0121000000#015#012$2#015#012''#15#012, reason: ERR Error running script (call to f_31fc701ca9b1b9f968f501c92b639f50f6346a9c): @user_script:21: user_script:21: attempt to perform arithmetic on local 'alpha' (a boolean value) : Input/output error
Jul 25 20:48:33.345213 str-s6000-acs-8 ERR syncd#syncd: :- runRedisScript: Caught exception while running Redis lua script: ERR Error running script (call to f_31fc701ca9b1b9f968f501c92b639f50f6346a9c): @user_script:21: user_script:21: attempt to perform arithmetic on local 'alpha' (a boolean value) : Input/output error
BUG REPORT INFORMATION
Steps to reproduce the issue:
Master.350 was the last good build that did not suffer this issue. Unfortunately master.351 and master.352 both had build issues so we don't have their test results.
Output of
show version
:admin@str-s6000-acs-8:~$ show version
SONiC Software Version: SONiC.master.353-b3ae7858
Distribution: Debian 10.4
Kernel: 4.19.0-9-2-amd64
Build commit: b3ae785
Build date: Sun Jul 19 13:27:43 UTC 2020
Built by: johnar@jenkins-worker-7
Platform: x86_64-dell_s6000_s1220-r0
HwSKU: Force10-S6000
ASIC: broadcom
Serial Number: 1QBRX42
Uptime: 07:05:06 up 38 min, 1 user, load average: 0.15, 0.10, 0.35
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-teamd latest be843e740dd2 380MB
docker-teamd master.353-b3ae7858 be843e740dd2 380MB
docker-nat latest d7faaa0c4d23 382MB
docker-nat master.353-b3ae7858 d7faaa0c4d23 382MB
docker-router-advertiser latest f5d8a5da5e15 350MB
docker-router-advertiser master.353-b3ae7858 f5d8a5da5e15 350MB
docker-platform-monitor latest 844697e3e64a 422MB
docker-platform-monitor master.353-b3ae7858 844697e3e64a 422MB
docker-database latest f63204dd0f61 351MB
docker-database master.353-b3ae7858 f63204dd0f61 351MB
docker-lldp latest ebfc33f71809 377MB
docker-lldp master.353-b3ae7858 ebfc33f71809 377MB
docker-orchagent latest d8c14690e66c 393MB
docker-orchagent master.353-b3ae7858 d8c14690e66c 393MB
docker-sonic-telemetry latest 8764f9533b8d 414MB
docker-sonic-telemetry master.353-b3ae7858 8764f9533b8d 414MB
docker-snmp latest 8605e02d8f8b 390MB
docker-snmp master.353-b3ae7858 8605e02d8f8b 390MB
docker-dhcp-relay latest 870bed6def46 357MB
docker-dhcp-relay master.353-b3ae7858 870bed6def46 357MB
docker-sonic-mgmt-framework latest a1cf99e915fe 473MB
docker-sonic-mgmt-framework master.353-b3ae7858 a1cf99e915fe 473MB
docker-sflow latest 577f5c5eed53 383MB
docker-sflow master.353-b3ae7858 577f5c5eed53 383MB
docker-syncd-brcm latest 21dd40c5d954 447MB
docker-syncd-brcm master.353-b3ae7858 21dd40c5d954 447MB
docker-fpm-frr latest 4e1bfd2f9ab7 340MB
docker-fpm-frr master.353-b3ae7858 4e1bfd2f9ab7 340MB
get_switch_type_failure_casue_orchagent_crash_syslog.txt
The text was updated successfully, but these errors were encountered: