Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tlm_teamd] tlm_teamd crashed when running pc/test_po_update.py #5491

Closed
bingwang-ms opened this issue Sep 29, 2020 · 4 comments
Closed

[tlm_teamd] tlm_teamd crashed when running pc/test_po_update.py #5491

bingwang-ms opened this issue Sep 29, 2020 · 4 comments
Assignees
Labels
Master Branch Quality P0 Priority of the issue

Comments

@bingwang-ms
Copy link
Contributor

Description
The tlm_teamd crashed when running pc/test_po_update.py. This issue exist on both master.415.d12e9cbb and master.425-4006ce71

Sep 29 06:47:34.697079 str-dx010-acs-4 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=True strip_empty_ends=True _raw_params=config portchannel add PortChannel999 removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Sep 29 06:47:35.459077 str-dx010-acs-4 INFO teamd#supervisord: teammgrd Using team device "PortChannel999".
Sep 29 06:47:35.459077 str-dx010-acs-4 INFO teamd#supervisord: teammgrd Using PID file "/var/run/teamd/PortChannel999.pid"
Sep 29 06:47:35.459077 str-dx010-acs-4 INFO teamd#supervisord: teammgrd This program is not intended to be run as root.
Sep 29 06:47:35.465222 str-dx010-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel999 admin:0 oper:0 addr:06:be:b1:9f:55:07 ifindex:86 master:0 type:team
Sep 29 06:47:35.469433 str-dx010-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel999 admin:0 oper:0 addr:06:be:b1:9f:55:07 ifindex:86 master:0 type:team
Sep 29 06:47:35.470424 str-dx010-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel999 admin:0 oper:0 addr:00:e0:ec:c2:af:7a ifindex:86 master:0 type:team
Sep 29 06:47:35.470424 str-dx010-acs-4 NOTICE swss#orchagent: message repeated 12955 times: [ :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS]
Sep 29 06:47:35.470424 str-dx010-acs-4 NOTICE swss#orchagent: :- addLag: Create an empty LAG PortChannel999 lid:20000000006c3
Sep 29 06:47:35.470424 str-dx010-acs-4 NOTICE swss#orchagent: :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS
Sep 29 06:47:35.473713 str-dx010-acs-4 INFO systemd-udevd[15061]: Using default interface naming scheme 'v240'.
Sep 29 06:47:35.475667 str-dx010-acs-4 INFO kernel: [ 1489.698641] PortChannel999: Mode changed to "loadbalance"
Sep 29 06:47:35.480004 str-dx010-acs-4 INFO systemd-udevd[15061]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 29 06:47:35.491621 str-dx010-acs-4 NOTICE teamd#teammgrd: :- addLag: Start port channel PortChannel999 with teamd
Sep 29 06:47:35.496040 str-dx010-acs-4 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:PortChannel999 admin:1 oper:0 addr:00:e0:ec:c2:af:7a ifindex:86 master:0 type:team
Sep 29 06:47:35.509621 str-dx010-acs-4 INFO kernel: [ 1489.729665] IPv6: ADDRCONF(NETDEV_UP): PortChannel999: link is not ready
Sep 29 06:47:35.509648 str-dx010-acs-4 INFO kernel: [ 1489.729677] 8021q: adding VLAN 0 to HW filter on device PortChannel999
Sep 29 06:47:35.509690 str-dx010-acs-4 NOTICE teamd#teammgrd: :- setLagAdminStatus: Set port channel PortChannel999 admin status to up
Sep 29 06:47:35.535819 str-dx010-acs-4 NOTICE teamd#teammgrd: :- setLagMtu: Set port channel PortChannel999 MTU to 9100
Sep 29 06:47:35.912274 str-dx010-acs-4 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=True strip_empty_ends=True _raw_params=config portchannel member add PortChannel999 Ethernet112 removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Sep 29 06:47:36.476729 str-dx010-acs-4 INFO kernel: [ 1490.699255] tlm_teamd[20296]: segfault at 0 ip 00005557c1db0b58 sp 00007fff5cb96d80 error 4 in tlm_teamd[5557c1dad000+9000]
Sep 29 06:47:36.476772 str-dx010-acs-4 INFO kernel: [ 1490.699269] Code: 8b 74 24 20 4c 89 e7 48 8b 54 24 28 89 44 24 40 49 8d 44 24 10 48 01 f2 48 89 04 24 e8 e1 09 00 00 48 8b 34 24 49 8d 44 24 10 <48> 8b 0b 48 8b 54 24 08 48 39 c6 0f 85 37 ff ff ff 48 85 d2 74 16

Steps to reproduce the issue:

  1. Run test script pc/test_po_update.py

Describe the results you received:
The test case passes without error and no process crash.

Describe the results you expected:
The tlm_teamd crashed.

Additional information you deem important (e.g. issue happens only occasionally):

**Output of `show version`:**
SONiC Software Version: SONiC.master.425-4006ce71
Distribution: Debian 10.6
Kernel: 4.19.0-9-2-amd64
Build commit: 4006ce71
Build date: Sun Sep 27 07:52:44 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
Serial Number: DX010F2B118711MS100007
Uptime: 06:32:40 up 9 min,  1 user,  load average: 2.58, 2.45, 1.38

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-teamd                  latest                20617ff1849b        391MB
docker-teamd                  master.425-4006ce71   20617ff1849b        391MB
docker-sonic-mgmt-framework   latest                04e712b877e4        486MB
docker-sonic-mgmt-framework   master.425-4006ce71   04e712b877e4        486MB
docker-router-advertiser      latest                793cb0e5c2fa        359MB
docker-router-advertiser      master.425-4006ce71   793cb0e5c2fa        359MB
docker-platform-monitor       latest                f867f6e00a19        434MB
docker-platform-monitor       master.425-4006ce71   f867f6e00a19        434MB
docker-lldp                   latest                2ec74daaf081        388MB
docker-lldp                   master.425-4006ce71   2ec74daaf081        388MB
docker-dhcp-relay             latest                56b4112e9427        366MB
docker-dhcp-relay             master.425-4006ce71   56b4112e9427        366MB
docker-database               latest                32598ba6c32e        359MB
docker-database               master.425-4006ce71   32598ba6c32e        359MB
docker-orchagent              latest                bd1d90cb1a3f        405MB
docker-orchagent              master.425-4006ce71   bd1d90cb1a3f        405MB
docker-nat                    latest                de766da20285        394MB
docker-nat                    master.425-4006ce71   de766da20285        394MB
docker-sonic-telemetry        latest                3d34711e992d        429MB
docker-sonic-telemetry        master.425-4006ce71   3d34711e992d        429MB
docker-fpm-frr                latest                b79fad0eb74e        407MB
docker-fpm-frr                master.425-4006ce71   b79fad0eb74e        407MB
docker-sflow                  latest                418e8407b585        395MB
docker-sflow                  master.425-4006ce71   418e8407b585        395MB
docker-snmp                   latest                075ee0d414ab        399MB
docker-snmp                   master.425-4006ce71   075ee0d414ab        399MB
docker-syncd-brcm             latest                b93785e0dca0        447MB
docker-syncd-brcm             master.425-4006ce71   b93785e0dca0        447MB
**Attach debug file `sudo generate_dump`:**

```
(paste your output here)
```
@akokhan
Copy link
Contributor

akokhan commented Sep 29, 2020

I think this duplicates #5306

@lguohan
Copy link
Collaborator

lguohan commented Oct 2, 2020

@abdosi is this fixed in #5489

@lguohan lguohan assigned abdosi and unassigned lguohan Oct 2, 2020
@daall daall added the P1 Priority of the issue, lower than P0 label Oct 7, 2020
@daall daall added P0 Priority of the issue and removed P1 Priority of the issue, lower than P0 labels Oct 20, 2020
@daall
Copy link
Contributor

daall commented Oct 22, 2020

It looks like there is a chance that this crashes when config reload is run as well, see logs:

syslog.txt
teamd.log

@daall daall assigned pavel-shirshov and unassigned abdosi Oct 22, 2020
@pavel-shirshov
Copy link
Contributor

duplicate #5306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Master Branch Quality P0 Priority of the issue
Projects
None yet
Development

No branches or pull requests

6 participants