Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On single-asic and multi-asic sonic devices, after sonic-installer install an image that does not exist, executing any commands generate backtrace #10135

Closed
harish-kalyanaraman opened this issue Mar 2, 2022 · 7 comments · Fixed by sonic-net/sonic-utilities#2179
Assignees
Labels
Chassis 🤖 Modular chassis support Triaged this issue has been triaged

Comments

@harish-kalyanaraman
Copy link

Description

On a multi-asic DUT, after we issue a sonic-installer install command, all SONIC commands produces a backtrace and exits w/out executing the command after a long time.

A 'sudo reboot' command does reboot the DUT, but only after the exception is thrown after a long time.

Steps to reproduce the issue:

  1. Run a show command like 'show feature status'
  2. Run sonic-install install to install an new image that doesn't exit on the DUT.
  3. Run the show command again.

Describe the results you received:

admin@sonic:~$ show feature status
Feature         State            AutoRestart     SetOwner
--------------  ---------------  --------------  ----------
bgp             enabled          enabled
database        always_enabled   always_enabled
dhcp_relay      enabled          enabled         local
lldp            enabled          enabled
macsec          disabled         enabled
mgmt-framework  enabled          enabled
mux             always_disabled  enabled
nat             disabled         enabled
pmon            enabled          enabled
radv            enabled          enabled
sflow           disabled         enabled
snmp            enabled          enabled
swss            enabled          enabled
syncd           enabled          enabled
teamd           enabled          enabled
telemetry       enabled          enabled
admin@sonic:~$ 
admin@sonic:~$ sudo sonic-installer install /tmp/sonic-broadcom-dnx.bin -y
Installing image SONiC-OS-HEAD.230935-dirty-20220301.182109 and setting it as default...
Command: bash /tmp/sonic_image
Verifying image checksum ... OK.
Preparing image archive ... OK.
Installing SONiC in SONiC
ONIE Installer: platform: x86_64-broadcom-dnx-r0
onie_platform: x86_64-nokia_ixr7250e_sup-r0
Removing old SONiC installation /host/image-HEAD.231265-dirty-20220302.011117
Installing SONiC to /host/image-HEAD.230935-dirty-20220301.182109
.
.
.

admin@sonic:~$ show feature status
Traceback (most recent call last):
  File "/usr/local/bin/show", line 5, in <module>
    from show.main import cli
  File "/usr/local/lib/python3.9/dist-packages/show/main.py", line 8, in <module>
    import utilities_common.cli as clicommon
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 189, in <module>
    iface_alias_converter = InterfaceAliasConverter()
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 126, in __init__
    self.port_dict = multi_asic.get_port_table()
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 295, in get_port_table
    ports = get_port_table_for_asic(ns)
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 309, in get_port_table_for_asic
    config_db = connect_config_db_for_ns(namespace)
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/multi_asic.py", line 43, in connect_config_db_for_ns
    config_db.connect()
  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1829, in connect
    return _swsscommon.ConfigDBConnector_Native_connect(self, wait_for_init, retry_on)
RuntimeError: Unable to connect to redis: Cannot assign requested address
admin@sonic:~$

Describe the results you expected:

Don't expect to see the backtrace and commands should execute.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@rlhui rlhui changed the title On multi-asic DUT, after sonic-installer install, executing any commands generate backtrace On multi-asic DUT, after sonic-installer install an image that does not exist, executing any commands generate backtrace Mar 13, 2022
@prsunny prsunny added the Triaged this issue has been triaged label Apr 13, 2022
@abdosi
Copy link
Contributor

abdosi commented Apr 13, 2022

cc @anamehra

@abdosi
Copy link
Contributor

abdosi commented Apr 19, 2022

Issue is happening as part of sonic-installation as we do package migration we start docker/stop service in new image mount point which seems to change docker0 ip which makes the connectivity of namespace over docker bridge broken

Before Installation

admin@xxxx:~$ sudo ifconfig docker0                                                                         
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500                                                          
        inet 240.127.1.1  netmask 255.255.255.0  broadcast 240.127.1.255                                                
        inet6 fd00::1  prefixlen 80  scopeid 0x0<global>                                                                
        inet6 fe80::42:dbff:fee6:2a12  prefixlen 64  scopeid 0x20<link>                                                 
        inet6 fe80::1  prefixlen 64  scopeid 0x20<link>                                                                 
        ether 02:42:db:e6:2a:12  txqueuelen 0  (Ethernet)                                                                                           
        RX packets 130394042  bytes 30899133518 (28.7 GiB)                                                                 
        RX errors 0  dropped 0  overruns 0  frame 0                                                                        
        TX packets 134037465  bytes 11686910235 (10.8 GiB)                                                                 
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0         

After Installation

admin@str2-xxxx:/$ sudo ifconfig docker0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.18.0.1  netmask 255.255.0.0  broadcast 172.18.255.255
        inet6 fe80::42:f3ff:fe60:5e1b  prefixlen 64  scopeid 0x20<link>
        inet6 fe80::1  prefixlen 64  scopeid 0x20<link>
        ether 02:42:f3:60:5e:1b  txqueuelen 0  (Ethernet)
        RX packets 20839928  bytes 4964254259 (4.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 20338719  bytes 1777167786 (1.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

@abdosi
Copy link
Contributor

abdosi commented Apr 19, 2022

@stepanblyschak please take a look into this.

@abdosi abdosi removed their assignment Apr 19, 2022
@abdosi
Copy link
Contributor

abdosi commented Apr 19, 2022

Also I see after post installation iptables rules created by docker. Basically all the option provided by sonic image as docker service are ignore when run again from different mount point

/usr/bin/dockerd -H unix:// --storage-driver=overlay2 --bip=240.127.1.1/24 --iptables=false --ipv6=true --fixed-cidr-v6=fd00::/80

admin@xxxx:/$ sudo systemctl status docker.service
● docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/docker.service.d
             └─docker.service.conf
     Active: active (running) since Tue 2022-04-05 22:34:07 UTC; 1 weeks 6 days ago
TriggeredBy: ● docker.socket
       Docs: https://docs.docker.com
   Main PID: 675 (dockerd)
      Tasks: 40
     Memory: 170.9M
     CGroup: /system.slice/docker.service
             └─675 **_/usr/bin/dockerd -H unix:// --storage-driver=overlay2 --bip=240.127.1.1/24 --iptables=false --ipv6=true --fixed-cidr-v6=fd00::/80_**

image

@abdosi
Copy link
Contributor

abdosi commented Apr 19, 2022

@mlok-nokia Workaround for now is to use --skip-package-migration with sonic-installer command.

@rlhui rlhui added the Chassis 🤖 Modular chassis support label Apr 20, 2022
@arlakshm
Copy link
Contributor

arlakshm commented May 6, 2022

same issue seen on single asic linecard as well

admin@sonic:~$ more /etc/sonic/sonic_version.yml
---
build_version: 'master.95340-4ec3af86a'
debian_version: '11.3'
kernel_version: '5.10.0-8-2-amd64'
asic_type: broadcom
asic_subtype: 'broadcom-dnx'
commit_id: '4ec3af86a'
branch: 'master'
release: 'none'
build_date: Mon May  2 18:04:48 UTC 2022
build_number: 95340
built_by: AzDevOps@sonic-build-workers-001H69
libswsscommon: 1.0.0
sonic_utilities: 1.2
admin@sonic:~$ show ver
^CTraceback (most recent call last):
  File "/usr/local/bin/show", line 5, in <module>
    from show.main import cli
  File "/usr/local/lib/python3.9/dist-packages/show/main.py", line 8, in <module>
    import utilities_common.cli as clicommon
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/cli.py", line 14, in <module>
    from utilities_common.db import Db
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/db.py", line 4, in <module>
    from utilities_common.multi_asic import multi_asic_ns_choices
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/multi_asic.py", line 102, in <module>
    default=multi_asic_display_default_option(),
  File "/usr/local/lib/python3.9/dist-packages/utilities_common/multi_asic.py", line 93, in multi_asic_display_default_option
    if not multi_asic.is_multi_asic() and not device_info.is_chassis():
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 427, in is_chassis
    return is_voq_chassis() or is_packet_chassis()
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 417, in is_voq_chassis
    switch_type = get_platform_info().get('switch_type')
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 353, in get_platform_info
    hw_info_dict['hwsku'] = get_hwsku()
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 126, in get_hwsku
    return get_localhost_info('hwsku')
  File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 47, in get_localhost_info
    config_db.connect()
  File "/usr/local/lib/python3.9/dist-packages/swsssdk/configdb.py", line 84, in connect
    self.db_connect('CONFIG_DB', wait_for_init, retry_on)
  File "/usr/local/lib/python3.9/dist-packages/swsssdk/configdb.py", line 81, in db_connect
    self.__wait_for_db_init()
  File "/usr/local/lib/python3.9/dist-packages/swsssdk/configdb.py", line 66, in __wait_for_db_init
    for item in pubsub.listen():
  File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 3605, in listen
    response = self.handle_message(self.parse_response(block=True))
  File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 3505, in parse_response
    response = self._execute(conn, conn.read_response)
  File "/usr/local/lib/python3.9/dist-packages/redis/client.py", line 3479, in _execute
    return command(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 739, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 324, in read_response
    raw = self._buffer.readline()
  File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 256, in readline
    self._read_from_socket()
  File "/usr/local/lib/python3.9/dist-packages/redis/connection.py", line 198, in _read_from_socket
    data = recv(self._sock, socket_read_size)
  File "/usr/local/lib/python3.9/dist-packages/redis/_compat.py", line 72, in recv
    return sock.recv(*args, **kwargs)
KeyboardInterrupt

admin@sonic:~$
admin@sonic:~$

@rlhui rlhui changed the title On multi-asic DUT, after sonic-installer install an image that does not exist, executing any commands generate backtrace On single-asic and multi-asic sonic devices, after sonic-installer install an image that does not exist, executing any commands generate backtrace May 11, 2022
@stepanblyschak
Copy link
Collaborator

@harish-kalyanaraman @abdosi @arlakshm Please check the PR sonic-net/sonic-utilities#2179. I tested VS multi-asic. Could you please help with verification on single-asic linecard?

abdosi pushed a commit to sonic-net/sonic-utilities that referenced this issue May 27, 2022
…erd in chroot (#2179)

I attempted to fix an issue that happens on multi-asic devices after new SONiC image installation. The issue is caused by overriding docker0 bridge configuration as well as installing iptables rules by dockerd started in chroot environment.

Fixes sonic-net/sonic-buildimage#10135

How I did it
I start dockerd in chroot using same parameters the host dockerd is started with.
liat-grozovik pushed a commit to sonic-net/sonic-utilities that referenced this issue Oct 3, 2022
…erd in chroot (#2179) (#2407)

Backport of #2179

- Why I did it
I attempted to fix an issue that happens on multi-asic devices after new SONiC image installation. The issue is caused by overriding docker0 bridge configuration as well as installing iptables rules by dockerd started in chroot environment.

Fixes sonic-net/sonic-buildimage#10135

- How I did it
I start dockerd in chroot using same parameters the host dockerd is started with.

- How to verify it
Run VS multi-asic device.
Install a new image on it.
Verify "show version", "show interface status"

Signed-off-by: Stepan Blyschak <[email protected]>
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this issue Aug 3, 2023
…erd in chroot (#2179)

I attempted to fix an issue that happens on multi-asic devices after new SONiC image installation. The issue is caused by overriding docker0 bridge configuration as well as installing iptables rules by dockerd started in chroot environment.

Fixes sonic-net/sonic-buildimage#10135

How I did it
I start dockerd in chroot using same parameters the host dockerd is started with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support Triaged this issue has been triaged
Projects
None yet
6 participants