-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interfaces-config.service may hang at sonic-cfggen -d #1873
Comments
I think this is an critical issue. I have fix it by reading configuration from config_db.json or minigraph.xml. e.g. |
In the real scenario though, |
@taoyl-ms , I cannot reproduce this issue in current version. I remember that in the previous environment |
Close issue as I cannot reproduce it. |
Interesting. Do you have the syslog file? Can I have a copy at [email protected]? |
syslog and tech-support dump sent by mail |
admin@S9130-32X:/proc/1221$ sudo cat stack this process is wait to recv msg from socket, maybe it waiting the publish info ? maybe the CONFIG_DB_INITIALIZED is set just after we do the judgement, but before we succeed subscribe ? |
diff --git a/src/swsssdk/configdb.py b/src/swsssdk/configdb.py index 6bffad9..e0b9411 100644 --- a/src/swsssdk/configdb.py +++ b/src/swsssdk/configdb.py @@ -39,18 +39,24 @@ class ConfigDBConnector(SonicV2Connector): def __wait_for_db_init(self): client = self.redis_clients[self.CONFIG_DB] pubsub = client.pubsub() + initialized = client.get(self.INIT_INDICATOR) - if not initialized: - pattern = "__keyspace@{}__:{}".format(self.db_map[self.CONFIG_DB]['db'], self.INIT_INDICATOR) - pubsub.psubscribe(pattern) - for item in pubsub.listen(): - if item['type'] == 'pmessage': - key = item['channel'].split(':', 1)[1] - if key == self.INIT_INDICATOR: - initialized = client.get(self.INIT_INDICATOR) - if initialized: - break - pubsub.punsubscribe(pattern) + + while (not initialized): + initialized = client.get(self.INIT_INDICATOR) + time.sleep(1) + + #if not initialized: + # pattern = "__keyspace@{}__:{}".format(self.db_map[self.CONFIG_DB]['db'], self.INIT_INDICATOR) + # pubsub.psubscribe(pattern) + # for item in pubsub.listen(): + # if item['type'] == 'pmessage': + # key = item['channel'].split(':', 1)[1] + # if key == self.INIT_INDICATOR: + # initialized = client.get(self.INIT_INDICATOR) + # if initialized: + # break + # pubsub.punsubscribe(pattern) def connect(self, wait_for_init=True, retry_on=False): I change this file, and do over 3000 reboot test. no hang again. |
@richard28530: Is this still an issue? If so, is your suggested code above a potential solution? Please feel free to submit a PR if this is the case. If this is no longer an issue, please close this. |
Closing as no response for 1 year. |
snmpagent * 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (sonic-net#233) (github/201811) [SuvarnaMeenakshi] swss: * 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (sonic-net#1898) (HEAD -> 201811, github/201811) [bingwang-ms] utilities: * f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (sonic-net#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan] * 6b351c9 2021-10-14 | [201811] Remove exec from platform_reboot_plugin call to handle any hang issue. (sonic-net#1880) [Sujin Kang] * d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (sonic-net#1726) [Blueve] Signed-off-by: Ying Xie <[email protected]>
snmpagent * 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (#233) (github/201811) [SuvarnaMeenakshi] swss: * 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (#1898) (HEAD -> 201811, github/201811) [bingwang-ms] utilities: * f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan] * 6b351c9 2021-10-14 | [201811] Remove exec from platform_reboot_plugin call to handle any hang issue. (#1880) [Sujin Kang] * d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (#1726) [Blueve] Signed-off-by: Ying Xie <[email protected]>
* Add DHCPv6 minigraph parsing support Co-authored-by: shlomibitton <[email protected]> Logrotate for wtmp and btmp files to fix size getting too large. (sonic-net#8744) Signed-off-by: Abhishek Dosi <[email protected]> [201811][utilities][swss][snmpagent] advance sub module head snmpagent * 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (sonic-net#233) (github/201811) [SuvarnaMeenakshi] swss: * 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (sonic-net#1898) (HEAD -> 201811, github/201811) [bingwang-ms] utilities: * f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (sonic-net#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan] * 6b351c9 2021-10-14 | [201811] Remove exec from platform_reboot_plugin call to handle any hang issue. (sonic-net#1880) [Sujin Kang] * d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (sonic-net#1726) [Blueve] Signed-off-by: Ying Xie <[email protected]> [201811] Invoke disk check periodically (sonic-net#8951) * Invoke disk check periodically. (sonic-net#7374) Why I did it Helps with periodic scan of disk for RO state. If found, this script makes transient fix and raise error message. Save DB dump after warm/fast reboot (sonic-net#8913) Back porting the master branch change - sonic-net#8803 Save the redis DB dump after warm reboot. [201811][swss] advance swss submodule head (sonic-net#9049) * e0b115a 2021-10-22 | [copp] add dhcpv6 copp rules (sonic-net#1979) (HEAD -> 201811, github/201811) [Ying Xie] Signed-off-by: Ying Xie <[email protected]> [swssconfig] load dhcpv6 copp rules by default (sonic-net#9047) Why I did it Need to enable DHCPv6 copp rule How I did it Add a separate DHCPv6 copp rule config file and load it during cold reboot. How to verify it cold reboot, and verify config being loaded and dhcpv6 rules got installed. Signed-off-by: Ying Xie [email protected] [warmboot finalizer] load dhcpv6 copp rules when missing (sonic-net#9048) Why I did it Need to enable DHCPv6 COPP rules. How I did it Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing. How to verify it Warm reboot from an image doesn't have DHCPv6 COPP rules installed. Warm reboot from an image have DHCPv6 COPP rules already installed. In either case, the script did the right thing and only install the COPP rules if it is missing. Signed-off-by: Ying Xie [email protected]
* Add DHCPv6 minigraph parsing support Co-authored-by: shlomibitton <[email protected]> Logrotate for wtmp and btmp files to fix size getting too large. (#8744) Signed-off-by: Abhishek Dosi <[email protected]> [201811][utilities][swss][snmpagent] advance sub module head snmpagent * 187aa10 2021-09-16 | [201811][RFC1213]: Initialize lag oid map in reinit_data (#233) (github/201811) [SuvarnaMeenakshi] swss: * 3503705 2021-09-05 | [201811][Cherry-pick] [acl mirror action] Mirror session ref count fix at acl rule attachment (#1898) (HEAD -> 201811, github/201811) [bingwang-ms] utilities: * f3f8667 2021-10-15 | [201811] disk_check.py: Allow remote user access when disk is read-only (#1873) (HEAD -> 201811, github/201811) [Renuka Manavalan] * 6b351c9 2021-10-14 | [201811] Remove exec from platform_reboot_plugin call to handle any hang issue. (#1880) [Sujin Kang] * d8d0461 2021-07-29 | [minigraph][port_config] Consume port_config.json while reloading minigraph (#1726) [Blueve] Signed-off-by: Ying Xie <[email protected]> [201811] Invoke disk check periodically (#8951) * Invoke disk check periodically. (#7374) Why I did it Helps with periodic scan of disk for RO state. If found, this script makes transient fix and raise error message. Save DB dump after warm/fast reboot (#8913) Back porting the master branch change - #8803 Save the redis DB dump after warm reboot. [201811][swss] advance swss submodule head (#9049) * e0b115a 2021-10-22 | [copp] add dhcpv6 copp rules (#1979) (HEAD -> 201811, github/201811) [Ying Xie] Signed-off-by: Ying Xie <[email protected]> [swssconfig] load dhcpv6 copp rules by default (#9047) Why I did it Need to enable DHCPv6 copp rule How I did it Add a separate DHCPv6 copp rule config file and load it during cold reboot. How to verify it cold reboot, and verify config being loaded and dhcpv6 rules got installed. Signed-off-by: Ying Xie [email protected] [warmboot finalizer] load dhcpv6 copp rules when missing (#9048) Why I did it Need to enable DHCPv6 COPP rules. How I did it Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing. How to verify it Warm reboot from an image doesn't have DHCPv6 COPP rules installed. Warm reboot from an image have DHCPv6 COPP rules already installed. In either case, the script did the right thing and only install the COPP rules if it is missing. Signed-off-by: Ying Xie [email protected]
Description
interfaces-config.service may hang at
sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces
Steps to reproduce the issue:
It's hard to reproduce this issue through keep rebooting system time and time.
But we can reproduce
sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces
hang up.redis-cli -n 4 FLUSHDB
sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2
We know the dependency: interfaces-config.service -> database.service -> updategraph.service.
And database.service load config db at docker container with
configdb-load.sh
.database.service does not wait configdb-load.sh load all confib db data into redis db 4 and it quits after redis-server is OK.
So when interfaces-config.service runs, there may be no entries in redis db 4. It causes interfaces-config.sh hang at
sonic-cfggen -d -t /usr/share/sonic/templates/interfaces.j2 > /etc/network/interfaces
, and keep interfaces-config.service in running status.Describe the results you received:
Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: