[BUG] Salt 3006.2 requiring a higher timeout value than 3004.2 #65397

ntt-raraujo · 2023-10-13T03:22:29Z

Description
After implementing Salt 3006, the timeout had to be changed from 30 seconds to 60 seconds, otherwise, the error 'minion did not respond' would occur when applying highstates to proxy-minions

we currently have a 3004 deployment that works fine with 30 seconds timeout.

I'm using the same files on 3006 and 3004 servers. (same pillars, master file, states files, custom modules and so on). The only difference is the Salt version and OS

Setup

Please be as specific as possible and give set-up details.

on-prem machine
VM (Virtualbox, KVM, etc. please specify) - VMware
VM running on a cloud service, please be explicit and add details
container (Kubernetes, Docker, containerd, etc. please specify)
or a combination, please be explicit
jails if it is FreeBSD
classic packaging
onedir packaging
used bootstrap to install

Both 3004 and 3006 servers/proxies are on the same subnets and have the same virtual-resources/vm settings.

Steps to Reproduce the behavior
The first one with our current default timeout of 30 seconds, and the second one with 60 seconds timeout

[root@saltmaster3006 ~]$ salt 'hostname-omitted' state.test -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3006.infra.omitted-domain.run
[DEBUG   ] Missing configuration file: /root/.saltrc
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Configuration file path: /etc/salt/master
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3006.infra.omitted-domain.run
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Closing AsyncReqChannel instance
[DEBUG   ] Marking 'base64_encode' as a jinja filter
[DEBUG   ] Marking 'base64_decode' as a jinja filter
[DEBUG   ] Marking 'md5' as a jinja filter
[DEBUG   ] Marking 'sha1' as a jinja filter
[DEBUG   ] Marking 'sha256' as a jinja filter
[DEBUG   ] Marking 'sha512' as a jinja filter
[DEBUG   ] Marking 'hmac' as a jinja filter
[DEBUG   ] Marking 'hmac_compute' as a jinja filter
[DEBUG   ] Marking 'random_hash' as a jinja filter
[DEBUG   ] Marking 'rand_str' as a jinja filter
[DEBUG   ] Marking 'file_hashsum' as a jinja filter
[DEBUG   ] Marking 'http_query' as a jinja filter
[DEBUG   ] Marking 'ifelse' as a jinja global
[DEBUG   ] Marking 'strftime' as a jinja filter
[DEBUG   ] Marking 'date_format' as a jinja filter
[DEBUG   ] Marking 'raise' as a jinja global
[DEBUG   ] Marking 'match' as a jinja test
[DEBUG   ] Marking 'equalto' as a jinja test
[DEBUG   ] Marking 'skip' as a jinja filter
[DEBUG   ] Marking 'sequence' as a jinja filter
[DEBUG   ] Marking 'to_bool' as a jinja filter
[DEBUG   ] Marking 'indent' as a jinja filter
[DEBUG   ] Marking 'tojson' as a jinja filter
[DEBUG   ] Marking 'quote' as a jinja filter
[DEBUG   ] Marking 'regex_escape' as a jinja filter
[DEBUG   ] Marking 'regex_search' as a jinja filter
[DEBUG   ] Marking 'regex_match' as a jinja filter
[DEBUG   ] Marking 'regex_replace' as a jinja filter
[DEBUG   ] Marking 'uuid' as a jinja filter
[DEBUG   ] Marking 'unique' as a jinja filter
[DEBUG   ] Marking 'min' as a jinja filter
[DEBUG   ] Marking 'max' as a jinja filter
[DEBUG   ] Marking 'avg' as a jinja filter
[DEBUG   ] Marking 'union' as a jinja filter
[DEBUG   ] Marking 'intersect' as a jinja filter
[DEBUG   ] Marking 'difference' as a jinja filter
[DEBUG   ] Marking 'symmetric_difference' as a jinja filter
[DEBUG   ] Marking 'method_call' as a jinja filter
[DEBUG   ] Marking 'yaml_dquote' as a jinja filter
[DEBUG   ] Marking 'yaml_squote' as a jinja filter
[DEBUG   ] Marking 'yaml_encode' as a jinja filter
[DEBUG   ] The functions from module 'local_cache' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/8d/0204ad3e49d1b7fca67bc2fa98026eda95d14ee8f49c4b9b6b524b8f2bd491/.minions.p
[DEBUG   ] get_iter_returns for jid 20231013025401457648 sent to {'hostname-omitted'} will timeout at 02:54:31.548023
[DEBUG   ] Checking whether jid 20231013025401457648 is still running
[DEBUG   ] Closing AsyncReqChannel instance
[DEBUG   ] retcode missing from client return
[DEBUG   ] return event: {'hostname-omitted': {'failed': True}}
[DEBUG   ] The functions from module 'localfs' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] The functions from module 'localfs' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded localfs.init_kwargs
[DEBUG   ] The functions from module 'no_return' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded no_return.output
hostname-omitted:
    Minion did not return. [Not connected]
[DEBUG   ] Closing IPCMessageSubscriber instance
ERROR: Minions returned with non-zero exit code





[root@saltmaster3006 ~]$ salt 'hostname-omitted' state.test --timeout 60 -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3006.infra.omitted-domain.run
[DEBUG   ] Missing configuration file: /root/.saltrc
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Configuration file path: /etc/salt/master
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3006.infra.omitted-domain.run
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Closing AsyncReqChannel instance
[DEBUG   ] Marking 'base64_encode' as a jinja filter
[DEBUG   ] Marking 'base64_decode' as a jinja filter
[DEBUG   ] Marking 'md5' as a jinja filter
[DEBUG   ] Marking 'sha1' as a jinja filter
[DEBUG   ] Marking 'sha256' as a jinja filter
[DEBUG   ] Marking 'sha512' as a jinja filter
[DEBUG   ] Marking 'hmac' as a jinja filter
[DEBUG   ] Marking 'hmac_compute' as a jinja filter
[DEBUG   ] Marking 'random_hash' as a jinja filter
[DEBUG   ] Marking 'rand_str' as a jinja filter
[DEBUG   ] Marking 'file_hashsum' as a jinja filter
[DEBUG   ] Marking 'http_query' as a jinja filter
[DEBUG   ] Marking 'ifelse' as a jinja global
[DEBUG   ] Marking 'strftime' as a jinja filter
[DEBUG   ] Marking 'date_format' as a jinja filter
[DEBUG   ] Marking 'raise' as a jinja global
[DEBUG   ] Marking 'match' as a jinja test
[DEBUG   ] Marking 'equalto' as a jinja test
[DEBUG   ] Marking 'skip' as a jinja filter
[DEBUG   ] Marking 'sequence' as a jinja filter
[DEBUG   ] Marking 'to_bool' as a jinja filter
[DEBUG   ] Marking 'indent' as a jinja filter
[DEBUG   ] Marking 'tojson' as a jinja filter
[DEBUG   ] Marking 'quote' as a jinja filter
[DEBUG   ] Marking 'regex_escape' as a jinja filter
[DEBUG   ] Marking 'regex_search' as a jinja filter
[DEBUG   ] Marking 'regex_match' as a jinja filter
[DEBUG   ] Marking 'regex_replace' as a jinja filter
[DEBUG   ] Marking 'uuid' as a jinja filter
[DEBUG   ] Marking 'unique' as a jinja filter
[DEBUG   ] Marking 'min' as a jinja filter
[DEBUG   ] Marking 'max' as a jinja filter
[DEBUG   ] Marking 'avg' as a jinja filter
[DEBUG   ] Marking 'union' as a jinja filter
[DEBUG   ] Marking 'intersect' as a jinja filter
[DEBUG   ] Marking 'difference' as a jinja filter
[DEBUG   ] Marking 'symmetric_difference' as a jinja filter
[DEBUG   ] Marking 'method_call' as a jinja filter
[DEBUG   ] Marking 'yaml_dquote' as a jinja filter
[DEBUG   ] Marking 'yaml_squote' as a jinja filter
[DEBUG   ] Marking 'yaml_encode' as a jinja filter
[DEBUG   ] The functions from module 'local_cache' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/4f/b4c4fa29b7319d60e114ce416745b7754f9eb081c9c4651e043ae6dc6cb8c5/.minions.p
[DEBUG   ] get_iter_returns for jid 20231013025540948005 sent to {'hostname-omitted'} will timeout at 02:56:41.035151
[DEBUG   ] Checking whether jid 20231013025540948005 is still running
[DEBUG   ] Closing AsyncReqChannel instance
[DEBUG   ] retcode missing from client return
[DEBUG   ] jid 20231013025540948005 return from hostname-omitted
[DEBUG   ] return event: {'hostname-omitted': < omitted output >...}
[DEBUG   ] The functions from module 'highstate' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded highstate.output
hostname-omitted:
----------
          ID: juniper_base_template
    Function: netconfig.managed
      Result: None
     Comment: Configuration discarded.

              Configuration diff:

              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
     Started: 02:56:10.437053
    Duration: 8491.948 ms
     Changes:
----------
          ID: firewall_networking
    Function: netconfig.managed
      Result: None
     Comment: Configuration discarded.

              Configuration diff:

              [omitted output
              !omitted output
              !omitted output
              [omitted output
              !omitted output
              [omitted output
               omitted output
              !omitted output
     Started: 02:56:18.929229
    Duration: 40837.934 ms
     Changes:

Summary for hostname-omitted
------------
Succeeded: 2 (unchanged=2)
Failed:    0
------------
Total states run:     2
Total run time:  49.330 s
[DEBUG   ] jid 20231013025540948005 found all minions {'hostname-omitted'}
[DEBUG   ] Closing IPCMessageSubscriber instance
[root@saltmaster3006 ~]$

Salt3004 version (same subnets and same files)

[root@saltmaster3004 ~]$ salt 'hostname-omitted' state.test -l debug
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3004
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] Using importlib_metadata to load entry points
[DEBUG   ] Override  __grains__: <module 'salt.loaded.int.log_handlers.sentry_mod' from '/usr/lib/python3.6/site-packages/salt/log/handlers/sentry_mod.py'>
[DEBUG   ] Configuration file path: /etc/salt/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /etc/salt/master
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: saltmaster3004
[DEBUG   ] Missing configuration file: /root/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: /var/run/salt/master/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: /var/run/salt/master/master_event_pull.ipc
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /var/cache/salt/master/jobs/6c/50d37b9f3910082670f9f4a00c6b59716f0a0e2cd80e8958f29695b6742d9e/.minions.p
[DEBUG   ] get_iter_returns for jid 20231013024610491474 sent to {'hostname-omitted'} will timeout at 02:46:40.498406
[DEBUG   ] Checking whether jid 20231013024610491474 is still running
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] retcode missing from client return
[DEBUG   ] Checking whether jid 20231013024610491474 is still running
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://127.0.0.1:4506
[DEBUG   ] Trying to connect to: tcp://127.0.0.1:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] retcode missing from client return
[DEBUG   ] jid 20231013024610491474 return from hostname-omitted
[DEBUG   ] return event: {'hostname-omitted': < omitted output >...}
[DEBUG   ] Using importlib_metadata to load entry points
[DEBUG   ] LazyLoaded highstate.output
hostname-omitted:
----------
          ID: juniper_base_template
    Function: netconfig.managed
      Result: None
     Comment: Configuration discarded.

              Configuration diff:

              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
              omitted output
     Started: 02:46:33.263478
    Duration: 3971.695 ms
     Changes:
----------
          ID: firewall_networking
    Function: netconfig.managed
      Result: None
     Comment: Configuration discarded.

              Configuration diff:

              [omitted output
              !omitted output
              !omitted output
              [omitted output
              !omitted output
              [omitted output
               omitted output
              !omitted output
     Started: 02:46:37.235416
    Duration: 14413.841 ms
     Changes:

Summary for hostname-omitted
------------
Succeeded: 2 (unchanged=2)
Failed:    0
------------
Total states run:     2
Total run time:  18.386 s
[DEBUG   ] jid 20231013024610491474 found all minions {'hostname-omitted'}
[DEBUG   ] Closing IPCMessageSubscriber instance
You have mail in /var/spool/mail/root

Expected behavior
A lower timeout gap between our 3004 and 3006 deployment.

Versions Report

salt --versions-report

(Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)

[root@SaltMaster ~]$ salt --versions
Salt Version:
          Salt: 3006.2

Python Version:
        Python: 3.10.12 (main, Aug  3 2023, 21:47:10) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.3
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: oracle 8.8
        locale: utf-8
       machine: x86_64
       release: 5.4.17-2136.321.4.1.el8uek.x86_64
        system: Linux
       version: Oracle Linux Server 8.8


[root@SaltProxy ~]# salt-proxy --versions
Salt Version:
          Salt: 3006.2

Python Version:
        Python: 3.10.12 (main, Aug  3 2023, 21:47:10) [GCC 11.2.0]

Dependency Versions:
          cffi: 1.14.6
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.1.2
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 22.0
     pycparser: 2.21
      pycrypto: Not Installed
  pycryptodome: 3.9.8
        pygit2: Not Installed
  python-gnupg: 0.4.8
        PyYAML: 6.0.1
         PyZMQ: 23.2.0
        relenv: 0.13.3
         smmap: Not Installed
       timelib: 0.2.4
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: oracle 8.8
        locale: utf-8
       machine: x86_64
       release: 5.4.17-2136.321.4.1.el8uek.x86_64
        system: Linux
       version: Oracle Linux Server 8.8

Additional context
Is there any other way to debug this issue? or a way to debug the zeromq itself to check for some transport issues?

welcome · 2023-10-13T03:22:32Z

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

ITJamie · 2023-11-01T14:59:12Z

having the same issue with salt-proxy's (napalm to nxos switches) since upgrading to 3006.x . previously 3004.x was our enviornment.

ntt-raraujo · 2023-11-14T15:51:44Z

I just updated to version 3006.4. It improved a little (a few seconds) but still got the timeout issue for a proxy located on Asia (180ms delay from master to proxy)
The time it takes to run a highstate on 3006.4 is still twice compared to 3004.2 version.

dwoz · 2023-12-06T21:24:55Z

@ntt-raraujo @ITJamie

Is it possible for either of you to come up with an example state in which I can test against both 3004 and 3006 to see the difference? That may help me identify the cause of the behavior you are seeing.

dwoz · 2023-12-18T07:25:29Z

@ntt-raraujo We fixed #65450 in 3006.5, can you test to see if that resolves this issue?

ntt-raraujo · 2023-12-19T15:47:50Z

@dwoz 3006.5 fixed the issue. Thanks!

ntt-raraujo · 2023-12-21T19:23:12Z

@dwoz Would you mind saying what the problem was? I couldn't find the issue on the release notes.
I've been discussing this with my team for so long, it would be nice to have some closure. Thanks

ITJamie · 2023-12-27T21:44:14Z

It would also be great to know if its a master side change or minion side change

dwoz · 2024-02-06T20:24:37Z

@ntt-raraujo @ITJamie Sorry for the late reply on this. I've been tied up working on other issues and just got back to this one. The was caused by a regression where the file client get re-created on each state run. The overhead of creating a new connection to the master multiple times during a highstate caused a substantial slow down. The issue and fix were on the minion side.

dwoz · 2024-02-06T20:24:52Z

Fixed in 3006.5

ntt-raraujo added Bug broken, incorrect, or confusing behavior needs-triage labels Oct 13, 2023

OrangeDog added Regression The issue is a bug that breaks functionality known to work in previous releases. Performance labels Oct 13, 2023

nicholasmhughes mentioned this issue Oct 16, 2023

[BUG] Nonce verification error with TCP transport on slower network connections #65114

Closed

dwoz self-assigned this Nov 3, 2023

dwoz added this to the Approved milestone Dec 6, 2023

dwoz modified the milestones: Approved, Sulfur v3006.5 Dec 18, 2023

dwoz closed this as completed Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Salt 3006.2 requiring a higher timeout value than 3004.2 #65397

[BUG] Salt 3006.2 requiring a higher timeout value than 3004.2 #65397

ntt-raraujo commented Oct 13, 2023 •

edited

Loading

welcome bot commented Oct 13, 2023

ITJamie commented Nov 1, 2023

ntt-raraujo commented Nov 14, 2023

dwoz commented Dec 6, 2023

dwoz commented Dec 18, 2023

ntt-raraujo commented Dec 19, 2023

ntt-raraujo commented Dec 21, 2023

ITJamie commented Dec 27, 2023

dwoz commented Feb 6, 2024

dwoz commented Feb 6, 2024

[BUG] Salt 3006.2 requiring a higher timeout value than 3004.2 #65397

[BUG] Salt 3006.2 requiring a higher timeout value than 3004.2 #65397

Comments

ntt-raraujo commented Oct 13, 2023 • edited Loading

welcome bot commented Oct 13, 2023

ITJamie commented Nov 1, 2023

ntt-raraujo commented Nov 14, 2023

dwoz commented Dec 6, 2023

dwoz commented Dec 18, 2023

ntt-raraujo commented Dec 19, 2023

ntt-raraujo commented Dec 21, 2023

ITJamie commented Dec 27, 2023

dwoz commented Feb 6, 2024

dwoz commented Feb 6, 2024

ntt-raraujo commented Oct 13, 2023 •

edited

Loading