Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Orchestration with "parallel: True" does not behave properly with napalm/deltaproxy #61439

Closed
COvirtNetwork opened this issue Jan 11, 2022 · 3 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior Delta-Proxy Proxy-Minion VMware

Comments

@COvirtNetwork
Copy link

Description
Running a state file which runs a series of 17 commands against napalm devices (network switches) intermittently fails when using the combination of orchestrate runner, napalm proxy, and deltaproxy

When running 12 or fewer commands, no issues are observed. When running 13-14 commands, approximately 20% of orchestration runs experience failure on at least 1 command. When running 15 commands, approximately 40% of orchestration runs experience failure on at least 1 command. When running 16 commands, approximately 60% of orchestration runs experience failure on at least 1 command. When running 17 commands, 100% of orchestration runs experience failure on at least 1 command

No issues are observed when running the commands in serial.

Setup
(Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

All devices are on-prem. Salt master is running as a VM in on-prem infrastructure. A second VM is running as a Salt minion. Delta proxy service is running on the salt minion VM. Pillar data is stored in an external MongoDB running on a 3rd VM.

below is srv/salt/nxos_ssh_commands_org/init.sls file which contains the example state file.

{% set target = salt['pillar.get']('target') %}
{% do salt.log.debug('xxx: ' ~ target) %}
{% for cmd in ['show version','show interfaces description','show ip bgp all','show module','show vlan','show mac address','show ip arp','show interfaces status','show cdp nei','show etherchannel summary','show file systems','show bootvar','show spanning-tree','show inventory','show redundancy','show switch virtual','show run'] %}
run {{ cmd }} :
  salt.function:
    - name: net.cli
    - tgt: '{{ target }}'
    - tgt_type: "compound"
    - arg:
      - {{ cmd }}
    - parallel: True
{% endfor %}

Steps to Reproduce the behavior
Run the attached .sls file using the following command:
salt-run state.orch nxos_ssh_commands_org pillar="{'target': 'L@<minion_id>'}"

Expected behavior
All 17 commands included in the .sls file should successfully return output

Screenshots
NA

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3004

Dependency Versions:
          cffi: 1.14.6
      cherrypy: unknown
      dateutil: Not Installed
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.1
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: Not Installed
  pycryptodome: 3.10.1
        pygit2: Not Installed
        Python: 3.6.8 (default, Nov 16 2020, 16:55:22)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 17.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4

Salt Extensions:
        SSEAPE: 8.6.1.4

System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1160.42.2.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

Additional context
The following tracebacks are observed on command failures:

              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 935, in establish_connection
                  self.remote_conn_pre.connect(**ssh_connect_params)
                File "/usr/local/lib/python3.6/site-packages/paramiko/client.py", line 412, in connect
                  server_key = t.get_remote_server_key()
                File "/usr/local/lib/python3.6/site-packages/paramiko/transport.py", line 834, in get_remote_server_key
                  raise SSHException("No existing session")
              paramiko.ssh_exception.SSHException: No existing session

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 92, in _netmiko_open
                  **netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/netmiko/ssh_dispatcher.py", line 326, in ConnectHandler
                  return ConnectionClass(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 12, in __init__
                  return super().__init__(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 350, in __init__
                  self._open()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 355, in _open
                  self.establish_connection()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 980, in establish_connection
                  raise NetmikoTimeoutException(msg)
              netmiko.ssh_exception.NetmikoTimeoutException: Paramiko: 'No existing session' error: try increasing 'conn_timeout' to 10 seconds or larger.

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 359, in get_device
                  network_device.get("DRIVER").open()
                File "/usr/local/lib/python3.6/site-packages/napalm/nxos_ssh/nxos_ssh.py", line 446, in open
                  device_type="cisco_nxos", netmiko_optional_args=self.netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 95, in _netmiko_open
                  raise ConnectionException("Cannot connect to {}".format(self.hostname))
              napalm.base.exceptions.ConnectionException: Cannot connect to nxs1-ork3.vmware.com

              During handling of the above exception, another exception occurred:

              Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 160, in call
                  napalm_device = get_device(opts)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 386, in get_device
                  raise napalm_base.exceptions.ConnectionException(base_err_msg)
              napalm.base.exceptions.ConnectionException: Cannot connect to nxs1-ork3.vmware.com as svc.salt.net.
              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 160, in call
                  napalm_device = get_device(opts)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 359, in get_device
                  network_device.get("DRIVER").open()
                File "/usr/local/lib/python3.6/site-packages/napalm/nxos_ssh/nxos_ssh.py", line 446, in open
                  device_type="cisco_nxos", netmiko_optional_args=self.netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/napalm/base/base.py", line 92, in _netmiko_open
                  **netmiko_optional_args
                File "/usr/local/lib/python3.6/site-packages/netmiko/ssh_dispatcher.py", line 326, in ConnectHandler
                  return ConnectionClass(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 12, in __init__
                  return super().__init__(*args, **kwargs)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 350, in __init__
                  self._open()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 356, in _open
                  self._try_session_preparation()
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 792, in _try_session_preparation
                  self.session_preparation()
                File "/usr/local/lib/python3.6/site-packages/netmiko/cisco/cisco_nxos_ssh.py", line 20, in session_preparation
                  command="terminal width 511", pattern=r"terminal width 511"
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 1126, in set_terminal_width
                  self.write_channel(command)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 459, in write_channel
                  self._write_channel(out_data)
                File "/usr/local/lib/python3.6/site-packages/netmiko/base_connection.py", line 417, in _write_channel
                  self.remote_conn.sendall(write_bytes(out_data, encoding=self.encoding))
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 846, in sendall
                  sent = self.send(s)
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 801, in send
                  return self._send(s, m)
                File "/usr/local/lib/python3.6/site-packages/paramiko/channel.py", line 1198, in _send
                  raise socket.error("Socket is closed")
              OSError: Socket is closed
              The minion function caused an exception: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/metaproxy/deltaproxy.py", line 595, in thread_return
                  opts, data, func, args, kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
                  return func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1201, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1216, in _run_as
                  return _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 535, in func_wrapper
                  ret = func(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/napalm_network.py", line 698, in cli
                  **{"commands": list(commands), "force_reconnect":True}
                File "/usr/lib/python3.6/site-packages/salt/utils/napalm.py", line 157, in call
                  opts["proxy"].update(**kwargs)
              KeyError: 'proxy'
@COvirtNetwork COvirtNetwork added Bug broken, incorrect, or confusing behavior needs-triage labels Jan 11, 2022
@welcome
Copy link

welcome bot commented Jan 11, 2022

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at [email protected]. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@COvirtNetwork
Copy link
Author

Have been investigating this issue with @garethgreenaway and @waynew

@Ch3LL
Copy link
Contributor

Ch3LL commented Jun 9, 2022

Closed by #61631

@Ch3LL Ch3LL closed this as completed Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Delta-Proxy Proxy-Minion VMware
Projects
None yet
Development

No branches or pull requests

5 participants