You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On installations of NAV 5.5.0 and 5.5.1 that monitors Cisco devices, ipdevpoll
workers tend to freeze up with a strange error message/traceback:
pynetsnmp.netsnmp.SnmpError: get: snmp_send cliberr=0, snmperr=-11, errstring=b"Error building ASN.1 representation (Can't build OID for variable)"
Quickly followed by
builtins.AttributeError: 'NoneType' object has no attribute 'addCallbacks'
The error messages and tracebacks do not really contain enough information to know what the underlying problem is, but it appears that the effect is that the worker process stops collecting at this point, causing delays to all other collectionsjobs - and thereby disturbing the collection intervals for the 1minstats and 5minstats jobs.
The main symptom for those not watching the logs is that any system and traffic statistics collection appears to stop working.
To Reproduce
The problem appears to be readily reproducible by just running the ipdevpoll 1minstats job against any SNMP-enabled Cisco switch:
E.g.:
$ ipdepvolld -J 1minstats -n cisco-sw.example.org
Expected behavior
No crashing. Collection should continue in the face of most collection issues.
Tracebacks
In ipdevpoll.log, lots of these messages appear:
2022-11-09 13:22:03,025 [2789339] [ERROR zen.pynetsnmp.netsnmp] b'snmp_build: unknown failure\n'
2022-11-09 13:22:03,025 [2789339] [ERROR zen.pynetsnmp.netsnmp] b'g: Error building ASN.1 representation\n'
Unhandled error in Deferred:
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.9/site-packages/pynetsnmp/tableretriever.py", line 30, in v2v3how
return proxy._getbulk(0, min(maxRepetitions, limit), [oids])
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/snmp/common.py", line 65, in _wrapper
return func(*args, **kwargs)
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/snmp/common.py", line 134, in _getbulk
return super(AgentProxyMixIn, self)._getbulk(*args, **kwargs)
File "/opt/venvs/nav/lib/python3.9/site-packages/pynetsnmp/twistedsnmp.py", line 385, in _getbulk
return defer.fail(ex)
--- <exception caught here> ---
File "/opt/venvs/nav/lib/python3.9/site-packages/pynetsnmp/twistedsnmp.py", line 381, in _getbulk
self.defers[self.session.getbulk(nonrepeaters,
File "/opt/venvs/nav/lib/python3.9/site-packages/pynetsnmp/netsnmp.py", line 759, in getbulk
self._handle_send_status(req, send_status, 'get')
File "/opt/venvs/nav/lib/python3.9/site-packages/pynetsnmp/netsnmp.py", line 717, in _handle_send_status
raise SnmpError(msg_fmt % msg_args)
pynetsnmp.netsnmp.SnmpError: get: snmp_send cliberr=0, snmperr=-11, errstring=b"Error building ASN.1 representation (Can't build OID for variable)"
Unhandled Error
Traceback (most recent call last):
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/daemon.py", line 458, in start_ipdevpoll
process.run()
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/ipdevpoll/daemon.py", line 101, in run
reactor.run()
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/base.py", line 1283, in run
self.mainLoop()
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/base.py", line 1292, in mainLoop
self.runUntilCurrent()
--- <exception caught here> ---
File "/opt/venvs/nav/lib/python3.9/site-packages/twisted/internet/base.py", line 913, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/mibs/mibretriever.py", line 479, in _schedule_next
deferred = self.retrieve_column(column)
File "/opt/venvs/nav/lib/python3.9/site-packages/nav/mibs/mibretriever.py", line 442, in retrieve_column
deferred.addCallbacks(_result_formatter, _valueerror_handler)
builtins.AttributeError: 'NoneType' object has no attribute 'addCallbacks'
Environment (please complete the following information):
NAV 5.5.1 from source or Debian package.
Additional context
Through judicious use of log level settings, the problem has been tracked down to the statsystem being run to collect memory information from Cisco devices.
The MIB dump used to implement the new support for CISCO-ENHANCED-MEMPOOL-MIB is full of invalid OIDs, which the NET-SNMP library refuses to construct ASN.1 representations of. ipdevpoll in turn does not seem equipped to handle this low-level error.
This functionality was introduced in #2439 - but is unclear how manual tests of those changes could have been successful.
The problem itself is easily remedied by dumping a new representation of CISCO-ENHANCED-MEMPOOL-MIB using the smidump command line program, but in addition to a hotfix in itself, we really need:
Automated tests to find these types of problems
Improved error handling in ipdevpoll so that this type of error doesn't take down the entire collector process.
Describe the bug
On installations of NAV 5.5.0 and 5.5.1 that monitors Cisco devices, ipdevpoll
workers tend to freeze up with a strange error message/traceback:
pynetsnmp.netsnmp.SnmpError: get: snmp_send cliberr=0, snmperr=-11, errstring=b"Error building ASN.1 representation (Can't build OID for variable)"
Quickly followed by
builtins.AttributeError: 'NoneType' object has no attribute 'addCallbacks'
The error messages and tracebacks do not really contain enough information to know what the underlying problem is, but it appears that the effect is that the worker process stops collecting at this point, causing delays to all other collectionsjobs - and thereby disturbing the collection intervals for the
1minstats
and5minstats
jobs.The main symptom for those not watching the logs is that any system and traffic statistics collection appears to stop working.
To Reproduce
The problem appears to be readily reproducible by just running the ipdevpoll
1minstats
job against any SNMP-enabled Cisco switch:E.g.:
$ ipdepvolld -J 1minstats -n cisco-sw.example.org
Expected behavior
No crashing. Collection should continue in the face of most collection issues.
Tracebacks
In
ipdevpoll.log
, lots of these messages appear:Environment (please complete the following information):
NAV 5.5.1 from source or Debian package.
Additional context
Through judicious use of log level settings, the problem has been tracked down to the
statsystem
being run to collect memory information from Cisco devices.The MIB dump used to implement the new support for
CISCO-ENHANCED-MEMPOOL-MIB
is full of invalid OIDs, which the NET-SNMP library refuses to construct ASN.1 representations of. ipdevpoll in turn does not seem equipped to handle this low-level error.Example of one of the invalid objects:
nav/python/nav/smidumps/CISCO-ENHANCED-MEMPOOL-MIB.py
Lines 360 to 373 in 44b0314
The text was updated successfully, but these errors were encountered: