Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add netsnmp-cffi as alternative SNMP back-end #391

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

lunkwill42
Copy link
Member

Scope and purpose

Fixes #383.

This is a work in progress. At the time of writing, only trap-related tests are failing, as the trap-listening bits have now yet been ported to the new library.

netsnmp-cffi will be a CFFI-based binding to the Net-SNMP C library, which is expected to bring performance improvements to Zino 2.

Contributor Checklist

Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to Zino can be found in the
README.

  • Added a changelog fragment for towncrier
  • Added/amended tests for new/changed code
  • Added/changed documentation
  • Linted/formatted the code with black, ruff and isort, easiest by using pre-commit
  • The first line of the commit message continues the sentence "If applied, this commit will ...", starts with a capital letter, does not end with punctuation and is 50 characters or less long. See https://cbea.ms/git-commit/
  • If applicable: Created new issues if this PR does not fix the issue completely/there is further work to be done

The OID class is designed to represent a tuple of ints.  It was never
designed to contain non-int elements (although it can translate non-int
values to int elements when instances are constructed).

The Zino code has bugs that actually would feed the OID class with
non-int elements, and because the OID class accepted it, strange
workarounds have been written by those who did not understand that
this was not the intended behavior.

This change enforces all elements to be int on instantiation -
presumably at the cost of instantiation becoming slower.
An `Identifier` was intended to represent an object's MIB name,
symbolic object name and numeric OID index.  However, due to a
misunderstanding of how PySNMP works, this was not always the case:

`getMibSymbol` will actually break down the remaining OID index to
one or more symbolic indices according to the MIB specification, so
the third element of the `getMibSymbol()` call may actually be a tuple
of various objects that represents individual indices.

While this could be helpful in the codebase, it was not the intention
of `_convert_varbind` and `Identifier`.  To keep Zino's interface
as intended here, this change tries to unroll all the symbolic indices
back to a single OID object - likely with some performance penalty.

We might want to look into changing the codebase to work with symbolic
indexes at a later time.
bgpstatemonitortask.py contained some strange workarounds for behaviors
that were actually bugs in the `_convert_varbind` implementation.
These bugs have been fixed, causing the workarounds to fail: This fixes
the failures by removing the workarounds.
@lunkwill42 lunkwill42 self-assigned this Jan 22, 2025
@lunkwill42 lunkwill42 force-pushed the feature/use-netsnmpy-cffi branch 3 times, most recently from e5eee1d to 28aed68 Compare January 22, 2025 12:35
In preparation for a new upcoming SNMP back-end, this moves the pysnmp-
specific back-end to a submodule of `zino.snmp` so an interchangeable
module can be made more easily.

The _mib_value_to_python function was imported outside of the original
module, so it was renamed to be public instead of internal in order
for imports to keep working.

Tests and patches that referenced old names were updated, and all
mibdump files were also moved.
These classes and type definitions aren't specific for the PySNMP
back-end and are therefore moved to a new module from where they
can be shared.
The poll device `hcounters` attribute is really what drives the SNMP
version selection in Zino.  Unfortunately, the default is SNMP v1, and
this has worked because PySNMP silently allowed v2c requests like
BULK-GET to be sent over sessions that were otherwise v1.

Net-SNMP does not allow the sending of v2c PDUs over v1 sessions, so
we need to be more strict about this.  It also makes no sense to default
to v1 in 2025, so this sets `hcounters`' default value to True and
fixes a test that implicitly assumed v1 exceptions to be raised.
The table index for an address like `10.0.0.42` will be
`.1.4.10.0.0.42`, where `1 is the address type and `4` is the address
length - while the remainder is the address itself.  We don't really
need the `1` and the `4` to detect the address type, as `ip_address`
will parse things correctly if given just a series of bytes.
I don't generally think it is a good idea to check for very specific
log output.  In this case, the non-int object type returned by the
underlying library may depend on the library.  netsnmpy-cffi does not
generally translate octet strings to `str` unless they have a
textual convention of `DisplayString`.  That has really no bearing on
the actual functionality in this case, so it's quite weird to test
for such specific log messages.
The underlying netsnmpy session object will not work with mock values
for addresses and community string, so its better to use an actual
mocked PollDevice object rather than a 100% Mock.
@lunkwill42 lunkwill42 force-pushed the feature/use-netsnmpy-cffi branch from 28aed68 to 9fafb5b Compare January 22, 2025 12:45
Copy link

codecov bot commented Jan 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.16%. Comparing base (1e9df5c) to head (6951c0b).
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #391      +/-   ##
==========================================
- Coverage   98.66%   98.16%   -0.50%     
==========================================
  Files          77       80       +3     
  Lines        9711     9879     +168     
==========================================
+ Hits         9581     9697     +116     
- Misses        130      182      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lunkwill42 lunkwill42 force-pushed the feature/use-netsnmpy-cffi branch 5 times, most recently from 1af8f28 to dc8ae09 Compare January 23, 2025 13:49
This ensures all MIB lookups work properly when running the test suite.
MIBs should be vendored verbatim.
This vendors in all the MIB definitions required to run Zino.  PySNMP
will work with versions translated to Python modules, Net-SNMP needs the
original definitions.
The netsnmp-cffi package i still in pre-alpha and has no proper
release yet.  It will be moved to GitHub and released on PyPI in
due time.
This specifically reaches into the PySNMP-based back-end for handling
traps, thereby making the test suite green again.
The workflow may need to build netsnmp-cffi from scratch if no pre-made
wheels are available.
I.e. have Net-SNMP load MIBs from its defined locations
There is no need to get a new SNMP session here, the instance
already has one.
Before this, every Task instance would create its own SNMP session
instance.  This might not mean much under PySNMP, but is wasteful
using netsnmp-cffi, as each open session takes up a socket resource.
Having multiple simultaneous sessions/sockets per poll device makes
no sense, so `get_snmp_session()` utilizes a `WeakValueDictionary`
to keep track of the existing session for each device and returns
an existing session if there is one.
These test patches will not work now that everything uses
`zino.snmp.get_snmp_session()` - the SNMP class needs to be patched
in the correct location.
@lunkwill42 lunkwill42 force-pushed the feature/use-netsnmpy-cffi branch from dc8ae09 to 6951c0b Compare January 24, 2025 09:47
@lunkwill42
Copy link
Member Author

Initial comparisons of PySNMP-based (master) Zino vs. Net-SNMP-based (this PR) Zino

I ran Zino for exactly 10 minutes with a version of polldevs.cf that contained 222 routers from the Uninett research network.

Preliminary conclusion is that the time spent developing netsnmp-cffi was worthwhile, as it provides a significant performance improvement (bugs notwithstanding)

Resource usage for PySNMP version

494.50s user 2.64s system 82% cpu 10:00.02 total

Resource usage for Net-SNMP version

78.03s user 4.94s system 13% cpu 10:00.03 total

@lunkwill42
Copy link
Member Author

For the record: This PR does not yet replace the TrapListener server component of Zino with a Net-SNMP equivalent. The existing PySNMP implementation is kept here. However, processing of intermittent traps is not where Zino spends the majority of its time - I also currently have no provisions to receive trap messages from the research network on my workstation, so it's hard to measure the impact here.

Besides fixing bugs and/or refactoring the current implementation, making the TrapListener replaceable is the next step: A split implementation will process MIB information in two different libraries, which can potentially cause issues that would be hard to debug (if, e.g., different MIB modules are loaded into the two separate implementations).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zino 2.0.0 (beta) has a fairly large performance/scaling issue
1 participant