Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Python 3.9 compatibility #2352

Merged
merged 18 commits into from
Feb 22, 2022
Merged

Fix Python 3.9 compatibility #2352

merged 18 commits into from
Feb 22, 2022

Conversation

lunkwill42
Copy link
Member

It was apparent that NAV was not compatible with Python 3.9, which is the default Python version on our current target production platform of Debian 11 (Bullseye).

This PR sets out to:

  • Enable testing on Python 3.9 in our CI matrix
  • Fix any compatibility issues discovered by our test suite
  • Update the Docker Compose-based development environment containers to run on Python 3.9.

The major showstopper appears to be the changed behavior of select.epoll.unregister() in Python 3.9. See issue 39239 in the Python bug tracker for the rationale behind this change. This caused an incompatibility between pynetsnmp-2, Twisted and NAV that this PR resolves within NAV by introducing a derivative event reactor for Twisted.

The new reactor could also have been introduced in pynetsnmp-2 itself, since the issue only arises due to how file descriptors are opened and closed outside of the control of Python code in the NET-SNMP library. Arguably, the issue should really be solved in Twisted itself, but they consider these errors to be due to faulty client code, and we don't have time to wait for this to be resolved by someone else. See the commit log of 75c0f1f for a more detailed explanation of the issues involved.

Although not ultimately a very large PR, I recommend reviewing commit-by-commit :-)

@lunkwill42 lunkwill42 added this to the 5.3.0 milestone Feb 21, 2022
@lunkwill42 lunkwill42 requested a review from hmpf February 21, 2022 13:40
@codecov
Copy link

codecov bot commented Feb 21, 2022

Codecov Report

Merging #2352 (2a9bdca) into master (2da4bef) will increase coverage by 0.02%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2352      +/-   ##
==========================================
+ Coverage   58.07%   58.09%   +0.02%     
==========================================
  Files         552      553       +1     
  Lines       40206    40228      +22     
==========================================
+ Hits        23349    23370      +21     
- Misses      16857    16858       +1     
Impacted Files Coverage Δ
python/nav/web/api/v1/views.py 74.34% <ø> (ø)
python/nav/ipdevpoll/epollreactor2.py 95.23% <95.23%> (ø)
python/nav/ipdevpoll/jobs.py 85.80% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2da4bef...2a9bdca. Read the comment docs.

We expect to move to Debian Bullseye-based servers in production in the
near future, and we need to ensure NAV works on Python 3.9, which is the
default Python version on that distro.
The old version was not compatible with Python 3.9, and we are trying to
stay compatible in the range 3.7-3.9 at the moment.
Graph.node has been deprecated in favor of Graph.nodes.
We need this to work an all Python 3.9 related issues in dev, since
Python 3.9 is the default Python version on the latest Debian stable
distro.

NET-SNMP has jumped from approx 5.7 to 5.9 (libsnmp30 vs libsnmp40), and
python3-gammu is no longer available in Debian (but we can get it from
PyPI if needed).
Version 22 and newer seems to trigger some obscure new errors with
pip-compile, and who has time for that?

Ref:
https://stackoverflow.com/questions/70946286/pip-compile-raising-assertionerror-on-its-logging-handler
I have no idea why pip-sync was being used at all, and also no idea why
this seemingly worked under the previous Debian release:

pip-sync will remove packages that arent in the requirements file, and
packages that came from Debian weren't necessarily in there - meaning
pip-sync actually trashed the supervisord installation that we depend
on (and maybe other things, as well!)
To trace memory leaks and other issues with C-interfacing Python
modules, like pynetsnmp, that NAV depends on, it is useful to install a
debugger (gdb) and debug symbols for the Python interpreter everything
runs under.
Python 3.9 introduces an insurmountable incompatibility between
pynetsnmp and Twisted - or least with Twisted's built-in epollreactor
implementation.

The NET-SNMP C library will handle SNMP file descriptors internally,
while pynetsnmp just ensures that new SNMP file descriptors are added to
the Twisted reactor, while stale ones are removed.

However, it now seems that removing a file descriptor from an epoll
object *after* it has been closed will raise an OSError with the
EBADF (Bad File descriptor) error number.

The epoll implementation in Python versions prior to 3.9 would ignore
EBADF errors. Python 3.9 intentionally removed this error handler. See
https://bugs.python.org/issue39239 and the Python docs for
`epoll.unregister()`:
https://docs.python.org/dev/library/select.html#select.epoll.unregister

ipdevpoll uses Twisted's epollreactor, and the epollreactor
implementation also does not handle this OSError, instead letting it
propagate to the client (because any errors are considered by Twisted to
be a problem with faulty client code).

We could choose to ignore the error - however, because epollreactor
doesn't handle it internally, it's internal data structures are left in
an inconsistent state, causing lots of KeyErrors inside reactor function
for the remainder of the process' lifetime.

This custom implementation inherits the epollreactor, but tries its best
to keep internal data structures intact when "bad file descriptors" are
being unregistered from the internal epoll object.
ipdevpoll tests were failing, due to still using the old reactor.
This was old, but surfaced as a linter error in the CI only now because
the jobs.py file was changed.
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@hmpf
Copy link
Contributor

hmpf commented Feb 22, 2022

Eh, there are 18 commits. What do you consider to be large? :)

@@ -80,7 +80,7 @@ COPY requirements.txt /
COPY tests/requirements.txt /test-requirements.txt
# Since we used pip3 to install pip globally, pip should now be for Python 3
RUN pip-compile --output-file /requirements.txt.lock /requirements.txt /test-requirements.txt
RUN pip-sync /requirements.txt.lock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pip-sync ensures that the versions in the reqs are locked in, but that is hardly needed in a Dockerfile since a new image needs to be built anyway if there are changes to dependencies.

@lunkwill42 lunkwill42 merged commit 3d1eb2d into master Feb 22, 2022
@lunkwill42 lunkwill42 deleted the python3.9 branch February 22, 2022 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants