Freeze Python 2.7 and modernize codebase #945

mkleehammer · 2021-08-22T19:59:30Z

I'd like to look into splitting the codebase into Python 2 and Python 3 and modernizing the 3 version. There used to be a huge number of Python 2 downloads every day, but Python 3 is now ~95% of the pypi download traffic.

https://pypistats.org/packages/pyodbc

Does anyone know the best way to do this? My first inclination is to make a new v5 that is only the modern version and maintain the 4.x branch for security fixes. Any other suggestions?

Some things I'm thinking about for a v5:

Update C API and consider using the stable ABI. If I understand correctly, it would allow new versions of Python to use the same binaries.
Would it be helpful to have a Python layer that loads the C implementation as _pyodbc? Does this give the project more flexibility to introduce items in whichever layer is easiest?
More thoroghly integrate the parameter binding of per-value and fast_executemany so there is only 1 function for encoding each.
Provide a connection factory to replace the hidden CnxnInfo. It would be eaiser to configure pre-connect items on the factory and then ask for the connect. (We might need an optional password on the connect for those that don't want it stored globally.)
Switch to pytest

hugovk · 2021-09-14T08:19:05Z

Many projects make/tag a last release for Python 2.7, then bump the major version and move on.

2.7 has been EOL for 20 months now so people using it aren't getting security releases, and most of the top-downloaded PyPI packages have already dropped it (pip, setuptools, certifi, numpy, cryptography, pytest...).

Make sure you have python_requires>=3.6 in the new version, then pip won't install the new version for folk on Python 2.7.

keitherskine · 2022-03-06T22:53:01Z

It's been a while since @mkleehammer raised this issue but I think everything that @mkleehammer mentioned in his original post are great ideas. Dropping Python 2.7 support makes perfect sense and will make the codebase much easier to maintain. As I understand it, we more-or-less have to start using the stable API for Unicode strings soon because Python 3.12 won't work with pyodbc in its current form (there are already a bunch of deprecation warnings during builds). I also like the idea of putting the C code into a _pyodbc module and creating a wrapper Python pyodbc module that calls it. This would make the pyodbc API easier to understand, and make code changes easier too.

I've been looking at the options for making these changes. Although we could potentially update the code to use the stable ABI first, i.e. before dropping 2.7, it seems to me that Python 2.7 just makes everything more complex. Dropping 2.7 will require a lot more work than just deleting some old code, but I still think it's worth doing that first. Happy to hear thoughts on that though.

So here are my thoughts on dropping Python 2.7 from pyodbc version 5.x:

put an update in the README.md file stating this is the intention, preferably with a date
create a new 4.x release from master, there have been a bunch of changes since the last release in Aug 2021 so we should put out a new release before making any serious code changes
once the new release is out there and considered stable, create a new branch from master called something like "py27" which will be used long-term for any Python 2.7 security/bug fixes
create a feature branch from master for the Python 3 work and start removing 2.7 code from it
when that work is complete (!), release it as pyodbc version 5.0.0

After Python 2.7 has been dropped we can then move on to use the stable ABI and do the other changes. All thoughts welcome.

@mkleehammer @gordthompson

@hugovk

* Add support for Python 3.10, drop EOL 3.5 (mkleehammer#952) * Remove duplicate entry in pyi stub (mkleehammer#979) * Replace deprecated SafeConfigParser with ConfigParser (mkleehammer#953) * Designate connection string as optional (mkleehammer#987) * Fix spelling typos (mkleehammer#985) Co-authored-by: Gord Thompson <[email protected]> * Fix for DSN Names with non-ASCII chars (mkleehammer#951) * Fix for DSN Names with non-ASCII chars Fixes: mkleehammer#948 Co-authored-by: bamboo <[email protected]> Co-authored-by: Gord Thompson <[email protected]> * Added InterfaceError to pyodbc.pyi. (mkleehammer#1013) Co-authored-by: Benjamin Holder <[email protected]> * Upgrade deprecated unicode encoding calls (mkleehammer#792) * Do not include .pyc artifacts in source tarball mkleehammer#742 * Build wheels with cibuildwheels on GitHub Actions Fixes mkleehammer#175 Ref mkleehammer#688 Closes mkleehammer#668 Closes mkleehammer#685 Fixes mkleehammer#441 and pretty much most issues that mention ` sql.h: No such file or directory` This also need to setup some PyPI keys for automated uploads. * Install unixodbc-dev for Linux wheels * Enable GitHub Actions for pull requests * Use Debian based `manylinux_2_24` image * `apt-get` update before installing in wheel build * Use PEP 440 version name required for wheels * Skip building 32-bit wheels * 4.0.dev0 for default version, because test_version() wants 3 parts here Checked this won't shadow released minor version (credit goes to @hugovk) >>> from packaging.version import Version >>> Version("4.0.dev0") > Version("4.0.24") False * Had to use Debian image for PyPy too * Disable PyPy wheels https://cibuildwheel.readthedocs.io/en/stable/options/#build-selection PyPy is missing some C functions that `pyodbc` needs. * Update README.md * Avoid error when testing with DSN= connection Fixes: mkleehammer#1000 * Disable setencoding/setdecoding in tests3/pgtests.py Fixes: mkleehammer#1004 * Adjust test_columns() in tests3/pgtests.py for newer driver versions Fixes: mkleehammer#1003 * Move driver version check out of function * Add comment to _get_column_size() * Fix memory leak with decimal parameters Fixes: mkleehammer#1026 * Create codeql-analysis.yml * Bugfix/sql param data memory leak (mkleehammer#703) * Updated .gitignore * * Created a test file for the specific scenario * * Updated doc of test file for the specific SQLParamData scenario * * Fixed the test file for the specific SQLParamData scenario by Py_XDECREF the PyObject with 1 reference. * * Improved the test to close the cursor and set it to None, then forcing the gc * * Changed the fix of the memory leak and updated the test. * * Removed redundant empty line * * Converted tabs to spaces * * Moved variable out of conn's scope * Update gitignore, remove duplicated * Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (mkleehammer#998) Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout. Likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs. * Making pyodbc compatible with PostgreSQL infinity dates, returning MINYEAR and MAXYEAR to python, instead of values out of python's limits * Removing autoformat from code * Removing autoformat from code * Add odbc_config support on mac and m1 homebrew dir * Note EOL of 2.7 support in README (mkleehammer#945) * Fix version of CI generated wheels The CI system is checking out exact tags like "git checkout 4.0.33", which results in a detached HEAD. The version calculation was adding the commit hash. * Fix for mkleehammer#1082 libraries in Linux wheels (mkleehammer#1084) * use argparse instead of optparse (mkleehammer#1089) Co-authored-by: Hugo van Kemenade <[email protected]> Co-authored-by: Alex Nelson <[email protected]> Co-authored-by: Kian Meng, Ang <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: bamboo <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: bdholder <[email protected]> Co-authored-by: Benjamin Holder <[email protected]> Co-authored-by: Inada Naoki <[email protected]> Co-authored-by: Michael Fladischer <[email protected]> Co-authored-by: Anatoli Babenia <[email protected]> Co-authored-by: Francisco Morales <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: Michael Kleehammer <[email protected]> Co-authored-by: Gilad Leifman <[email protected]> Co-authored-by: Bob Kline <[email protected]> Co-authored-by: Leandro Scott <[email protected]> Co-authored-by: Jordan Mendelson <[email protected]> Co-authored-by: Keith Erskine <[email protected]>

mkleehammer · 2022-11-16T04:11:20Z

My original plans for v5 were more ambitous than the time I've been able to put in, so I think Keith's suggested path is best. I'll save major architecture changes for a later version.

Drop Python 2 support
Continue Python 2 (and maybe 3) v4 security fixes?
Convert tests to pytest
Switch to stable ABI

The switch to the ABI would ensure pyodbc libraries are ready for new Python versions before those versions are even officially released. There is a potential performance hit, so I'd want to do that work on a separate "abi" branch and create at least one good performance test I could use with PostgreSQL and MyODBC locally.

I'm going to delete the current v5 branch and create a new one.

Thoughts?

keitherskine · 2022-11-17T01:15:46Z

Here's a few of my thoughts about things we could do to improve pyodbc going forward. Much of this does not involve changing the core C++ code, but there's still plenty of things we can do to modernize the project.

Drop Python 2.7
- Seems kinda fundamental for us to move forward. Not doing so makes everything more complicated.
- The "master" branch will continue to carry the evolving latest-and-greatest Python 3 code.
- Optionally, create a new branch "py27" to manage the legacy Python 2.7 code.
- Strip out Python 2.7 code from the entire codebase - C++ extensions, tests, CICD, etc. (probably best to do this in pieces over time).
Build and deploy
- Figure out a versioning strategy for both deployment and development work. In version 5, I'm hoping we will aim for incremental minor releases (5.0.0, 5.1.0. 5.2.0, etc.) rather than incremental patch releases (5.0.1, 5.0.2, 5.0.3) as in version 4. We may also want to consider "release candidate" labels between deployments (e.g. 5.3rc). Finally, I'm not sure I totally understand the package versions generated by setup.py based on commits, e.g. "4.0.35b14". Are they truly needed?
- Drop the use of distutils in setup.py, which will be removed in 3.12 (use setuptools instead).
- PEP517 - update pyproject.toml and setup.py as necessary. We can probably move much of the static info into pyproject.toml from setup.py.
Unit Tests
- Organize the tests into folders for each database, to make test discovery simpler.
- Expand the use of tox for running unit tests. tox creates virtual environments (in a top-level ".tox" folder) which keeps the testing isolated and running them easy.
- Use pytest to run the unit tests. This will help discovery further by including all test files simply by their filenames, instead of lumping all our tests into one script (for each database) as we do currently.
- Reconsider the mechanism of adding .pyd files to PYTHONPATH? It would be good to just use tox if possible. Besides, with PEP517, generating .pyd files might not be so easy in the future. Calling setup.py directly is discouraged now and much of setup.py might be moved into pyproject.toml anyway.
- Tidy up the unit tests. They have evolved incrementally over many years and perhaps there's case for giving them a spring clean.
Codebase
- Upgrade the C++ code to not use the legacy Unicode APIs (PyUnicode_AsUnicode, etc.) which have been removed from Python 3.12.
- Create a Python-based top-level module that imports the C extensions? This would make the pyodbc API less opaque.

I'm no C++ developer, so I'll leave the codebase to others, but I'm happy to tackle many of the non-C++ ideas here.

keitherskine · 2022-11-20T14:24:12Z

Not sure where this comment should go, so I'll just add it here. I stumbled across the PyPi downloads database recently. Kinda interesting to see what systems are downloading which pyodbc files from PyPi. Here's some sample SQL you might want to take a look at:

SELECT file.project, file.version, file.filename, details.distro.name AS distro_name, details.system.name AS system_name, details.system.release AS system_release, count(*) as cnt
FROM `bigquery-public-data.pypi.file_downloads`
WHERE DATE(timestamp) = "2022-11-18"
  AND file.project = 'pyodbc'
  AND file.version = '4.0.35'
GROUP BY file.project, file.version, file.filename, details.distro.name, details.system.name, details.system.release
ORDER BY count(*) desc, file.project DESC, file.version DESC, file.filename, details.distro.name, details.system.name, details.system.release;

Looks like Azure is our biggest customer by far, on Ubuntu. Also, Linux downloads dominate, followed by Windows then Macs, not surprisingly, It's interesting to see which of the many wheels we publish are actually used.

mkleehammer · 2022-11-21T17:16:03Z

That is interesting. I'm surprised at how many source downloads there are for the same OS version that has wheels.

I was looking at something similar here: https://pypistats.org/packages/pyodbc

In particular I was surprised that there are almost 6K Python 2.7 downloads per day stil.

mkleehammer · 2022-11-21T17:19:01Z

I discarded my old v5 work and restarted a couple of days ago to slim it down. I've pushed it on branch py3. I think it is probably best to not use a MR until it gets to a baseline. In particular, I want to get rid of the Unicode APIs that are going to be removed in 3.12.

keitherskine · 2022-11-25T17:00:36Z

Just a thought, but rather than dropping Python2.7 largely all in one go, we could break up this transition into incremental parcels of work, each step of which would maintain a working pyodbc codebase (whether deployed or not).

The first step could be to make a minimalistic PR that just shows pyodbc no longer supports Python 2.7 (or 3.6). That PR would just:

in setup.py, change python_requires to ">=3.7" and VERSION to "5.0.0"
remove Python 2.7 and Python 3.6 from .github\workflows\ubuntu_build.yml
remove Python 2.7 and Python 3.6 from appveyor.yml

For example, PR #1137 . That's all that would be needed to signify a break from Python 2, although if we wanted to do more to setup.py as described below, that would be fine too.

After that, there are various reasonably self-contained parcels of work, which could be done in roughly this order:

Modernize setup.py by removing references to distutils (use setuptools instead) and remove Python2 references. Also, potentially: remove the git-based versioning(?), remove the cmdclass references(?). Lastly, we could explore moving the static info like "name" and "description" into pyproject.toml, but that's not at all necessary right now. Something for the future perhaps.
Update the CI pipelines to use pytest when calling the unit test scripts. This work has been proposed in PRs 1129 use pytest in run_tests() #1129 and 1130 database-specific unit test folders, updated test scripts #1130. It includes moving database-specific tests into their own subdirectories within /tests3, e.g. /tests3/mysql/.
Drop Python2.7 from the AppVeyor pipeline code. This probably involves just deleting appveyor\compile.cmd for the most part, although there are other Python2.7 references to be removed as well (included in database-specific unit test folders, updated test scripts #1130 ).
Remove Python 2.7 from the C++ codebase. This involves stripping out all the PY_MAJOR_VERSION macros and discarding other Python2-specific constructions. A lot of work.
Updating the C++ codebase to not use c-extension functions that are going to be removed in Python 3.12, as described in issue917 Deprecated PyUnicode functions that must be updated #917.
Updating the C++ codebase to use only the stable ABI.
Delete the /tests2 directory and rename the /tests3 directory to the more standard "/tests". There are various references to "tests3" in the repo that will need to be updated accordingly too.
Update the /tests3 (or /tests) unit test scripts to use pytest-style tests rather than unittest-style tests. This might be as simple as just removing the unittest.TestCase class wrappers but I suspect this might be more complicated than that. Also, the test cases themselves could probably do with a thorough review.

I believe each of the above steps can be done separately, rather than taking a "big-bang" approach. After this Python3 transition, there is still more work to be done of course, like perhaps having a Python pyodbc front-end and a "_pyodbc" C++ backend, etc. But this transition to Python3-only would be a major step foward.

mkleehammer · 2023-05-09T06:29:17Z

I just built the py3 branch with a debug version of Python 3.12 and the PostgreSQL tests pass.

I did comment out the fast executemany code temporarily until I can port it also. I need to study it first and I'd really like to see if the two code paths could share the core binding code.

I also wonder if the feature should be renamed to "row wise binding" or "array binding" or something less generic than fast-executemany, mainly to give users an idea of what ODBC feature it is. Otherwise, people probably wonder why we don't just use "fast" and how they can determine if their DB supports it.

ndmlny-qs · 2023-08-11T22:03:12Z

@mkleehammer & @keitherskine I'd like to start tackling some of the items discussed above as I have time to devote to helping the community that uses pyodbc. The first thing I would like to help with is transitioning tests to use pytest. I can also help with documentation, example usage, and possibly even adding a github pages about the project that builds API documentation.

Let me know what y'all think would be best for me to start with. You can also tag me on issues you want done.

mkleehammer · 2023-08-26T21:21:09Z

I've merged 5.0 into the master branch and released 5.0.0 alpha 2. Please see the discussion here

mkleehammer mentioned this issue Aug 22, 2021

Deprecated PyUnicode functions that must be updated #917

Closed

bkline mentioned this issue Mar 27, 2022

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

Merged

mkleehammer added a commit that referenced this issue Jul 14, 2022

Note EOL of 2.7 support in README (#945)

f28d235

mkleehammer added this to 5.0 Release Apr 11, 2023

mkleehammer added this to the 5.0 Python 3 Only milestone Apr 11, 2023

mkleehammer closed this as completed Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freeze Python 2.7 and modernize codebase #945

Freeze Python 2.7 and modernize codebase #945

mkleehammer commented Aug 22, 2021 •

edited

Loading

hugovk commented Sep 14, 2021

keitherskine commented Mar 6, 2022

mkleehammer commented Nov 16, 2022

keitherskine commented Nov 17, 2022 •

edited

Loading

keitherskine commented Nov 20, 2022

mkleehammer commented Nov 21, 2022

mkleehammer commented Nov 21, 2022

keitherskine commented Nov 25, 2022 •

edited

Loading

mkleehammer commented May 9, 2023

ndmlny-qs commented Aug 11, 2023

mkleehammer commented Aug 26, 2023

Freeze Python 2.7 and modernize codebase #945

Freeze Python 2.7 and modernize codebase #945

Comments

mkleehammer commented Aug 22, 2021 • edited Loading

hugovk commented Sep 14, 2021

keitherskine commented Mar 6, 2022

mkleehammer commented Nov 16, 2022

keitherskine commented Nov 17, 2022 • edited Loading

keitherskine commented Nov 20, 2022

mkleehammer commented Nov 21, 2022

mkleehammer commented Nov 21, 2022

keitherskine commented Nov 25, 2022 • edited Loading

mkleehammer commented May 9, 2023

ndmlny-qs commented Aug 11, 2023

mkleehammer commented Aug 26, 2023

mkleehammer commented Aug 22, 2021 •

edited

Loading

keitherskine commented Nov 17, 2022 •

edited

Loading

keitherskine commented Nov 25, 2022 •

edited

Loading