-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freeze Python 2.7 and modernize codebase #945
Comments
Many projects make/tag a last release for Python 2.7, then bump the major version and move on. 2.7 has been EOL for 20 months now so people using it aren't getting security releases, and most of the top-downloaded PyPI packages have already dropped it (pip, setuptools, certifi, numpy, cryptography, pytest...). Make sure you have |
It's been a while since @mkleehammer raised this issue but I think everything that @mkleehammer mentioned in his original post are great ideas. Dropping Python 2.7 support makes perfect sense and will make the codebase much easier to maintain. As I understand it, we more-or-less have to start using the stable API for Unicode strings soon because Python 3.12 won't work with pyodbc in its current form (there are already a bunch of deprecation warnings during builds). I also like the idea of putting the C code into a I've been looking at the options for making these changes. Although we could potentially update the code to use the stable ABI first, i.e. before dropping 2.7, it seems to me that Python 2.7 just makes everything more complex. Dropping 2.7 will require a lot more work than just deleting some old code, but I still think it's worth doing that first. Happy to hear thoughts on that though. So here are my thoughts on dropping Python 2.7 from pyodbc version 5.x:
After Python 2.7 has been dropped we can then move on to use the stable ABI and do the other changes. All thoughts welcome. |
* Add support for Python 3.10, drop EOL 3.5 (mkleehammer#952) * Remove duplicate entry in pyi stub (mkleehammer#979) * Replace deprecated SafeConfigParser with ConfigParser (mkleehammer#953) * Designate connection string as optional (mkleehammer#987) * Fix spelling typos (mkleehammer#985) Co-authored-by: Gord Thompson <[email protected]> * Fix for DSN Names with non-ASCII chars (mkleehammer#951) * Fix for DSN Names with non-ASCII chars Fixes: mkleehammer#948 Co-authored-by: bamboo <[email protected]> Co-authored-by: Gord Thompson <[email protected]> * Added InterfaceError to pyodbc.pyi. (mkleehammer#1013) Co-authored-by: Benjamin Holder <[email protected]> * Upgrade deprecated unicode encoding calls (mkleehammer#792) * Do not include .pyc artifacts in source tarball mkleehammer#742 * Build wheels with cibuildwheels on GitHub Actions Fixes mkleehammer#175 Ref mkleehammer#688 Closes mkleehammer#668 Closes mkleehammer#685 Fixes mkleehammer#441 and pretty much most issues that mention ` sql.h: No such file or directory` This also need to setup some PyPI keys for automated uploads. * Install unixodbc-dev for Linux wheels * Enable GitHub Actions for pull requests * Use Debian based `manylinux_2_24` image * `apt-get` update before installing in wheel build * Use PEP 440 version name required for wheels * Skip building 32-bit wheels * 4.0.dev0 for default version, because test_version() wants 3 parts here Checked this won't shadow released minor version (credit goes to @hugovk) >>> from packaging.version import Version >>> Version("4.0.dev0") > Version("4.0.24") False * Had to use Debian image for PyPy too * Disable PyPy wheels https://cibuildwheel.readthedocs.io/en/stable/options/#build-selection PyPy is missing some C functions that `pyodbc` needs. * Update README.md * Avoid error when testing with DSN= connection Fixes: mkleehammer#1000 * Disable setencoding/setdecoding in tests3/pgtests.py Fixes: mkleehammer#1004 * Adjust test_columns() in tests3/pgtests.py for newer driver versions Fixes: mkleehammer#1003 * Move driver version check out of function * Add comment to _get_column_size() * Fix memory leak with decimal parameters Fixes: mkleehammer#1026 * Create codeql-analysis.yml * Bugfix/sql param data memory leak (mkleehammer#703) * Updated .gitignore * * Created a test file for the specific scenario * * Updated doc of test file for the specific SQLParamData scenario * * Fixed the test file for the specific SQLParamData scenario by Py_XDECREF the PyObject with 1 reference. * * Improved the test to close the cursor and set it to None, then forcing the gc * * Changed the fix of the memory leak and updated the test. * * Removed redundant empty line * * Converted tabs to spaces * * Moved variable out of conn's scope * Update gitignore, remove duplicated * Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (mkleehammer#998) Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout. Likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs. * Making pyodbc compatible with PostgreSQL infinity dates, returning MINYEAR and MAXYEAR to python, instead of values out of python's limits * Removing autoformat from code * Removing autoformat from code * Add odbc_config support on mac and m1 homebrew dir * Note EOL of 2.7 support in README (mkleehammer#945) * Fix version of CI generated wheels The CI system is checking out exact tags like "git checkout 4.0.33", which results in a detached HEAD. The version calculation was adding the commit hash. * Fix for mkleehammer#1082 libraries in Linux wheels (mkleehammer#1084) * use argparse instead of optparse (mkleehammer#1089) Co-authored-by: Hugo van Kemenade <[email protected]> Co-authored-by: Alex Nelson <[email protected]> Co-authored-by: Kian Meng, Ang <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: bamboo <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: bdholder <[email protected]> Co-authored-by: Benjamin Holder <[email protected]> Co-authored-by: Inada Naoki <[email protected]> Co-authored-by: Michael Fladischer <[email protected]> Co-authored-by: Anatoli Babenia <[email protected]> Co-authored-by: Francisco Morales <[email protected]> Co-authored-by: Gord Thompson <[email protected]> Co-authored-by: Michael Kleehammer <[email protected]> Co-authored-by: Gilad Leifman <[email protected]> Co-authored-by: Bob Kline <[email protected]> Co-authored-by: Leandro Scott <[email protected]> Co-authored-by: Jordan Mendelson <[email protected]> Co-authored-by: Keith Erskine <[email protected]>
My original plans for v5 were more ambitous than the time I've been able to put in, so I think Keith's suggested path is best. I'll save major architecture changes for a later version.
The switch to the ABI would ensure pyodbc libraries are ready for new Python versions before those versions are even officially released. There is a potential performance hit, so I'd want to do that work on a separate "abi" branch and create at least one good performance test I could use with PostgreSQL and MyODBC locally. I'm going to delete the current v5 branch and create a new one. Thoughts? |
Here's a few of my thoughts about things we could do to improve pyodbc going forward. Much of this does not involve changing the core C++ code, but there's still plenty of things we can do to modernize the project.
I'm no C++ developer, so I'll leave the codebase to others, but I'm happy to tackle many of the non-C++ ideas here. |
Not sure where this comment should go, so I'll just add it here. I stumbled across the PyPi downloads database recently. Kinda interesting to see what systems are downloading which pyodbc files from PyPi. Here's some sample SQL you might want to take a look at: SELECT file.project, file.version, file.filename, details.distro.name AS distro_name, details.system.name AS system_name, details.system.release AS system_release, count(*) as cnt
FROM `bigquery-public-data.pypi.file_downloads`
WHERE DATE(timestamp) = "2022-11-18"
AND file.project = 'pyodbc'
AND file.version = '4.0.35'
GROUP BY file.project, file.version, file.filename, details.distro.name, details.system.name, details.system.release
ORDER BY count(*) desc, file.project DESC, file.version DESC, file.filename, details.distro.name, details.system.name, details.system.release; Looks like Azure is our biggest customer by far, on Ubuntu. Also, Linux downloads dominate, followed by Windows then Macs, not surprisingly, It's interesting to see which of the many wheels we publish are actually used. |
That is interesting. I'm surprised at how many source downloads there are for the same OS version that has wheels. I was looking at something similar here: https://pypistats.org/packages/pyodbc In particular I was surprised that there are almost 6K Python 2.7 downloads per day stil. |
I discarded my old v5 work and restarted a couple of days ago to slim it down. I've pushed it on branch py3. I think it is probably best to not use a MR until it gets to a baseline. In particular, I want to get rid of the Unicode APIs that are going to be removed in 3.12. |
Just a thought, but rather than dropping Python2.7 largely all in one go, we could break up this transition into incremental parcels of work, each step of which would maintain a working pyodbc codebase (whether deployed or not). The first step could be to make a minimalistic PR that just shows pyodbc no longer supports Python 2.7 (or 3.6). That PR would just:
For example, PR #1137 . That's all that would be needed to signify a break from Python 2, although if we wanted to do more to After that, there are various reasonably self-contained parcels of work, which could be done in roughly this order:
I believe each of the above steps can be done separately, rather than taking a "big-bang" approach. After this Python3 transition, there is still more work to be done of course, like perhaps having a Python pyodbc front-end and a "_pyodbc" C++ backend, etc. But this transition to Python3-only would be a major step foward. |
I just built the py3 branch with a debug version of Python 3.12 and the PostgreSQL tests pass. I did comment out the fast executemany code temporarily until I can port it also. I need to study it first and I'd really like to see if the two code paths could share the core binding code. I also wonder if the feature should be renamed to "row wise binding" or "array binding" or something less generic than fast-executemany, mainly to give users an idea of what ODBC feature it is. Otherwise, people probably wonder why we don't just use "fast" and how they can determine if their DB supports it. |
@mkleehammer & @keitherskine I'd like to start tackling some of the items discussed above as I have time to devote to helping the community that uses pyodbc. The first thing I would like to help with is transitioning tests to use Let me know what y'all think would be best for me to start with. You can also tag me on issues you want done. |
I've merged 5.0 into the master branch and released 5.0.0 alpha 2. Please see the discussion here |
I'd like to look into splitting the codebase into Python 2 and Python 3 and modernizing the 3 version. There used to be a huge number of Python 2 downloads every day, but Python 3 is now ~95% of the pypi download traffic.
https://pypistats.org/packages/pyodbc
Does anyone know the best way to do this? My first inclination is to make a new v5 that is only the modern version and maintain the 4.x branch for security fixes. Any other suggestions?
Some things I'm thinking about for a v5:
The text was updated successfully, but these errors were encountered: