Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

bkline · 2022-01-02T22:31:19Z

Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout, and likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs.

I have included a unit test, but I assume others more familiar with the project will want to decide how to integrate it with the existing tests. I didn't fold the test in with any of the other sets, because the only sets in tests3 which were not DBMS-specific were for tests of the API, and this isn't a fix of API behavior (or rather, it's a fix of internals related to working with the Python APIs). The test fails without the fix and passes with it.

This is my second shot at a PR for this issue. My first attempt took the instructions in the deprecation warning and in the documentation for PyUnicode_New() at face value, but that wasn't sufficient, and the pipeline tests didn't all pass. So I dug deeper into the documentation for all of the PyUnicode APIs and PEP 393 and did some more extensive rewriting of the two parts of pyodbc which were calling PyUnicode_FromUnicode(NULL, size). Assuming this second attempt passes the pipeline tests, I would very much like some very close scrutiny of this patch. I am very confident that my experience with the Python C APIs is much less extensive than the seasoned maintainers for the project, and the documentation is thinner than I would have liked, and downright buggy in places (for example, the documentation claims that PyUnicode_CopyCharacters() returns 0 on success, but I figured out from digging through the Python API's own C code that the function actually returns the number of characters copied; note also the mixup in the argument names—the second to should be from).

int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)

Copy characters from one Unicode object into another. This function performs character conversion when necessary and falls back to memcpy() if possible. Returns -1 and sets an exception on error, otherwise returns 0.

Fixes #998

…er#998) Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout. Likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs.

gordthompson · 2022-02-07T00:00:03Z

With increasing adoption of Python 3.10 this will affect more and more people.

@mkleehammer , @v-chojas - Care to have a look?

mkleehammer · 2022-03-27T19:27:35Z

Wow, lots of great work. Thanks. I'll spend some time looking through this carefully.

mkleehammer · 2022-03-27T19:38:56Z

I wonder - as nice as this is, would it be better to hurry the v5? A big part of the complexity is dealing with multiple encodings due to 2.7. Python itself can deal with merging the different encodings into a single string trivially when using ''.join(a). We could rely on Python more if we had a single version and possibly if we had the core in Python instead of C.

Thoughts?

bkline · 2022-03-27T20:29:48Z

I wonder - as nice as this is, would it be better to hurry the v5?

Well, sure. I only opened #998 (and the PRs) because I had never gotten a response to the question I asked last year in #917.

Should I open a separate ticket to get this fixed on a faster track than #945?

Not complaining. We're all busy. 😉

eabase · 2022-05-05T09:45:53Z

Any reason why this PR is not yet merged?

Is it possible this is also breaking the output of Pandas output when read into a DF from fetching an SQL result?

I'm referring to these SO questions:

alanblyth · 2022-06-07T10:12:19Z

This is a blocker for me, would be great to see this merged.

gordthompson mentioned this pull request Feb 10, 2022

Deprecation warning breaks CGI scripts #998

Closed

mkleehammer added the Investigating label Mar 27, 2022

mkleehammer merged commit 1f05bb6 into mkleehammer:master Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

bkline commented Jan 2, 2022

gordthompson commented Feb 7, 2022

mkleehammer commented Mar 27, 2022

mkleehammer commented Mar 27, 2022

bkline commented Mar 27, 2022

eabase commented May 5, 2022

alanblyth commented Jun 7, 2022

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

Conversation

bkline commented Jan 2, 2022

gordthompson commented Feb 7, 2022

mkleehammer commented Mar 27, 2022

mkleehammer commented Mar 27, 2022

bkline commented Mar 27, 2022

eabase commented May 5, 2022

alanblyth commented Jun 7, 2022