Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace deprecated PyUnicode_FromUnicode(NULL, size) calls (#998) #1002

Merged
merged 1 commit into from
Jul 14, 2022

Conversation

bkline
Copy link
Contributor

@bkline bkline commented Jan 2, 2022

Current versions of Python write a deprecation warning message to stderr, which breaks CGI scripts running under web servers which fold stderr into stdout, and likely breaks other software. This change replaces the deprecated calls with PyUnicode_New(size, maxchar). The accompanying code to populate the new objects has also been rewritten to use the new PyUnicode APIs.

I have included a unit test, but I assume others more familiar with the project will want to decide how to integrate it with the existing tests. I didn't fold the test in with any of the other sets, because the only sets in tests3 which were not DBMS-specific were for tests of the API, and this isn't a fix of API behavior (or rather, it's a fix of internals related to working with the Python APIs). The test fails without the fix and passes with it.

This is my second shot at a PR for this issue. My first attempt took the instructions in the deprecation warning and in the documentation for PyUnicode_New() at face value, but that wasn't sufficient, and the pipeline tests didn't all pass. So I dug deeper into the documentation for all of the PyUnicode APIs and PEP 393 and did some more extensive rewriting of the two parts of pyodbc which were calling PyUnicode_FromUnicode(NULL, size). Assuming this second attempt passes the pipeline tests, I would very much like some very close scrutiny of this patch. I am very confident that my experience with the Python C APIs is much less extensive than the seasoned maintainers for the project, and the documentation is thinner than I would have liked, and downright buggy in places (for example, the documentation claims that PyUnicode_CopyCharacters() returns 0 on success, but I figured out from digging through the Python API's own C code that the function actually returns the number of characters copied; note also the mixup in the argument names—the second to should be from).

int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)

Copy characters from one Unicode object into another. This function performs character conversion when necessary and falls back to memcpy() if possible. Returns -1 and sets an exception on error, otherwise returns 0.

Fixes #998

…er#998)

Current versions of Python write a deprecation warning message to
stderr, which breaks CGI scripts running under web servers which
fold stderr into stdout. Likely breaks other software. This change
replaces the deprecated calls with PyUnicode_New(size, maxchar).
The accompanying code to populate the new objects has also been
rewritten to use the new PyUnicode APIs.
@gordthompson
Copy link
Collaborator

With increasing adoption of Python 3.10 this will affect more and more people.

@mkleehammer , @v-chojas - Care to have a look?

@mkleehammer
Copy link
Owner

Wow, lots of great work. Thanks. I'll spend some time looking through this carefully.

@mkleehammer
Copy link
Owner

I wonder - as nice as this is, would it be better to hurry the v5? A big part of the complexity is dealing with multiple encodings due to 2.7. Python itself can deal with merging the different encodings into a single string trivially when using ''.join(a). We could rely on Python more if we had a single version and possibly if we had the core in Python instead of C.

Thoughts?

@bkline
Copy link
Contributor Author

bkline commented Mar 27, 2022

I wonder - as nice as this is, would it be better to hurry the v5?

Well, sure. I only opened #998 (and the PRs) because I had never gotten a response to the question I asked last year in #917.

Should I open a separate ticket to get this fixed on a faster track than #945?

Not complaining. We're all busy. 😉

@eabase
Copy link

eabase commented May 5, 2022

Any reason why this PR is not yet merged?

Is it possible this is also breaking the output of Pandas output when read into a DF from fetching an SQL result?

I'm referring to these SO questions:

@alanblyth
Copy link

This is a blocker for me, would be great to see this merged.

@mkleehammer mkleehammer merged commit 1f05bb6 into mkleehammer:master Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecation warning breaks CGI scripts
5 participants