Replace usage of old unicode API removed in Py3.12/PEP 623 #1860

kxrob · 2022-04-14T13:50:43Z

The "legacy" Unicode object will be removed in Python 3.12
with deprecated APIs. All Unicode objects will be "canonical"
since then. See PEP 623 for more information.

Those old APIs were still used in pywin32:

PyUnicode_AsUnicode
PyUnicode_GetSize
PyUnicode_AS_UNICODE
PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE
PyUnicode_FromUnicode
PyUnicode_EncodeMBCS
u u# Z Z# in PyArg_Parse... format strings

kxrob · 2022-08-14T15:21:00Z

The first Py3.12 alpha should appear in October on github for testing the deprecation replacements fully: https://peps.python.org/pep-0693/

mhammond

Thanks for this! I still need to get my head around the new macros - it adds alot of complexity I'm still untangling. I do like the ability it has to provide the PyObject** for PyArg_ParseTuple.

I'm inclined to think that TmpWCHAR is the wrong vehicle for this though. Every current use of TmpWCHAR never calls the constructor that takes a WCHAR - almost every single case is used in a PyArg_ParseTuple("O")/ PyWinObject_AsWCHAR dance. So really, almost every existing use of TmpWCHAR should eventually move to this new mechanism.

ie, I think we should split this functionality - TmpWCHAR (with a new name eventually) should be just "takes ownership of memory allocated by PyMem_New()", probably moving to a void * once PyWinObject_AsWCHAR() has been killed in favor of the new mechanism. IOW, TmpWCHAR shouldn't be touched here other than a note to say it's deprecate/pending a rename/something.

A new object, say, PyWin_WCHAR or similar, which looks like you changes here, but unlike TmpWCHAR, never takes ownership of memory allocated elsewhere. It only supports construction/initialization via a PyObject *. The WCHAR * it holds is always allocated internally (and I suspect there's a future optimization here, where if we can determine the object's PyUnicode_KIND is PyUnicode_4BYTE_KIND we could still borrow the buffer?)

I think that would simplify things significantly. As mentioned though, I'm still getting my head around the new macros, even after staring at this for a while, so (a) I need to think more about them but (b) I really hope there's something that can be done to make that part of this easier to understand.

WDYT?

I'm fine if you don't have the time or inclination to take this on though, in which case I'll have a bit of a poke in a week or 2 - but I really do appreciate your work here!

PyUnicode_AS_UNICODE removed in Py3.12 (PEP 623) / e6f7299

TmpWCHAR does most of the conversions and now required memory handling. Replace PyUnicode_GetSize. > The "legacy" Unicode object (buffered wstr in unicode objects) > will be removed in Python 3.12 with deprecated APIs. All Unicode > objects will be "canonical" since then. See PEP 623 for more > information. Those deprecated API were still used in pywin32: * PyUnicode_AsUnicode * PyUnicode_GetSize * PyUnicode_AS_UNICODE * PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE * PyUnicode_FromUnicode * PyUnicode_EncodeMBCS * u u# Z Z# in PyArg_Parse... format strings

and replace PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE.

kxrob · 2022-09-05T09:15:15Z

I still need to get my head around the new macros - it adds alot of complexity I'm still untangling. I do like the ability it has to provide the PyObject** for PyArg_ParseTuple. ... hope there's something that can be done to make that part of this easier to understand.

This strange macro mechanism (U2WREC, U2WCONV, u2w ..) and auxiliary union attribute stuff (handling source/target addresses for conversion) probably just should be dropped:
It came only with the last 2 commits (replace "u" and "Z" in PyArg_Parse... ). The only purpose was to not repetitively make extra PyObject* intermediate variables and extra statements doing PyWinObject_AsWCHAR in the PyArg_ParseTuple... use cases (like the cases where TmpWCHAR was used previously ~100x) - but doing it within TmpWCHAR itself and the PyArg_ParseTuple statement and to automatically provide specific help text upon error.
But there are only few new use cases for that. So that strange mechanism does not pay off and its not worth to compress those existing 100 cases. So its probably best to just remove those 2 commits, keep it easy to read, and write out those few new PyArg_ParseTuple use cases using the existing style. Regarding the single big bulk use case in the last commit (bulk "Z") a specific local macro and aux. array(s) can be used for conversions in a loop.
So I'm simply going to drop those 2 commits so far - handling the u and Z arg parse cases in an extra PR later ...

almost every existing use of TmpWCHAR should eventually move to this new mechanism

(well, it doesn't save typing and testing anymore, may not be worth touching and organizing all this in a well readable way... ?)

A new object, say, PyWin_WCHAR or similar, which looks like you changes here, but unlike TmpWCHAR, never takes ownership of memory allocated elsewhere. It only supports construction/initialization via a PyObject *.

Besides the above dropped mechanism TmpWCHAR so far would only gain the function to do auto PyUnicode_AsWideCharString at assignment / construction time (2nd commit). That could become an extra class / name / sub class as well. But so far there is not really a separate purpose (freeing the held temp string).

if we can determine the object's PyUnicode_KIND is PyUnicode_4BYTE_KIND we could still borrow the buffer?

For potentially saving a PyUnicode_AsWideCharString in the (rare?) case of PyUnicode_2BYTE_KIND, it seems the canonical state must be guaranteed first (PyUnicode_READY(), extra cost?, otherwise the string representation could change suddenly), then checked (again). There is also a (non-canonical?) PyUnicode_WCHAR_KIND. Is Py_UCS2* / PyUnicode_2BYTE_KIND always a valid NULL terminated Windows WCHAR string? If this works, is fast and is worth it, there would be an extra PyObject* (inc/decref) to the python string in the holder - being a flag at the same time. Could still happen in the same class.

mhammond · 2022-09-17T04:10:47Z

Sorry for the delay and thanks for persevering!

mhammond mentioned this pull request Aug 14, 2022

Use new string APIs and add memory safety for PyWinObject_AsPARAM #1927

Merged

kxrob force-pushed the _py312_uc branch from 9ddd6fb to 35ba1d4 Compare August 21, 2022 20:50

kxrob marked this pull request as ready for review August 21, 2022 21:12

kxrob force-pushed the _py312_uc branch from 35ba1d4 to cdaca4b Compare August 26, 2022 07:19

mhammond reviewed Sep 4, 2022

View reviewed changes

kxrob added 4 commits September 5, 2022 10:58

Py3.12: replace PyUnicode_EncodeMBCS with PyUnicode_AsMBCSString

10ab2ba

PyUnicode_AS_UNICODE removed in Py3.12 (PEP 623) / e6f7299

Py3.12: replace PyUnicode_AS_UNICODE

0c2011a

and replace PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE.

Py3.12: replace PyUnicode_FromUnicode

7359be4

kxrob force-pushed the _py312_uc branch from cdaca4b to 7359be4 Compare September 5, 2022 09:15

kxrob mentioned this pull request Sep 16, 2022

Replace deprecated "u" and "Z" formats in PyArg_Parse calls #1947

Merged

mhammond merged commit 5f8bf1c into mhammond:main Sep 17, 2022

This was referenced Mar 12, 2024

Pypy3 mattip/pywin32#1

Closed

Limited support of the PyPy #1289

Open

Avasam mentioned this pull request Jul 28, 2024

Remove obsolete is_platform_unicode marked for removal 16 years ago #2343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace usage of old unicode API removed in Py3.12/PEP 623 #1860

Replace usage of old unicode API removed in Py3.12/PEP 623 #1860

kxrob commented Apr 14, 2022

kxrob commented Aug 14, 2022

mhammond left a comment

kxrob commented Sep 5, 2022

mhammond commented Sep 17, 2022

Replace usage of old unicode API removed in Py3.12/PEP 623 #1860

Replace usage of old unicode API removed in Py3.12/PEP 623 #1860

Conversation

kxrob commented Apr 14, 2022

kxrob commented Aug 14, 2022

mhammond left a comment

Choose a reason for hiding this comment

kxrob commented Sep 5, 2022

mhammond commented Sep 17, 2022