-
Notifications
You must be signed in to change notification settings - Fork 800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace usage of old unicode API removed in Py3.12/PEP 623 #1860
Conversation
The first Py3.12 alpha should appear in October on github for testing the deprecation replacements fully: https://peps.python.org/pep-0693/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I still need to get my head around the new macros - it adds alot of complexity I'm still untangling. I do like the ability it has to provide the PyObject**
for PyArg_ParseTuple
.
I'm inclined to think that TmpWCHAR is the wrong vehicle for this though. Every current use of TmpWCHAR
never calls the constructor that takes a WCHAR
- almost every single case is used in a PyArg_ParseTuple("O")
/ PyWinObject_AsWCHAR
dance. So really, almost every existing use of TmpWCHAR
should eventually move to this new mechanism.
ie, I think we should split this functionality - TmpWCHAR
(with a new name eventually) should be just "takes ownership of memory allocated by PyMem_New()", probably moving to a void *
once PyWinObject_AsWCHAR()
has been killed in favor of the new mechanism. IOW, TmpWCHAR
shouldn't be touched here other than a note to say it's deprecate/pending a rename/something.
A new object, say, PyWin_WCHAR
or similar, which looks like you changes here, but unlike TmpWCHAR
, never takes ownership of memory allocated elsewhere. It only supports construction/initialization via a PyObject *
. The WCHAR * it holds is always allocated internally (and I suspect there's a future optimization here, where if we can determine the object's PyUnicode_KIND
is PyUnicode_4BYTE_KIND
we could still borrow the buffer?)
I think that would simplify things significantly. As mentioned though, I'm still getting my head around the new macros, even after staring at this for a while, so (a) I need to think more about them but (b) I really hope there's something that can be done to make that part of this easier to understand.
WDYT?
I'm fine if you don't have the time or inclination to take this on though, in which case I'll have a bit of a poke in a week or 2 - but I really do appreciate your work here!
PyUnicode_AS_UNICODE removed in Py3.12 (PEP 623) / e6f7299
TmpWCHAR does most of the conversions and now required memory handling. Replace PyUnicode_GetSize. > The "legacy" Unicode object (buffered wstr in unicode objects) > will be removed in Python 3.12 with deprecated APIs. All Unicode > objects will be "canonical" since then. See PEP 623 for more > information. Those deprecated API were still used in pywin32: * PyUnicode_AsUnicode * PyUnicode_GetSize * PyUnicode_AS_UNICODE * PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE * PyUnicode_FromUnicode * PyUnicode_EncodeMBCS * u u# Z Z# in PyArg_Parse... format strings
and replace PyUnicode_GET_SIZE, PyUnicode_GET_DATA_SIZE.
This strange macro mechanism (U2WREC, U2WCONV, u2w ..) and auxiliary
(well, it doesn't save typing and testing anymore, may not be worth touching and organizing all this in a well readable way... ?)
Besides the above dropped mechanism TmpWCHAR so far would only gain the function to do auto PyUnicode_AsWideCharString at assignment / construction time (2nd commit). That could become an extra class / name / sub class as well. But so far there is not really a separate purpose (freeing the held temp string).
For potentially saving a PyUnicode_AsWideCharString in the (rare?) case of PyUnicode_2BYTE_KIND, it seems the canonical state must be guaranteed first (PyUnicode_READY(), extra cost?, otherwise the string representation could change suddenly), then checked (again). There is also a (non-canonical?) PyUnicode_WCHAR_KIND. Is Py_UCS2* / PyUnicode_2BYTE_KIND always a valid NULL terminated Windows WCHAR string? If this works, is fast and is worth it, there would be an extra |
Sorry for the delay and thanks for persevering! |
Those old APIs were still used in pywin32: