Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Py_FileSystemDefaultEncoding in 3.12+ #86

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

rtobar
Copy link
Contributor

@rtobar rtobar commented Jun 9, 2024

This has been deprecated, and hence emits build warnings, which will eventually become errors.

Thus avoid it using the same mechanism that's already being used in similar situations.

First seen in https://github.com/ICRAR/ijson/actions/runs/8996107055/job/24712097518#step:7:81

hswong3i added a commit to alvistack/python-cffi-cffi that referenced this pull request Jun 13, 2024
    git clean -xdf
    tar zcvf ../python-cffi_1.16.0.orig.tar.gz --exclude=.git .
    debuild -uc -us
    cp python-cffi.spec ../python-cffi_1.16.0-1.spec
    cp ../python*-cffi*1.16.0*.{gz,xz,spec,dsc} /osc/home\:alvistack/python-cffi-cffi-1.16.0/
    rm -rf ../python*-cffi*1.16.0*.* ../python*-cffi-backend*1.16.0*.*

See python-cffi#86

Signed-off-by: Wong Hoi Sing Edison <[email protected]>
@rtobar
Copy link
Contributor Author

rtobar commented Jun 26, 2024

This is a gentle ping to see if this can be considered for merging

@arigo
Copy link
Contributor

arigo commented Jun 26, 2024

It fails to build, though, at least in the automated tests on CPython 3.13 on Linux...

@arigo
Copy link
Contributor

arigo commented Jun 26, 2024

PyConfig is a type name, not a global structure name. I couldn't find much information about how we're supposed to replace Py_FileSystemDefaultEncoding, but it seems to be more involved than this PR does.

@rtobar
Copy link
Contributor Author

rtobar commented Jun 26, 2024

@arigo ouch, sorry about that! From the very simple deprecation notes I assumed that PyConfig was a global object that one could inspect.

Like you, doing some digging left me a bit confused as to how to do this properly. One option seems to be to invoke sys.getfilesystemencoding(), which seems a tad too complex, but is what one would do from Python. The other looks like declaring a PyConfig config objet on the stack, call PyConfig_Read(&config), and then inspect the correct attribute. The documentation of the function isn't crystal clear to me though, and it would seem like the whole.mechanism is more expensive that what it should: the interpreter already has a PyConfig object initialised (but there's no way to access it from the C API), which is what sys uses internally, so we shouldn't need to create and initialise/read one again.

I'd be happy to do something in the direction of invoking sys, does that sound ok?

@arigo
Copy link
Contributor

arigo commented Jun 26, 2024

That sounds over-complicated. I'd rather go with a few extra lines that are at least explicit (untested code, assuming Python 3.x):

    if (!PyArg_ParseTuple(args, "U|i:load_library", &s, &flags))
        return NULL;
    *p_temp = PyUnicode_EncodeFSDefault(s);   // there's already logic to free this later
    if (*p_temp == NULL)
        return NULL;
    filename_or_null = PyBytes_AS_STRING(*p_temp);

Most of this is removing code for PY_VERSION_MINOR < 3, removing the
guards for PY_VERSION_MAJOR >= 3, and removing all unnecessary macros
that would now have a single definition (e.g., PyText_Check ->
PyUnicode_Check) in favour of using the direct Python C API for clarity.
Only in minor circumstances some small logic needed to be changed.

Signed-off-by: Rodrigo Tobar <[email protected]>
Similarly to the previous commit, the bulk of the work is removing code
for PY_VERSION_HEX < 0x03080000, removing #if guards around
PY_VERSION_HEX >= 0x03080000, and removing unnecessary macro
definitions.

Signed-off-by: Rodrigo Tobar <[email protected]>
The block was previously at the bottom of the function, making it a bit
more difficult to follow the overall logic of the function. Moving this
win32-specific error to the specific scope where it is triggered on
should make it less confusing.

Signed-off-by: Rodrigo Tobar <[email protected]>
This avoid the need to use a goto statement and a win32 ifdef, making
the logic a tad more difficult to follow.

Signed-off-by: Rodrigo Tobar <[email protected]>
It's always UTF8-encoded, so there's no need for "maybe".

Signed-off-by: Rodrigo Tobar <[email protected]>
@nitzmahone nitzmahone added this to the 1.18 milestone Sep 6, 2024
This is so it's a bit easier to follow, and to allow easier refactoring
of the higher-level function later on.

Signed-off-by: Rodrigo Tobar <[email protected]>
This allows us to have an quick return in the dlopen(None) case instead
of having to follow the main main b_dlopen function all the way to the
end.

Signed-off-by: Rodrigo Tobar <[email protected]>
In python 3.12 the Py_FileSystemDefaultEncoding global variable has been
deprecated. This was used in combination with a "et" PyArgs format
specifier to get a raw C string with the encoded bytes for the input
filename.

This commit replaces the usage of Py_FileSystemDefaultEncoding with a
manual call to PyUnicode_EncodeFSDefault instead (only within
b_do_dlopen_posix). This function requires a unicode object as an input,
and therefore we now need to accept the argument with the "U" format
specifier. While this is a departure from the "et" specifier, which
accepts not only strings, but also bytes and bytearrays, this is
inconsequential, since the higher-level cffi API already ensures we get
a string object here. Moreover, we were already calling PyUnicode_AsUTF8
on it, which would have failed if we received bytes. Thus this new
constrain is fine.

With this change there's no need anymore to keep around the raw C
pointer in the b_do_dlopen() function, and it can be moved down to
b_do_dlopen_posix(), where it doensn't need to be freed (as it's now
taken from a PyBytes object via PyBytes_AsString). We can also now
remove the unnecessary "s" variable since the filename_unicode variable
(previously used only in the win32 case, now always) holds the same
value.
With the clarity that the input filename has to be a str object at this
stage, we can now bring together the common logic that the win32 and
posix branches took.

Notice how the win32 implementation previously clearer the error if
PyArgs_ParseTuple failed, only to try again with the "et" format variant
(now "U" as per the previous commit). As noted in the previous comment
this was moot, both because the filename argument is guaranteed to be a
string at this point, and because it was used in PyUnicode_AsUTF8, which
would have failed if it wasn't given a str object.

Signed-off-by: Rodrigo Tobar <[email protected]>
This way the higher-level b_do_dlopen function now doesn't have to worry
about OS-specific logic, and can concentrate on dealing with the
different types of arguments instead.

Signed-off-by: Rodrigo Tobar <[email protected]>
Instead of having different flavours of PyArgs_ParseTuple we can instead
use a single one where all arguments are optional, then check the type
of the first argument (when present).

The main difference in this new implementation is that instead of
raising a TypeError if the argument is not a str object in the final
case, we simply assert it, making it clear that the higher-level API
that calls us ensures this is the case.

Signed-off-by: Rodrigo Tobar <[email protected]>
@rtobar rtobar force-pushed the deprecated-fsdefaultencoding branch from 944561d to 431d751 Compare September 6, 2024 07:16
@rtobar
Copy link
Contributor Author

rtobar commented Sep 6, 2024

Mmmm.m.. I reworked this quite a bit , but was waiting for the pre-3.8 C code removal before putting an update in this PR. I of course forgot that it automatically updates if I push to my repo branch, which I did to double-check that win/macos builds were working (I tested all individual commits locally in Linux).

I'll mark this PR as Draft to signal that I'm not looking to merge this until the pre-3.8 C code removal is merged; after that I'm happy to discuss this in further detail.

@rtobar rtobar marked this pull request as draft September 6, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants