-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PyBytes_Join() function #36
Comments
Looks good to me! I agree that We can't change that behaviour of |
I agree. I don't think that it would be a good idea to change the behavior because the function exists since forever in Python (ex: it exists in Python 2.7 with NULL treated as a whitespace). |
FYI: This originates from the default for string.join() (the module function) in Python 2. The default separator was a blank. It's been the default for PyUnicode_Join() ever since the API was added to Python. Today, I would not allow for this corner case anymore, though. Passing in NULL as first argument is bound to mask potential errors in code, |
Isn't it possible to change its behaviour using the proper deprecation process? (or is it impossible because it's part of the stable ABI that we cannot touch it like this?) |
It's possible, yes. But it would mean that everyone who uses it this way needs to update their code. It would be quite cruel of us to do it without a very good reason. |
Woulnd't be the following be legitimate reasons: 1) it's not documented 2) it's something introduced for Python 2, 3) it would be inconsistent with |
None of those are reasons to change it.
|
All usages of the current private |
To avoid the same error masking issue as with PyUnicode_Join() I'd suggest to not use NULL as default parameter, but instead a use separate macro PY_BYTES_EMPTY or perhaps even an interned and immortal singleton Py_BYTES_EMPTY (haven't checked whether we already have something like this). |
Had a look... we already have something like this in form of |
You can just do |
Thanks for mentioning this. I wasn't aware of that new API: https://docs.python.org/3.14/c-api/object.html#c.Py_GetConstant Unfortunately, this returns a strong reference, so you'd still have the ref count manage the object instead of just doing There is |
We could also use |
Or have the
|
Both solutions sound like a good alternative approach. Petr's version would even solve the potential issue with My concern is mostly about passing in NULL as the first parameter, since you normally would pass in the object you want to work on as this parameter. A forgotten NULL check could then easily result in the join function doing it's job and leaving a dangling error around which would then show up at some later point in the execution of the program - which is really hard to debug. I've run into such issues too often to not pay close attention to this anymore. While this can be an issue with other parameters as well, the first one is special, since working on NULLs is rather uncommon 😄 |
I like this approach. |
I propose to:
|
I like those suggestions. When you say "don't accept NULL in |
Sounds good. |
I mean PyErr_BadInternalCall() yes, raise SystemError. |
To summarize:
Should this change be backported to 3.12 and 3.13 as well without notice? Or should it only be a 3.14 change?
|
I think that in this case we may add a SystemError with more specific error message (similar to these that are raised when C implemented function returns non-NULL with an error set). It can also be chained with the original exception. But this is an implementation detail. I would prefer to add special references for empty After adding |
It's all personal opinions now. As for me, I don't like using one Python object to stand in for another. Do we even have a precedent for C API taking I'd prefer any of:
|
If we can, I would also prefer it. Returning an empty string or using an empty string might be common enough (for instance search for We seem to have |
I'm fine with either
|
Accepting NULL is causing too much trouble:
I prefer to abandon the NULL idea at this point. |
I'm not aware of any existing C API doing that, so maybe Py_None is a bad idea here, especially because getting an empty bytes string became cheap and easy (Py_GetConstantBorrowed) in Python 3.13. |
Ok, let's vote on the simple API: sep must always be a Python bytes object (it cannot be NULL, it cannot be Py_None). API:
Vote: |
If we go with the above proposal, please add a macro to return a borrowed reference to the empty bytes constant (= |
Less ready for a C API, true, but more ready for a generic native API (that can support languages other than C), as well as more ready for a thread-aware API. It was a worthwhile change.
Yeah, I think this is worth adding ourselves. |
Could we do it for all known constants to be consistent? there are multiple places where empty str/bytes are being returned and some files use local helpers for that. I think we can have a PR only for this (namely implement a correspondence between constants and macros and remove those local helpers). |
Provided there are no name conflicts, sure. Macros are cheap, and I believe all of these constants are already immortal/true-constant, which means there's no likely future where refcounting will actually matter. We do want to deprecate functions that return borrowed references, as they make refcounting very complicated. But these constants are effectively tagged pointers now rather than live objects (the refcount is still writable, but properly-built extensions will leave it alone, and they are interpreter- and thread-agnostic), so whether strong or borrowed isn't a big deal. |
@mdboom: You didn't vote yet in #36 (comment) - what's your call on this API? |
The C API Working Group adopted |
API:
PyObject* PyBytes_Join(PyObject *sep, PyObject *iterable)
Similar to
sep.join(iterable)
in Python.sep must be Python
bytes
object.iterable must be an iterable object yielding objects that implement the buffer protocol.
On success, return a new
bytes
object. On error, set an exception and returnNULL
.UPDATE: Don't accept sep=NULL.
It's different than
PyUnicode_Join(NULL, iterable)
which treats NULL separator as a whitespace (' '
). ThisPyUnicode_Join()
behavior is not documented. ThePyUnicode_Join()
documentation only says:The text was updated successfully, but these errors were encountered: