Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId #86172

Closed
serhiy-storchaka opened this issue Oct 11, 2020 · 10 comments
Closed
Labels
3.10 only security fixes extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@serhiy-storchaka
Copy link
Member

BPO 42006
Nosy @vstinner, @serhiy-storchaka, @hongweipeng
PRs
  • bpo-42006: Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId. #22648
  • bpo-42006: Stop using PyDict_GetItemString() #23487
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-10-11.14:36:40.426>
    labels = ['extension-modules', 'interpreter-core', '3.10']
    title = 'Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId'
    updated_at = <Date 2020-11-24.07:31:32.848>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2020-11-24.07:31:32.848>
    actor = 'hongweipeng'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules', 'Interpreter Core']
    creation = <Date 2020-10-11.14:36:40.426>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42006
    keywords = ['patch']
    message_count = 6.0
    messages = ['378437', '378460', '378462', '378463', '379643', '379645']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'serhiy.storchaka', 'hongweipeng']
    pr_nums = ['22648', '23487']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue42006'
    versions = ['Python 3.10']

    @serhiy-storchaka
    Copy link
    Member Author

    There are design flaws in PyDict_GetItem(), PyDict_GetItemString() and _PyDict_GetItemId().

    1. They suppress all exceptions raised internally. Most common of are MemoryError, KeyboardInterrupt and RecursionError whose raising is out of control of the programmer. When an exception is suppressed the function returns incorrect result. For example PyDict_GetItem() can return NULL for existing key if the user press Ctrl-C.

    This situation seems impossible if the key and all keys in the dict are exact strings, but we cannot be sure.

    1. PyDict_GetItem() preserves the current exception, and therefore can be used in handling exception but PyDict_GetItemString() and _PyDict_GetItemId() clear it if the creation of string object is failed (due to MemoryError or, less likely, UnicodeDecodeError). It can leads to very unexpected result.

    Most of uses of PyDict_GetItem() etc were fixed in bpo-35459 and other issues. But some occurrences were left, in cases when there is no any error handling, either errors are not expected or all exceptions are suppressed in any case. It is mainly in the compiler and symtable (where we look up in just created dicts), in structseq.c and in the sys module.

    The proposed PR (it can be split on several smaller PRs if needed, see also bpo-41993 and bpo-42002) removes all calls of PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId, replacing them with calls of PyDict_GetItemWithError, _PyDict_GetItemStringWithError and _PyDict_GetItemIdWithError or other functions if appropriate.

    _PyDict_GetItemId is no longer used and can be removed.

    @serhiy-storchaka serhiy-storchaka added 3.10 only security fixes extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 11, 2020
    @vstinner
    Copy link
    Member

    See also old issues about PyDict_GetItemWithError():

    @vstinner
    Copy link
    Member

    PyDict_GetItem() behavior is an old wart. Would it make sense to plan a migration plan to slowly move towards PyDict_GetItem() behaving as PyDict_GetItemWithError()?

    CPython is an old code base full of corner cases. But outside CPython, is it common to rely on the current PyDict_GetItem() behavior (ignore errors, can be called with an exception raised)?

    (A) For example, would it make sense to start to emit a DeprecationWarning when it's called with an exception raised? Currently, the function calls _PyErr_Fetch(): checking if exc_type is non-NULL requires to add an if which would add a cost to all PyDict_GetItem() calls. But IMO it would be interesting to fix this function in the long term.

    (B) Also, would it make sense to start to no longer ignore base exceptions like SystemExit, KeyboardInterrupt or MemoryError in PyDict_GetItem()?

    How can we take a decision on that? Analyze the C code base of a bunch of popular C extension? Make the change and await to see how many people complain?

    In the long term, I would prefer that PyDict_GetItem() behaves as other C functions: it should not be called with an exception raised (the caller has to store/restore the current exception), and raises an exception on error. I expect better performance for such more regular and simpler behavior.

    @vstinner
    Copy link
    Member

    I don't want to hold fixing the code right now. If we decide to break the backward compatibility of the C API, a transition plan will likely take a few years anyway, whereas this issue should be first as soon as possible.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset fb5db7e by Serhiy Storchaka in branch 'master':
    bpo-42006: Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId. (GH-22648)
    fb5db7e

    @serhiy-storchaka
    Copy link
    Member Author

    My plan is to deprecate PyDict_GetItem() and functions with same design flaw and finally remove them (in 4.0?).

    We cannot start to ignore exceptions in PyDict_GetItem(). It would not fix the original flaw when the user code does not check an exception after using PyDict_GetItem(), and only make it worse. It can cause crashes in the code which asserts that the exception is not set (for example when function returns value and sets exception). And in worst case it will lead to incorrect results when PyErr_Occurred() is used after calling other C API function.

    Emitting a DeprecationWarning only when PyDict_GetItem() is called with an exception raised does not make much sense, because the problem with PyDict_GetItem() is not only in this case. Also, there is a trick in using warnings here, you should use PyErr_WriteUnraisable(), so warnings always will be printed and cannot be turned into exceptions.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel
    Copy link
    Member

    See also #106033.

    @erlend-aasland
    Copy link
    Contributor

    See also #106033.

    ISTM this can be closed as superseded.

    @erlend-aasland
    Copy link
    Contributor

    @serhiy-storchaka, can we close this?

    @vstinner
    Copy link
    Member

    vstinner commented Oct 8, 2023

    My plan is to deprecate PyDict_GetItem() and functions with same design flaw and finally remove them (in 4.0?).

    Start maybe with a soft deprecation in Python 3.13?

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants