-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-99540: Constant hash for _PyNone_Type to aid reproducibility #99541
Conversation
yonillasky
commented
Nov 16, 2022
•
edited by bedevere-bot
Loading
edited by bedevere-bot
- Issue: Constant hash value for None to aid reproducibility #99540
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
f216cd7
to
0c08cac
Compare
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There is some value in making it dependent on the hash secret (PYTHONHASHSEED):
The cost of doing that is an extra global variable and a single calculation done once upon setting the hash secret. I think that is a reasonable cost - I'll try writing it that way. EDIT: now that I did, having second thoughts - it's a larger change to CPython, and perhaps people who run their code only on platforms with ASLR disabled today will be surprised by the "non-determinism by default" we'll be introducing here. The hash secret was introduced to negate chosen input complexity attacks on builtin hash sets/maps, which is not relevant here, as None is a singleton. The "fuzzing" use case is secondary. I'm willing to go with either solution. Personally, I will benefit more from the PYTHONHASHSEED dependent hash, but I don't know that it's the right thing to do for general use. |
c12fc81
to
961b41f
Compare
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
✅ Deploy Preview for python-cpython-preview canceled.
|
961b41f
to
f154d65
Compare
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
I'm waiting with the blurb update until we make a final decision whether to go with a hash-secret-dependent calculation or not. |
I have made the requested changes; please review again |
Thanks for making the requested changes! @ethanfurman: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a simple constant hash for None
.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Looks good. We'll wait a week to see if anyone else wants to comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How 0xFCA86420 is chosen?
Is it better than random value?
The constant was chosen basically at whim (that particular value amused me); it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomize If it was a truly random value (which it currently is if Python was compiled with ASLR), then program results can vary from one run to the next with no way to force them to be the same (since |
I just meant constant random number, like Since hashtable (dict and set) uses lower bits first, filling some lower bits can reduce hash collision against small constant ints. But this is not a big issue because None is just one singleton. |
Just noting that if a 64-bit literal is used, we'd presumably need to add a static check for 32-bit builds, and add a 32-bit "fallback" value for that case. |
Good point. Let's just use a 32-bit number instead. |
Actually, unless I'm missing something, that is a 32-bit number. |
I think that was in response to Inada-san's suggestion of |
Ah, that's what I was missing. I wish we had a reply directly to a comment ability. :-( |
* main: pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541) pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168) pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286) pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
* origin/main: (1306 commits) Correct CVE-2020-10735 documentation (python#100306) pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273) pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551) Docs: Use `PY_VERSION_HEX` for version comparison (python#100179) pythongh-97909: Fix markup for `PyMethodDef` members (python#100089) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205) pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469) pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541) pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168) pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286) pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278) Improve stats presentation for calls. (pythonGH-100274) Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295) pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277) Document that zipfile's pwd parameter is a `bytes` object (python#100209) pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271) Fix typo in introduction.rst (python#100266) ...
Thank you for this, it allowed to make my project deterministic! I dropped a comment about it there: https://stackoverflow.com/questions/7681786/how-is-hashnone-calculated/75296606#75296606 |
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
Is this merged into 3.12? There's no mention in the changelog. |
The PR did update the blurb, and it appears in What makes a given PR worthy of appearing in the changelog? |