Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement __hash__ on all extensions we've implemented __eq__ on #2578

Closed
26 tasks done
reaperhulk opened this issue Dec 26, 2015 · 8 comments · Fixed by #4534
Closed
26 tasks done

Implement __hash__ on all extensions we've implemented __eq__ on #2578

reaperhulk opened this issue Dec 26, 2015 · 8 comments · Fixed by #4534

Comments

@reaperhulk
Copy link
Member

reaperhulk commented Dec 26, 2015

We've fixed this haphazardly in the past, but we should probably consistently implement __hash__ so extensions don't have odd behaviors like not being able to be inserted into hashable collections.

@reaperhulk reaperhulk added this to the Twelfth Release milestone Dec 26, 2015
alex added a commit to alex/cryptography that referenced this issue Dec 26, 2015
reaperhulk added a commit that referenced this issue Dec 26, 2015
Refs #2578 -- implement __hash__ on CRLNumber
alex added a commit to alex/cryptography that referenced this issue Dec 26, 2015
reaperhulk added a commit that referenced this issue Dec 26, 2015
Refs #2578 -- implement __hash__ on SubjectKeyIdentifier
alex added a commit to alex/cryptography that referenced this issue Dec 27, 2015
reaperhulk added a commit that referenced this issue Dec 27, 2015
Refs #2578 -- implement __hash__ on CRLReason
@reaperhulk
Copy link
Member Author

Many of the remaining extensions use a list of items as their internal storage. Do we want to do something like hash(tuple(list))...? This could potentially be pretty expensive.

@alex
Copy link
Member

alex commented Dec 27, 2015

bleh. let's take some time and think for those, I don't think we have
external requests for these features yet, so we can take our time

On Sun, Dec 27, 2015 at 5:00 PM, Paul Kehrer [email protected]
wrote:

Many of the remaining extensions use a list of items as their internal
storage. Do we want to do something like hash(tuple(list))...? This could
potentially be pretty expensive.


Reply to this email directly or view it on GitHub
#2578 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

@reaperhulk reaperhulk removed this from the Twelfth Release milestone Dec 28, 2015
alex added a commit that referenced this issue Jan 31, 2016
Refs #2578 -- implement __hash__ on AccessDescription
reaperhulk added a commit that referenced this issue Feb 1, 2016
Refs #2578 -- implement __hash__ on InhibitAnyPolicy
@Badg
Copy link
Contributor

Badg commented Aug 17, 2016

I'm going to preemptively apologize for an unintentionally long question, but: for those classes using lists for internal storage, is it a big deal if __hash__ just reinstates object.__hash__?

This would have the consequence that if, for example, two GeneralNames instances had the same internal self._general_names list, they would hash differently, and therefore be put into different buckets in dicts, etc. I'm asking because I honestly don't know: is it likely that this could result in problems?

I'm thinking about chipping away at a few of these (at what could only be described as a glacial pace, but hey, anything helps, and if I ended up blocking anyone I'd speed up), but I'm concerned about not violating the edict of "thou shalt not have changing hashes". If the internal storage iterables are ever mutated, it seems to me that the only safe option would be either to use object.__hash__ directly, or something functionally equivalent (a unique hash for each instance regardless of internal state).

I suppose there are some other strategies that could be used, too, but they all seem fairly dangerous. At a certain point, if two equivalent objects really, really should be put in the same hash bucket, it seems to me like it'd be better off to convert the internal list (ex GeneralNames._general_names) into a tuple to enforce immutability. But that has the potential to break things, and I'm definitely not familiar enough with the project internals to actually suggest that.

Thoughts?

@sejr
Copy link

sejr commented Jun 1, 2017

@alex I saw this issue as a starter project, and I would like to work on implementing __hash__ for the remaining extensions. Would you prefer each extension as its own PR, or would you like to group them together as a PR addressing this particular issue (#2578)?

@alex
Copy link
Member

alex commented Jun 1, 2017

Individual PRs please, it makes them much easier to review and get merged quickly! Thanks for looking into contributing!

@sejr
Copy link

sejr commented Jun 1, 2017

@alex I have been thinking about the comment from @Badg regarding __hash__() on classes with lists. I also consulted the Python documentation:

If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that an object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

My thoughts: since the data members of the AuthorityKeyIdentifier class (for example) are read-only, why don't we just construct the problematic members as tuples instead of lists?

In the AuthorityKeyIdentifier example; instead of:

if authority_cert_issuer is not None:
            authority_cert_issuer = list(authority_cert_issuer)

We could just do:

if authority_cert_issuer is not None:
            authority_cert_issuer = tuple(authority_cert_issuer)

This way, we wouldn't have to convert the list to a tuple every time __hash__() is called on these classes.

@alex
Copy link
Member

alex commented Jul 9, 2017

The attribute really shouldn't be a tuple -- tuples are not just immutable lists, they should not be treated as general purpose sequences. Until someone has a specific performance complaint, I'd much rather we convert to tuple in __hash__.

@reaperhulk
Copy link
Member Author

Completing this is now blocked on finishing #3461

@alex alex added this to the Twenty fourth release milestone Aug 29, 2018
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging a pull request may close this issue.

4 participants