Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[h7Q1epCi] Make hashing functions stable on all neo4j types #573

Merged
merged 5 commits into from
Feb 6, 2024
Merged

Conversation

loveleif
Copy link
Contributor

@loveleif loveleif commented Jan 23, 2024

The hash functions are only intended for strings, but unfortunately their signature allows all types. This is a safety for users that use hash functions on non string values despite the documentation.

@loveleif loveleif changed the title Hash Make hashing functions stable on all neo4j types. Jan 23, 2024
@loveleif loveleif changed the title Make hashing functions stable on all neo4j types. [h7Q1epCi] Make hashing functions stable on all neo4j types. Jan 23, 2024
@loveleif loveleif force-pushed the hash branch 2 times, most recently from 0e69d71 to e9c53e2 Compare January 23, 2024 13:14
@loveleif loveleif changed the title [h7Q1epCi] Make hashing functions stable on all neo4j types. [h7Q1epCi] Make hashing functions stable on all neo4j types Jan 23, 2024
@Lojjs Lojjs self-assigned this Jan 24, 2024
Copy link
Contributor

@Lojjs Lojjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, just some minor questions and remarks.

core/src/test/java/apoc/util/UtilsTest.java Show resolved Hide resolved
try (final var tx = db.beginTx()) {
final var node = tx.createNode(Label.label("HashFunctionsAreStable"));
final var rel = node.createRelationshipTo(node, RelationshipType.withName("R"));
for (int i = randStorables.size() - 1; i >= 0; --i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why this loop is going backwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! {a: 1, b: 2} should have the same hash as {b: 2, a: 1}, before we did toString, and that will simply use the iteration order (which depends on the map implementation). I wanted to try to make the properties come in another order (not sure it succeeded, depends on the map implementation and what kernel decides to do).

Comment on lines +422 to +425
final var first = a.get(0).entrySet().iterator().next();
for (final var e : a.get(0).entrySet()) {
assertEquals(message, first.getValue(), e.getValue());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain the point of this assertion? To me it looks like you are comparing something to itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, it's a bit messy. Let's look at an example:

We might run this query with $value=23, $list=List(23) and n.p1=23:

match (n)
return
  apoc.util.md5([n.p1]) as hash1,
  apoc.util.md5([$value]) as hash2
  apoc.util.md5($list) as hash3

This assertion is intended to make sure hash1, hash2 and hash3 have the same value. They should all equal each other, but in Cypher there can be subtle differences depending on where a value comes from. For example create (n {p: [1,2,3]}) return n.p will most probably not have exactly the same types internally as return [1,2,3] even though the two lists are equal and represents the same thing. Just wanted to make sure that such internal types do not affect the hash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining!

core/src/test/java/apoc/util/UtilsTest.java Outdated Show resolved Hide resolved
Comment on lines +144 to +147
* This is not the most efficient way to produce a hash, but it is backwards compatible.
* This function is only intended to be used on strings (as documented on the hash functions above)
* But it turns out that is not how everyone is using it, so as a safety we have stable implementations
* for all neo4j types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that you managed to keep backwards compatibility. Do you think we should mention something about the other types than strings in docs or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, perhaps we should point out that it is only meant for strings and other types can have bad performance?

Copy link
Contributor

@Lojjs Lojjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now

Make hashing functions stable on all neo4j types.
These functions are only intended for strings,
but unfortunately their signature allows all types.
This is a safety for users that use them for other values.
@loveleif loveleif merged commit e795883 into dev Feb 6, 2024
20 checks passed
@loveleif loveleif deleted the hash branch February 6, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants