docs: `std::hash::Hash` should ensure prefix-free data #89438

pierwill · 2021-10-01T17:45:24Z

Attempt to synthesize the discussion in #89429 into a suggestion regarding Hash implementations (not a hard requirement).

Closes #89429.

Closes rust-lang#89429

rust-highfive · 2021-10-01T17:45:27Z

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

pierwill · 2021-10-01T17:48:45Z

@cuviper

tczajka · 2021-10-01T17:58:24Z

library/core/src/hash/mod.rs

+/// ## Prefix collisions
+///
+/// Implementations of `hash` should ensure that the data they
+/// pass to the `Hasher` are prefix-free. That is, different concatenations


This explanation of what "prefix-free" means is incomplete. It should say that unequal values should cause two different byte sequences to be written, and neither of the two sequences should be a prefix of the other.

Note that it's not sufficient to say that concatenations of outputs of multiple values of the same type should result in different outputs. It has to be true when concatenated with outputs for other types as well (think about hashing (A, B)). That's where the prefix-free property comes in: the outputs will be different if all the types involved satisfy the prefix-free property.

Thanks @tczajka! I'm not sure I understand the idea of one sequence being a prefix of another. Does it simply mean "starts with", or is it another kind of relation? Is there a way we can rephrase this?

Another way to ask the question: in the example of ("ab", "c") and ("a", "bc") where and how would the "prefix" occur, and how does the extra byte prevent it?

A "prefix" is a beginning of a string, so it's same as "starts with". https://en.wikipedia.org/wiki/Prefix

If strings were hashed without the extra 0xff at the end, hashing ("ab", "c") and ("a", "bc") would write the same byte sequence "abc" to Hasher. The problem is that "a" is a prefix of "ab". Whereas "a\xff" is not a prefix of "ab\xff", so if Hash outputs these sequences instead that solves the problem. "ab\xffc\xff" != "a\xffbc\xff".

Note: \xff is not actually allowed in string literals, since it would be invalid UTF-8 -- which is also what makes it a useful separator here. You could really write those as byte strings though, b"ab\xffc\xff" != b"a\xffbc\xff".

pierwill · 2021-10-03T16:59:00Z

library/core/src/hash/mod.rs

@@ -153,9 +153,21 @@ mod sip;
 /// Thankfully, you won't need to worry about upholding this property when
 /// deriving both [`Eq`] and `Hash` with `#[derive(PartialEq, Eq, Hash)]`.
 ///
+/// ## Prefix collisions


Thinking more about this... "Collision" isn't the right term, here, is it?

It's not but I can't think of a better word to use.

library/core/src/hash/mod.rs

Co-authored-by: Amanieu d'Antras <[email protected]>

Amanieu · 2021-10-10T15:46:43Z

@bors r+ rollup

bors · 2021-10-10T15:46:45Z

📌 Commit 749194d has been approved by Amanieu

pierwill · 2021-10-10T18:37:46Z

@Amanieu Does this need to be rebased?

…askrgr Rollup of 11 pull requests Successful merges: - rust-lang#88374 (Fix documentation in Cell) - rust-lang#88713 (Improve docs for int_log) - rust-lang#89428 (Feature gate the non_exhaustive_omitted_patterns lint) - rust-lang#89438 (docs: `std::hash::Hash` should ensure prefix-free data) - rust-lang#89520 (Don't rebuild GUI test crates every time you run test src/test/rustdoc-gui) - rust-lang#89705 (Cfg hide no_global_oom_handling and no_fp_fmt_parse) - rust-lang#89713 (Fix ABNF of inline asm options) - rust-lang#89718 (Add #[must_use] to is_condition tests) - rust-lang#89719 (Add #[must_use] to char escape methods) - rust-lang#89720 (Add #[must_use] to math and bit manipulation methods) - rust-lang#89735 (Stabilize proc_macro::is_available) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

Amanieu · 2021-10-10T19:46:02Z

No, it should be fine as it is.

docs: std::hash::Hash should ensure prefix-free data

f531b81

Closes rust-lang#89429

rust-highfive assigned m-ou-se Oct 1, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 1, 2021

tczajka reviewed Oct 1, 2021

View reviewed changes

fix: edit description of "prefix-free"

2a5dcd5

pierwill commented Oct 3, 2021

View reviewed changes

m-ou-se assigned Amanieu and unassigned m-ou-se Oct 6, 2021

Amanieu reviewed Oct 9, 2021

View reviewed changes

library/core/src/hash/mod.rs Outdated Show resolved Hide resolved

Update library/core/src/hash/mod.rs

749194d

Co-authored-by: Amanieu d'Antras <[email protected]>

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Oct 10, 2021

matthiaskrgr mentioned this pull request Oct 10, 2021

Rollup of 11 pull requests #89739

Merged

bors merged commit 06cfd0a into rust-lang:master Oct 10, 2021

rustbot added this to the 1.57.0 milestone Oct 10, 2021

pierwill deleted the prefix-free-hash branch October 19, 2021 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: `std::hash::Hash` should ensure prefix-free data #89438

docs: `std::hash::Hash` should ensure prefix-free data #89438

pierwill commented Oct 1, 2021

rust-highfive commented Oct 1, 2021

pierwill commented Oct 1, 2021

tczajka Oct 1, 2021 •

edited

Loading

pierwill Oct 1, 2021

pierwill Oct 1, 2021

tczajka Oct 1, 2021

cuviper Oct 1, 2021 •

edited

Loading

pierwill Oct 3, 2021

Amanieu Oct 9, 2021

Amanieu commented Oct 10, 2021

bors commented Oct 10, 2021

pierwill commented Oct 10, 2021

Amanieu commented Oct 10, 2021

docs: std::hash::Hash should ensure prefix-free data #89438

docs: std::hash::Hash should ensure prefix-free data #89438

Conversation

pierwill commented Oct 1, 2021

rust-highfive commented Oct 1, 2021

pierwill commented Oct 1, 2021

tczajka Oct 1, 2021 • edited Loading

Choose a reason for hiding this comment

pierwill Oct 1, 2021

Choose a reason for hiding this comment

pierwill Oct 1, 2021

Choose a reason for hiding this comment

tczajka Oct 1, 2021

Choose a reason for hiding this comment

cuviper Oct 1, 2021 • edited Loading

Choose a reason for hiding this comment

pierwill Oct 3, 2021

Choose a reason for hiding this comment

Amanieu Oct 9, 2021

Choose a reason for hiding this comment

Amanieu commented Oct 10, 2021

bors commented Oct 10, 2021

pierwill commented Oct 10, 2021

Amanieu commented Oct 10, 2021

docs: `std::hash::Hash` should ensure prefix-free data #89438

docs: `std::hash::Hash` should ensure prefix-free data #89438

tczajka Oct 1, 2021 •

edited

Loading

cuviper Oct 1, 2021 •

edited

Loading