-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: std::hash::Hash
should ensure prefix-free data
#89438
Conversation
r? @m-ou-se (rust-highfive has picked a reviewer for you, use r? to override) |
library/core/src/hash/mod.rs
Outdated
/// ## Prefix collisions | ||
/// | ||
/// Implementations of `hash` should ensure that the data they | ||
/// pass to the `Hasher` are prefix-free. That is, different concatenations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation of what "prefix-free" means is incomplete. It should say that unequal values should cause two different byte sequences to be written, and neither of the two sequences should be a prefix of the other.
Note that it's not sufficient to say that concatenations of outputs of multiple values of the same type should result in different outputs. It has to be true when concatenated with outputs for other types as well (think about hashing (A, B)
). That's where the prefix-free property comes in: the outputs will be different if all the types involved satisfy the prefix-free property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tczajka! I'm not sure I understand the idea of one sequence being a prefix of another. Does it simply mean "starts with", or is it another kind of relation? Is there a way we can rephrase this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to ask the question: in the example of ("ab", "c")
and ("a", "bc")
where and how would the "prefix" occur, and how does the extra byte prevent it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "prefix" is a beginning of a string, so it's same as "starts with". https://en.wikipedia.org/wiki/Prefix
If strings were hashed without the extra 0xff at the end, hashing ("ab", "c")
and ("a", "bc")
would write the same byte sequence "abc"
to Hasher
. The problem is that "a"
is a prefix of "ab"
. Whereas "a\xff"
is not a prefix of "ab\xff"
, so if Hash
outputs these sequences instead that solves the problem. "ab\xffc\xff" != "a\xffbc\xff"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: \xff
is not actually allowed in string literals, since it would be invalid UTF-8 -- which is also what makes it a useful separator here. You could really write those as byte strings though, b"ab\xffc\xff" != b"a\xffbc\xff"
.
@@ -153,9 +153,21 @@ mod sip; | |||
/// Thankfully, you won't need to worry about upholding this property when | |||
/// deriving both [`Eq`] and `Hash` with `#[derive(PartialEq, Eq, Hash)]`. | |||
/// | |||
/// ## Prefix collisions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this... "Collision" isn't the right term, here, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not but I can't think of a better word to use.
Co-authored-by: Amanieu d'Antras <amanieu@gmail.com>
@bors r+ rollup |
📌 Commit 749194d has been approved by |
@Amanieu Does this need to be rebased? |
…askrgr Rollup of 11 pull requests Successful merges: - rust-lang#88374 (Fix documentation in Cell) - rust-lang#88713 (Improve docs for int_log) - rust-lang#89428 (Feature gate the non_exhaustive_omitted_patterns lint) - rust-lang#89438 (docs: `std::hash::Hash` should ensure prefix-free data) - rust-lang#89520 (Don't rebuild GUI test crates every time you run test src/test/rustdoc-gui) - rust-lang#89705 (Cfg hide no_global_oom_handling and no_fp_fmt_parse) - rust-lang#89713 (Fix ABNF of inline asm options) - rust-lang#89718 (Add #[must_use] to is_condition tests) - rust-lang#89719 (Add #[must_use] to char escape methods) - rust-lang#89720 (Add #[must_use] to math and bit manipulation methods) - rust-lang#89735 (Stabilize proc_macro::is_available) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
No, it should be fine as it is. |
Attempt to synthesize the discussion in #89429 into a suggestion regarding
Hash
implementations (not a hard requirement).Closes #89429.