From 2f5d2208c20bfd5df462cda62be54aa110bb7ec9 Mon Sep 17 00:00:00 2001 From: Piotr Czarnecki Date: Mon, 4 Jul 2016 11:40:04 +0200 Subject: [PATCH 1/5] RFC for one-shot hashing support --- text/0000-one-shot-hashing.md | 96 +++++++++++++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 text/0000-one-shot-hashing.md diff --git a/text/0000-one-shot-hashing.md b/text/0000-one-shot-hashing.md new file mode 100644 index 00000000000..15a0b3f05d4 --- /dev/null +++ b/text/0000-one-shot-hashing.md @@ -0,0 +1,96 @@ +- Feature Name: one_shot_hashing +- Start Date: 2016-07-04 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Extend the `Hasher` trait with a `fn delimit` method. Add an unstable Farmhash +implementation to the standard library. + +# Motivation +[motivation]: #motivation + +The current hashing architecture is suitable for streaming hashing. + +In general, for each type which implements Hasher, there cannot be two values +that produce the same stream. Delimiters are inserted so that values of +compound types produce unique streams. For example, hashing `("ab", "c")` and +`("a", "bc")` must produce different results. + +Hashing in one shot is possible even today with a custom hasher for constant- +sized types. However, HashMap keys are often strings and slices. In order to +allow fast, specialized hashing for more types, we need a clean way of +handling single writes. Hashing of strings and slices performs two writes to a +stream: one for a delimiter and the other for the content. We need a way of +conveying the distinction between the delimiter and actual content. In the +case of one-shot hashing, the delimiter can be ignored. + +# Detailed design +[design]: #detailed-design + +The functionality of streaming hashers remains the same. + +A `delimit` method with default implementation is added to the `Hasher` trait as +follows. + +```rust +trait Hasher { + // ... + + /// Emit a delimiter for an array of length `len`. + #[inline] + #[unstable(feature = "hash_delimit", since = "...", issue="...")] + fn delimit(&mut self, len: usize) { + self.write_usize(len); + } +} +``` + +Farmhash is introduced as an unstable struct at `core::hash::FarmHasher`. It +should not be exposed in to users of stable Rust. + +It may be implemented in the standard library as follows. + +```rust +struct FarmHasher { + hash: u64 +} + +impl Hasher for FarmHasher { + fn write(&mut self, input: &[u8]) { + self.hash = farmhash::hash64(input); + } + + fn delimit(&mut self, _len: usize) { + // Nothing to do. + } + + fn finish(&mut self) -> u64 { + self.hash + } +} +``` + +# Drawbacks +[drawbacks]: #drawbacks + +* There will be yet another hashing algorithm to maintain in the standard library. +* The `Hasher` trait becomes larger. + +# Alternatives +[alternatives]: #alternatives + +* Leaving out either or both of these. This means adaptive hashing won't work for + string and slice types. +* Introducing Farmhash as an unstable function. +* Adding the `fn delimit` method, but leaving out Farmhash. +* Using MetroHash or some other algorithm instead of Farmhash. +* Changing SipHash to ignore the first delimiter. + +# Unresolved questions +[unresolved]: #unresolved-questions + +* Should `str` and `[u8]` get hashed the same way? +* Can streaming hashers such as SipHash ignore the first or the last delimiter? From 88a744fb8429945e477cb3a212e9475a801baea3 Mon Sep 17 00:00:00 2001 From: Piotr Czarnecki Date: Mon, 4 Jul 2016 11:40:04 +0200 Subject: [PATCH 2/5] Remove mentions of Farmhash --- text/0000-one-shot-hashing.md | 34 ++-------------------------------- 1 file changed, 2 insertions(+), 32 deletions(-) diff --git a/text/0000-one-shot-hashing.md b/text/0000-one-shot-hashing.md index 15a0b3f05d4..af10440cb7f 100644 --- a/text/0000-one-shot-hashing.md +++ b/text/0000-one-shot-hashing.md @@ -6,8 +6,7 @@ # Summary [summary]: #summary -Extend the `Hasher` trait with a `fn delimit` method. Add an unstable Farmhash -implementation to the standard library. +Extend the `Hasher` trait with a `fn delimit` method. # Motivation [motivation]: #motivation @@ -48,45 +47,16 @@ trait Hasher { } ``` -Farmhash is introduced as an unstable struct at `core::hash::FarmHasher`. It -should not be exposed in to users of stable Rust. - -It may be implemented in the standard library as follows. - -```rust -struct FarmHasher { - hash: u64 -} - -impl Hasher for FarmHasher { - fn write(&mut self, input: &[u8]) { - self.hash = farmhash::hash64(input); - } - - fn delimit(&mut self, _len: usize) { - // Nothing to do. - } - - fn finish(&mut self) -> u64 { - self.hash - } -} -``` - # Drawbacks [drawbacks]: #drawbacks -* There will be yet another hashing algorithm to maintain in the standard library. * The `Hasher` trait becomes larger. # Alternatives [alternatives]: #alternatives -* Leaving out either or both of these. This means adaptive hashing won't work for +* Leaving out this, which means adaptive hashing may not work for string and slice types. -* Introducing Farmhash as an unstable function. -* Adding the `fn delimit` method, but leaving out Farmhash. -* Using MetroHash or some other algorithm instead of Farmhash. * Changing SipHash to ignore the first delimiter. # Unresolved questions From b90731c4dd5c56e858e7af0087ed3849c4fbafd9 Mon Sep 17 00:00:00 2001 From: Piotr Czarnecki Date: Mon, 4 Jul 2016 11:40:04 +0200 Subject: [PATCH 3/5] Correct phrasing --- text/0000-one-shot-hashing.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/0000-one-shot-hashing.md b/text/0000-one-shot-hashing.md index af10440cb7f..4beccb43a7d 100644 --- a/text/0000-one-shot-hashing.md +++ b/text/0000-one-shot-hashing.md @@ -20,10 +20,10 @@ compound types produce unique streams. For example, hashing `("ab", "c")` and Hashing in one shot is possible even today with a custom hasher for constant- sized types. However, HashMap keys are often strings and slices. In order to -allow fast, specialized hashing for more types, we need a clean way of -handling single writes. Hashing of strings and slices performs two writes to a -stream: one for a delimiter and the other for the content. We need a way of -conveying the distinction between the delimiter and actual content. In the +allow fast, specialized hashing for variable-length types, we need a clean way +of handling single writes. Hashing of strings and slices performs two writes +to a stream: one for a delimiter and the other for the content. We need a way +of conveying the distinction between the delimiter and actual content. In the case of one-shot hashing, the delimiter can be ignored. # Detailed design From be65caf9382f5331f4c11a3e7cd6b19638fbd1fe Mon Sep 17 00:00:00 2001 From: Piotr Czarnecki Date: Thu, 7 Jul 2016 00:20:18 +0200 Subject: [PATCH 4/5] Update for more clear motivation --- text/0000-one-shot-hashing.md | 79 +++++++++++++++++++++++++++-------- 1 file changed, 61 insertions(+), 18 deletions(-) diff --git a/text/0000-one-shot-hashing.md b/text/0000-one-shot-hashing.md index 4beccb43a7d..ad6fefd0d72 100644 --- a/text/0000-one-shot-hashing.md +++ b/text/0000-one-shot-hashing.md @@ -11,26 +11,48 @@ Extend the `Hasher` trait with a `fn delimit` method. # Motivation [motivation]: #motivation -The current hashing architecture is suitable for streaming hashing. +Streaming hashing is a way of hashing values of any type. Every significant +byte of the hashed value is included in a stream. The entire stream is hashed. +One-shot hashing is a simplification of streaming hashing. It is limited to a +single scalar value. + +The current hashing architecture is used for streaming hashing. However, it is +unfit for optimal one-shot hashing. Consider the following interface for +one-shot hashing, based on Farmhash. + +```rust +extern crate farmhash; + +struct FarmHasher { + result: u64 +} + +impl Hasher for FarmHasher { + fn write(&mut self, msg: &[u8]) { + self.result = farmhash::hash64(msg); + } + + fn finish(&self) -> u64 { + self.result + } +} +``` + +This `FarmHasher` will work for constant-sized primitive types. That is: +integers, raw pointers, and `char`. It will give wrong results when hashing +`&str`, and may do unnecessary work when hashing `&[T]`. Why doesn't it work +for variable-sized types? In general, for each type which implements Hasher, there cannot be two values -that produce the same stream. Delimiters are inserted so that values of -compound types produce unique streams. For example, hashing `("ab", "c")` and -`("a", "bc")` must produce different results. - -Hashing in one shot is possible even today with a custom hasher for constant- -sized types. However, HashMap keys are often strings and slices. In order to -allow fast, specialized hashing for variable-length types, we need a clean way -of handling single writes. Hashing of strings and slices performs two writes -to a stream: one for a delimiter and the other for the content. We need a way -of conveying the distinction between the delimiter and actual content. In the -case of one-shot hashing, the delimiter can be ignored. +that produce the same stream. For example, hashing `("ab", "c")` and `("a", +"bc")` must produce different results. To ensure that, a special value is +inserted in the stream after the contents of every string. One-shot hashing +should be able to ignore such delimiters, because compound types can't be +hashed in one shot. # Detailed design [design]: #detailed-design -The functionality of streaming hashers remains the same. - A `delimit` method with default implementation is added to the `Hasher` trait as follows. @@ -38,15 +60,36 @@ follows. trait Hasher { // ... - /// Emit a delimiter for an array of length `len`. + /// Emit a delimiter. #[inline] #[unstable(feature = "hash_delimit", since = "...", issue="...")] - fn delimit(&mut self, len: usize) { - self.write_usize(len); + fn delimit(&mut self, delimiter: T) { + delimiter.hash(self); } } ``` +Implementations of `Hash` for `str` and `[T]` are changed as follows. + +```rust +impl Hash for str { + fn hash(&self, state: &mut H) { + state.write(self.as_bytes()); + state.delimit(0xff_u8); + } +} + +impl Hash for [T] { + fn hash(&self, state: &mut H) { + state.delimit(self.len()); + Hash::hash_slice(self, state) + } +} +``` + +The functionality of streaming hashers remains the same. One-shot hashing is +not yet in the standard library. + # Drawbacks [drawbacks]: #drawbacks @@ -57,7 +100,7 @@ trait Hasher { * Leaving out this, which means adaptive hashing may not work for string and slice types. -* Changing SipHash to ignore the first delimiter. +* Changing SipHash to ignore the first or the last delimiter. # Unresolved questions [unresolved]: #unresolved-questions From 7f0d65cd50f403871fcfa7f35d5d9262d736bddb Mon Sep 17 00:00:00 2001 From: Piotr Czarnecki Date: Thu, 7 Jul 2016 00:23:32 +0200 Subject: [PATCH 5/5] Small update --- text/0000-one-shot-hashing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-one-shot-hashing.md b/text/0000-one-shot-hashing.md index ad6fefd0d72..3ce23ed126a 100644 --- a/text/0000-one-shot-hashing.md +++ b/text/0000-one-shot-hashing.md @@ -14,7 +14,7 @@ Extend the `Hasher` trait with a `fn delimit` method. Streaming hashing is a way of hashing values of any type. Every significant byte of the hashed value is included in a stream. The entire stream is hashed. One-shot hashing is a simplification of streaming hashing. It is limited to a -single scalar value. +single primitive value. The current hashing architecture is used for streaming hashing. However, it is unfit for optimal one-shot hashing. Consider the following interface for @@ -47,8 +47,8 @@ In general, for each type which implements Hasher, there cannot be two values that produce the same stream. For example, hashing `("ab", "c")` and `("a", "bc")` must produce different results. To ensure that, a special value is inserted in the stream after the contents of every string. One-shot hashing -should be able to ignore such delimiters, because compound types can't be -hashed in one shot. +should be able to ignore such delimiters, because compound types can't even +be hashed in one shot. # Detailed design [design]: #detailed-design