-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<functional>
: better hash function
#2360
Comments
<functional>
: better has function<functional>
: better hash function
#686 (comment) mentions SipHash-2-4 |
I did some benchmarks based on https://github.com/rurban/smhasher
While SipHash seems promising in preventing hash flood attacks, the actual paper describes it as fast in relation to other MACs and not other PRFs. The claim that it's fast for short messages in general appears to be an overstatement by easily 4x. Considering things like VSO-653642 (complaint about 2x |
Thanks - the SipHash perf data is definitely news to me, so we'll need to consider the available options carefully. |
I would consider also looking at SipHash13 rust-lang/rust#29754 (comment) |
@BillyONeal also mentioned a security consideration:
But I'm not sure if this can be implemented with insignificant perf impact. |
I would argue that the default, user doesn't know any better, option, should be the not exploitable one, even at a performance cost. Moreover, the resulting strength of the hash may help some programs by reducing collisions vs. other hash functions; although it will not for very small keys in relatively small containers like 127 int32s. If you want the absolute fastest of juggling razor blades fast, |
|
@cbezault For short inputs the performance difference between SipHash-2-4 and 1-3 is minimal. I've updated the table above. @BillyONeal Good point. I agree that a better, but slower hash function should be preferred as the default one for the stdlib. However SipHash isn't a stronger hash function than alternatives and it's rate of collisions is the same as for any other one that more or less passes the smhasher suite. But personally I'm still against using SipHash as the default for this STL so far - at least according to what I know at the moment. I believe the absolute majority of code written with this STL are end-user, desktop applications and not publicly exposed web servers. In fact I'd explicitly argue that the number of web servers written in C++ and running on Windows is quite low. While the safety promises of SipHash are nice, it doesn't excuse the increase in computing power consumption for applications that don't require any such safety at all. Before using a new hash function I believe there's a large number of alternatives we could first consider. |
I agree that we'll need to explore the increasingly large space of alternatives (when vNext work resumes). On the STL Discord, I observed that C++'s concerns are somewhat different from Python's; notably, we don't use dictionaries as our universal data structure (so the impact of a hash function choice is dramatically smaller), and |
@lhecker observes FNV1a is bad for integers.
https://discord.com/channels/737189251069771789/737189251497721887/913600700846596147
vNext note: Resolving this issue will require breaking binary compatibility. We won't be able to accept pull requests for this issue until the vNext branch is available. See #169 for more information.
Also tracked by VSO-371547 / AB#371547 .
The text was updated successfully, but these errors were encountered: