Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use usize instead of u64 for hashes in HashMap #36567

Closed
bluss opened this issue Sep 18, 2016 · 13 comments
Closed

Use usize instead of u64 for hashes in HashMap #36567

bluss opened this issue Sep 18, 2016 · 13 comments
Labels
A-collections Area: `std::collection` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@bluss
Copy link
Member

bluss commented Sep 18, 2016

Just a note on the current implementation which stores the hash values per bucket as u64. We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This would be an improvement on 32-bit platforms to make the hashmap's data structures smaller, hopefully improving cache utilization during insert and lookup.

@bluss bluss added the A-collections Area: `std::collection` label Sep 18, 2016
@eddyb eddyb added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 19, 2016
@eddyb
Copy link
Member

eddyb commented Sep 19, 2016

cc @rust-lang/compiler This might be relevant to rustc, although as tempting as x32-style schemes may seem, they're not that portable. Still, it would be interesting to see if we can better perf on linux.

@bluss
Copy link
Member Author

bluss commented Sep 20, 2016

I've made the PR #36595. Can anyone test see what effect it has on bootstrap on 32-bit?

@bluss
Copy link
Member Author

bluss commented Sep 20, 2016

If we're hunting hashmap improvements, maybe one should even get ordermap into finished shape, and use rustc as a benchmark for it.

@arthurprs
Copy link
Contributor

We probably want to investigate switching the layout to HHHHHHKVKVKVKVKVKV as well, that will definitely improve the iteration and growing speed. There was an attempt here #21973

@arthurprs
Copy link
Contributor

arthurprs commented Sep 20, 2016

regarding the previous, I started experimenting in this branch arthurprs/hashmap2@e18a323 I'm a bit tired so I can't tell why insertions are quite slower, lookups are slight faster.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 21, 2016

-- removed bench

@bluss
Copy link
Member Author

bluss commented Sep 22, 2016

That's very impressive @arthurps, any improvement like that must be welcome. Any downsides that you know of? The previous try stalled on code style / implementation details.

The simple u64 → usize PR be merged long before, since it's a simple change.

@bluss
Copy link
Member Author

bluss commented Sep 22, 2016

The lookup_100_000_multi_10p performance drop of course attracts some attention, is that something that's consistent between runs? Maybe there's a cache use change that can explain it.

@bluss
Copy link
Member Author

bluss commented Sep 22, 2016

Oh right, one drawback that was mentioned is that (K, V) is stored like that, which leads to losses to alignment especially if K and V are very different.

@arthurprs
Copy link
Contributor

@bluss yeah I'm not sure that's noise.

Since Robin hood puts a good bound on the probing distance I think it may be a gain even if it wastes space in padding. I'll add a bench for that.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 22, 2016

I was a little skeptical so I made a round of benchmark improvements arthurprs/hashmap2@f8bd579. I tried to make sure value was not a () and that it's actually accessed on lookup benchmarks. I think we can all agree that's more realistic. To my disbelief the improvements are real.

removed bench

@bluss
Copy link
Member Author

bluss commented Sep 22, 2016

It would make sense for .get(k).cloned() lookup to favour the kvkvkv layout more (at least more there than..) and .contains_key(k) for the kkkvvv layout, due to caching effects.

@alexcrichton
Copy link
Member

Discussed at libs triage the other day, our thoughts are on the PR.

bors added a commit that referenced this issue Nov 1, 2016
hashmap: Store hashes as usize internally

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants