Use `usize` instead of `u64` for hashes in HashMap #36567

bluss · 2016-09-18T19:11:54Z

Just a note on the current implementation which stores the hash values per bucket as u64. We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This would be an improvement on 32-bit platforms to make the hashmap's data structures smaller, hopefully improving cache utilization during insert and lookup.

The text was updated successfully, but these errors were encountered:

eddyb · 2016-09-19T21:53:48Z

cc @rust-lang/compiler This might be relevant to rustc, although as tempting as x32-style schemes may seem, they're not that portable. Still, it would be interesting to see if we can better perf on linux.

bluss · 2016-09-20T07:43:44Z

I've made the PR #36595. Can anyone test see what effect it has on bootstrap on 32-bit?

bluss · 2016-09-20T07:53:01Z

If we're hunting hashmap improvements, maybe one should even get ordermap into finished shape, and use rustc as a benchmark for it.

arthurprs · 2016-09-20T13:26:24Z

We probably want to investigate switching the layout to HHHHHHKVKVKVKVKVKV as well, that will definitely improve the iteration and growing speed. There was an attempt here #21973

arthurprs · 2016-09-20T21:30:37Z

regarding the previous, I started experimenting in this branch arthurprs/hashmap2@e18a323 I'm a bit tired so I can't tell why insertions are quite slower, lookups are slight faster.

arthurprs · 2016-09-21T14:45:50Z

-- removed bench

bluss · 2016-09-22T14:20:52Z

That's very impressive @arthurps, any improvement like that must be welcome. Any downsides that you know of? The previous try stalled on code style / implementation details.

The simple u64 → usize PR be merged long before, since it's a simple change.

bluss · 2016-09-22T14:21:54Z

The lookup_100_000_multi_10p performance drop of course attracts some attention, is that something that's consistent between runs? Maybe there's a cache use change that can explain it.

bluss · 2016-09-22T14:23:59Z

Oh right, one drawback that was mentioned is that (K, V) is stored like that, which leads to losses to alignment especially if K and V are very different.

arthurprs · 2016-09-22T19:02:17Z

@bluss yeah I'm not sure that's noise.

Since Robin hood puts a good bound on the probing distance I think it may be a gain even if it wastes space in padding. I'll add a bench for that.

arthurprs · 2016-09-22T21:12:11Z

I was a little skeptical so I made a round of benchmark improvements arthurprs/hashmap2@f8bd579. I tried to make sure value was not a () and that it's actually accessed on lookup benchmarks. I think we can all agree that's more realistic. To my disbelief the improvements are real.

removed bench

bluss · 2016-09-22T21:30:23Z

It would make sense for .get(k).cloned() lookup to favour the kvkvkv layout more (at least more there than..) and .contains_key(k) for the kkkvvv layout, due to caching effects.

alexcrichton · 2016-09-29T00:39:04Z

Discussed at libs triage the other day, our thoughts are on the PR.

hashmap: Store hashes as usize internally We can't use more than usize's bits of a hash to select a bucket anyway, so we only need to store that part in the table. This should be an improvement for the size of the data structure on 32-bit platforms. Smaller data means better cache utilization and hopefully better performance. Fixes #36567

bluss added the A-collections Area: `std::collection` label Sep 18, 2016

eddyb added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 19, 2016

bluss mentioned this issue Sep 20, 2016

hashmap: Store hashes as usize internally #36595

Merged

arthurprs mentioned this issue Sep 22, 2016

Revisit HashMap memory layout #36660

Closed

alexcrichton removed the I-nominated label Sep 29, 2016

arthurprs mentioned this issue Oct 17, 2016

Replace FNV with a faster hash function. #37229

Merged

bors closed this as completed in #36595 Nov 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `usize` instead of `u64` for hashes in HashMap #36567

Use `usize` instead of `u64` for hashes in HashMap #36567

bluss commented Sep 18, 2016

eddyb commented Sep 19, 2016

bluss commented Sep 20, 2016

bluss commented Sep 20, 2016

arthurprs commented Sep 20, 2016

arthurprs commented Sep 20, 2016 •

edited

Loading

arthurprs commented Sep 21, 2016 •

edited

Loading

bluss commented Sep 22, 2016

bluss commented Sep 22, 2016

bluss commented Sep 22, 2016 •

edited

Loading

arthurprs commented Sep 22, 2016

arthurprs commented Sep 22, 2016 •

edited

Loading

bluss commented Sep 22, 2016

alexcrichton commented Sep 29, 2016

Use usize instead of u64 for hashes in HashMap #36567

Use usize instead of u64 for hashes in HashMap #36567

Comments

bluss commented Sep 18, 2016

eddyb commented Sep 19, 2016

bluss commented Sep 20, 2016

bluss commented Sep 20, 2016

arthurprs commented Sep 20, 2016

arthurprs commented Sep 20, 2016 • edited Loading

arthurprs commented Sep 21, 2016 • edited Loading

bluss commented Sep 22, 2016

bluss commented Sep 22, 2016

bluss commented Sep 22, 2016 • edited Loading

arthurprs commented Sep 22, 2016

arthurprs commented Sep 22, 2016 • edited Loading

bluss commented Sep 22, 2016

alexcrichton commented Sep 29, 2016

Use `usize` instead of `u64` for hashes in HashMap #36567

Use `usize` instead of `u64` for hashes in HashMap #36567

arthurprs commented Sep 20, 2016 •

edited

Loading

arthurprs commented Sep 21, 2016 •

edited

Loading

bluss commented Sep 22, 2016 •

edited

Loading

arthurprs commented Sep 22, 2016 •

edited

Loading