-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add std::hash<podio::ObjectID> specialization #733
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether this has the necessary properties that should be met for usage in a hash map, e.g. does it distribute the hashes wide enough and how likely are collisions?
I suppose a theoretical estimate of those is kind of hard to come by, but I am wondering whether we should put in some checks to avoid at least some of the potentially subtle issues.
For example, since the set of distinct collectionID
s that we get is fairly finite and I think we went through the exercise in #412 of collecting a reasonable superset of the ones we have encountered so far. We could use a reasonable range of index
values to at least test for collisions in that range, and maybe even check load factors for unordered_map
. However, I am not entirely sure whether the range that we could cover can serve as a reasonable test here.
include/podio/ObjectID.h
Outdated
{ | ||
std::size_t operator()(const podio::ObjectID& id) const noexcept | ||
{ | ||
auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the collectionID
is (almost certainly) already a 32 bit hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how that helps here. I assume, hash is still better to cover domain of size_t.
In our typical applications (different elements of the same collection), it shouldn't be any different from defining |
Follow up question after the EDM4hep meeting today. Would it be of general interest to also have the generated user facing objects, e.g. |
Yes, that's desirable. |
789f3c2
to
b1307f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this now. Support for hashing handles coming with #738
include/podio/ObjectID.h
Outdated
auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID); | ||
auto hash_index = std::hash<int>{}(id.index); | ||
|
||
return hash_collectionID ^ hash_index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think XOR is fine for our usage. Just for completeness, a more sound approach would be something like boost::hash_combine
( and about design) but implementing this just for this usage is an overkill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was my original concern, because I knew about all the things in boost, but didn't want to pull in a boost dependency just for this. In any case, I think we can merge this as is, as #738 will probably be the more used thing and if we realize issues with this we can change the implementation for ObjectID
pretty easily without affecting interfaces or anything that is persisted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was considered, and, perhaps, I should have left a comment.
The collectionID is expected to be either a negative non-unique constant or a very high semi-unique number (already a hash), which makes it up to hash of index (small number values) to provide uniqueness. For that reason a simple xor without shifts was employed.
BEGINRELEASENOTES
std::hash<podio::ObjectID>
specialization to allowstd::unordered_map<podio::ObjectID, T>
ENDRELEASENOTES
This helps with writing algorithms based around PODIO types. Perhaps we could next provide hash definitions for PODIO objects themselves.