Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::hash<podio::ObjectID> specialization #733

Merged
merged 2 commits into from
Feb 19, 2025

Conversation

veprbl
Copy link
Contributor

@veprbl veprbl commented Feb 11, 2025

BEGINRELEASENOTES

  • Added std::hash<podio::ObjectID> specialization to allow std::unordered_map<podio::ObjectID, T>

ENDRELEASENOTES

This helps with writing algorithms based around PODIO types. Perhaps we could next provide hash definitions for PODIO objects themselves.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether this has the necessary properties that should be met for usage in a hash map, e.g. does it distribute the hashes wide enough and how likely are collisions?

I suppose a theoretical estimate of those is kind of hard to come by, but I am wondering whether we should put in some checks to avoid at least some of the potentially subtle issues.

For example, since the set of distinct collectionIDs that we get is fairly finite and I think we went through the exercise in #412 of collecting a reasonable superset of the ones we have encountered so far. We could use a reasonable range of index values to at least test for collisions in that range, and maybe even check load factors for unordered_map. However, I am not entirely sure whether the range that we could cover can serve as a reasonable test here.

{
std::size_t operator()(const podio::ObjectID& id) const noexcept
{
auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically the collectionID is (almost certainly) already a 32 bit hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how that helps here. I assume, hash is still better to cover domain of size_t.

@veprbl
Copy link
Contributor Author

veprbl commented Feb 12, 2025

I am wondering whether this has the necessary properties that should be met for usage in a hash map, e.g. does it distribute the hashes wide enough and how likely are collisions?

In our typical applications (different elements of the same collection), it shouldn't be any different from defining std::unordered_map<int, T>.

@tmadlener
Copy link
Collaborator

Follow up question after the EDM4hep meeting today. Would it be of general interest to also have the generated user facing objects, e.g. MCParticle, have a specialized std::hash? We currently have operator< to make std::map or std::set usable but it should be possible to make a std::hash spezialization available as well if that is something that is considered as generally useful.

@veprbl
Copy link
Contributor Author

veprbl commented Feb 18, 2025

Yes, that's desirable.

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this now. Support for hashing handles coming with #738

auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID);
auto hash_index = std::hash<int>{}(id.index);

return hash_collectionID ^ hash_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think XOR is fine for our usage. Just for completeness, a more sound approach would be something like boost::hash_combine ( and about design) but implementing this just for this usage is an overkill

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was my original concern, because I knew about all the things in boost, but didn't want to pull in a boost dependency just for this. In any case, I think we can merge this as is, as #738 will probably be the more used thing and if we realize issues with this we can change the implementation for ObjectID pretty easily without affecting interfaces or anything that is persisted.

Copy link
Contributor Author

@veprbl veprbl Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was considered, and, perhaps, I should have left a comment.
The collectionID is expected to be either a negative non-unique constant or a very high semi-unique number (already a hash), which makes it up to hash of index (small number values) to provide uniqueness. For that reason a simple xor without shifts was employed.

@tmadlener tmadlener merged commit 33a0676 into AIDASoft:master Feb 19, 2025
19 checks passed
@veprbl veprbl deleted the ObjectID_std_hash branch February 19, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants