Add std::hash<podio::ObjectID> specialization #733

veprbl · 2025-02-11T22:10:20Z

BEGINRELEASENOTES

Added std::hash<podio::ObjectID> specialization to allow std::unordered_map<podio::ObjectID, T>

ENDRELEASENOTES

This helps with writing algorithms based around PODIO types. Perhaps we could next provide hash definitions for PODIO objects themselves.

tmadlener

I am wondering whether this has the necessary properties that should be met for usage in a hash map, e.g. does it distribute the hashes wide enough and how likely are collisions?

I suppose a theoretical estimate of those is kind of hard to come by, but I am wondering whether we should put in some checks to avoid at least some of the potentially subtle issues.

For example, since the set of distinct collectionIDs that we get is fairly finite and I think we went through the exercise in #412 of collecting a reasonable superset of the ones we have encountered so far. We could use a reasonable range of index values to at least test for collisions in that range, and maybe even check load factors for unordered_map. However, I am not entirely sure whether the range that we could cover can serve as a reasonable test here.

tmadlener · 2025-02-12T12:33:50Z

include/podio/ObjectID.h

+{
+    std::size_t operator()(const podio::ObjectID& id) const noexcept
+    {
+        auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID);


Technically the collectionID is (almost certainly) already a 32 bit hash.

I'm not sure how that helps here. I assume, hash is still better to cover domain of size_t.

veprbl · 2025-02-12T18:09:20Z

I am wondering whether this has the necessary properties that should be met for usage in a hash map, e.g. does it distribute the hashes wide enough and how likely are collisions?

In our typical applications (different elements of the same collection), it shouldn't be any different from defining std::unordered_map<int, T>.

tmadlener · 2025-02-18T09:39:06Z

Follow up question after the EDM4hep meeting today. Would it be of general interest to also have the generated user facing objects, e.g. MCParticle, have a specialized std::hash? We currently have operator< to make std::map or std::set usable but it should be possible to make a std::hash spezialization available as well if that is something that is considered as generally useful.

veprbl · 2025-02-18T09:54:59Z

Yes, that's desirable.

tmadlener

Merging this now. Support for hashing handles coming with #738

m-fila · 2025-02-18T10:27:18Z

include/podio/ObjectID.h

+        auto hash_collectionID = std::hash<uint32_t>{}(id.collectionID);
+        auto hash_index = std::hash<int>{}(id.index);
+
+        return hash_collectionID ^ hash_index;


I think XOR is fine for our usage. Just for completeness, a more sound approach would be something like boost::hash_combine ( and about design) but implementing this just for this usage is an overkill

Yeah that was my original concern, because I knew about all the things in boost, but didn't want to pull in a boost dependency just for this. In any case, I think we can merge this as is, as #738 will probably be the more used thing and if we realize issues with this we can change the implementation for ObjectID pretty easily without affecting interfaces or anything that is persisted.

Yes, that was considered, and, perhaps, I should have left a comment.
The collectionID is expected to be either a negative non-unique constant or a very high semi-unique number (already a hash), which makes it up to hash of index (small number values) to provide uniqueness. For that reason a simple xor without shifts was employed.

veprbl mentioned this pull request Feb 11, 2025

Update TrackClusterMergeSplitter to output track-cluster associations (PFA0) eic/EICrecon#1699

Open

7 tasks

tmadlener reviewed Feb 12, 2025

View reviewed changes

m-fila mentioned this pull request Feb 19, 2025

add std::hash for datatype objects, interfaces and links #738

Open

veprbl and others added 2 commits February 19, 2025 10:24

Add std::hash<podio::ObjectID> specialization

cecfc88

Fix formatting

b1307f1

tmadlener force-pushed the ObjectID_std_hash branch from 789f3c2 to b1307f1 Compare February 19, 2025 09:24

tmadlener approved these changes Feb 19, 2025

View reviewed changes

m-fila reviewed Feb 19, 2025

View reviewed changes

tmadlener merged commit 33a0676 into AIDASoft:master Feb 19, 2025
19 checks passed

veprbl deleted the ObjectID_std_hash branch February 19, 2025 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add std::hash<podio::ObjectID> specialization #733

Add std::hash<podio::ObjectID> specialization #733

veprbl commented Feb 11, 2025

tmadlener left a comment

tmadlener Feb 12, 2025

veprbl Feb 12, 2025

veprbl commented Feb 12, 2025

tmadlener commented Feb 18, 2025

veprbl commented Feb 18, 2025

tmadlener left a comment

m-fila Feb 18, 2025

tmadlener Feb 19, 2025

veprbl Feb 19, 2025 •

edited

Loading

Add std::hash<podio::ObjectID> specialization #733

Add std::hash<podio::ObjectID> specialization #733

Conversation

veprbl commented Feb 11, 2025

tmadlener left a comment

Choose a reason for hiding this comment

tmadlener Feb 12, 2025

Choose a reason for hiding this comment

veprbl Feb 12, 2025

Choose a reason for hiding this comment

veprbl commented Feb 12, 2025

tmadlener commented Feb 18, 2025

veprbl commented Feb 18, 2025

tmadlener left a comment

Choose a reason for hiding this comment

m-fila Feb 18, 2025

Choose a reason for hiding this comment

tmadlener Feb 19, 2025

Choose a reason for hiding this comment

veprbl Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

veprbl Feb 19, 2025 •

edited

Loading