Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing Performance Considerations (Allocations) #9

Open
solonovamax opened this issue Mar 4, 2025 · 1 comment
Open

Hashing Performance Considerations (Allocations) #9

solonovamax opened this issue Mar 4, 2025 · 1 comment

Comments

@solonovamax
Copy link
Contributor

While implementing the two PRs that I just recently made, I noticed that hashing a ByteArray via ByteArray.hash(HasherFactory) some unnecessary allocations are performed, such as allocating a new Hasher every time.

In performance-critical code this is undesirable.
Since the HasherFactory is guaranteed to always invoke .digest() which finalizes the hash value, perhaps it would be best to pool the Hashers.

  • If the pool is empty, a new hasher is created via the val create: () -> Hasher property, however if the pool has a hasher in it, then it can be used instead.
  • When a hasher is used, it will be removed from the pool.
  • After digest() is called, reset() will always be called before then returning the hasher to the pool.
  • The pool could have a 'reasonable' upper cap on the number of entries, and when attempting to insert any entries while already at the cap, they will instead just be discarded.
  • The pool would only be local to the HasherFactory, and any external users of it would have no awareness of it. operator fun invoke(): Hasher will continue to always return a new Hasher.
@solonovamax solonovamax changed the title Hashing Performance Considerations Hashing Performance Considerations (Allocations) Mar 4, 2025
@soywiz
Copy link
Member

soywiz commented Mar 4, 2025

HasherFactory.digest methods are guaranteed to end calling the digest. It could also call the .reset method if required.
HasherFactory also exposes an invoke method that creates a new Hasher, that shouldn't be Pooled.

A private Pool implementation could be used like allocHasher { digest() }. It would require to be Thread-Safe, so atomicfu or per platform code would be needed.

In any case, for performance-critical code, you can still construct a Hasher and do the reset on that code, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants