-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/promtail/client: Handle fingerprint hash collisions #1254
Conversation
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is pretty neat!
We have the same issue in pkg/logentry/stages/timestamp.go
, but it may be a bit more more complicated to solve there using the same approach because mappings
would last forever. Maybe we should see the performance penalty of using the slow fingerprinting there, before taking an informed decision?
Thanks for taking your time for review. I wouldn’t be worried about “slow” fingerprint, but despite using it, there may still be collisions. |
I think correct solution there is to implement similar remapping on collisions, and expire entries from lastKnownTimestamps and remapping map after some period of time since last use (say 30mins). |
Ah, I can see we already use golang-lru cache there. It can be configured with eviction handler, which would allow us to remove collision mapping entry as well. |
Label strings are properly sorted key=value pairs. This solution produces more garbage, but doesn't reinvent its own hashing, and keeps code simple. This seems to be a good tradeoff for promtail. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
I've updated PR to not use fast fingerprint in promtail. We use labelset.String as map keys instead. While this produces extra garbage in promtail, it doesn't reinvent hash maps, keeps code simple and solves the collision problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer this last approach, being easier and more generally applicable. I don't see memory consumption issues in the Promtail client (because the batch lifecycle is short), while in the timestamp
stage we do limit the last known timestamp to 10K entries: assuming a worst case scenario of 1KB each log stream label set, it's 10MB memory consumption, which doesn't look prohibitive to me.
Thanks for your review! My concern is more about calling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…mily_error Bigtable fix family already exists error
What this PR does / why we need it:
Due to fingerprint hash collisions, Promtail client can mix entries for different labels into the same stream. This can lead to having entries with incorrect labels, and possible out of order errors.
Which issue(s) this PR fixes:
Not sure if there is issue for this.
Special notes for your reviewer:
I've used large prime number to find next possible fingerprint value. Any prime (except 2) would work here, as would simple increment. I wanted to avoid using next fingerprint (+1), because fast fingerprint creates similar values for similar labels.
Checklist