Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/promtail/client: Handle fingerprint hash collisions #1254

Merged
merged 2 commits into from
Nov 26, 2019

Conversation

pstibrany
Copy link
Member

What this PR does / why we need it:
Due to fingerprint hash collisions, Promtail client can mix entries for different labels into the same stream. This can lead to having entries with incorrect labels, and possible out of order errors.

Which issue(s) this PR fixes:
Not sure if there is issue for this.

Special notes for your reviewer:
I've used large prime number to find next possible fingerprint value. Any prime (except 2) would work here, as would simple increment. I wanted to avoid using next fingerprint (+1), because fast fingerprint creates similar values for similar labels.

Checklist

  • Documentation added (no extra documentation)
  • Tests updated

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is pretty neat!

We have the same issue in pkg/logentry/stages/timestamp.go, but it may be a bit more more complicated to solve there using the same approach because mappings would last forever. Maybe we should see the performance penalty of using the slow fingerprinting there, before taking an informed decision?

@pstibrany
Copy link
Member Author

The solution is pretty neat!

We have the same issue in pkg/logentry/stages/timestamp.go, but it may be a bit more more complicated to solve there using the same approach because mappings would last forever. Maybe we should see the performance penalty of using the slow fingerprinting there, before taking an informed decision?

Thanks for taking your time for review. I wouldn’t be worried about “slow” fingerprint, but despite using it, there may still be collisions.

@pstibrany
Copy link
Member Author

We have the same issue in pkg/logentry/stages/timestamp.go ...

I think correct solution there is to implement similar remapping on collisions, and expire entries from lastKnownTimestamps and remapping map after some period of time since last use (say 30mins).

@pstibrany
Copy link
Member Author

I think correct solution there is to implement similar remapping on collisions, and expire entries from lastKnownTimestamps and remapping map after some period of time since last use (say 30mins).

Ah, I can see we already use golang-lru cache there. It can be configured with eviction handler, which would allow us to remove collision mapping entry as well.

Label strings are properly sorted key=value pairs. This solution
produces more garbage, but doesn't reinvent its own hashing, and keeps
code simple.
This seems to be a good tradeoff for promtail.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Member Author

I've updated PR to not use fast fingerprint in promtail. We use labelset.String as map keys instead. While this produces extra garbage in promtail, it doesn't reinvent hash maps, keeps code simple and solves the collision problem.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer this last approach, being easier and more generally applicable. I don't see memory consumption issues in the Promtail client (because the batch lifecycle is short), while in the timestamp stage we do limit the last known timestamp to 10K entries: assuming a worst case scenario of 1KB each log stream label set, it's 10MB memory consumption, which doesn't look prohibitive to me.

@pstibrany
Copy link
Member Author

... assuming a worst case scenario of 1KB each log stream label set, it's 10MB memory consumption, which doesn't look prohibitive to me.

Thanks for your review! My concern is more about calling labelset.String() on every single log line. Not so much memory consumption, but memory allocation rate. That said, let's optimize it when it becomes a real problem.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit 125cfbf into master Nov 26, 2019
@cyriltovena cyriltovena deleted the promtail-hash-collisions branch November 26, 2019 19:46
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
…mily_error

Bigtable fix family already exists error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants