-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promtail: added action_on_failure support to timestamp stage #1123
promtail: added action_on_failure support to timestamp stage #1123
Conversation
d06a986
to
1f93fb9
Compare
I'm trying to think of other ways we might be able to implement this so we can include journal support. One way might be to add extra metadata into the entry handler so we can pass a unique identifier to the pipeline instead of having the pipeline figure it out per target. Then we could also configure the journal target to pass through the unit name as the unique ID, which should be good enough in most cases. This is just a rough idea though, WDYT? |
Thanks @rfratto for the feedback. I do agree with you that guessing the ID in the timestamp stage is not the right approach.
The unique identifier passed through the pipeline has no real value until that identifier is added as value of a label, because we need to guarantee uniqueness to the log streams pushed to Loki. So in the case we pick the unit name as "unique stream ID" (within a promtail instance) we'll need to also guarantee the unit name is always added to the labels, in the same way we do with What do you think? |
I think I'm following you here; is the issue that the unique identifier may not be pushed to Loki, causing a bad entry in the LRU cache? Overall I'm not sure why we couldn't just compute the fingerprint from the entire labelset and use that as the key into the LRU cache. It'd be the same mechanism the ingesters use to divide the streams to validate ordering, and I think it's fair to rely on the labelset the pipeline receives to be unique enough without having to explicitly check for a specific label (like filename, systemd unit, etc).
I'm a little hesitant about this. I do think users will almost always include the unit name as a label for journal targets, but I don't want to take away the choice of omitting it. |
Yes, correct.
I thought about it, but unless the user doesn't add a "unique stream ID" (ie. unit name) as label via the relabelling config, the labels set may not guarantee stream uniqueness anyway, so we're shifting this responsability to the user instead of forcing it. Am I missing anything? |
No, you're right. It's already the user's responsibility to guarantee uniqueness per-file/unit if they're using a pipeline with a timestamp stage. I guess it's a question of whether or not it's a good idea to make it a hard requirement. I'm not sure I'm informed enough to have an opinion on this. We could make the unique identifier label configurable in the stage, but that might be over engineering this. /cc @cyriltovena @slim-bean any thoughts? |
I'm fine with this solution. AFAIK you can have pipeline with journald targets ? Not sure what's the problem there. |
1f93fb9
to
7645b12
Compare
@rfratto @cyriltovena Thanks for your feedback. I've switched the "stream ID" to "labels fingerprint", which is a better solution compared to the previous one.
My only concern is that - contrary to the file target (which always has the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Pretty sure journald is fine if you have one stream it would still work.
Download chunks as tarball so you can run tests against them.
What this PR does / why we need it:
Currently, when the timestamp stage fails to parse the timestamp on a log entry, it keeps the time of when the entry has been scraped by Promtail. This may cause logs to fail to ingest due to out of order timestamps, because the "scrape timestamp" of an entry which failed parsing may be more recent than the "parsed timestamp" of a subsequent log entry.
In this PR I'm introducing the timestamp stage setting
action_on_failure
which can be:fudge
(new default): change the timestamp to the last known timestamp, summing up 1 nanosecondskip
(old behavior): do not change the timestamp and keep the time when the log entry has been scraped by PromtailUnfortunately, just keeping an overall "last parsed timestamp" is not enough, because a single pipeline instance can process multiple targets (ie. logs from multiple files) and log timestamps should be expected to be ordered per single file only. For this reason, I've introduced an LRU cache to keep the "last known timestamp per target file" (the reason why I used an LRU is to add a cap to the max number of entries stored). So far, I haven't found a good solution for the journal target: any suggestion?
Which issue(s) this PR fixes:
Fixes #1093
Special notes for your reviewer:
github.com/hashicorp/golang-lru
for the LRU, which is already a dependency of other dependecies we haveChecklist