-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add non-indexed metadata to chunks #9700
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skimmed, approach looks good to me. Left a few comments and happy to give a closer review+approval when you're ready. Would like to see some refactoring in bufferedIterator.moveNext
if possible.
pkg/chunkenc/memchunk.go
Outdated
BytesBufferPool.Put(si.metaLabelsBuf[i]) | ||
} | ||
} | ||
si.metaLabelsBuf = make([][]byte, nLabels*2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See how the other pools are created and use the "github.com/prometheus/prometheus/util/pool"
pkg, which supports different sub-pools by length categories. That allows better reuse than creating a slice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I set a maximum size of 256, i.e. max of 128 meta labels. That should be more than enough, but IIUC, we should now set a hard limit on this and refuse payloads with entries containing more than 128 meta-labels. Alternatively, instead of adding a limit and if we ever hit this maximum, we can increase the pool size. Wdyt?
return ts, si.buf[:lineSize], nil, true | ||
} | ||
|
||
// TODO: This is pretty similar to how we read the line size, and the metadata name and value sizes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- this looks like it could all be one helper function which is nice considering how complex this code is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it on a separate PR since:
- The changes to extract these to a separate function are not as simple as copy-pasting the code to a separate function. How we read the timestamp and line size is slightly different from how we read the number of labels and the length of the label names and values.
- To make sure our refactoring doesn't worsen the performance, we'd need to run some benchmarks. That'll require time and this PR is blocking Metadata to labels result and filtering support #9702
- Sandeep will work on implementing string interning inside the chunks for the metadata labels. That'll change this
moveNext
method so I don't think it's with refactoring it until we make those changes.
I've created this task to keep track of this optimization: https://github.com/grafana/loki-private/issues/792
…at `unorderedWithMetadataHeadBlockFmt` that can work with metadata labels. Signed-off-by: Vladyslav Diachenko <vlad.diachenko@grafana.com>
0fa7568
to
21df77b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the changes look great to me. Left some minor suggestions and a comment which might need input from other folks.
@@ -120,14 +131,14 @@ func (hb *unorderedHeadBlock) Append(ts int64, line string) error { | |||
// entries at the same time with the same content, iterate through any existing | |||
// entries and ignore the line if we already have an entry with the same content | |||
for _, et := range displaced[0].(*nsEntries).entries { | |||
if et == line { | |||
if et.line == line { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also check for equality of metaLabels
here to allow entries with the same timestamp and logline but different non-indexed labels? However, this would mean that we will have to generate a hash with both log line and non-indexed labels in SampleIterator, which uses it for deduping duplicate data due to replication.
It would be rare that someone would have log entries with the same timestamp and logline but different non-indexed labels, so I think we should keep it as is to avoid paying the cost to support this rare case. Just putting it out to see if anyone wants to share their thoughts here. If we decide not to support it, we should make it clear in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be rare that someone would have log entries with the same timestamp and logline but different non-indexed labels
Taking promtail as an example, I don't see any stage that can duplicate log lines. Therefore, this situation would only happen if the user is using a custom client that somehow is running into this scenario. IMO, to avoid having to compute a hash over the meta labels, and to keep the code simple, we should not check the non-indexed labels here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking promtail as an example, I don't see any stage that can duplicate log lines.
I was only considering the case where the same logline is being emitted by some jobs, and for whatever reason, they all have the same stream labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with just checking (indexed_labels, ts, line)
for equality here without specifying a formal stance yet. We can later choose to handle this differently if we choose as long as it doesn't seriously break backwards compatibility in the future. Basically, before we include this in a release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined towards keeping it as is to avoid supporting it partially, i.e. without taking care of required changes in SampleIterator
. If you strongly feel we should make this change, please let me know, and I will take care of it with my string interning change.
|
||
if hb.format >= UnorderedWithMetadataHeadBlockFmt { | ||
// Serialize metadata labels | ||
n = binary.PutUvarint(encBuf, uint64(len(metaLabels))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had decided to write the length of the whole nonIndexedLabels section first so that we can choose to skip it altogether and jump to the beginning of the next entry. I can take care of it while working on string interning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. We can do it here, but we'd need to iterate through all the labels and sum up their lengths. With strings interning, IIUC, the label and values would be refs to wherever the actual string is. Therefore, in that case we wouldn't need to iterate through all the label value pairs, but rather multiply len(metaLabels)
by twice the ref size, right?
In that case, I think it would make more sense to do it in the PR for strings interning directly? To not complicate this PR more than necessary provided that this part would change. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are writing data with variable length encoding, the number of bytes can't be estimated beforehand. We will have to write the data to a temp buffer first, calculate and write its size first and then write the data from temp buffer. I will take care of it while working on string interning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind, this will break compatibility between this PR and a future PR unless we add a newer format than UnorderedWithMetadataHeadBlockFmt
. This is one of the benefits of not setting this as the default yet (via DefaultHeadBlockFmt = UnorderedHeadBlockFmt
) -- we can change the implementation in a later PR before we start using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor nits, mostly around tests. I think we can merge this once they get addressed. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left 2 minor suggestions. Overall it LGTM.
pkg/chunkenc/unordered_test.go
Outdated
rt: (recovered.(*unorderedHeadBlock)).rt, | ||
lines: (recovered.(*unorderedHeadBlock)).lines, | ||
size: (recovered.(*unorderedHeadBlock)).size, | ||
mint: (recovered.(*unorderedHeadBlock)).mint, | ||
maxt: (recovered.(*unorderedHeadBlock)).maxt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be using data from unordered
to build the expected value, right?
Otherwise, we won't catch any bugs since we are mostly comparing the same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Done. Thank you!
pkg/chunkenc/unordered_test.go
Outdated
rt: (recovered.(*unorderedHeadBlock)).rt, | ||
lines: (recovered.(*unorderedHeadBlock)).lines, | ||
size: (recovered.(*unorderedHeadBlock)).size, | ||
mint: (recovered.(*unorderedHeadBlock)).mint, | ||
maxt: (recovered.(*unorderedHeadBlock)).maxt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, lgtm
@@ -120,14 +131,14 @@ func (hb *unorderedHeadBlock) Append(ts int64, line string) error { | |||
// entries at the same time with the same content, iterate through any existing | |||
// entries and ignore the line if we already have an entry with the same content | |||
for _, et := range displaced[0].(*nsEntries).entries { | |||
if et == line { | |||
if et.line == line { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with just checking (indexed_labels, ts, line)
for equality here without specifying a formal stance yet. We can later choose to handle this differently if we choose as long as it doesn't seriously break backwards compatibility in the future. Basically, before we include this in a release
|
||
if hb.format >= UnorderedWithMetadataHeadBlockFmt { | ||
// Serialize metadata labels | ||
n = binary.PutUvarint(encBuf, uint64(len(metaLabels))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind, this will break compatibility between this PR and a future PR unless we add a newer format than UnorderedWithMetadataHeadBlockFmt
. This is one of the benefits of not setting this as the default yet (via DefaultHeadBlockFmt = UnorderedHeadBlockFmt
) -- we can change the implementation in a later PR before we start using it.
**What this PR does / why we need it**: In #9700, we added support for writing non-indexed labels from entries into chunks. This PR introduced two bugs: - When the buffered iterator is closed, the buffer to read metadata labels is put back into the pool but not set to nil. Subsequent uses of the iterator may write on top of the buffer that was put back on the pool. This may lead to inconsistent/incorrect results. - Inside the buffered iterator, we were not correctly handling EOFs while reading the number of labels and each label length. This was due to the `lastAttemp` variable not being reset. This PR adds a new test for these two bugs and also fixes them. **Which issue(s) this PR fixes**: Fixes #9700 **Special notes for your reviewer**: **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e)
**What this PR does / why we need it**: In #9700, we support encoding and decoding metadata for each entry into the chunks. In this PR we: - Update the bytes processed stats to account for the bytes from those non-indexed labels - Add new stats for bytes processed for those non-indexed labels - Add new ingestion metrics to track ingested non-indexed bytes **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e)
**What this PR does / why we need it**: In grafana#9700, we support encoding and decoding metadata for each entry into the chunks. In this PR we: - Update the bytes processed stats to account for the bytes from those non-indexed labels - Add new stats for bytes processed for those non-indexed labels - Add new ingestion metrics to track ingested non-indexed bytes **Checklist** - [ ] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](grafana@d10549e)
**What this PR does / why we need it**: In #9700, we support encoding and decoding non-indexed labels for each entry into the chunks. This PR adds support for writing/reading the non-indexed labels to/from the WAL. **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] If the change is worth mentioning in the release notes, add `add-to-release-notes` label - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e)
**What this PR does / why we need it**: In #9700, we support encoding and decoding metadata for each entry into the chunks. This PR adds support for returning metadata labels for matching entries in a query to the returned LabelResults. It also supports filtering out logs by metadata labels. **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) --------- Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
…and processing non-indexed labels (#10044) **What this PR does / why we need it**: In PR #9700, we added support for storing non-indexed labels in chunks. This PR optimizes resource usage for storing and processing non-indexed labels by doing string interning. We will store deduped label names and values as a list in chunks in the newly added non-indexed labels section. The labels would then be referenced in blocks by their index(called symbols). Additionally, I have started the convention of writing lengths of sections with their offsets within chunks, making it easier to introduce new sections. The section offsets and lengths would be stored at the end of the chunk, similar to [TOC](https://ganeshvernekar.com/blog/prometheus-tsdb-persistent-block-and-its-index/#a-toc) in TSDB. **Checklist** - [x] Tests updated
**What this PR does / why we need it**: In #9700, we support encoding and decoding metadata for each entry into the chunks. This PR adds support for returning metadata labels for matching entries in a query to the returned LabelResults. It also supports filtering out logs by metadata labels. **Special notes for your reviewer**: **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [x] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md` - [ ] For Helm chart changes bump the Helm chart version in `production/helm/loki/Chart.yaml` and update `production/helm/loki/CHANGELOG.md` and `production/helm/loki/README.md`. [Example PR](d10549e) --------- Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com> (cherry picked from commit 1d04cd5)
…and processing non-indexed labels (#10044) **What this PR does / why we need it**: In PR #9700, we added support for storing non-indexed labels in chunks. This PR optimizes resource usage for storing and processing non-indexed labels by doing string interning. We will store deduped label names and values as a list in chunks in the newly added non-indexed labels section. The labels would then be referenced in blocks by their index(called symbols). Additionally, I have started the convention of writing lengths of sections with their offsets within chunks, making it easier to introduce new sections. The section offsets and lengths would be stored at the end of the chunk, similar to [TOC](https://ganeshvernekar.com/blog/prometheus-tsdb-persistent-block-and-its-index/#a-toc) in TSDB. **Checklist** - [x] Tests updated (cherry picked from commit 9b554bb)
What this PR does / why we need it:
In #9694 we support adding metadata labels for each entry in the push payload. In this PR we take that metadata and add it to the entries in the chunk. Supporting serialization and deserialization of those metadata labels.
Which issue(s) this PR fixes:
Special notes for your reviewer:
This PR is built on top of #9694, therefore:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updateddocs/sources/upgrading/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PR