Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-indexed metadata to chunks #9700

Merged
merged 15 commits into from
Jul 18, 2023
Merged

Conversation

salvacorts
Copy link
Contributor

@salvacorts salvacorts commented Jun 13, 2023

What this PR does / why we need it:
In #9694 we support adding metadata labels for each entry in the push payload. In this PR we take that metadata and add it to the entries in the chunk. Supporting serialization and deserialization of those metadata labels.

Which issue(s) this PR fixes:

Special notes for your reviewer:
This PR is built on top of #9694, therefore:

  • Any change to the parent branch should be rebased here.
  • Once the parent is merged, rebase this branch to main and delete the commits from the parent branch.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR

@salvacorts salvacorts changed the base branch from main to salvacorts/metadata-push-payload June 13, 2023 11:23
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed, approach looks good to me. Left a few comments and happy to give a closer review+approval when you're ready. Would like to see some refactoring in bufferedIterator.moveNext if possible.

BytesBufferPool.Put(si.metaLabelsBuf[i])
}
}
si.metaLabelsBuf = make([][]byte, nLabels*2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See how the other pools are created and use the "github.com/prometheus/prometheus/util/pool" pkg, which supports different sub-pools by length categories. That allows better reuse than creating a slice here.

Copy link
Contributor Author

@salvacorts salvacorts Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

I set a maximum size of 256, i.e. max of 128 meta labels. That should be more than enough, but IIUC, we should now set a hard limit on this and refuse payloads with entries containing more than 128 meta-labels. Alternatively, instead of adding a limit and if we ever hit this maximum, we can increase the pool size. Wdyt?

return ts, si.buf[:lineSize], nil, true
}

// TODO: This is pretty similar to how we read the line size, and the metadata name and value sizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- this looks like it could all be one helper function which is nice considering how complex this code is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it on a separate PR since:

  • The changes to extract these to a separate function are not as simple as copy-pasting the code to a separate function. How we read the timestamp and line size is slightly different from how we read the number of labels and the length of the label names and values.
  • To make sure our refactoring doesn't worsen the performance, we'd need to run some benchmarks. That'll require time and this PR is blocking Metadata to labels result and filtering support #9702
  • Sandeep will work on implementing string interning inside the chunks for the metadata labels. That'll change this moveNext method so I don't think it's with refactoring it until we make those changes.

I've created this task to keep track of this optimization: https://github.com/grafana/loki-private/issues/792

Base automatically changed from salvacorts/metadata-push-payload to main July 13, 2023 07:13
@salvacorts salvacorts force-pushed the salvacorts/metadata-to-chunk branch from 0fa7568 to 21df77b Compare July 13, 2023 09:39
@salvacorts salvacorts marked this pull request as ready for review July 13, 2023 10:25
@salvacorts salvacorts requested a review from a team as a code owner July 13, 2023 10:25
Copy link
Contributor

@vlad-diachenko vlad-diachenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ;)

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look great to me. Left some minor suggestions and a comment which might need input from other folks.

pkg/chunkenc/memchunk.go Show resolved Hide resolved
pkg/chunkenc/unordered.go Show resolved Hide resolved
@@ -120,14 +131,14 @@ func (hb *unorderedHeadBlock) Append(ts int64, line string) error {
// entries at the same time with the same content, iterate through any existing
// entries and ignore the line if we already have an entry with the same content
for _, et := range displaced[0].(*nsEntries).entries {
if et == line {
if et.line == line {
Copy link
Contributor

@sandeepsukhani sandeepsukhani Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check for equality of metaLabels here to allow entries with the same timestamp and logline but different non-indexed labels? However, this would mean that we will have to generate a hash with both log line and non-indexed labels in SampleIterator, which uses it for deduping duplicate data due to replication.

It would be rare that someone would have log entries with the same timestamp and logline but different non-indexed labels, so I think we should keep it as is to avoid paying the cost to support this rare case. Just putting it out to see if anyone wants to share their thoughts here. If we decide not to support it, we should make it clear in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be rare that someone would have log entries with the same timestamp and logline but different non-indexed labels

Taking promtail as an example, I don't see any stage that can duplicate log lines. Therefore, this situation would only happen if the user is using a custom client that somehow is running into this scenario. IMO, to avoid having to compute a hash over the meta labels, and to keep the code simple, we should not check the non-indexed labels here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking promtail as an example, I don't see any stage that can duplicate log lines.

I was only considering the case where the same logline is being emitted by some jobs, and for whatever reason, they all have the same stream labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with just checking (indexed_labels, ts, line) for equality here without specifying a formal stance yet. We can later choose to handle this differently if we choose as long as it doesn't seriously break backwards compatibility in the future. Basically, before we include this in a release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am inclined towards keeping it as is to avoid supporting it partially, i.e. without taking care of required changes in SampleIterator. If you strongly feel we should make this change, please let me know, and I will take care of it with my string interning change.


if hb.format >= UnorderedWithMetadataHeadBlockFmt {
// Serialize metadata labels
n = binary.PutUvarint(encBuf, uint64(len(metaLabels)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had decided to write the length of the whole nonIndexedLabels section first so that we can choose to skip it altogether and jump to the beginning of the next entry. I can take care of it while working on string interning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We can do it here, but we'd need to iterate through all the labels and sum up their lengths. With strings interning, IIUC, the label and values would be refs to wherever the actual string is. Therefore, in that case we wouldn't need to iterate through all the label value pairs, but rather multiply len(metaLabels) by twice the ref size, right?

In that case, I think it would make more sense to do it in the PR for strings interning directly? To not complicate this PR more than necessary provided that this part would change. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are writing data with variable length encoding, the number of bytes can't be estimated beforehand. We will have to write the data to a temp buffer first, calculate and write its size first and then write the data from temp buffer. I will take care of it while working on string interning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind, this will break compatibility between this PR and a future PR unless we add a newer format than UnorderedWithMetadataHeadBlockFmt. This is one of the benefits of not setting this as the default yet (via DefaultHeadBlockFmt = UnorderedHeadBlockFmt) -- we can change the implementation in a later PR before we start using it.

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor nits, mostly around tests. I think we can merge this once they get addressed. Nice work!

pkg/chunkenc/memchunk_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/memchunk_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/memchunk_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered.go Show resolved Hide resolved
pkg/chunkenc/unordered.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered_test.go Outdated Show resolved Hide resolved
pkg/chunkenc/unordered_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left 2 minor suggestions. Overall it LGTM.

Comment on lines 310 to 314
rt: (recovered.(*unorderedHeadBlock)).rt,
lines: (recovered.(*unorderedHeadBlock)).lines,
size: (recovered.(*unorderedHeadBlock)).size,
mint: (recovered.(*unorderedHeadBlock)).mint,
maxt: (recovered.(*unorderedHeadBlock)).maxt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be using data from unordered to build the expected value, right?
Otherwise, we won't catch any bugs since we are mostly comparing the same values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Done. Thank you!

Comment on lines 333 to 337
rt: (recovered.(*unorderedHeadBlock)).rt,
lines: (recovered.(*unorderedHeadBlock)).lines,
size: (recovered.(*unorderedHeadBlock)).size,
mint: (recovered.(*unorderedHeadBlock)).mint,
maxt: (recovered.(*unorderedHeadBlock)).maxt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, lgtm

@@ -120,14 +131,14 @@ func (hb *unorderedHeadBlock) Append(ts int64, line string) error {
// entries at the same time with the same content, iterate through any existing
// entries and ignore the line if we already have an entry with the same content
for _, et := range displaced[0].(*nsEntries).entries {
if et == line {
if et.line == line {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with just checking (indexed_labels, ts, line) for equality here without specifying a formal stance yet. We can later choose to handle this differently if we choose as long as it doesn't seriously break backwards compatibility in the future. Basically, before we include this in a release


if hb.format >= UnorderedWithMetadataHeadBlockFmt {
// Serialize metadata labels
n = binary.PutUvarint(encBuf, uint64(len(metaLabels)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind, this will break compatibility between this PR and a future PR unless we add a newer format than UnorderedWithMetadataHeadBlockFmt. This is one of the benefits of not setting this as the default yet (via DefaultHeadBlockFmt = UnorderedHeadBlockFmt) -- we can change the implementation in a later PR before we start using it.

@sandeepsukhani sandeepsukhani merged commit ce91076 into main Jul 18, 2023
@sandeepsukhani sandeepsukhani deleted the salvacorts/metadata-to-chunk branch July 18, 2023 06:37
@salvacorts salvacorts mentioned this pull request Jul 19, 2023
7 tasks
salvacorts added a commit that referenced this pull request Jul 19, 2023
**What this PR does / why we need it**:

In #9700, we added support for
writing non-indexed labels from entries into chunks. This PR introduced
two bugs:

- When the buffered iterator is closed, the buffer to read metadata
labels is put back into the pool but not set to nil. Subsequent uses of
the iterator may write on top of the buffer that was put back on the
pool. This may lead to inconsistent/incorrect results.
- Inside the buffered iterator, we were not correctly handling EOFs
while reading the number of labels and each label length. This was due
to the `lastAttemp` variable not being reset.

This PR adds a new test for these two bugs and also fixes them.

**Which issue(s) this PR fixes**:
Fixes #9700

**Special notes for your reviewer**:

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
salvacorts added a commit that referenced this pull request Jul 21, 2023
**What this PR does / why we need it**:
In #9700, we support encoding and
decoding metadata for each entry into the chunks.
In this PR we:
- Update the bytes processed stats to account for the bytes from those
non-indexed labels
- Add new stats for bytes processed for those non-indexed labels
- Add new ingestion metrics to track ingested non-indexed bytes

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
LinTechSo pushed a commit to LinTechSo/loki that referenced this pull request Jul 21, 2023
**What this PR does / why we need it**:
In grafana#9700, we support encoding and
decoding metadata for each entry into the chunks.
In this PR we:
- Update the bytes processed stats to account for the bytes from those
non-indexed labels
- Add new stats for bytes processed for those non-indexed labels
- Add new ingestion metrics to track ingested non-indexed bytes

**Checklist**
- [ ] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
salvacorts added a commit that referenced this pull request Jul 21, 2023
**What this PR does / why we need it**:
In #9700, we support encoding and
decoding non-indexed labels for each entry into the chunks. This PR adds
support for writing/reading the non-indexed labels to/from the WAL.

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)
salvacorts added a commit that referenced this pull request Jul 24, 2023
**What this PR does / why we need it**:
In #9700, we support encoding and decoding metadata for each entry into
the chunks. This PR adds support for returning metadata labels for
matching entries in a query to the returned LabelResults. It also
supports filtering out logs by metadata labels.

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)

---------

Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
sandeepsukhani added a commit that referenced this pull request Jul 26, 2023
…and processing non-indexed labels (#10044)

**What this PR does / why we need it**:
In PR #9700, we added support for storing non-indexed labels in chunks.
This PR optimizes resource usage for storing and processing non-indexed
labels by doing string interning. We will store deduped label names and
values as a list in chunks
in the newly added non-indexed labels section. The labels would then be
referenced in blocks by their index(called symbols).

Additionally, I have started the convention of writing lengths of
sections with their offsets within chunks, making it easier to introduce
new sections. The section offsets and lengths would be stored at the end
of the chunk, similar to
[TOC](https://ganeshvernekar.com/blog/prometheus-tsdb-persistent-block-and-its-index/#a-toc)
in TSDB.

**Checklist**
- [x] Tests updated
sandeepsukhani pushed a commit that referenced this pull request Jul 26, 2023
**What this PR does / why we need it**:
In #9700, we support encoding and decoding metadata for each entry into
the chunks. This PR adds support for returning metadata labels for
matching entries in a query to the returned LabelResults. It also
supports filtering out logs by metadata labels.

**Special notes for your reviewer**:

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [x] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](d10549e)

---------

Co-authored-by: Sandeep Sukhani <sandeep.d.sukhani@gmail.com>
(cherry picked from commit 1d04cd5)
sandeepsukhani added a commit that referenced this pull request Jul 26, 2023
…and processing non-indexed labels (#10044)

**What this PR does / why we need it**:
In PR #9700, we added support for storing non-indexed labels in chunks.
This PR optimizes resource usage for storing and processing non-indexed
labels by doing string interning. We will store deduped label names and
values as a list in chunks
in the newly added non-indexed labels section. The labels would then be
referenced in blocks by their index(called symbols).

Additionally, I have started the convention of writing lengths of
sections with their offsets within chunks, making it easier to introduce
new sections. The section offsets and lengths would be stored at the end
of the chunk, similar to
[TOC](https://ganeshvernekar.com/blog/prometheus-tsdb-persistent-block-and-its-index/#a-toc)
in TSDB.

**Checklist**
- [x] Tests updated

(cherry picked from commit 9b554bb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants