-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rhythm] Introduce block-builder and kafka ingest path #4533
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add unit test for block-builder * fmt * Update tests * cmon
* chore: remove gofakeit dependency (grafana#4274) * Further reduce Labes() calls in the metrics registry (grafana#4283) * Respect passed headers in read path requests (grafana#4287) * Ingester: Validate completed blocks (grafana#4256) * Add validate method to block Signed-off-by: Joe Elliott <number101010@gmail.com> * Add Validate usage in the ingester Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * add test and fix replay Signed-off-by: Joe Elliott <number101010@gmail.com> * increment metric Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Add `invalid_utf8` to reasons spans could be rejected (grafana#4293) * Add `invalid_utf8` to reasons spans could be rejected * Update changelog * Update docs * Ensure test covers invalid UTF-8 and not slack time * add signals for duplicate rf1 data (grafana#4296) Signed-off-by: Joe Elliott <number101010@gmail.com> * Bump anchore/sbom-action from 0.17.5 to 0.17.7 (grafana#4307) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.5...v0.17.7) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: Update readme with explore traces info (grafana#4263) * docs: Update readme with explore traces info Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * chore: remove spanlogger (grafana#4312) * chore: remove spanlogger * Query-Frontend: Add middleware to drop headers (grafana#4298) * header strip ware Signed-off-by: Joe Elliott <number101010@gmail.com> * comment Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * remove header strip wear from metrics summary Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Increase length of time compactions have to fail (grafana#4315) * increase length of time compactions have to fail Signed-off-by: Joe Elliott <number101010@gmail.com> * gen Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * docs: mark serverless as deprecated (grafana#4017) * docs: mark serverless as deprecated * Changelog + readme * docs: Remove duplicated examples (grafana#4295) This removes duplicates examples from the Configure TraceQL metrics page. Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com> * tempo-cli: support dropping multiple traces in a single operation (grafana#4266) * tempo-cli: support dropping multiple traces in a single operation * update final log message --------- Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> * [DOC] Add clarification for metrics summary and traceQL metrics (grafana#4316) * Add clarification for metrics summary and traceQL metrics * Apply suggestions from code review Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> * Update docs/sources/tempo/api_docs/metrics-summary.md --------- Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> * TraceQL metrics time range fixes (grafana#4325) * Disconnect job time range filtering from step, so that results in split backend/recent range is accurate * changelog * Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (grafana#4331) * Add doc about configuring TLS with Helm (grafana#4328) * Add doc about configuring TLS with Helm * Add memberlist and readinessProbe to example * Include server config for listening on TLS * Add note about scraping * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Add memcached config for TLS --------- Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * [DOC] Add TLS info to Helm chart doc (grafana#4334) --------- Signed-off-by: Joe Elliott <number101010@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com> Co-authored-by: Javier Molina Reyes <javiermolinar@live.com> Co-authored-by: Zach Leslie <zach.leslie@grafana.com> Co-authored-by: Joe Elliott <number101010@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Perry <Rperry2174@gmail.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> Co-authored-by: Alex Bikfalvi <alex@bikfalvi.com> Co-authored-by: Andrey Karpov <ndk@users.noreply.github.com> Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> Co-authored-by: Martin Disibio <martin.disibio@grafana.com> Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com>
* Validate distributor config. Finish encoder/decoder tests * Repair tests * Make SingleBinary work out of the box by defaulting to partition 0 * Fix first time startup where blockbuilder fails before ingester can create topic * Fix initial startup cycle time and delay
* Add more tests to the block-builder * stuff * Add comments
* Metrics generator read from kafka first pass * review feedback
* chore: remove gofakeit dependency (grafana#4274) * Further reduce Labes() calls in the metrics registry (grafana#4283) * Respect passed headers in read path requests (grafana#4287) * Ingester: Validate completed blocks (grafana#4256) * Add validate method to block Signed-off-by: Joe Elliott <number101010@gmail.com> * Add Validate usage in the ingester Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * add test and fix replay Signed-off-by: Joe Elliott <number101010@gmail.com> * increment metric Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Add `invalid_utf8` to reasons spans could be rejected (grafana#4293) * Add `invalid_utf8` to reasons spans could be rejected * Update changelog * Update docs * Ensure test covers invalid UTF-8 and not slack time * add signals for duplicate rf1 data (grafana#4296) Signed-off-by: Joe Elliott <number101010@gmail.com> * Bump anchore/sbom-action from 0.17.5 to 0.17.7 (grafana#4307) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.5 to 0.17.7. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.5...v0.17.7) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: Update readme with explore traces info (grafana#4263) * docs: Update readme with explore traces info Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * chore: remove spanlogger (grafana#4312) * chore: remove spanlogger * Query-Frontend: Add middleware to drop headers (grafana#4298) * header strip ware Signed-off-by: Joe Elliott <number101010@gmail.com> * comment Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * remove header strip wear from metrics summary Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Increase length of time compactions have to fail (grafana#4315) * increase length of time compactions have to fail Signed-off-by: Joe Elliott <number101010@gmail.com> * gen Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * docs: mark serverless as deprecated (grafana#4017) * docs: mark serverless as deprecated * Changelog + readme * docs: Remove duplicated examples (grafana#4295) This removes duplicates examples from the Configure TraceQL metrics page. Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com> * tempo-cli: support dropping multiple traces in a single operation (grafana#4266) * tempo-cli: support dropping multiple traces in a single operation * update final log message --------- Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> * [DOC] Add clarification for metrics summary and traceQL metrics (grafana#4316) * Add clarification for metrics summary and traceQL metrics * Apply suggestions from code review Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> * Update docs/sources/tempo/api_docs/metrics-summary.md --------- Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> * TraceQL metrics time range fixes (grafana#4325) * Disconnect job time range filtering from step, so that results in split backend/recent range is accurate * changelog * Fix to assert metrics query range before alignment because alignment may increase it, which is not the responsibility of the caller to account for (grafana#4331) * Add doc about configuring TLS with Helm (grafana#4328) * Add doc about configuring TLS with Helm * Add memberlist and readinessProbe to example * Include server config for listening on TLS * Add note about scraping * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Update docs/sources/tempo/configuration/network/tls.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Add memcached config for TLS --------- Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * [DOC] Add TLS info to Helm chart doc (grafana#4334) * fix deprecation warning by switching to DoBatchWithOptions (grafana#4343) Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com> * bump dskit to v0.0.0-20241115082728-f2a7eb3aa0e9 to leverage benefits for context causes for DoBatch calls. (grafana#4341) See grafana/dskit#576 Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com> * Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 (grafana#4282) * Bump github.com/minio/minio-go/v7 from 7.0.70 to 7.0.80 Bumps [github.com/minio/minio-go/v7](https://github.com/minio/minio-go) from 7.0.70 to 7.0.80. - [Release notes](https://github.com/minio/minio-go/releases) - [Commits](minio/minio-go@v7.0.70...v7.0.80) --- updated-dependencies: - dependency-name: github.com/minio/minio-go/v7 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Update serverless vendor --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Zach Leslie <zach.leslie@grafana.com> * update default config values to better align with production workloads (grafana#4340) * update default config values to better align with production workloads * Update CHANGELOG.md and config docs * Ingester memory improvements by adjusting prealloc (grafana#4344) * remove trace ids Signed-off-by: Joe Elliott <number101010@gmail.com> * linear buckets Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * tuney tune Signed-off-by: Joe Elliott <number101010@gmail.com> * metric misses and increase pool size Signed-off-by: Joe Elliott <number101010@gmail.com> * lint Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 (grafana#4302) * Bump github.com/Azure/azure-sdk-for-go/sdk/azcore from 1.13.0 to 1.16.0 Bumps [github.com/Azure/azure-sdk-for-go/sdk/azcore](https://github.com/Azure/azure-sdk-for-go) from 1.13.0 to 1.16.0. - [Release notes](https://github.com/Azure/azure-sdk-for-go/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/release.md) - [Commits](Azure/azure-sdk-for-go@sdk/azcore/v1.13.0...sdk/azcore/v1.16.0) --- updated-dependencies: - dependency-name: github.com/Azure/azure-sdk-for-go/sdk/azcore dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update serverless vendor --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Zach Leslie <zach.leslie@grafana.com> * Use Prometheus fast regexp (grafana#4329) * basic integration Signed-off-by: Joe Elliott <number101010@gmail.com> * patch tests for new meaning Signed-off-by: Joe Elliott <number101010@gmail.com> * patch up more tests Signed-off-by: Joe Elliott <number101010@gmail.com> * add basic tests Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog + docs Signed-off-by: Joe Elliott <number101010@gmail.com> * remove benches Signed-off-by: Joe Elliott <number101010@gmail.com> * Cleaned up + tests Signed-off-by: Joe Elliott <number101010@gmail.com> * comment Signed-off-by: Joe Elliott <number101010@gmail.com> * lint Signed-off-by: Joe Elliott <number101010@gmail.com> * Update docs/sources/tempo/traceql/_index.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * comment Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Fix broken link in service-graphs docs (grafana#4351) * Fix minor typo in TraceQL docs (grafana#4356) * Bump default memcached version (grafana#4363) * Exemplar fixes (grafana#4366) * Fix exemplars based on duration to convert to seconds, fix various other issues * changelog * fix: initialize histogram buckets to 0 to avoid them being downsampled (grafana#4368) * initialized histogram buckets to 0 to avoid them being downsampled * Ingester/Generator Live trace cleanup (grafana#4365) * moved trace sizes somewhere shareable Signed-off-by: Joe Elliott <number101010@gmail.com> * use tracesizes in ingester Signed-off-by: Joe Elliott <number101010@gmail.com> * make tests work Signed-off-by: Joe Elliott <number101010@gmail.com> * trace bytes in generator Signed-off-by: Joe Elliott <number101010@gmail.com> * remove traceCount Signed-off-by: Joe Elliott <number101010@gmail.com> * live trace shenanigans Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * Update modules/generator/processor/localblocks/livetraces.go Co-authored-by: Mario <mariorvinas@gmail.com> * Update modules/ingester/instance.go Co-authored-by: Mario <mariorvinas@gmail.com> * Test cleanup. Add sz test, restore commented out and fix e2e Signed-off-by: Joe Elliott <number101010@gmail.com> * remove todo comment Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Mario <mariorvinas@gmail.com> * Bump anchore/sbom-action from 0.17.7 to 0.17.8 (grafana#4371) Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.17.7 to 0.17.8. - [Release notes](https://github.com/anchore/sbom-action/releases) - [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md) - [Commits](anchore/sbom-action@v0.17.7...v0.17.8) --- updated-dependencies: - dependency-name: anchore/sbom-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update for IDs change * Only run blockbuilder if ingest enabled --------- Signed-off-by: Joe Elliott <number101010@gmail.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Alex Bikfalvi <alex.bikfalvi@grafana.com> Signed-off-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com> Co-authored-by: Javier Molina Reyes <javiermolinar@live.com> Co-authored-by: Zach Leslie <zach.leslie@grafana.com> Co-authored-by: Joe Elliott <number101010@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Perry <Rperry2174@gmail.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> Co-authored-by: Alex Bikfalvi <alex@bikfalvi.com> Co-authored-by: Andrey Karpov <ndk@users.noreply.github.com> Co-authored-by: Jennifer Villa <jvilla2013@gmail.com> Co-authored-by: Martin Disibio <martin.disibio@grafana.com> Co-authored-by: Markus Toivonen <markus.toivonen@hoxhunt.com> Co-authored-by: Daniel Strobusch <1847260+dastrobu@users.noreply.github.com> Co-authored-by: Carles Garcia <carles.garciacabot@grafana.com>
* Use mapping for assigning partitions * Use mapping for assigning partitions in the generator too * Add support for SASL auth to kafka clients
* Extract block-builder into its own module * Update /operations and examples * No ephemeral storage * No rolling strategy either * fmt and compile * Address review comment
…a#4410) * Correctly pass start/end times * Different code, same result
* Multiple fixes to cycle consumption * fmt * happy now? * ups
…ue data for reads (grafana#4411) * wip: separate non-flushing local blocks processor to store new queue data for reads * Make real config for non-flushing local blocks processor, optional, validate wal config and use defaults if needed * Fix defaulting of second WAL config
* Make ID generator more robust * Simplify
* Make ID generator more robust * Simplify
Signed-off-by: Joe Elliott <number101010@gmail.com>
* Make blockbuilder tests closer to real kafka and less implementation specific by always enabling support for consumer groups, call commit control func in order * Verify last committed offset in each test * hide test function * lint * lint
* Alternate block-builder consume * Set timeout on PollFetches, reduce initial poll delay, update 1 test to work using real consumergroup functionality * restore metrics * Re-add original partition lag metric, polled in separate goroutine. Fix consume loop to only consume full-duration cycles for more determinism * merge conflict * Review feedback * Review feedback * Comment * code cleanup, lint * logs * code cleanup * lint * Review feedback * Remove missed lookback_on_no_commit config in e2e tests and regen manifest * Review feedback
mapno
force-pushed
the
main-rhythm-rebased
branch
from
January 10, 2025 11:08
e4c9f53
to
c2b668e
Compare
mapno
changed the title
[WIP] [rhythm] Add block-builder
[rhythm] Introduce block-builder and kafka ingest path
Jan 10, 2025
mapno
requested review from
joe-elliott,
mdisibio,
yvrhdn,
zalegrala,
electron0zero,
ie-pham,
stoewer and
javiermolinar
as code owners
January 10, 2025 12:41
mdisibio
approved these changes
Jan 10, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What this PR does:
This PR introduces the new Kafka-based Tempo architecture.
The distributors are updated to write all incoming requests to Kafka, as well as to ingesters, while generators and block-builders consume the data from it.
The block builder is a new component introduced to handle writing data for long-term storage and allowing the decoupling of the write and read paths in Tempo. It consumes trace data from Kafka and asynchronously build blocks that are shipped to object storage.
This is PR 1 of X, not recommended to use yet.
NOTE: This project is named rhythm. You may find references to rhythm throughout Github or the code itself.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]