Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166

Conversation

sandeepsukhani
Copy link
Contributor

What this PR does / why we need it:
Boltdb shipper keeps uploading modified boltdb files by ingesters, which are by default created weekly. Queriers also downloads them during reads and keeps downloading the updated files. This means for a big cluster, files could grow a lot and ingesters have to upload bigger files even for minor updates and on the other side Queriers have to download them.

This PR enforces periodic config for index tables to be 24h when using boltdb shipper, which would help create smaller files. We anyways wanted to move towards that in future to help to add more components which can modify the index other than ingesters. The plan is to let only ingesters modify last 24 hours of the index and let other components to modify the rest of the index.

I think we should do it now while boltdb shipper is still in an experimental stage to avoid any problems in future and doing the provisioning in code for variable index file sizes.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
Since this would not work with existing deployments which already used non-24h periodic duration, I am going to build a helper tool to rebuild boltdb files to make them have 24 hours worth of index.

Checklist

  • Documentation added

@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from 77aa519 to a80ad48 Compare June 3, 2020 09:17
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2020

Codecov Report

Attention: Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.

Project coverage is 62.77%. Comparing base (2a596a7) to head (21ea033).

Files with missing lines Patch % Lines
pkg/loki/modules.go 0.00% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2166      +/-   ##
==========================================
- Coverage   62.91%   62.77%   -0.14%     
==========================================
  Files         162      162              
  Lines       13952    13959       +7     
==========================================
- Hits         8778     8763      -15     
- Misses       4491     4512      +21     
- Partials      683      684       +1     
Files with missing lines Coverage Δ
pkg/loki/loki.go 0.00% <ø> (ø)
pkg/storage/store.go 66.48% <100.00%> (+4.45%) ⬆️
pkg/loki/modules.go 5.04% <0.00%> (-5.73%) ⬇️

... and 3 files with indirect coverage changes

@cyriltovena cyriltovena requested a review from slim-bean June 9, 2020 18:59
@stale
Copy link

stale bot commented Jul 9, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jul 9, 2020
@stale stale bot closed this Jul 17, 2020
@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jul 17, 2020
@sandeepsukhani sandeepsukhani added the keepalive An issue or PR that will be kept alive and never marked as stale. label Jul 17, 2020
@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from a80ad48 to 4f37230 Compare July 27, 2020 08:20
@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 27, 2020
Copy link
Collaborator

@slim-bean slim-bean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sandeepsukhani sandeepsukhani force-pushed the boltdb-shipper-24h-periodic-config branch from bf669f2 to 0c07aba Compare July 30, 2020 07:44
@sandeepsukhani sandeepsukhani merged commit d5c840b into grafana:master Jul 30, 2020
cyriltovena pushed a commit to cyriltovena/loki that referenced this pull request Jun 11, 2021
* First step towards Cortex using services model. Only Server module converted.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted runtimeconfig.Manager to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added /services page.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted memberlistKV to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed runtimeconfig tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Ring to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Log waiting for other module to initialize.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted overrides to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted client pool to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert ring lifecycler into a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted HA Tracker to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Distributor to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Handle nil from wrappedService call.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Explain why Server uses service and not a wrappedService.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make service from Store.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert ingester to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert querier initialization into a service. This isn't full
conversion, but more work would be required to fully use services
for querier. Left TODO comment instead.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Listen for module failures, and log them.

Also log when all services start successfully.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert blockQueryable and UserStore into a services.

UserStore now does initial sync in Starting state.
blockQueryable doesn't return querier until it has finished starting.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Wait a little before shutdown.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted queryFrontend to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Logging

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert TableManager to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert Ruler to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert Configs to service

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Convert AlertManager to service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed init methods back.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed tests.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Converted Compactor to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Polishing, comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Lint comments.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Stop server only after all other modules have finished.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't send more jobs to lifecycler loop, if it's not running anymore.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Don't stop Server until other modules have stopped.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed Compactor from All target. It was meant to be for testing only.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* More comments around startup logic.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* moduleServiceWrapper doesn't need full Cortex, only serviceMap

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Messages

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix outdated comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Start lifecycler in starting functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed comment. Return lifecycler's failure case, if any, as error.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix lifecycler usage.

Pass independent context to lifecycler, so that it doesn't stop too early.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix test.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Removed obsolete waiting code. Only log error if it is real error.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Renamed servManager to subservices.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Addressing review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix test.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix compilation errors after rebase.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Extracted code that creates server service into separate file.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added some helper methods.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper methods to simplify service creation.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Helper functions for manager.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixes, use helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Helper function

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Use helper functions to reduce amount of code.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added tests for helper functions.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed imports

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Simplify code.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Stop and wait until stopped.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Imports

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Manage compaction and shipper via subservices manager.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Improve error message.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comment.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added unit test for Cortex initialization and module dependencies.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Comments, return errors.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Unified /ready handlers into one, that reflects state of all services.

It also uses ingester's check (if configured) to limit rate of starting
ingesters.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix http handler logic. Also renamed it.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Address review feedback.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* One more test... since Shutdown already stops Run, no need to call Stop().

We can use Stop in the test instead.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* CHANGELOG.md

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fixed integration test, old versions of Cortex didn't have /ready probe.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Make lint happy.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Mention /ready for all services.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Wrap "not running" error into promql.ErrStorage.

That makes API handler to return HTTP code 500 (server error)
instead of 422 (client error).

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Expose http port via method on HTTPService.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Print number of services in each state.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix comment and remove obsolete line.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Fix compile errors after rebase.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Rebased and converted data purger to a service.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>

* Pass context to the bucket client.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants