-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166
enforce requirment for periodic config for index tables to be 24h when using boltdb shipper #2166
Conversation
77aa519
to
a80ad48
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2166 +/- ##
==========================================
- Coverage 62.91% 62.77% -0.14%
==========================================
Files 162 162
Lines 13952 13959 +7
==========================================
- Hits 8778 8763 -15
- Misses 4491 4512 +21
- Partials 683 684 +1
|
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
a80ad48
to
4f37230
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…n using boltdb shipper
bf669f2
to
0c07aba
Compare
* First step towards Cortex using services model. Only Server module converted. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted runtimeconfig.Manager to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added /services page. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted memberlistKV to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed runtimeconfig tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Ring to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Log waiting for other module to initialize. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted overrides to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted client pool to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert ring lifecycler into a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted HA Tracker to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Distributor to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Handle nil from wrappedService call. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Explain why Server uses service and not a wrappedService. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make service from Store. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert ingester to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert querier initialization into a service. This isn't full conversion, but more work would be required to fully use services for querier. Left TODO comment instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Listen for module failures, and log them. Also log when all services start successfully. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert blockQueryable and UserStore into a services. UserStore now does initial sync in Starting state. blockQueryable doesn't return querier until it has finished starting. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Wait a little before shutdown. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted queryFrontend to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Logging Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert TableManager to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert Ruler to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert Configs to service Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Convert AlertManager to service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed init methods back. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed tests. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Converted Compactor to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Polishing, comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Lint comments. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stop server only after all other modules have finished. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't send more jobs to lifecycler loop, if it's not running anymore. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Don't stop Server until other modules have stopped. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed Compactor from All target. It was meant to be for testing only. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * More comments around startup logic. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * moduleServiceWrapper doesn't need full Cortex, only serviceMap Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Messages Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix outdated comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Start lifecycler in starting functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed comment. Return lifecycler's failure case, if any, as error. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix lifecycler usage. Pass independent context to lifecycler, so that it doesn't stop too early. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix test. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Removed obsolete waiting code. Only log error if it is real error. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Renamed servManager to subservices. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Addressing review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix test. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix compilation errors after rebase. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Extracted code that creates server service into separate file. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added some helper methods. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper methods to simplify service creation. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Helper functions for manager. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixes, use helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Helper function Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Use helper functions to reduce amount of code. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added tests for helper functions. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed imports Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Simplify code. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Stop and wait until stopped. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Imports Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Manage compaction and shipper via subservices manager. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Improve error message. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comment. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added unit test for Cortex initialization and module dependencies. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Comments, return errors. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Unified /ready handlers into one, that reflects state of all services. It also uses ingester's check (if configured) to limit rate of starting ingesters. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Added //nolint:errcheck to `defer services.StopAndAwaitTerminated(...)` calls. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix http handler logic. Also renamed it. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Address review feedback. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * One more test... since Shutdown already stops Run, no need to call Stop(). We can use Stop in the test instead. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * CHANGELOG.md Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fixed integration test, old versions of Cortex didn't have /ready probe. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Make lint happy. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Mention /ready for all services. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Wrap "not running" error into promql.ErrStorage. That makes API handler to return HTTP code 500 (server error) instead of 422 (client error). Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Expose http port via method on HTTPService. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Print number of services in each state. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix comment and remove obsolete line. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Fix compile errors after rebase. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Rebased and converted data purger to a service. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com> * Pass context to the bucket client. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
What this PR does / why we need it:
Boltdb shipper keeps uploading modified boltdb files by ingesters, which are by default created weekly. Queriers also downloads them during reads and keeps downloading the updated files. This means for a big cluster, files could grow a lot and ingesters have to upload bigger files even for minor updates and on the other side Queriers have to download them.
This PR enforces periodic config for index tables to be 24h when using boltdb shipper, which would help create smaller files. We anyways wanted to move towards that in future to help to add more components which can modify the index other than ingesters. The plan is to let only ingesters modify last 24 hours of the index and let other components to modify the rest of the index.
I think we should do it now while boltdb shipper is still in an experimental stage to avoid any problems in future and doing the provisioning in code for variable index file sizes.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Since this would not work with existing deployments which already used non-24h periodic duration, I am going to build a helper tool to rebuild boltdb files to make them have 24 hours worth of index.
Checklist