-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sidecar: Allow Thanos backup when local compaction is enabled #206
Comments
Hey, thanks for trying out Thanos. Your are doing nothing wrong. Early on we hardcoded the sidecar to only upload blocks of comapction level 0, i.e. those that never got compacted. With the garbage collection behavior the compactor has nowadays, it should be safe though to also upload historic data and potentially double-upload some data without lasting consequences. Just didn't get to changing the bahvior yet. In the meantime, you could just manually upload those blocks to the bucket of course. |
Yea we could now safely drop the rule of uploading only blocks with compaction level 0, however, I am just curious, is there any use case or reason why anyone want to do compaction on local, Prometheus level instead of bucket level? By default we recommend to set
to the same value to turn off local compaction at all. If one decide to actually do local compaction I can see some (unlikely, but..) race condition, when sidecar is not able to upload for some time from various reasons and Prometheus will have enough time to compact some blocks and kill 0-level block. This way nothing would be uploaded. More importantly, our rule of uploading only 0 level makes it more difficult to use thanos sidecar on already running Prometheus instances. I think we should just upload all levels by default |
Cool, cheers guys. @Bplotka in regards to your question about local tsdb compaction. I'm guessing those who are not using an object store would still want local compaction, to save disk. I've heard of teams out there with a year+ worth of time series. Thanks for flipping PR 207. I've tried testing this out in my fork, but I am not seeing the the desired uploads. |
@V3ckt0r let's move this discussion to the #207 PR then. I think you cannot see these uploads because thanos marked these as "uploaded" in I know that marking them as uploaded, when they were not was, bit weird, maybe initially we should name it |
Seems to work for @V3ckt0r |
hm just rethinking... @V3ckt0r
In that case they can specify to not upload things and this issue will never occur (: |
OK new approach emerged: On every upload attempt Thanos should:
@fabxc Do we want to require admin endpoint if user does configure local compaction? And otherwise just error sidecar? I think so. |
As @TimSimmons mentioned, we should add more info in docs as well, how to configure Prometheus for the best experience. |
I'm looking at how to integrate the sidecar with our existing Prometheus instances and I'm wondering whether the sidecar should try to only ship the most compacted blocks, within the limits of the data retention window. The advantages:
The disadvantages:
|
Sorry for delay @mattbostock! Regarding mentioned benefits:
True, but not sure if req/band of object store is actually an issue.
Is that really the needed if you keep scraper small? (24h retention?)
Don't get that, what would be simplified? I am afraid all the disadvantages you mentioned are true, and they are "winning" with the benefits. There are couple of more problems:
|
Agree. #294 should help to determine the exact usage in access logs.
Reducing Prometheus' data retention means that you're reducing the window for recovery during a disaster recovery scenario. For example, if you're using an on-premise object store that has a catastrophic failure (e.g. datacentre goes up in flames) and Prometheus' data retention is 30 days, that gives you more time to configure a new object store in a new datacentre and configure Thanos to send the data to the new object store. The sample could apply to cloud storage if/when a provider had/has a significantly long outage. There are mitigations for this (the most obvious being to not run a single object store in a single datacentre), but most of these are more complex and more costly than retaining data for longer in Prometheus (at least for on-premise installations). I'm not suggesting one over the other, but highlighting that the retention period is a factor to consider.
Thanos currently requires that compaction is disabled in Prometheus, which means setting a command-line flag for Prometheus.
Good point, this is a significant downside. |
Cool, I see the disaster recovery goal for some users, but not sure if Prometheus scraper should be treated as a backup solution to your on-premise/cloud/magic object storage. There must be some dedicated tools for that. The main important use case I can see for local compaction is when user want to migrate to thanos and upload all old blocks already compacted by their vanilla Prometheus servers with long retention. The easiest way would be to just glue sidecar to existing Prom server that will be smart enough to detect what "sources" are missing in an object store. This way, we can allow local compaction for longer local storage if you wish, and upload sources that are not upstreamed yet. This is what I proposed here: #206 (comment) There are some pain points here that needs to be solved, though.
If you really want to keep longer retention for Prometheus long term -> nothing blocks you from that with above logic, except one more thing mentioned here: #283 Maybe it sounds reasonable to add some arbitrary min-time flag to sidecar to expose only fresh metrics? |
Ok, migration goal moved to another ticket: #348 This tickets stays as ability to run Prometheus with local compaction + sidecar for long period and use it as it is. However this does not make sense that much. since longer retention is not good for this reason: #82 This blocker makes me think that this (long retention + local compaction for longer usage) might be not in our scope. |
A bit late perhaps, but without knowing the internals of the tsdb I think the suggestion from @mattbostock makes sense. Compaction would be the limiting factor in how much data can be stored in a bucket. Not only does it have to download and upload all the data that get stored in the bucket multiple times, unlike the sidecars uploading in parallel the compaction is supposed to run as a singleton. This can be solved by splitting the data into more buckets, but that means more complicated setups where users have to manage which servers go to each buckets. To me it seems like most issues with uploading compacted blocks also apply to uploading raw blocks. Is global compaction still needed? The sidecars could downsample the data before uploading, I don't know about 2w blocks. It's also worth to consider that compacting before uploading would solve issues too, like #377 and #271 (long timerange queries across all promeheus sources while the compactor is running is very unreliable as some blocks are almost bound to have been removed.) |
Yea, local only compaction would solve some issues, nice idea, but unfortunately:
Yes. Reasons:
I totally see we want to allow local compaction with Thanos for some use cases, but we need to invest some time to solve it, but I think WITH global compactor ): |
BTW:
|
Yup, one drawback is of course that it means prometheus needs longer retention times which may not work for all deployments. FWIW, I am experimenting with this by adding a flag to the sidecar that specifies what compaction level the sidecar is uploading. I plan to upload only compaction level 5 where a block has about a weeks worth of data. I then won't run the global compactor, but will run "thanos downsample" or try to patch the sidecar to downsample before upload (which would have the drawback of only one level of downsampling...) I understand this is not the direction the project want to go for several reasons, but I think it's a worthwhile experiment that could provide some useful data as the alternative for us is to split the data into several buckets so compaction can keep up ( and I really like the simplicity of thanos and want to keep the deployment simple too :) |
Just a FYI. It then started the downsampling process which I estimated would take another 3 weeks to complete before the cycle would start over. I then aborted the whole process, let prometheus start compacting the data, got a new bucket and added a small patch to upload only blocks where compaction level == *flagShipperLevel to the sidecar. With a 3 week backlog, the whole compaction and upload process now took less than 4 hours. While one solution is to have multiple buckets to distribute the compaction load that way, would it be possible to add a flag to the sidecar to only upload blocks at a certain compaction level for those that have large enough prometheus servers? |
Totally missed this, sorry.
Yes, definitely valid use case, probably to separate issues. And useful as one time job. We are actually working on it as part of: observatorium/thanos-replicate#7 We also added sharding for the compactor, so you can deploy many that operates on different blocks. |
@bwplotka better to add |
This issue/PR has been automatically marked as stale because it has not had recent activity. Please comment on status otherwise the issue will be closed in a week. Thank you for your contributions. |
It would be amazing to get back to this. @daixiang0 no as it's Prometheus compaction, not Compactor really. |
I think the solution is vertical compaction which is quite stable as long as data is 1:1 |
/reopen |
Question: How can we solve this via the Besides, the current snapshot API is not flexible enough as we usually only want to do snapshot for the newly created block. |
Maybe my idea is stupid but in Prometheus blocks we already have:
Given that Thanos Sidecar locally also "knows" what block IDs have been already uploaded, maybe it could take a look at the same We already have |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
We have |
I agree. @fpetkovski is looking into this at the moment, I believe. |
This is what I understood so far:
I might enable this setup on one of our staging clusters and monitor for a bit to see if anything suspicious comes up. |
Prometheus will exclude the most recent block from compaction planning. Which means a block has 2h time window to upload for the sidecar. I think 2h is enough usually. |
Another downside of forced Let's say I have default retention of 15d and each block has 1GB of index data. Without those flags Prometheus will merge blocks up to 10% of 15d = 36h, with |
Hello 👋 Looks like there was no activity on this issue for the last two months. |
Hey @Bplotka @fabxc
Still pretty new to the Thanos code base. Going through it, one thing I've noticed with the backup behaviour is it seems to only dump the initial 2 hourly blocks. In my instance I've stood up thanos against an existing Prom server. My data dir looks as follows:
When standing up Thanos I see:
only the last two blocks are uploaded. From the flags for
thanos sidecar
I don't see a mechanism for specifying a period for backdating. Perhaps I am doing something wrong? Is this intentional for some reason (compute/performance)? Or am I simply listing a feature request here?Thanks.
The text was updated successfully, but these errors were encountered: