-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unshipped blocks when out of order writes are enabled #5402
Comments
I was just about to raise this same bug |
We have tested various values for |
I will try to take a look on this this week! cc @yeya24 |
Is it because ingester shipper didn't upload compacted blocks? https://github.com/cortexproject/cortex/blob/master/pkg/ingester/ingester.go#L2031 |
Also raised thanos-io/thanos#6462 on Thanos side. |
Is there anything outstanding that's blocking the merge still? |
@disambiguationuk I think we can merge this now #5416, I just need to rebase and resolve conflicts. With this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043 we should be able to upload dynamically |
Any updates on this fix, is it still being worked on? |
Hi @AmerSelimovic, sorry for the delay. The fix should be ready but I want to see if I can verify it first in our testing environment. I should get it done this week. And if you are willing to test some prebuilt image, it would be very helpful |
@AmerSelimovic Actually I believe the bug is already fixed. If the tenant has OOO time window > 0 enabled, shipper should upload compacted blocks. What we are trying to add in #5416 is to turn on/off shipper uploading compacted blocks dynamically in case OOO feature is enabled/disabled during runtime. |
Hi @yeya24. Not sure what do you propose fixed the reported bug? You think it is okay with this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043 |
The fix is to always upload compacted blocks in ingester so OOO compacted blocks can be uploaded to object store |
Btw https://github.com/cortexproject/cortex/releases/tag/v1.16.0-rc.0 is out. Feel free to try it out and see if it fixes this issue |
https://github.com/cortexproject/cortex/releases/tag/v1.17.0-rc.0 is out. It should address this issue completely as overlapped blocks will not be compacted by Prometheus anymore. Compactor will handle that. |
… parameterize uploading compacted blocks In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (cortexproject#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (cortexproject#5625). In v1.17, ingesters stopped uploading compacted blocks (cortexproject#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <charlie_le@apple.com>
… parameterize uploading compacted blocks (#5959) In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (#5625). In v1.17, ingesters stopped uploading compacted blocks (#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <charlie_le@apple.com>
… parameterize uploading compacted blocks (#5959) In v1.15.2, ingesters configured with OOO samples ingestion enabled could hit this bug (#5402) where ingesters would not upload compacted blocks (thanos-io/thanos#6462). In v1.16.1, ingesters are configured to always upload compacted blocks (#5625). In v1.17, ingesters stopped uploading compacted blocks (#5735). This can cause problems for users upgrading from v1.15.2 with OOO ingestion enabled to v1.17 because both versions are hard coded to disable uploading compacted blocks from the ingesters. The workaround was to downgrade from v1.17 to v1.16 to allow those compacted blocks to be uploaded (and eventually deleted). The new flag is set to true by default which reverts the behavior of the ingester uploading compacted blocks back to v1.16. Signed-off-by: Charlie Le <charlie_le@apple.com>
Describe the bug
Unshipped blocks are shown in the
cortex_ingester_oldest_unshipped_block_timestamp_seconds
metric and are also visible in the ingester storage when out of order writes are enabled with the configuration introduced in #4964. Blocks are accumulating on the ingester as long as the config is set.To Reproduce
Expected behavior
Expecting to see no unshipped blocks on the ingester and have the metric
cortex_ingester_oldest_unshipped_block_timestamp_seconds
at value 0.Environment
Infrastructure: Kubernetes
Deployment tool: Helm
Cortex version 1.15.2
Chart version 2.1.0
Additional Context
Tested with following two combinations of configurations and they produced the same result.
and
Metrics:
cortex_ingester_shipper_uploads_total
shows that block uploads are being donecortex_ingester_shipper_upload_failures_total
does not show any failurescortex_compactor_runs_completed_total
shows that compactions are being donecortex_compactor_runs_failed_total
shows no failed compactionsLogs:
There are no errors in Cortex component logs.
Only logs that could point to something are ingester logs regarding blocks overlapping, for example
Alerts:
CortexIngesterHasUnshippedBlocks
alert from cortex-jsonnet is triggered as there are unshipped blocks available.The text was updated successfully, but these errors were encountered: