Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unshipped blocks when out of order writes are enabled #5402

Closed
AmerSelimovic opened this issue Jun 13, 2023 · 14 comments · Fixed by #5959
Closed

Unshipped blocks when out of order writes are enabled #5402

AmerSelimovic opened this issue Jun 13, 2023 · 14 comments · Fixed by #5959

Comments

@AmerSelimovic
Copy link

Describe the bug

Unshipped blocks are shown in the cortex_ingester_oldest_unshipped_block_timestamp_seconds metric and are also visible in the ingester storage when out of order writes are enabled with the configuration introduced in #4964. Blocks are accumulating on the ingester as long as the config is set.

To Reproduce

  1. Start Cortex 1.15.2
  2. Allow out of order writes using newly introduced configuration parameters introduced in Support out of order samples ingestion #4964
  3. Perform Write operations

Expected behavior

Expecting to see no unshipped blocks on the ingester and have the metric cortex_ingester_oldest_unshipped_block_timestamp_seconds at value 0.

Environment

  • Infrastructure: Kubernetes

  • Deployment tool: Helm

  • Cortex version 1.15.2

  • Chart version 2.1.0

Additional Context

Tested with following two combinations of configurations and they produced the same result.

out_of_order_time_window: 30m
out_of_order_cap_max: 32

and

out_of_order_time_window: 30m
out_of_order_cap_max: 32
skip_blocks_with_out_of_order_chunks_enabled: true

Metrics:

cortex_ingester_shipper_uploads_total shows that block uploads are being done
cortex_ingester_shipper_upload_failures_total does not show any failures
cortex_compactor_runs_completed_total shows that compactions are being done
cortex_compactor_runs_failed_total shows no failed compactions

Logs:

There are no errors in Cortex component logs.
Only logs that could point to something are ingester logs regarding blocks overlapping, for example

caller=compact.go:698 org_id=fake msg="Found overlapping blocks during compaction" ulid=01H2SSZF807FMM2FD52HFGA3N7

Alerts:

CortexIngesterHasUnshippedBlocks alert from cortex-jsonnet is triggered as there are unshipped blocks available.

@disambiguationuk
Copy link

I was just about to raise this same bug

@danieljosephchambers
Copy link

We have tested various values for out_of_order_time_window including very short ones like 10m and longer ones like 2w and the problem persists.

@alanprot
Copy link
Member

I will try to take a look on this this week!

cc @yeya24

@yeya24
Copy link
Contributor

yeya24 commented Jun 19, 2023

Is it because ingester shipper didn't upload compacted blocks? https://github.com/cortexproject/cortex/blob/master/pkg/ingester/ingester.go#L2031

@yeya24
Copy link
Contributor

yeya24 commented Jun 21, 2023

Also raised thanos-io/thanos#6462 on Thanos side.
I think make shipper upload compacted blocks works, but it might cause other issues (since we cannot identify compacted blocks generated by OOO or others)

@disambiguationuk
Copy link

Is there anything outstanding that's blocking the merge still?

@yeya24
Copy link
Contributor

yeya24 commented Aug 3, 2023

@disambiguationuk I think we can merge this now #5416, I just need to rebase and resolve conflicts.

With this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043 we should be able to upload dynamically

@AmerSelimovic
Copy link
Author

Any updates on this fix, is it still being worked on?

@yeya24
Copy link
Contributor

yeya24 commented Oct 24, 2023

Hi @AmerSelimovic, sorry for the delay. The fix should be ready but I want to see if I can verify it first in our testing environment. I should get it done this week.

And if you are willing to test some prebuilt image, it would be very helpful

@yeya24
Copy link
Contributor

yeya24 commented Oct 29, 2023

@AmerSelimovic Actually I believe the bug is already fixed. If the tenant has OOO time window > 0 enabled, shipper should upload compacted blocks.

What we are trying to add in #5416 is to turn on/off shipper uploading compacted blocks dynamically in case OOO feature is enabled/disabled during runtime.
If OOO is enabled when ingester starts, all blocks can be uploaded successfully.

@AmerSelimovic
Copy link
Author

Hi @yeya24.

Not sure what do you propose fixed the reported bug?
Because issues were also happening with out_of_order_time_window: 30m

You think it is okay with this change https://github.com/cortexproject/cortex/pull/5495/files#diff-e1032332627c413a3010c66b54b22b6e9835cf152fa339e40cf0b11204f7241fR2043

@yeya24
Copy link
Contributor

yeya24 commented Nov 9, 2023

The fix is to always upload compacted blocks in ingester so OOO compacted blocks can be uploaded to object store

@yeya24
Copy link
Contributor

yeya24 commented Nov 9, 2023

Btw https://github.com/cortexproject/cortex/releases/tag/v1.16.0-rc.0 is out. Feel free to try it out and see if it fixes this issue

@yeya24
Copy link
Contributor

yeya24 commented Apr 25, 2024

https://github.com/cortexproject/cortex/releases/tag/v1.17.0-rc.0 is out. It should address this issue completely as overlapped blocks will not be compacted by Prometheus anymore. Compactor will handle that.

@yeya24 yeya24 closed this as completed Apr 25, 2024
CharlieTLe added a commit to CharlieTLe/cortex that referenced this issue May 18, 2024
… parameterize uploading compacted blocks

In v1.15.2, ingesters configured with OOO samples ingestion enabled
could hit this bug (cortexproject#5402)
where ingesters would not upload compacted blocks
(thanos-io/thanos#6462).

In v1.16.1, ingesters are configured to always upload compacted blocks
(cortexproject#5625).

In v1.17, ingesters stopped uploading compacted blocks
(cortexproject#5735).

This can cause problems for users upgrading from v1.15.2 with OOO
ingestion enabled to v1.17 because both versions are hard coded to
disable uploading compacted blocks from the ingesters.

The workaround was to downgrade from v1.17 to v1.16 to allow those
compacted blocks to be uploaded (and eventually deleted).

The new flag is set to true by default which reverts the behavior of the
ingester uploading compacted blocks back to v1.16.

Signed-off-by: Charlie Le <charlie_le@apple.com>
yeya24 pushed a commit that referenced this issue May 19, 2024
… parameterize uploading compacted blocks (#5959)

In v1.15.2, ingesters configured with OOO samples ingestion enabled
could hit this bug (#5402)
where ingesters would not upload compacted blocks
(thanos-io/thanos#6462).

In v1.16.1, ingesters are configured to always upload compacted blocks
(#5625).

In v1.17, ingesters stopped uploading compacted blocks
(#5735).

This can cause problems for users upgrading from v1.15.2 with OOO
ingestion enabled to v1.17 because both versions are hard coded to
disable uploading compacted blocks from the ingesters.

The workaround was to downgrade from v1.17 to v1.16 to allow those
compacted blocks to be uploaded (and eventually deleted).

The new flag is set to true by default which reverts the behavior of the
ingester uploading compacted blocks back to v1.16.

Signed-off-by: Charlie Le <charlie_le@apple.com>
yeya24 pushed a commit that referenced this issue May 20, 2024
… parameterize uploading compacted blocks (#5959)

In v1.15.2, ingesters configured with OOO samples ingestion enabled
could hit this bug (#5402)
where ingesters would not upload compacted blocks
(thanos-io/thanos#6462).

In v1.16.1, ingesters are configured to always upload compacted blocks
(#5625).

In v1.17, ingesters stopped uploading compacted blocks
(#5735).

This can cause problems for users upgrading from v1.15.2 with OOO
ingestion enabled to v1.17 because both versions are hard coded to
disable uploading compacted blocks from the ingesters.

The workaround was to downgrade from v1.17 to v1.16 to allow those
compacted blocks to be uploaded (and eventually deleted).

The new flag is set to true by default which reverts the behavior of the
ingester uploading compacted blocks back to v1.16.

Signed-off-by: Charlie Le <charlie_le@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants