store: Thanos consumes 40G at startup #5367

fagossa · 2022-05-16T16:20:06Z

Hello,

I'm having some uncontrolled memory consumption with thanos store.

What happened:

At start up, there is a peak of memory of ~40GiB that later decreases over time to ~26GiB. However, depending on the load, it reached back the peak value and even beyond (which means OEMKill).

even when i'm using a memcached precisely to avoid this situation.

I currently have more that 145k blocks in my s3 storage and more than 80 prometheus (+sidecars).

What I expected:

As per the use of a cache I expected the memory to stay low.

At least it seems the cache is used

current configuration

thanos store \
--data-dir="/var/thanos/store" \
--objstore.config-file="/etc/thanos/bucket.yml" \
--http-address="0.0.0.0:10902" \
--grpc-address="0.0.0.0:10901" \
--log.format="json" \
--log.level="info" \
--store.index-header-posting-offsets-in-mem-sampling=50 \
--index-cache.config-file="/etc/thanos/thanos-store-cache-config.yml"

this is my cache configuration

type: MEMCACHED
config:
  addresses: ["memcached:11211"]
  timeout: 3s
  max_idle_connections: 200
  max_async_concurrency: 20
  max_async_buffer_size: 10000000
  max_item_size: 300MiB
  max_get_multi_concurrency: 5
  max_get_multi_batch_size: 20
  dns_provider_update_interval: 10s

Environment:

thanos 0.25.1
prometheus 2.28.1

What have I tested so far?

1 - sharding by date using flags

I created several stores for 3 months periods using this pattern

thanos store \
...
--max-time=-12w \
--min-time=-24w \

but, all stores ended-up consuming the same amount of memory.

2 - changing the compact/level

I basically followed this issue #325

Anything else

I don't know if this information can be useful but the amt of samples changes greatly depending on my prometheus

The text was updated successfully, but these errors were encountered:

GiedriusS · 2022-05-16T20:15:02Z

Do you have persistent storage on your Thanos Store pods (I assume k8s here)? The RAM usage probably comes from building binary index headers. Maybe there is some opportunity here to use sync.Pool to have a constant RAM usage. With persistent storage, you wouldn't have to rebuild them each time. Could you upload a profile of the memory usage of Thanos Store just after the start of the process?

fagossa · 2022-05-17T08:55:49Z

Hello, thanks a lot for your answer!

We are actually using docker swarm with persistent volumes. I do not think there is a problem at this level because we have hundreds of services (not related to prometheus+thanos) running without any issue.

As per the memory, here we have the memory for the store using the bucket with 145k blocks after a reboot

and here we have a similar behavior of one of the stores sharded on three months (using min-time and max-time) after a reboot

Note:

This might be unrelated but the thanos compact is down because there are some overlaps.

stale · 2022-07-31T04:24:46Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale bot added the stale label Jul 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

store: Thanos consumes 40G at startup #5367

store: Thanos consumes 40G at startup #5367

fagossa commented May 16, 2022 •

edited

Loading

GiedriusS commented May 16, 2022

fagossa commented May 17, 2022

stale bot commented Jul 31, 2022

store: Thanos consumes 40G at startup #5367

store: Thanos consumes 40G at startup #5367

Comments

fagossa commented May 16, 2022 • edited Loading

What happened:

What I expected:

current configuration

Environment:

What have I tested so far?

1 - sharding by date using flags

2 - changing the compact/level

Anything else

GiedriusS commented May 16, 2022

fagossa commented May 17, 2022

stale bot commented Jul 31, 2022

fagossa commented May 16, 2022 •

edited

Loading