Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: Thanos consumes 40G at startup #5367

Open
fagossa opened this issue May 16, 2022 · 3 comments
Open

store: Thanos consumes 40G at startup #5367

fagossa opened this issue May 16, 2022 · 3 comments
Labels

Comments

@fagossa
Copy link

fagossa commented May 16, 2022

Hello,

I'm having some uncontrolled memory consumption with thanos store.

What happened:

At start up, there is a peak of memory of ~40GiB that later decreases over time to ~26GiB. However, depending on the load, it reached back the peak value and even beyond (which means OEMKill).
Capture d’écran 2022-05-16 à 17 16 56

even when i'm using a memcached precisely to avoid this situation.

I currently have more that 145k blocks in my s3 storage and more than 80 prometheus (+sidecars).

What I expected:

As per the use of a cache I expected the memory to stay low.

At least it seems the cache is used

Capture d’écran 2022-05-16 à 17 59 37

current configuration

thanos store \
--data-dir="/var/thanos/store" \
--objstore.config-file="/etc/thanos/bucket.yml" \
--http-address="0.0.0.0:10902" \
--grpc-address="0.0.0.0:10901" \
--log.format="json" \
--log.level="info" \
--store.index-header-posting-offsets-in-mem-sampling=50 \
--index-cache.config-file="/etc/thanos/thanos-store-cache-config.yml"

this is my cache configuration

type: MEMCACHED
config:
  addresses: ["memcached:11211"]
  timeout: 3s
  max_idle_connections: 200
  max_async_concurrency: 20
  max_async_buffer_size: 10000000
  max_item_size: 300MiB
  max_get_multi_concurrency: 5
  max_get_multi_batch_size: 20
  dns_provider_update_interval: 10s

Environment:

  • thanos 0.25.1
  • prometheus 2.28.1

What have I tested so far?

1 - sharding by date using flags

I created several stores for 3 months periods using this pattern

thanos store \
...
--max-time=-12w \
--min-time=-24w \

but, all stores ended-up consuming the same amount of memory.

2 - changing the compact/level

I basically followed this issue #325

Anything else

I don't know if this information can be useful but the amt of samples changes greatly depending on my prometheus

Capture d’écran 2022-05-16 à 17 53 20

@GiedriusS
Copy link
Member

Do you have persistent storage on your Thanos Store pods (I assume k8s here)? The RAM usage probably comes from building binary index headers. Maybe there is some opportunity here to use sync.Pool to have a constant RAM usage. With persistent storage, you wouldn't have to rebuild them each time. Could you upload a profile of the memory usage of Thanos Store just after the start of the process?

@fagossa
Copy link
Author

fagossa commented May 17, 2022

Hello, thanks a lot for your answer!

We are actually using docker swarm with persistent volumes. I do not think there is a problem at this level because we have hundreds of services (not related to prometheus+thanos) running without any issue.

As per the memory, here we have the memory for the store using the bucket with 145k blocks after a reboot

Capture d’écran 2022-05-17 à 10 29 20

and here we have a similar behavior of one of the stores sharded on three months (using min-time and max-time) after a reboot

Capture d’écran 2022-05-17 à 10 27 42

Note:

  • This might be unrelated but the thanos compact is down because there are some overlaps.

@stale
Copy link

stale bot commented Jul 31, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants