-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receive's memory usage continues to grow in v0.31.0-rc.0 #6176
Comments
Would you mind posting a graph of head series and samples ingested for the same time period? The metrics are |
@philipgough @saswatamcode would it be possible to test the RC with your load testing framework to see if there's a memory regression with Thanos itself? |
@yutian1224 do you know if there might have been queries being executed in your cluster that could be touching the "hot data" in Receives? You said you rolled back, but the right edge of the chart still shows a trend upwards. How's the memory usage since you rolled back? |
@douglascamata As shown in the figure below, the memory usage before updating 0.31.0 and after rolling back is relatively stable. |
@fpetkovski I wont get a chance to do so this week due to other commitments but I can check next week. |
@philipgough I tested the instance without the |
@yutian1224 interesting, thanks for confirming. We were indeed running the RC without those limit flags. cc @fpetkovski |
I can test later this week, thanks for looking into it. |
Interesting, @fpetkovski I assume it then could be #6074, I just realized we only added those flags this release. Would be interesting to see a profile, where all the memory is being hogged. |
I enabled these flags in our staging environment but could not reproduce the described memory issue. @yutian1224 are you able to reproduce this problem consistently? |
@fpetkovski Yes, except for the first time, I also made a comparison before and after removing the limit flag, and this problem was also reproduced. |
3M series should not be that much data. Would you mind providing a heap profile when you reproduce the issue? You can get it by hitting the |
@fpetkovski Sure, I'll test it on the weekend |
Sounds good to me, thanks for checking @yutian1224 and @fpetkovski 👍 |
@fpetkovski I deployed 0.31.0 and enabled limit flags yesterday, the memory problem reappeared. |
FYI I took the profile and uploaded to this web visualization tool: https://flamegraph.com/share/7c78f5a0-cfa9-11ed-9b0d-d641223b6af4. I'm not sure what's the problem, but |
I wonder is there some contention caused by these low read limits on receivers that is effecting the ingestion path. @yutian1224 Can you confirm if the limits were being hit or not? Can you increase your previous limits * 100 and see if the problem remains? |
@philipgough I am pretty sure the limits were being hit. |
This is really interesting. Is the yellow line outgoing traffic and why is it negative? |
@fpetkovski |
Thanos, Prometheus and Golang version used:
Thanos: v0.31.0-rc.0
Object Storage Provider:
S3
What happened:

I went from 0.30.2 to the new version around 8 o'clock and noticed that the memory kept growing until I rolled back.
Args:
The text was updated successfully, but these errors were encountered: