Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit data bytes fetched per query and per store #5750

Open
yeya24 opened this issue Oct 2, 2022 · 9 comments
Open

Limit data bytes fetched per query and per store #5750

yeya24 opened this issue Oct 2, 2022 · 9 comments

Comments

@yeya24
Copy link
Contributor

yeya24 commented Oct 2, 2022

Is your proposal related to a problem?

In a multi-tenant environment, Thanos Querier is easy to get OOM killed without any limits/protection.

We have some limits in place now like series limits, but since the series labels have different sizes, probably we might need other ways for limiting.

Limit the data bytes fetched for the whole query or even at each store level could be a way to try.

Describe the solution you'd like

Cortex has a PR cortexproject/cortex#4854 which implements this feature.

@douglascamata
Copy link
Contributor

I think after the work done by @fpetkovski on #6074, this can be closed. WDYT, @yeya24?

@yeya24
Copy link
Contributor Author

yeya24 commented Apr 6, 2023

I see thanks @douglascamata. I think that covers the per store limits. But we still have the per query limits, right? Do we have plan to support that?
Also data bytes fetched is another limit for number of limits fetched, different from number of series and chunks limit.

@fpetkovski
Copy link
Contributor

fpetkovski commented Apr 7, 2023

But we still have the per query limits, right?

You are right, the querier will apply the limits only when it acts as a store through gRPC, but not when it executes queries. For that we might need to add limiters to the store proxy.

Could extending the existing wrapper work for adding a bytes limit too?

@douglascamata
Copy link
Contributor

For that we might need to add limiters to the store proxy.

What if we wrap the store proxy with the instrumented store server wrapper? 🤔

Could extending the existing wrapper work for adding a bytes limit too?

I think we could add this one to the instrumented store server and get it for free everywhere. 🙌

@yylbfyl
Copy link

yylbfyl commented Aug 27, 2024

Hello everyone , I'm facing this problem now.
If a query sent to the Thanos Query component returns a large amount of data, a sudden influx of data can cause the Thanos Query component to experience an OOM (Out of Memory) error. Therefore, it would be helpful to introduce a parameter that limits the size of query responses.
Alternatively, is there a way to evaluate the maximum amount of data that can be queried from a Prometheus instance? This way, we can estimate the maximum data that all Prometheus instances and sidecars can respond to in advance, and then increase the memory of the Thanos Query nodes to an appropriate size.

@yeya24
Copy link
Contributor Author

yeya24 commented Aug 28, 2024

@yylbfyl Have you tried the per store limits? If not, maybe you can give it a try and probably it already help fix your issue.
Actual query limits need to be implemented separately. To prevent OOM kill, you can also try setting GOMEMLIMIT in Querier

@yylbfyl
Copy link

yylbfyl commented Aug 29, 2024

Hi @yeya24 , thanks for your advice. As store query from S3 and it is very slow , we already remove the store component.
I have set up the GOMEMLIMIT in query, hope it will improve.

@yylbfyl
Copy link

yylbfyl commented Sep 1, 2024

@yeya24 Additionally, I have a question: If the memory usage exceeds the GOMEMLIMIT, what would be the result? Will it cause the Thanos Query component to restart automatically?

@yeya24
Copy link
Contributor Author

yeya24 commented Sep 3, 2024

@yylbfyl So assuming you are using Kubernetes, Thanos Query gets restarted only happened when memory usage exceeds the pod memory limit (OOM kill). GOMEMLIMIT is another limit used by Go runtime. Go runtime will do best effort to not exceed GOMEMLIMIT by adjusting GC behaviors but it cannot ensure that.

You can find more https://pkg.go.dev/runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants