Limit data bytes fetched per query and per store #5750

yeya24 · 2022-10-02T18:10:05Z

Is your proposal related to a problem?

In a multi-tenant environment, Thanos Querier is easy to get OOM killed without any limits/protection.

We have some limits in place now like series limits, but since the series labels have different sizes, probably we might need other ways for limiting.

Limit the data bytes fetched for the whole query or even at each store level could be a way to try.

Describe the solution you'd like

Cortex has a PR cortexproject/cortex#4854 which implements this feature.

douglascamata · 2023-04-06T16:48:28Z

I think after the work done by @fpetkovski on #6074, this can be closed. WDYT, @yeya24?

yeya24 · 2023-04-06T18:46:18Z

I see thanks @douglascamata. I think that covers the per store limits. But we still have the per query limits, right? Do we have plan to support that?
Also data bytes fetched is another limit for number of limits fetched, different from number of series and chunks limit.

fpetkovski · 2023-04-07T05:59:58Z

But we still have the per query limits, right?

You are right, the querier will apply the limits only when it acts as a store through gRPC, but not when it executes queries. For that we might need to add limiters to the store proxy.

Could extending the existing wrapper work for adding a bytes limit too?

douglascamata · 2023-04-14T20:42:56Z

For that we might need to add limiters to the store proxy.

What if we wrap the store proxy with the instrumented store server wrapper? 🤔

Could extending the existing wrapper work for adding a bytes limit too?

I think we could add this one to the instrumented store server and get it for free everywhere. 🙌

yylbfyl · 2024-08-27T09:02:56Z

Hello everyone , I'm facing this problem now.
If a query sent to the Thanos Query component returns a large amount of data, a sudden influx of data can cause the Thanos Query component to experience an OOM (Out of Memory) error. Therefore, it would be helpful to introduce a parameter that limits the size of query responses.
Alternatively, is there a way to evaluate the maximum amount of data that can be queried from a Prometheus instance? This way, we can estimate the maximum data that all Prometheus instances and sidecars can respond to in advance, and then increase the memory of the Thanos Query nodes to an appropriate size.

yeya24 · 2024-08-28T03:51:12Z

@yylbfyl Have you tried the per store limits? If not, maybe you can give it a try and probably it already help fix your issue.
Actual query limits need to be implemented separately. To prevent OOM kill, you can also try setting GOMEMLIMIT in Querier

yylbfyl · 2024-08-29T08:44:45Z

Hi @yeya24 , thanks for your advice. As store query from S3 and it is very slow , we already remove the store component.
I have set up the GOMEMLIMIT in query, hope it will improve.

yylbfyl · 2024-09-01T01:57:49Z

@yeya24 Additionally, I have a question: If the memory usage exceeds the GOMEMLIMIT, what would be the result? Will it cause the Thanos Query component to restart automatically?

yeya24 · 2024-09-03T01:32:02Z

@yylbfyl So assuming you are using Kubernetes, Thanos Query gets restarted only happened when memory usage exceeds the pod memory limit (OOM kill). GOMEMLIMIT is another limit used by Go runtime. Go runtime will do best effort to not exceed GOMEMLIMIT by adjusting GC behaviors but it cannot ensure that.

You can find more https://pkg.go.dev/runtime

yeya24 added the component: query label Oct 2, 2022

matej-g added the feature request/improvement label Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit data bytes fetched per query and per store #5750

Limit data bytes fetched per query and per store #5750

yeya24 commented Oct 2, 2022

douglascamata commented Apr 6, 2023

yeya24 commented Apr 6, 2023

fpetkovski commented Apr 7, 2023 •

edited

Loading

douglascamata commented Apr 14, 2023

yylbfyl commented Aug 27, 2024

yeya24 commented Aug 28, 2024

yylbfyl commented Aug 29, 2024

yylbfyl commented Sep 1, 2024

yeya24 commented Sep 3, 2024

Limit data bytes fetched per query and per store #5750

Limit data bytes fetched per query and per store #5750

Comments

yeya24 commented Oct 2, 2022

Is your proposal related to a problem?

Describe the solution you'd like

douglascamata commented Apr 6, 2023

yeya24 commented Apr 6, 2023

fpetkovski commented Apr 7, 2023 • edited Loading

douglascamata commented Apr 14, 2023

yylbfyl commented Aug 27, 2024

yeya24 commented Aug 28, 2024

yylbfyl commented Aug 29, 2024

yylbfyl commented Sep 1, 2024

yeya24 commented Sep 3, 2024

fpetkovski commented Apr 7, 2023 •

edited

Loading