Feedback Wanted: Prometheus benchmarking feedback #391

cevian · 2020-12-18T17:23:49Z

This is meant as a thread to share ideas of the best way to create an open, reproducible, and fair benchmarking suite for various long term stores in the Prometheus ecosystem.

Especially helpful would be pointers to open datasets with real-world prometheus data.

cevian · 2020-12-18T17:28:02Z

We do have a loader for prometheus data in tsbs https://github.com/timescale/tsbs/tree/master/cmd/tsbs_load_prometheus but we aren't quite happy with it for the following reasons:

It doesn't read directly from Prometheus TSDB blocks yet, which we'd prefer
We want to make the load act more like Prometheus remote_write queue managment.
We'd really like a real-world dataset instead of the synthetic one we use now

RichiH · 2020-12-18T18:21:41Z

https://docs.google.com/document/d/12DcuwrHnR-gVbWQq2Qq5UI2ZIq5CqS7W3OYALU6Bke8 might be relevant to your interests.

valyala · 2020-12-19T00:13:17Z

I'd propose creating a dataset from real time series data collected from the most widely used exporters such as node_exporter, which run under production load. Then this data could be ingested into various Prometheus-compatible remote storage systems at a fixed rate, while measuring resource usage during the benchmark (CPU, RAM, disk IO, disk space, network bandwidth). Something like in the following benchmarks:

Then queries from the corresponding Grafana dashboards could be extracted and issued to the tested systems in order to measure resource usage during query processing.

nethsix · 2021-01-22T05:13:05Z

If possible the benchmarks should include query performance, e.g., how long in seconds does a 1000 time-series that matches 2 labels over 6 hour time range with start/end time of 2021-01-01 00:00:00 to 2021-01-05 00:00:00, take?

This helps a devops person decide if promscale is a solution for his metrics collection/visualization.

To rephrase:

Given that we can calculate our metrics collection requirements, i.e., my system sends X unique time-series (each with average of Y labels), at every scrape interval of Zs, what would be the query performance like? Would using a bigger machine improve the query performance?

To quote a concrete example, in this article, (https://valyala.medium.com/promscale-vs-victoriametrics-resource-usage-on-production-workload-91c8e3786c03) quoted by @valyala, it is stated that the ingestion is 40000 samples/second, this translates 3.2B samples per day, which means that if the retention is 14 days, promscale will end up storing 44.B datapoints for 40000 time-series (assuming each sample is a time-series). Would promscale or @VictoriaMetrics be able to return queries fast enoough to be usable?

cevian added the user feedback wanted label Jul 9, 2021

ramonguiu removed the user feedback wanted label Dec 10, 2021

ramonguiu closed this as completed Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback Wanted: Prometheus benchmarking feedback #391

Feedback Wanted: Prometheus benchmarking feedback #391

cevian commented Dec 18, 2020

cevian commented Dec 18, 2020 •

edited

Loading

RichiH commented Dec 18, 2020

valyala commented Dec 19, 2020

nethsix commented Jan 22, 2021 •

edited

Loading

Feedback Wanted: Prometheus benchmarking feedback #391

Feedback Wanted: Prometheus benchmarking feedback #391

Comments

cevian commented Dec 18, 2020

cevian commented Dec 18, 2020 • edited Loading

RichiH commented Dec 18, 2020

valyala commented Dec 19, 2020

nethsix commented Jan 22, 2021 • edited Loading

cevian commented Dec 18, 2020 •

edited

Loading

nethsix commented Jan 22, 2021 •

edited

Loading