-
Notifications
You must be signed in to change notification settings - Fork 169
Feedback Wanted: Prometheus benchmarking feedback #391
Comments
We do have a loader for prometheus data in tsbs https://github.com/timescale/tsbs/tree/master/cmd/tsbs_load_prometheus but we aren't quite happy with it for the following reasons:
|
https://docs.google.com/document/d/12DcuwrHnR-gVbWQq2Qq5UI2ZIq5CqS7W3OYALU6Bke8 might be relevant to your interests. |
I'd propose creating a dataset from real time series data collected from the most widely used exporters such as node_exporter, which run under production load. Then this data could be ingested into various Prometheus-compatible remote storage systems at a fixed rate, while measuring resource usage during the benchmark (CPU, RAM, disk IO, disk space, network bandwidth). Something like in the following benchmarks:
Then queries from the corresponding Grafana dashboards could be extracted and issued to the tested systems in order to measure resource usage during query processing. |
If possible the benchmarks should include query performance, e.g., how long in seconds does a 1000 time-series that matches 2 labels over 6 hour time range with start/end time of 2021-01-01 00:00:00 to 2021-01-05 00:00:00, take? This helps a devops person decide if promscale is a solution for his metrics collection/visualization. To rephrase: Given that we can calculate our metrics collection requirements, i.e., my system sends X unique time-series (each with average of Y labels), at every scrape interval of Zs, what would be the query performance like? Would using a bigger machine improve the query performance? To quote a concrete example, in this article, (https://valyala.medium.com/promscale-vs-victoriametrics-resource-usage-on-production-workload-91c8e3786c03) quoted by @valyala, it is stated that the ingestion is 40000 samples/second, this translates 3.2B samples per day, which means that if the retention is 14 days, promscale will end up storing 44.B datapoints for 40000 time-series (assuming each sample is a time-series). Would promscale or @VictoriaMetrics be able to return queries fast enoough to be usable? |
This is meant as a thread to share ideas of the best way to create an open, reproducible, and fair benchmarking suite for various long term stores in the Prometheus ecosystem.
Especially helpful would be pointers to open datasets with real-world prometheus data.
The text was updated successfully, but these errors were encountered: