Skip to content
This repository has been archived by the owner on Apr 2, 2024. It is now read-only.

Feedback Wanted: Prometheus benchmarking feedback #391

Closed
cevian opened this issue Dec 18, 2020 · 4 comments
Closed

Feedback Wanted: Prometheus benchmarking feedback #391

cevian opened this issue Dec 18, 2020 · 4 comments

Comments

@cevian
Copy link
Contributor

cevian commented Dec 18, 2020

This is meant as a thread to share ideas of the best way to create an open, reproducible, and fair benchmarking suite for various long term stores in the Prometheus ecosystem.

Especially helpful would be pointers to open datasets with real-world prometheus data.

@cevian
Copy link
Contributor Author

cevian commented Dec 18, 2020

We do have a loader for prometheus data in tsbs https://github.com/timescale/tsbs/tree/master/cmd/tsbs_load_prometheus but we aren't quite happy with it for the following reasons:

  • It doesn't read directly from Prometheus TSDB blocks yet, which we'd prefer
  • We want to make the load act more like Prometheus remote_write queue managment.
  • We'd really like a real-world dataset instead of the synthetic one we use now

@RichiH
Copy link

RichiH commented Dec 18, 2020

https://docs.google.com/document/d/12DcuwrHnR-gVbWQq2Qq5UI2ZIq5CqS7W3OYALU6Bke8 might be relevant to your interests.

@valyala
Copy link

valyala commented Dec 19, 2020

I'd propose creating a dataset from real time series data collected from the most widely used exporters such as node_exporter, which run under production load. Then this data could be ingested into various Prometheus-compatible remote storage systems at a fixed rate, while measuring resource usage during the benchmark (CPU, RAM, disk IO, disk space, network bandwidth). Something like in the following benchmarks:

Then queries from the corresponding Grafana dashboards could be extracted and issued to the tested systems in order to measure resource usage during query processing.

@nethsix
Copy link

nethsix commented Jan 22, 2021

If possible the benchmarks should include query performance, e.g., how long in seconds does a 1000 time-series that matches 2 labels over 6 hour time range with start/end time of 2021-01-01 00:00:00 to 2021-01-05 00:00:00, take?

This helps a devops person decide if promscale is a solution for his metrics collection/visualization.

To rephrase:

Given that we can calculate our metrics collection requirements, i.e., my system sends X unique time-series (each with average of Y labels), at every scrape interval of Zs, what would be the query performance like? Would using a bigger machine improve the query performance?

To quote a concrete example, in this article, (https://valyala.medium.com/promscale-vs-victoriametrics-resource-usage-on-production-workload-91c8e3786c03) quoted by @valyala, it is stated that the ingestion is 40000 samples/second, this translates 3.2B samples per day, which means that if the retention is 14 days, promscale will end up storing 44.B datapoints for 40000 time-series (assuming each sample is a time-series). Would promscale or @VictoriaMetrics be able to return queries fast enoough to be usable?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants