diff --git a/README.md b/README.md index d27b7c981..5824b4491 100644 --- a/README.md +++ b/README.md @@ -129,7 +129,7 @@ Variables needed: 1. the same use case, seed, # of devices, and start time as used in data generation 1. an end time that is one second after the end time from data generation. E.g., for `2016-01-04T00:00:00Z` use `2016-01-04T00:00:01Z` 1. the number of queries to generate. E.g., `1000` -1. and the type of query you'd like to generate. E.g., `1-host-1-hr` +1. and the type of query you'd like to generate. E.g., `single-groupby(1,1,1)` For the last step there are numerous queries to choose from, which are listed in [Appendix I](#appendix-i-query-types). Additionally, the file @@ -143,8 +143,8 @@ For generating just one set of queries for a given type: $ tsbs_generate_queries -seed=123 -scale-var=4000 \ -timestamp-start="2016-01-01T00:00:00Z" \ -timestamp-end="2016-01-04T00:00:01Z" \ - -queries=1000 -query-type="1-host-1-hr" -format="timescaledb" \ - | gzip > /tmp/timescaledb-queries-1-host-1-hr.gz + -queries=1000 -query-type="single-groupby(1,1,1)" -format="timescaledb" \ + | gzip > /tmp/timescaledb-queries-single-groupby(1,1,1).gz ``` For generating sets of queries for multiple types: @@ -152,7 +152,7 @@ For generating sets of queries for multiple types: $ formats="timescaledb" scaleVar=4000 seed=123 \ tsStart="2016-01-01T00:00:00Z" \ tsEnd="2016-01-04T00:00:01Z" \ - queries=1000 queryTypes="1-host-1-hr 1-host-12-hr groupby" \ + queries=1000 queryTypes="single-groupby(1,1,1) single-groupby(1,1,12) double-groupby(1)" \ dataDir="/tmp" script/generate_queries.sh ``` @@ -257,10 +257,10 @@ to run multiple query types in a row. The queries it generates should be put in a file with one query per line and the path given to the script. For example, if you had a file named `queries.txt` that looked like this: ```text -high-cpu-and-field -cpu-max-all-eight-hosts +high-cpu(1) +cpu-max-all(8) groupby-orderby-limit -groupby +double-groupby(1) ``` You could generate a run script named `query_test.sh`: @@ -275,13 +275,13 @@ And the resulting script file would look like: ```bash #!/bin/bash # Queries -cat /tmp/queries/timescaledb-high-cpu-and-field-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-high-cpu-and-field-queries.out +cat /tmp/queries/timescaledb-high-cpu(1)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-high-cpu(1)-queries.out -cat /tmp/queries/timescaledb-cpu-max-all-eight-hosts-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-cpu-max-all-eight-hosts-queries.out +cat /tmp/queries/timescaledb-cpu-max-all(8)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-cpu-max-all(8)-queries.out -cat /tmp/queries/timescaledb-groupby-orderby-limit-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-orderby-limit-queries.out +cat /tmp/queries/timescaledb-groupby-orderby-limit-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-orderby-limit-queries.out -cat /tmp/queries/timescaledb-groupby-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-queries.out +cat /tmp/queries/timescaledb-double-groupby(1)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-double-groupby(1)-queries.out ``` ### Query validation (optional) @@ -296,18 +296,18 @@ the results. ### Devops / cpu-only |Query type|Description| |:---|:---| -|1-host-1-hr| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour -|1-host-12-hr| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours -|8-host-1-hr| Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour -|5-metrics-1-host-1-hr| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour -|5-metrics-1-host-12-hr| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours -|5-metrics-8-host-1-hr| Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour -|cpu-max-all-single-host| Aggregate across all CPU metrics per hour over 1 hour for a single host -|cpu-max-all-eight-hosts| Aggregate across all CPU metrics per hour over 1 hour for eight hosts -|groupby| Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours -|groupby-5| Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours -|groupby-all| Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours -|high-cpu-all-hosts| All the readings where one metric is above a threshold across all hosts -|high-cpu-1-host| All the readings where one metric is above a threshold for a particular host +|single-groupby(1,1,1)| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour +|single-groupby(1,1,12)| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours +|single-groupby(1,8,1)| Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour +|single-groupby(5,1,1)| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour +|single-groupby(5,1,12)| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours +|single-groupby(5,8,1)| Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour +|cpu-max-all(1)| Aggregate across all CPU metrics per hour over 1 hour for a single host +|cpu-max-all(8)| Aggregate across all CPU metrics per hour over 1 hour for eight hosts +|double-groupby(1)| Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours +|double-groupby(5)| Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours +|double-groupby(all)| Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours +|high-cpu(all)| All the readings where one metric is above a threshold across all hosts +|high-cpu(1)| All the readings where one metric is above a threshold for a particular host |lastpoint| The last reading for each host |groupby-orderby-limit| The last 5 aggregate readings (across time) before a randomly chosen endpoint diff --git a/cmd/tsbs_generate_queries/cassandra_devops_common.go b/cmd/tsbs_generate_queries/cassandra_devops_common.go index ea4ceaa9f..b50037ced 100644 --- a/cmd/tsbs_generate_queries/cassandra_devops_common.go +++ b/cmd/tsbs_generate_queries/cassandra_devops_common.go @@ -76,7 +76,7 @@ func (d *CassandraDevops) GroupByTime(qi query.Query, nHosts, numMetrics int, ti func (d *CassandraDevops) GroupByOrderByLimit(qi query.Query) { interval := d.interval.RandWindow(time.Hour) - humanLabel := "Cassandra max cpu over last 5 min-intervals (rand end)" + humanLabel := "Cassandra max cpu over last 5 min-intervals (random end)" q := qi.(*query.Cassandra) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, d.interval.StartString())) @@ -103,7 +103,7 @@ func (d *CassandraDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics in interval := d.interval.RandWindow(doubleGroupByDuration) metrics := getCPUMetricsSlice(numMetrics) - humanLabel := fmt.Sprintf("Cassandra mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics) + humanLabel := getDoubleGroupByLabel("Cassandra", numMetrics) q := qi.(*query.Cassandra) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) @@ -124,14 +124,14 @@ func (d *CassandraDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics in // FROM cpu WHERE (hostname = '$HOSTNAME_1' OR ... OR hostname = '$HOSTNAME_N') // AND time >= '$HOUR_START' AND time < '$HOUR_END' // GROUP BY hour ORDER BY hour -func (d *CassandraDevops) MaxAllCPU(qi query.Query, nhosts int) { - interval := d.interval.RandWindow(8 * time.Hour) - tagSet := d.getHostWhere(nhosts) +func (d *CassandraDevops) MaxAllCPU(qi query.Query, nHosts int) { + interval := d.interval.RandWindow(maxAllDuration) + tagSet := d.getHostWhere(nHosts) tagSets := [][]string{} tagSets = append(tagSets, tagSet) - humanLabel := fmt.Sprintf("Cassandra max cpu all fields, rand %4d hosts, rand 12hr by 1h", nhosts) + humanLabel := getMaxAllLabel("Cassandra", nHosts) q := qi.(*query.Cassandra) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) diff --git a/cmd/tsbs_generate_queries/devops.go b/cmd/tsbs_generate_queries/devops.go index f5bd694d0..516b8861f 100644 --- a/cmd/tsbs_generate_queries/devops.go +++ b/cmd/tsbs_generate_queries/devops.go @@ -14,6 +14,14 @@ const ( allHosts = "all hosts" doubleGroupByDuration = 24 * time.Hour highCPUDuration = 24 * time.Hour + maxAllDuration = 8 * time.Hour + + labelSingleGroupby = "single-groupby" + labelDoubleGroupby = "double-groupby" + labelLastpoint = "lastpoint" + labelMaxAll = "cpu-max-all" + labelGroupbyOrderbyLimit = "groupby-orderby-limit" + labelHighCPU = "high-cpu" ) type devopsCore struct { @@ -61,17 +69,6 @@ func getCPUMetricsSlice(numMetrics int) []string { return cpuMetrics[:numMetrics] } -// Devops describes a devops query generator. -type Devops interface { - CPU5Metrics(query.Query, int, int, time.Duration) - GroupByOrderByLimit(query.Query) - HighCPUForHosts(query.Query, int, int) - LastPointPerHost(query.Query) - MaxAllCPU(query.Query, int, int) - MaxCPUUsageHourByMinute(query.Query, int, int, time.Duration) - MeanCPUMetricsDayByHourAllHostsGroupbyHost(query.Query, int) -} - // SingleGroupbyFiller is a type that can fill in a single groupby query type SingleGroupbyFiller interface { GroupByTime(query.Query, int, int, time.Duration) @@ -102,6 +99,10 @@ type HighCPUFiller interface { HighCPUForHosts(query.Query, int) } +func getDoubleGroupByLabel(dbName string, numMetrics int) string { + return fmt.Sprintf("%s mean of %d metrics, all hosts, random %s by 1hr", dbName, numMetrics, doubleGroupByDuration) +} + func getHighCPULabel(dbName string, nHosts int) string { label := dbName + " CPU over threshold, " if nHosts > 0 { @@ -112,12 +113,16 @@ func getHighCPULabel(dbName string, nHosts int) string { return label } -func getRandomHosts(scale, nhosts int) []string { - if nhosts > scale { - log.Fatal("nhosts > scaleVar") +func getMaxAllLabel(dbName string, nHosts int) string { + return fmt.Sprintf("%s max of all CPU fields, random %4d hosts, random %s by 1h", dbName, nHosts, maxAllDuration) +} + +func getRandomHosts(scale, nHosts int) []string { + if nHosts > scale { + log.Fatalf("number of hosts (%d) larger than --scale-var (%d)", nHosts, scale) } - nn := rand.Perm(scale)[:nhosts] + nn := rand.Perm(scale)[:nHosts] hostnames := []string{} for _, n := range nn { diff --git a/cmd/tsbs_generate_queries/influx_devops_common.go b/cmd/tsbs_generate_queries/influx_devops_common.go index dc4485077..de3c118a1 100644 --- a/cmd/tsbs_generate_queries/influx_devops_common.go +++ b/cmd/tsbs_generate_queries/influx_devops_common.go @@ -88,7 +88,7 @@ func (d *InfluxDevops) GroupByOrderByLimit(qi query.Query) { v := url.Values{} v.Set("q", fmt.Sprintf(`SELECT max(usage_user) from cpu %s group by time(1m) limit 5`, where)) - humanLabel := "Influx max cpu over last 5 min-intervals (rand end)" + humanLabel := "Influx max cpu over last 5 min-intervals (random end)" q := qi.(*query.HTTP) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) @@ -116,7 +116,7 @@ func (d *InfluxDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int) v := url.Values{} v.Set("q", fmt.Sprintf("SELECT %s from cpu where time >= '%s' and time < '%s' group by time(1h),hostname", strings.Join(selectClauses, ", "), interval.StartString(), interval.EndString())) - humanLabel := fmt.Sprintf("Influx mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics) + humanLabel := getDoubleGroupByLabel("Influx", numMetrics) q := qi.(*query.HTTP) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) @@ -132,14 +132,14 @@ func (d *InfluxDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int) // FROM cpu WHERE (hostname = '$HOSTNAME_1' OR ... OR hostname = '$HOSTNAME_N') // AND time >= '$HOUR_START' AND time < '$HOUR_END' // GROUP BY hour ORDER BY hour -func (d *InfluxDevops) MaxAllCPU(qi query.Query, nhosts int) { - interval := d.interval.RandWindow(8 * time.Hour) - whereHosts := d.getHostWhereString(nhosts) +func (d *InfluxDevops) MaxAllCPU(qi query.Query, nHosts int) { + interval := d.interval.RandWindow(maxAllDuration) + whereHosts := d.getHostWhereString(nHosts) v := url.Values{} v.Set("q", fmt.Sprintf("SELECT max(usage_user),max(usage_system),max(usage_idle),max(usage_nice),max(usage_iowait),max(usage_irq),max(usage_softirq),max(usage_steal),max(usage_guest),max(usage_guest_nice) from cpu where %s and time >= '%s' and time < '%s' group by time(1m)", whereHosts, interval.StartString(), interval.EndString())) - humanLabel := fmt.Sprintf("Influx max cpu all fields, rand %4d hosts, rand 12hr by 1m", nhosts) + humanLabel := getMaxAllLabel("Influx", nHosts) q := qi.(*query.HTTP) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) diff --git a/cmd/tsbs_generate_queries/main.go b/cmd/tsbs_generate_queries/main.go index 12bb24055..3b61655e0 100644 --- a/cmd/tsbs_generate_queries/main.go +++ b/cmd/tsbs_generate_queries/main.go @@ -16,21 +16,21 @@ import ( var useCaseMatrix = map[string]map[string]QueryFillerMaker{ "devops": { - "1-host-1-hr": NewDevopsSingleGroupby(1, 1, 1), - "1-host-12-hr": NewDevopsSingleGroupby(1, 1, 12), - "8-host-1-hr": NewDevopsSingleGroupby(1, 8, 1), - "5-metrics-1-host-1-hr": NewDevopsSingleGroupby(5, 1, 1), - "5-metrics-1-host-12-hr": NewDevopsSingleGroupby(5, 1, 12), - "5-metrics-8-host-1-hr": NewDevopsSingleGroupby(5, 8, 1), - "cpu-max-all-single-host": NewDevopsMaxAllCPU(1), - "cpu-max-all-eight-hosts": NewDevopsMaxAllCPU(1), - "groupby": NewDevopsGroupBy(1), - "groupby-5": NewDevopsGroupBy(5), - "groupby-all": NewDevopsGroupBy(len(cpuMetrics)), - "groupby-orderby-limit": NewDevopsGroupByOrderByLimit, - "high-cpu-all-hosts": NewDevopsHighCPU(0), - "high-cpu-1-host": NewDevopsHighCPU(1), - "lastpoint": NewDevopsLastPointPerHost, + labelSingleGroupby + "(1,1,1)": NewDevopsSingleGroupby(1, 1, 1), + labelSingleGroupby + "(1,1,12)": NewDevopsSingleGroupby(1, 1, 12), + labelSingleGroupby + "(1,8,1)": NewDevopsSingleGroupby(1, 8, 1), + labelSingleGroupby + "(5,1,1)": NewDevopsSingleGroupby(5, 1, 1), + labelSingleGroupby + "(5,1,12)": NewDevopsSingleGroupby(5, 1, 12), + labelSingleGroupby + "(5,8,1)": NewDevopsSingleGroupby(5, 8, 1), + labelMaxAll + "(1)": NewDevopsMaxAllCPU(1), + labelMaxAll + "(8)": NewDevopsMaxAllCPU(1), + labelDoubleGroupby + "(1)": NewDevopsGroupBy(1), + labelDoubleGroupby + "(5)": NewDevopsGroupBy(5), + labelDoubleGroupby + "(all)": NewDevopsGroupBy(len(cpuMetrics)), + labelGroupbyOrderbyLimit: NewDevopsGroupByOrderByLimit, + labelHighCPU + "(all)": NewDevopsHighCPU(0), + labelHighCPU + "(1)": NewDevopsHighCPU(1), + labelLastpoint: NewDevopsLastPointPerHost, }, } @@ -97,8 +97,8 @@ func init() { flag.IntVar(&scaleVar, "scale-var", 1, "Scaling variable (must be the equal to the scalevar used for data generation).") flag.IntVar(&queryCount, "queries", 1000, "Number of queries to generate.") - flag.BoolVar(×caleUseJSON, "timescale-use-json", false, "Use separate JSON tags table when querying") - flag.BoolVar(×caleUseTags, "timescale-use-tags", true, "Use separate tags table when querying") + flag.BoolVar(×caleUseJSON, "timescale-use-json", false, "TimescaleDB only: Use separate JSON tags table when querying") + flag.BoolVar(×caleUseTags, "timescale-use-tags", true, "TimescaleDB only: Use separate tags table when querying") flag.StringVar(×tampStartStr, "timestamp-start", "2016-01-01T00:00:00Z", "Beginning timestamp (RFC3339).") flag.StringVar(×tampEndStr, "timestamp-end", "2016-01-02T06:00:00Z", "Ending timestamp (RFC3339).") diff --git a/cmd/tsbs_generate_queries/mongo_devops_common.go b/cmd/tsbs_generate_queries/mongo_devops_common.go index 067fcd349..812b5ad48 100644 --- a/cmd/tsbs_generate_queries/mongo_devops_common.go +++ b/cmd/tsbs_generate_queries/mongo_devops_common.go @@ -167,7 +167,7 @@ func (d *MongoDevops) GroupByTime(qi query.Query, nHosts, numMetrics int, timeRa // AND time >= '$HOUR_START' AND time < '$HOUR_END' // GROUP BY hour ORDER BY hour func (d *MongoDevops) MaxAllCPU(qi query.Query, nHosts int) { - interval := d.interval.RandWindow(8 * time.Hour) + interval := d.interval.RandWindow(maxAllDuration) hostnames := d.getRandomHosts(nHosts) docs := getTimeFilterDocs(interval) bucketNano := time.Hour.Nanoseconds() @@ -219,7 +219,7 @@ func (d *MongoDevops) MaxAllCPU(qi query.Query, nHosts int) { pipelineQuery = append(pipelineQuery, group) pipelineQuery = append(pipelineQuery, bson.M{"$sort": bson.M{"_id": 1}}) - humanLabel := fmt.Sprintf("Mongo max cpu all fields, rand %4d hosts, rand 8hr by 1h", nHosts) + humanLabel := getMaxAllLabel("Mongo", nHosts) q := qi.(*query.Mongo) q.HumanLabel = []byte(humanLabel) q.BsonDoc = pipelineQuery @@ -298,7 +298,7 @@ func (d *MongoDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int) { {"$sort": bson.M{"_id.time": 1}}, }...) - humanLabel := fmt.Sprintf("Mongo mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics) + humanLabel := getDoubleGroupByLabel("Mongo", numMetrics) q := qi.(*query.Mongo) q.HumanLabel = []byte(humanLabel) q.BsonDoc = pipelineQuery @@ -492,7 +492,7 @@ func (d *MongoDevops) GroupByOrderByLimit(qi query.Query) { {"$limit": 5}, }...) - humanLabel := "Mongo max cpu over last 5 min-intervals (rand end)" + humanLabel := "Mongo max cpu over last 5 min-intervals (random end)" q := qi.(*query.Mongo) q.HumanLabel = []byte(humanLabel) q.BsonDoc = pipelineQuery diff --git a/cmd/tsbs_generate_queries/mongo_naive_devops_common.go b/cmd/tsbs_generate_queries/mongo_naive_devops_common.go index 9632ae7f1..14e5a74e8 100644 --- a/cmd/tsbs_generate_queries/mongo_naive_devops_common.go +++ b/cmd/tsbs_generate_queries/mongo_naive_devops_common.go @@ -157,7 +157,7 @@ func (d *MongoNaiveDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics i }...) pipelineQuery = append(pipelineQuery, bson.M{"$sort": bson.M{"_id.time": 1, "_id.hostname": 1}}) - humanLabel := fmt.Sprintf("Mongo [NAIVE] mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics) + humanLabel := getDoubleGroupByLabel("Mongo [NAIVE]", numMetrics) q := qi.(*query.Mongo) q.HumanLabel = []byte(humanLabel) q.BsonDoc = pipelineQuery diff --git a/cmd/tsbs_generate_queries/timescaledb_devops_common.go b/cmd/tsbs_generate_queries/timescaledb_devops_common.go index fcabb7e83..e99b3d099 100644 --- a/cmd/tsbs_generate_queries/timescaledb_devops_common.go +++ b/cmd/tsbs_generate_queries/timescaledb_devops_common.go @@ -108,7 +108,7 @@ func (d *TimescaleDBDevops) GroupByOrderByLimit(qi query.Query) { sqlQuery := fmt.Sprintf(`SELECT date_trunc('minute', time) AS minute, max(usage_user) FROM cpu %s GROUP BY minute ORDER BY minute DESC LIMIT 5`, where) - humanLabel := "TimescaleDB max cpu over last 5 min-intervals (rand end)" + humanLabel := "TimescaleDB max cpu over last 5 min-intervals (random end)" q := qi.(*query.TimescaleDB) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.EndString())) @@ -161,7 +161,7 @@ func (d *TimescaleDBDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics interval.Start.Format(goTimeFmt), interval.End.Format(goTimeFmt), hostnameField, strings.Join(meanClauses, ", "), joinStr, hostnameField) - humanLabel := fmt.Sprintf("TimescaleDB mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics) + humanLabel := getDoubleGroupByLabel("TimescaleDB", numMetrics) q := qi.(*query.TimescaleDB) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) @@ -176,8 +176,8 @@ func (d *TimescaleDBDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics // FROM cpu WHERE (hostname = '$HOSTNAME_1' OR ... OR hostname = '$HOSTNAME_N') // AND time >= '$HOUR_START' AND time < '$HOUR_END' // GROUP BY hour ORDER BY hour -func (d *TimescaleDBDevops) MaxAllCPU(qi query.Query, nhosts int) { - interval := d.interval.RandWindow(8 * time.Hour) +func (d *TimescaleDBDevops) MaxAllCPU(qi query.Query, nHosts int) { + interval := d.interval.RandWindow(maxAllDuration) metrics := getCPUMetricsSlice(len(cpuMetrics)) selectClauses := d.getSelectClausesAggMetrics("max", metrics) @@ -187,10 +187,10 @@ func (d *TimescaleDBDevops) MaxAllCPU(qi query.Query, nhosts int) { WHERE %s AND time >= '%s' AND time < '%s' GROUP BY hour ORDER BY hour`, strings.Join(selectClauses, ", "), - d.getHostWhereString(nhosts), + d.getHostWhereString(nHosts), interval.Start.Format(goTimeFmt), interval.End.Format(goTimeFmt)) - humanLabel := fmt.Sprintf("TimescaleDB max cpu all fields, rand %4d hosts, rand 8hr by 1h", nhosts) + humanLabel := getMaxAllLabel("TimescaleDB", nHosts) q := qi.(*query.TimescaleDB) q.HumanLabel = []byte(humanLabel) q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString())) diff --git a/scripts/generate_data.sh b/scripts/generate_data.sh index c89fa149f..2e3cfd2ce 100755 --- a/scripts/generate_data.sh +++ b/scripts/generate_data.sh @@ -23,22 +23,4 @@ do $binName -format $format -scale-var $scaleVar -seed $seed -timestamp-end $tsEnd -timestamp-start $tsStart -log-interval $logInterval -use-case $useCase | gzip > $fname ln -s $fname ${format}-data.gz fi - # or do whatever with individual element of the array done - - - # -format string - # Format to emit. (choices: influx-bulk, es-bulk, cassandra, timescaledb) (default "influx-bulk") - - # -log-interval duration - # Duration between host data points (default 10s) - # -scale-var int - # Scaling variable specific to the use case. (default 1) - # -seed int - # PRNG seed (default, or 0, uses the current timestamp). - # -timestamp-end string - # Ending timestamp (RFC3339). (default "2016-01-02T06:00:00Z") - # -timestamp-start string - # Beginning timestamp (RFC3339). (default "2016-01-01T00:00:00Z") - # -use-case string - # Use case to model. (choices: devops, cpu-only) (default "devops") diff --git a/scripts/generate_queries.sh b/scripts/generate_queries.sh index 9509a5153..761e19ba7 100755 --- a/scripts/generate_queries.sh +++ b/scripts/generate_queries.sh @@ -7,7 +7,7 @@ useJson=${useJson:-false} useTags=${useTags:-true} formats=${formats:-"timescaledb"} -queryTypes=${queryTypes:-"1-host-1-hr 1-host-12-hr 8-host-1-hr groupby groupby-5 groupby-all lastpoint cpu-max-all-single-host cpu-max-all-eight-hosts high-cpu-all-hosts high-cpu-1-host groupby-orderby-limit 5-metrics-1-host-1-hr 5-metrics-1-host-12-hr 5-metrics-8-host-1-hr"} +queryTypes=${queryTypes:-"single-groupby(1,1,1) single-groupby(1,1,12) single-groupby(1,8,1) single-groupby(5,1,1) single-groupby(5,1,12) single-groupby(5,8,1) double-groupby(1) double-groupby(5) double-groupby(all) cpu-max-all(1) cpu-max-all(8) high-cpu(all) high-cpu(1) lastpoint groupby-orderby-limit"} scaleVar=${scaleVar:-"4000"} queries=${queries:-"1000"} diff --git a/scripts/generate_run_script.py b/scripts/generate_run_script.py index 99101c951..85e4f3173 100644 --- a/scripts/generate_run_script.py +++ b/scripts/generate_run_script.py @@ -31,9 +31,9 @@ EXAMPLE: queries.txt: -#1-host-12-hr -5-metrics-1-host-1-hr -5-metrics-1-host-12-hr +#single-groupby(1,1,1) +single-groupby(5,1,1) +single-groupby(5,1,12) Command: python generate_query_run -d timescaledb -w 8 @@ -44,9 +44,9 @@ NUM_WORKERS=8 BULK_DATA_DIR=/tmp DATABASE_HOST=localhost BATCH_SIZE=10000 ./load_timescaledb.sh | tee load_timescaledb_8_10000.out # Queries -cat /tmp/queries/timescaledb-5-metrics-1-host-1-hr-queries.gz | gunzip | tsbs_run_queries_timescaledb -workers 8 -limit 1000 -postgres "host=localhost user=postgres sslmode=disable timescaledb.disable_optimizations=false" | tee query_timescaledb_timescaledb-5-metrics-1-host-1-hr-queries.out +cat /tmp/queries/timescaledb-single-groupby(5,1,1)-queries.gz | gunzip | tsbs_run_queries_timescaledb -workers 8 -limit 1000 -postgres "host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-single-groupby(5,1,1)-queries.out -cat /tmp/queries/timescaledb-5-metrics-1-host-12-hr-queries.gz | gunzip | tsbs_run_queries_timescaledb -workers 8 -limit 1000 -postgres "host=localhost user=postgres sslmode=disable timescaledb.disable_optimizations=false" | tee query_timescaledb_timescaledb-5-metrics-1-host-12-hr-queries.out +cat /tmp/queries/timescaledb-single-groupby(5,1,12)-queries.gz | gunzip | tsbs_run_queries_timescaledb -workers 8 -limit 1000 -postgres "host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-single-groupby(5,1,12)-queries.out ''' import argparse import os @@ -78,12 +78,9 @@ def get_query_str(queryfile, label, workers, limit, hostname, extra_query_args): # Cassandra has an extra option to choose between server & client # aggregation plans. Client seems to be better in all cases extra_args = '--aggregation-plan=client' - elif label == 'timescaledb': + elif label == 'timescaledb' or label == 'postgres': # TimescaleDB needs the connection string - extra_args = '--postgres="{}"'.format('host={} user=postgres sslmode=disable'.format(hostname)) - elif label == 'postgres': - # Postgres needs the connection string - extra_args = '--postgres="{}"'.format('host={} user=postgres sslmode=disable'.format(hostname)) + extra_args = '--hosts="{}" --postgres="{}"'.format(hostname, 'user=postgres sslmode=disable') return 'cat {} | gunzip | tsbs_run_queries_{} --workers={} {} {} {} | tee {}.out'.format( queryfile, benchmarker, workers, limit_arg, extra_args, extra_query_args, output_file)