Skip to content

Commit

Permalink
Update query types to new naming convention.
Browse files Browse the repository at this point in the history
Cleanup and make consistent the labels for queries.
  • Loading branch information
RobAtticus committed May 22, 2018
1 parent e1db088 commit f7e1166
Show file tree
Hide file tree
Showing 11 changed files with 92 additions and 108 deletions.
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Variables needed:
1. the same use case, seed, # of devices, and start time as used in data generation
1. an end time that is one second after the end time from data generation. E.g., for `2016-01-04T00:00:00Z` use `2016-01-04T00:00:01Z`
1. the number of queries to generate. E.g., `1000`
1. and the type of query you'd like to generate. E.g., `1-host-1-hr`
1. and the type of query you'd like to generate. E.g., `single-groupby(1,1,1)`

For the last step there are numerous queries to choose from, which are
listed in [Appendix I](#appendix-i-query-types). Additionally, the file
Expand All @@ -143,16 +143,16 @@ For generating just one set of queries for a given type:
$ tsbs_generate_queries -seed=123 -scale-var=4000 \
-timestamp-start="2016-01-01T00:00:00Z" \
-timestamp-end="2016-01-04T00:00:01Z" \
-queries=1000 -query-type="1-host-1-hr" -format="timescaledb" \
| gzip > /tmp/timescaledb-queries-1-host-1-hr.gz
-queries=1000 -query-type="single-groupby(1,1,1)" -format="timescaledb" \
| gzip > /tmp/timescaledb-queries-single-groupby(1,1,1).gz
```

For generating sets of queries for multiple types:
```bash
$ formats="timescaledb" scaleVar=4000 seed=123 \
tsStart="2016-01-01T00:00:00Z" \
tsEnd="2016-01-04T00:00:01Z" \
queries=1000 queryTypes="1-host-1-hr 1-host-12-hr groupby" \
queries=1000 queryTypes="single-groupby(1,1,1) single-groupby(1,1,12) double-groupby(1)" \
dataDir="/tmp" script/generate_queries.sh
```

Expand Down Expand Up @@ -257,10 +257,10 @@ to run multiple query types in a row. The queries it generates should be
put in a file with one query per line and the path given to the script.
For example, if you had a file named `queries.txt` that looked like this:
```text
high-cpu-and-field
cpu-max-all-eight-hosts
high-cpu(1)
cpu-max-all(8)
groupby-orderby-limit
groupby
double-groupby(1)
```

You could generate a run script named `query_test.sh`:
Expand All @@ -275,13 +275,13 @@ And the resulting script file would look like:
```bash
#!/bin/bash
# Queries
cat /tmp/queries/timescaledb-high-cpu-and-field-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-high-cpu-and-field-queries.out
cat /tmp/queries/timescaledb-high-cpu(1)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-high-cpu(1)-queries.out

cat /tmp/queries/timescaledb-cpu-max-all-eight-hosts-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-cpu-max-all-eight-hosts-queries.out
cat /tmp/queries/timescaledb-cpu-max-all(8)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-cpu-max-all(8)-queries.out

cat /tmp/queries/timescaledb-groupby-orderby-limit-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-orderby-limit-queries.out
cat /tmp/queries/timescaledb-groupby-orderby-limit-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-orderby-limit-queries.out

cat /tmp/queries/timescaledb-groupby-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --postgres="host=localhost user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-groupby-queries.out
cat /tmp/queries/timescaledb-double-groupby(1)-queries.gz | gunzip | query_benchmarker_timescaledb --workers=8 --limit=1000 --hosts="localhost" --postgres="user=postgres sslmode=disable" | tee query_timescaledb_timescaledb-double-groupby(1)-queries.out
```

### Query validation (optional)
Expand All @@ -296,18 +296,18 @@ the results.
### Devops / cpu-only
|Query type|Description|
|:---|:---|
|1-host-1-hr| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour
|1-host-12-hr| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours
|8-host-1-hr| Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour
|5-metrics-1-host-1-hr| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour
|5-metrics-1-host-12-hr| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours
|5-metrics-8-host-1-hr| Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour
|cpu-max-all-single-host| Aggregate across all CPU metrics per hour over 1 hour for a single host
|cpu-max-all-eight-hosts| Aggregate across all CPU metrics per hour over 1 hour for eight hosts
|groupby| Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours
|groupby-5| Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours
|groupby-all| Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours
|high-cpu-all-hosts| All the readings where one metric is above a threshold across all hosts
|high-cpu-1-host| All the readings where one metric is above a threshold for a particular host
|single-groupby(1,1,1)| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 1 hour
|single-groupby(1,1,12)| Simple aggregrate (MAX) on one metric for 1 host, every 5 mins for 12 hours
|single-groupby(1,8,1)| Simple aggregrate (MAX) on one metric for 8 hosts, every 5 mins for 1 hour
|single-groupby(5,1,1)| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 1 hour
|single-groupby(5,1,12)| Simple aggregrate (MAX) on 5 metrics for 1 host, every 5 mins for 12 hours
|single-groupby(5,8,1)| Simple aggregrate (MAX) on 5 metrics for 8 hosts, every 5 mins for 1 hour
|cpu-max-all(1)| Aggregate across all CPU metrics per hour over 1 hour for a single host
|cpu-max-all(8)| Aggregate across all CPU metrics per hour over 1 hour for eight hosts
|double-groupby(1)| Aggregate on across both time and host, giving the average of 1 CPU metric per host per hour for 24 hours
|double-groupby(5)| Aggregate on across both time and host, giving the average of 5 CPU metrics per host per hour for 24 hours
|double-groupby(all)| Aggregate on across both time and host, giving the average of all (10) CPU metrics per host per hour for 24 hours
|high-cpu(all)| All the readings where one metric is above a threshold across all hosts
|high-cpu(1)| All the readings where one metric is above a threshold for a particular host
|lastpoint| The last reading for each host
|groupby-orderby-limit| The last 5 aggregate readings (across time) before a randomly chosen endpoint
12 changes: 6 additions & 6 deletions cmd/tsbs_generate_queries/cassandra_devops_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ func (d *CassandraDevops) GroupByTime(qi query.Query, nHosts, numMetrics int, ti
func (d *CassandraDevops) GroupByOrderByLimit(qi query.Query) {
interval := d.interval.RandWindow(time.Hour)

humanLabel := "Cassandra max cpu over last 5 min-intervals (rand end)"
humanLabel := "Cassandra max cpu over last 5 min-intervals (random end)"
q := qi.(*query.Cassandra)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, d.interval.StartString()))
Expand All @@ -103,7 +103,7 @@ func (d *CassandraDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics in
interval := d.interval.RandWindow(doubleGroupByDuration)
metrics := getCPUMetricsSlice(numMetrics)

humanLabel := fmt.Sprintf("Cassandra mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics)
humanLabel := getDoubleGroupByLabel("Cassandra", numMetrics)
q := qi.(*query.Cassandra)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString()))
Expand All @@ -124,14 +124,14 @@ func (d *CassandraDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics in
// FROM cpu WHERE (hostname = '$HOSTNAME_1' OR ... OR hostname = '$HOSTNAME_N')
// AND time >= '$HOUR_START' AND time < '$HOUR_END'
// GROUP BY hour ORDER BY hour
func (d *CassandraDevops) MaxAllCPU(qi query.Query, nhosts int) {
interval := d.interval.RandWindow(8 * time.Hour)
tagSet := d.getHostWhere(nhosts)
func (d *CassandraDevops) MaxAllCPU(qi query.Query, nHosts int) {
interval := d.interval.RandWindow(maxAllDuration)
tagSet := d.getHostWhere(nHosts)

tagSets := [][]string{}
tagSets = append(tagSets, tagSet)

humanLabel := fmt.Sprintf("Cassandra max cpu all fields, rand %4d hosts, rand 12hr by 1h", nhosts)
humanLabel := getMaxAllLabel("Cassandra", nHosts)
q := qi.(*query.Cassandra)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString()))
Expand Down
35 changes: 20 additions & 15 deletions cmd/tsbs_generate_queries/devops.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ const (
allHosts = "all hosts"
doubleGroupByDuration = 24 * time.Hour
highCPUDuration = 24 * time.Hour
maxAllDuration = 8 * time.Hour

labelSingleGroupby = "single-groupby"
labelDoubleGroupby = "double-groupby"
labelLastpoint = "lastpoint"
labelMaxAll = "cpu-max-all"
labelGroupbyOrderbyLimit = "groupby-orderby-limit"
labelHighCPU = "high-cpu"
)

type devopsCore struct {
Expand Down Expand Up @@ -61,17 +69,6 @@ func getCPUMetricsSlice(numMetrics int) []string {
return cpuMetrics[:numMetrics]
}

// Devops describes a devops query generator.
type Devops interface {
CPU5Metrics(query.Query, int, int, time.Duration)
GroupByOrderByLimit(query.Query)
HighCPUForHosts(query.Query, int, int)
LastPointPerHost(query.Query)
MaxAllCPU(query.Query, int, int)
MaxCPUUsageHourByMinute(query.Query, int, int, time.Duration)
MeanCPUMetricsDayByHourAllHostsGroupbyHost(query.Query, int)
}

// SingleGroupbyFiller is a type that can fill in a single groupby query
type SingleGroupbyFiller interface {
GroupByTime(query.Query, int, int, time.Duration)
Expand Down Expand Up @@ -102,6 +99,10 @@ type HighCPUFiller interface {
HighCPUForHosts(query.Query, int)
}

func getDoubleGroupByLabel(dbName string, numMetrics int) string {
return fmt.Sprintf("%s mean of %d metrics, all hosts, random %s by 1hr", dbName, numMetrics, doubleGroupByDuration)
}

func getHighCPULabel(dbName string, nHosts int) string {
label := dbName + " CPU over threshold, "
if nHosts > 0 {
Expand All @@ -112,12 +113,16 @@ func getHighCPULabel(dbName string, nHosts int) string {
return label
}

func getRandomHosts(scale, nhosts int) []string {
if nhosts > scale {
log.Fatal("nhosts > scaleVar")
func getMaxAllLabel(dbName string, nHosts int) string {
return fmt.Sprintf("%s max of all CPU fields, random %4d hosts, random %s by 1h", dbName, nHosts, maxAllDuration)
}

func getRandomHosts(scale, nHosts int) []string {
if nHosts > scale {
log.Fatalf("number of hosts (%d) larger than --scale-var (%d)", nHosts, scale)
}

nn := rand.Perm(scale)[:nhosts]
nn := rand.Perm(scale)[:nHosts]

hostnames := []string{}
for _, n := range nn {
Expand Down
12 changes: 6 additions & 6 deletions cmd/tsbs_generate_queries/influx_devops_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ func (d *InfluxDevops) GroupByOrderByLimit(qi query.Query) {
v := url.Values{}
v.Set("q", fmt.Sprintf(`SELECT max(usage_user) from cpu %s group by time(1m) limit 5`, where))

humanLabel := "Influx max cpu over last 5 min-intervals (rand end)"
humanLabel := "Influx max cpu over last 5 min-intervals (random end)"
q := qi.(*query.HTTP)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString()))
Expand Down Expand Up @@ -116,7 +116,7 @@ func (d *InfluxDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int)
v := url.Values{}
v.Set("q", fmt.Sprintf("SELECT %s from cpu where time >= '%s' and time < '%s' group by time(1h),hostname", strings.Join(selectClauses, ", "), interval.StartString(), interval.EndString()))

humanLabel := fmt.Sprintf("Influx mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics)
humanLabel := getDoubleGroupByLabel("Influx", numMetrics)
q := qi.(*query.HTTP)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString()))
Expand All @@ -132,14 +132,14 @@ func (d *InfluxDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int)
// FROM cpu WHERE (hostname = '$HOSTNAME_1' OR ... OR hostname = '$HOSTNAME_N')
// AND time >= '$HOUR_START' AND time < '$HOUR_END'
// GROUP BY hour ORDER BY hour
func (d *InfluxDevops) MaxAllCPU(qi query.Query, nhosts int) {
interval := d.interval.RandWindow(8 * time.Hour)
whereHosts := d.getHostWhereString(nhosts)
func (d *InfluxDevops) MaxAllCPU(qi query.Query, nHosts int) {
interval := d.interval.RandWindow(maxAllDuration)
whereHosts := d.getHostWhereString(nHosts)

v := url.Values{}
v.Set("q", fmt.Sprintf("SELECT max(usage_user),max(usage_system),max(usage_idle),max(usage_nice),max(usage_iowait),max(usage_irq),max(usage_softirq),max(usage_steal),max(usage_guest),max(usage_guest_nice) from cpu where %s and time >= '%s' and time < '%s' group by time(1m)", whereHosts, interval.StartString(), interval.EndString()))

humanLabel := fmt.Sprintf("Influx max cpu all fields, rand %4d hosts, rand 12hr by 1m", nhosts)
humanLabel := getMaxAllLabel("Influx", nHosts)
q := qi.(*query.HTTP)
q.HumanLabel = []byte(humanLabel)
q.HumanDescription = []byte(fmt.Sprintf("%s: %s", humanLabel, interval.StartString()))
Expand Down
34 changes: 17 additions & 17 deletions cmd/tsbs_generate_queries/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,21 @@ import (

var useCaseMatrix = map[string]map[string]QueryFillerMaker{
"devops": {
"1-host-1-hr": NewDevopsSingleGroupby(1, 1, 1),
"1-host-12-hr": NewDevopsSingleGroupby(1, 1, 12),
"8-host-1-hr": NewDevopsSingleGroupby(1, 8, 1),
"5-metrics-1-host-1-hr": NewDevopsSingleGroupby(5, 1, 1),
"5-metrics-1-host-12-hr": NewDevopsSingleGroupby(5, 1, 12),
"5-metrics-8-host-1-hr": NewDevopsSingleGroupby(5, 8, 1),
"cpu-max-all-single-host": NewDevopsMaxAllCPU(1),
"cpu-max-all-eight-hosts": NewDevopsMaxAllCPU(1),
"groupby": NewDevopsGroupBy(1),
"groupby-5": NewDevopsGroupBy(5),
"groupby-all": NewDevopsGroupBy(len(cpuMetrics)),
"groupby-orderby-limit": NewDevopsGroupByOrderByLimit,
"high-cpu-all-hosts": NewDevopsHighCPU(0),
"high-cpu-1-host": NewDevopsHighCPU(1),
"lastpoint": NewDevopsLastPointPerHost,
labelSingleGroupby + "(1,1,1)": NewDevopsSingleGroupby(1, 1, 1),
labelSingleGroupby + "(1,1,12)": NewDevopsSingleGroupby(1, 1, 12),
labelSingleGroupby + "(1,8,1)": NewDevopsSingleGroupby(1, 8, 1),
labelSingleGroupby + "(5,1,1)": NewDevopsSingleGroupby(5, 1, 1),
labelSingleGroupby + "(5,1,12)": NewDevopsSingleGroupby(5, 1, 12),
labelSingleGroupby + "(5,8,1)": NewDevopsSingleGroupby(5, 8, 1),
labelMaxAll + "(1)": NewDevopsMaxAllCPU(1),
labelMaxAll + "(8)": NewDevopsMaxAllCPU(1),
labelDoubleGroupby + "(1)": NewDevopsGroupBy(1),
labelDoubleGroupby + "(5)": NewDevopsGroupBy(5),
labelDoubleGroupby + "(all)": NewDevopsGroupBy(len(cpuMetrics)),
labelGroupbyOrderbyLimit: NewDevopsGroupByOrderByLimit,
labelHighCPU + "(all)": NewDevopsHighCPU(0),
labelHighCPU + "(1)": NewDevopsHighCPU(1),
labelLastpoint: NewDevopsLastPointPerHost,
},
}

Expand Down Expand Up @@ -97,8 +97,8 @@ func init() {
flag.IntVar(&scaleVar, "scale-var", 1, "Scaling variable (must be the equal to the scalevar used for data generation).")
flag.IntVar(&queryCount, "queries", 1000, "Number of queries to generate.")

flag.BoolVar(&timescaleUseJSON, "timescale-use-json", false, "Use separate JSON tags table when querying")
flag.BoolVar(&timescaleUseTags, "timescale-use-tags", true, "Use separate tags table when querying")
flag.BoolVar(&timescaleUseJSON, "timescale-use-json", false, "TimescaleDB only: Use separate JSON tags table when querying")
flag.BoolVar(&timescaleUseTags, "timescale-use-tags", true, "TimescaleDB only: Use separate tags table when querying")

flag.StringVar(&timestampStartStr, "timestamp-start", "2016-01-01T00:00:00Z", "Beginning timestamp (RFC3339).")
flag.StringVar(&timestampEndStr, "timestamp-end", "2016-01-02T06:00:00Z", "Ending timestamp (RFC3339).")
Expand Down
8 changes: 4 additions & 4 deletions cmd/tsbs_generate_queries/mongo_devops_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ func (d *MongoDevops) GroupByTime(qi query.Query, nHosts, numMetrics int, timeRa
// AND time >= '$HOUR_START' AND time < '$HOUR_END'
// GROUP BY hour ORDER BY hour
func (d *MongoDevops) MaxAllCPU(qi query.Query, nHosts int) {
interval := d.interval.RandWindow(8 * time.Hour)
interval := d.interval.RandWindow(maxAllDuration)
hostnames := d.getRandomHosts(nHosts)
docs := getTimeFilterDocs(interval)
bucketNano := time.Hour.Nanoseconds()
Expand Down Expand Up @@ -219,7 +219,7 @@ func (d *MongoDevops) MaxAllCPU(qi query.Query, nHosts int) {
pipelineQuery = append(pipelineQuery, group)
pipelineQuery = append(pipelineQuery, bson.M{"$sort": bson.M{"_id": 1}})

humanLabel := fmt.Sprintf("Mongo max cpu all fields, rand %4d hosts, rand 8hr by 1h", nHosts)
humanLabel := getMaxAllLabel("Mongo", nHosts)
q := qi.(*query.Mongo)
q.HumanLabel = []byte(humanLabel)
q.BsonDoc = pipelineQuery
Expand Down Expand Up @@ -298,7 +298,7 @@ func (d *MongoDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics int) {
{"$sort": bson.M{"_id.time": 1}},
}...)

humanLabel := fmt.Sprintf("Mongo mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics)
humanLabel := getDoubleGroupByLabel("Mongo", numMetrics)
q := qi.(*query.Mongo)
q.HumanLabel = []byte(humanLabel)
q.BsonDoc = pipelineQuery
Expand Down Expand Up @@ -492,7 +492,7 @@ func (d *MongoDevops) GroupByOrderByLimit(qi query.Query) {
{"$limit": 5},
}...)

humanLabel := "Mongo max cpu over last 5 min-intervals (rand end)"
humanLabel := "Mongo max cpu over last 5 min-intervals (random end)"
q := qi.(*query.Mongo)
q.HumanLabel = []byte(humanLabel)
q.BsonDoc = pipelineQuery
Expand Down
2 changes: 1 addition & 1 deletion cmd/tsbs_generate_queries/mongo_naive_devops_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ func (d *MongoNaiveDevops) GroupByTimeAndPrimaryTag(qi query.Query, numMetrics i
}...)
pipelineQuery = append(pipelineQuery, bson.M{"$sort": bson.M{"_id.time": 1, "_id.hostname": 1}})

humanLabel := fmt.Sprintf("Mongo [NAIVE] mean of %d metrics, all hosts, rand 1day by 1hr", numMetrics)
humanLabel := getDoubleGroupByLabel("Mongo [NAIVE]", numMetrics)
q := qi.(*query.Mongo)
q.HumanLabel = []byte(humanLabel)
q.BsonDoc = pipelineQuery
Expand Down
Loading

0 comments on commit f7e1166

Please sign in to comment.