Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Tracking Integration Test and Data Collection #497

Merged
merged 8 commits into from
Jun 23, 2022
48 changes: 48 additions & 0 deletions integration/test/performance_tracker/performance_metrics_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
//go:build linux && integration
// +build linux,integration

package performancetest

import(
"testing"
"time"
"log"
"context"

"github.com/aws/amazon-cloudwatch-agent/integration/test"
)

const (
configPath = "resources/config.json"
configOutputPath = "/opt/aws/amazon-cloudwatch-agent/bin/config.json"
agentRuntimeMinutes = 20
)

func PerformanceTest(t *testing.T) {
agentContext := context.TODO()
instanceId := test.GetInstanceId()
log.Printf("Instance ID used for performance metrics : %s\n", instanceId)

test.CopyFile(configPath, configOutputPath)

test.StartAgent(configOutputPath, true)

agentRunDuration := agentRuntimeMinutes * time.Minute
//let agent run before collecting performance metrics on it
time.Sleep(agentRunDuration)
log.Printf("Agent has been running for : %s\n", (agentRunDuration).String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Printf("Agent has been running for : %s\n", (agentRunDuration).String())
log.Printf("Agent has been running for : %s\n", agentRunDuration.String())

Don't think you need the parentheses around this but not super particular.

test.StopAgent()

//collect data
data, err := GetPerformanceMetrics(instanceId, agentRuntime, agentContext)
if err != nil {
log.Println("Error: " + err)
t.Fatalf("Error: %v", err)
}

//------Placeholder to put data into database------//
//useless code so data get used and compiler isn't mad
if data == nil {
t.Fatalf("No data")
}
}
158 changes: 158 additions & 0 deletions integration/test/performance_tracker/performance_query_utils.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
package performancetest

import (
"time"
"errors"
"context"
"encoding/json"
"os"
"fmt"

"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/cloudwatch"
"github.com/aws/aws-sdk-go-v2/service/cloudwatch/types"
)

const (
Namespace = "CWAgent"
DimensionName = "InstanceId"
Stat = "Average"
Period = 30
configPath = "./resources/config.json"
)

/*
* GetConfigMetrics parses the cloudwatch agent config and returns the associated
* metrics that the cloudwatch agent is measuring on itself
*/
func GetConfigMetrics() ([]string, []string, error) {
//get metric measurements from config file
file, err := os.ReadFile(configPath)
if err != nil {
return nil, nil, err
}

var cfgFileData map[string]interface{}
err = json.Unmarshal(file, &cfgFileData)
if err != nil {
return nil, nil, err
}

//go through the config json to get to the procstat metrics
procstatList := cfgFileData["metrics"].(map[string]interface{})["metrics_collected"].(map[string]interface{})["procstat"].([]interface{})

//within procstat metrics, find cloudwatch-agent process
cloudwatchIndex := -1
for i, process := range procstatList {
if process.(map[string]interface{})["exe"].(string) == "cloudwatch-agent" {
cloudwatchIndex = i
}
}

//check to see if the process was not found
if cloudwatchIndex == -1 {
return nil, nil, errors.New("cloudwatch-agent process not found in cloudwatch agent config")
}

//use the index to get the rest of the path
metricList := procstatList[cloudwatchIndex].(map[string]interface{})["measurement"].([]interface{})

//convert the resulting []interface{} to []string and create matching metric ids for each one
metricNames := make([]string, len(metricList))
ids := make([]string, len(metricList))
for i, metricName := range metricList {
metricNames[i] = "procstat_" + metricName.(string)
ids[i] = fmt.Sprint("m", i + 1)
}

return metricNames, ids, nil
}

// GenerateGetMetricInputStruct generates the struct required to make a query request to cloudwatch's GetMetrics
func GenerateGetMetricInputStruct(ids, metricNames []string, instanceId string, timeDiff int) (*cloudwatch.GetMetricDataInput, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, this is using GetMetricData and #497 is expecting that. I think the end result will make more sense to call GetMetricStatistics here, but it's okay to have this as is for now while you both explore how interacting with GetMetricStatistics would work. It's possible I'm just wrong and GetMetricStatistics doesn't help at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look more into GetMetricStatistics

if len(ids) != len(metricNames) {
return nil, errors.New("Mismatching lengths of metric ids and metricNames")
}

if len(ids) == 0 || len(metricNames) == 0 || instanceId == "" || timeDiff == 0 {
return nil, errors.New("Must supply metric ids, metric names, instance id, and time to collect metrics")
}

dimensionValue := instanceId
metricDataQueries := []types.MetricDataQuery{}

//generate list of individual metric requests
for i, id := range ids {
metricDataQueries = append(metricDataQueries, ConstructMetricDataQuery(id, Namespace, DimensionName, dimensionValue, metricNames[i], timeDiff))
}

timeNow := time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - might just be my personal bias, but normally I just call this now. Not something I'd block on, though.

Suggested change
timeNow := time.Now()
now := time.Now()

input := &cloudwatch.GetMetricDataInput{
EndTime: aws.Time(time.Unix(timeNow.Unix(), 0)),
StartTime: aws.Time(time.Unix(timeNow.Add(time.Duration(-timeDiff)*time.Minute).Unix(), 0)),
MetricDataQueries: metricDataQueries,
}

return input, nil
}

// ConstructMetricDataQuery is a helper function for GenerateGetMetricInputStruct and constructs individual metric requests
func ConstructMetricDataQuery(id, namespace, dimensionName, dimensionValue, metricName string, timeDiff int) (types.MetricDataQuery) {
query := types.MetricDataQuery{
Id: aws.String(id),
MetricStat: &types.MetricStat{
Metric: &types.Metric{
Namespace: aws.String(namespace),
MetricName: aws.String(metricName),
Dimensions: []types.Dimension{
types.Dimension{
Name: aws.String(dimensionName),
Value: aws.String(dimensionValue),
},
},
},
Period: aws.Int32(int32(Period)),
Stat: aws.String(Stat),
},
}

return query
}

func GetPerformanceMetrics(instanceId string, agentRuntime int, agentContext context.Context) ([]byte, error) {

//load default configuration
cfg, err := config.LoadDefaultConfig(agentContext)
if err != nil {
return nil, err
}

client := cloudwatch.NewFromConfig(cfg)

//fetch names of metrics to request and generate corresponding ids
metricNames, ids, err := GetConfigMetrics()
if err != nil {
return nil, err
}

//make input struct
input, err := GenerateGetMetricInputStruct(ids, metricNames, instanceId, agentRuntime)
if err != nil {
return nil, err
}

//call to CloudWatch API
metrics, err := client.GetMetricData(agentContext, input)
if err != nil {
return nil, err
}

//format data to json before passing output
outputData, err := json.MarshalIndent(metrics.MetricDataResults, "", " ")
if err != nil {
return nil, err
}

return outputData, nil
}
45 changes: 45 additions & 0 deletions integration/test/performance_tracker/resources/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to consider going forward is if there is going to be collision when trying to get metrics from different performance tests that run concurrently.

I ran into an issue when writing the CloudWatch Logs integration tests because I reused the log group and log stream name across all of the tests, so I had issues with getting consistent results from assertions because I was picking up data from multiple tests at the same time by accident.

Not something that needs to be addressed right now, but should be something looked into in the next few weeks. The main focus is getting some performance test running and persisting data.

"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"procstat": [
{
"exe": "cloudwatch-agent",
"measurement": [
"cpu_usage",
"memory_rss"
]
}
]
}
}
}