Skip to content

Commit

Permalink
feat(DMVP-4666): grafana dashboard create ability
Browse files Browse the repository at this point in the history
  • Loading branch information
mrdntgrn committed Aug 3, 2024
1 parent afb8a94 commit 9ab4667
Show file tree
Hide file tree
Showing 88 changed files with 2,791 additions and 38 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,5 @@ override.tf.json
# Ignore CLI configuration files
.terraformrc
terraform.rc

.terraform.lock.hcl
82 changes: 45 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,55 @@
# terraform-onpremise-grafana
# terraform-onpremise-grafana
https://registry.terraform.io/modules/dasmeta/grafana/onpremise/latest

This module is created to manage OnPremise Grafana stack with Terraform.
At this moment we support managing
- Grafana Dashboard with `dashboard` submodule
- Grafana Alerts with `alerts` submodule
- Grafana Contact Points with `contact-points` submodule
- Grafana Notification Policies with `notifications` submodule

More parts are coming soon.

## example for dashboard
```hcl
module "grafana_monitoring" {
source = "dasmeta/grafana/onpremise"
version = "1.2.0"
name = "Test-dashboard"
application_dashboard = {
rows : [
{ type : "block/sla" },
{ type : "block/ingress" },
{ type : "block/service", name : "service-name-1", host : "example.com" },
{ type : "block/service", name : "service-name-2" },
{ type : "block/service", name : "service-name-3" }
]
data_source = {
uid : "00000"
}
variables = [
{
"name" : "namespace",
"options" : [
{
"selected" : true,
"value" : "prod"
},
{
"value" : "stage"
},
{
"value" : "dev"
}
],
}
]
}
}
```

## Example for Alert Rules
```
module "grafana_alerts" {
Expand Down Expand Up @@ -161,39 +204,4 @@ module "grafana_alerts" {
```

## Usage
Check `modules/alerts/tests`, `modules/contact-points/tests` and `modules/notifications/tests` folders to see more examples.
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

No requirements.

## Providers

No providers.

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_alerts"></a> [alerts](#module\_alerts) | ./modules/alerts | n/a |
| <a name="module_contact_points"></a> [contact\_points](#module\_contact\_points) | ./modules/contact-points | n/a |
| <a name="module_notifications"></a> [notifications](#module\_notifications) | ./modules/notifications | n/a |

## Resources

No resources.

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alert_interval_seconds"></a> [alert\_interval\_seconds](#input\_alert\_interval\_seconds) | The interval, in seconds, at which all rules in the group are evaluated. If a group contains many rules, the rules are evaluated sequentially. | `number` | `10` | no |
| <a name="input_alert_rules"></a> [alert\_rules](#input\_alert\_rules) | This varibale describes alert folders, groups and rules. | <pre>list(object({<br> name = string # The name of the alert rule<br> no_data_state = optional(string, "NoData") # Describes what state to enter when the rule's query returns No Data<br> exec_err_state = optional(string, "Error") # Describes what state to enter when the rule's query is invalid and the rule cannot be executed<br> summary = optional(string, "") # Rule annotation as a summary<br> priority = optional(string, "P2") # Rule priority level: P2 is for non-critical alerts, P1 will be set for critical alerts<br> folder_name = optional(string, "Main Alerts") # Grafana folder name in which the rule will be created<br> datasource = string # Name of the datasource used for the alert<br> expr = optional(string, null) # Full expression for the alert<br> metric_name = optional(string, "") # Prometheus metric name which queries the data for the alert<br> metric_function = optional(string, "") # Prometheus function used with metric for queries, like rate, sum etc.<br> metric_interval = optional(string, "") # The time interval with using functions like rate<br> settings_mode = optional(string, "replaceNN") # The mode used in B block, possible values are Strict, replaceNN, dropNN<br> settings_replaceWith = optional(number, 0) # The value by which NaN results of the query will be replaced<br> filters = optional(any, {}) # Filters object to identify each service for alerting<br> function = optional(string, "mean") # One of Reduce functions which will be used in B block for alerting<br> equation = string # The equation in the math expression which compares B blocks value with a number and generates an alert if needed. Possible values: gt, lt, gte, lte, e<br> threshold = number # The value against which B blocks are compared in the math expression<br> }))</pre> | `[]` | no |
| <a name="input_notifications"></a> [notifications](#input\_notifications) | Represents the configuration options for Grafana notification policies. | <pre>object({<br> contact_point = optional(string, "Slack") # The default contact point to route all unmatched notifications to.<br> group_by = optional(list(string), ["grafana_folder", "alertname"]) # A list of alert labels to group alerts into notifications by.<br> group_interval = optional(string, "5m") # Minimum time interval between two notifications for the same group.<br> repeat_interval = optional(string, "4h") # Minimum time interval for re-sending a notification if an alert is still firing.<br><br> policy = optional(object({<br> contact_point = optional(string, null) # The contact point to route notifications that match this rule to.<br> continue = optional(bool, false) # Whether to continue matching subsequent rules if an alert matches the current rule. Otherwise, the rule will be 'consumed' by the first policy to match it.<br> group_by = optional(list(string), [])<br> mute_timings = optional(list(string), []) # A list of mute timing names to apply to alerts that match this policy.<br><br> matcher = optional(object({<br> label = optional(string, "priority") # The name of the label to match against.<br> match = optional(string, "=") # The operator to apply when matching values of the given label. Allowed operators are = for equality, != for negated equality, =~ for regex equality, and !~ for negated regex equality.<br> value = optional(string, "P1") # The label value to match against.<br> }))<br> }))<br> })</pre> | `{}` | no |
| <a name="input_opsgenie_endpoints"></a> [opsgenie\_endpoints](#input\_opsgenie\_endpoints) | OpsGenie contact points list. | <pre>list(object({<br> name = string # The name of the contact point.<br> api_key = string # The OpsGenie API key to use.<br> auto_close = optional(bool, false) # Whether to auto-close alerts in OpsGenie when they resolve in the Alertmanager.<br> message = optional(string, "") # The templated content of the message.<br> api_url = optional(string, "https://api.opsgenie.com/v2/alerts") # Allows customization of the OpsGenie API URL.<br> disable_resolve_message = optional(bool, false) # Whether to disable sending resolve messages.<br> }))</pre> | `[]` | no |
| <a name="input_slack_endpoints"></a> [slack\_endpoints](#input\_slack\_endpoints) | Slack contact points list. | <pre>list(object({<br> name = string # The name of the contact point.<br> endpoint_url = optional(string, "https://slack.com/api/chat.postMessage") # Use this to override the Slack API endpoint URL to send requests to.<br> icon_emoji = optional(string, "") # The name of a Slack workspace emoji to use as the bot icon.<br> icon_url = optional(string, "") # A URL of an image to use as the bot icon.<br> recipient = optional(string, null) # Channel, private group, or IM channel (can be an encoded ID or a name) to send messages to.<br> text = optional(string, "") # Templated content of the message.<br> title = optional(string, "") # Templated title of the message.<br> token = optional(string, "") # A Slack API token,for sending messages directly without the webhook method.<br> webhook_url = optional(string, "") # A Slack webhook URL,for sending messages via the webhook method.<br> username = optional(string, "") # Username for the bot to use.<br> disable_resolve_message = optional(bool, false) # Whether to disable sending resolve messages.<br> }))</pre> | `[]` | no |

## Outputs

No outputs.
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
Check `./tests`, `modules/alerts/tests`, `modules/contact-points/tests` and `modules/notifications/tests` folders to see more examples.
10 changes: 10 additions & 0 deletions dashboard.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module "application_dashboard" {
source = "./modules/dashboard/"

count = length(var.application_dashboard) > 0 ? 1 : 0

name = var.name
rows = var.application_dashboard.rows
data_source = var.application_dashboard.data_source
variables = var.application_dashboard.variables
}
1 change: 1 addition & 0 deletions modules/dashboard/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.tfstate
54 changes: 54 additions & 0 deletions modules/dashboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Module to create Grafana dashboard from json/hcl
## Yaml example
```
source: dasmeta/grafana/onpremise//modules/dashboard
version: x.y.z
variables:
name: test-dashboard
data_source:
uid: "0000"
rows:
- type : block/sla
- type : "block/ingress"
- type : "block/service"
name : "service-name-1"
host : "example.com"
- type : "block/service",
name : "service-name-2"
-
- type : "text/title",
text : "End"
```

## HCL example
```
module "this" {
source = "dasmeta/grafana/onpremise//modules/dashboard"
version = "x.y.z"
name = "test-dashboard-with-blocks"
data_source = {
uid: "0000"
}
rows = [
{ "type" : "block/sla" },
{ type : "block/ingress" },
{ type : "block/service", name : "service-name-1", namespace: "dev", host : "example.com" },
{ type : "block/service", name : "service-name-2", namespace: "dev" },
{ type : text/title, text: "End"}
]
}
```

## How add new widget
1. create module in modules/widgets (copy from one)
2. implement data loading as required
3. add new widget tf module in widget-{widget-group-name | single}.tf file
4. add new widget line in widget_result local

## How add new block
1. create module in modules/blocks (copy from one)
2. implement data loading as required
3. add new block tf module in widget-blocks.tf
4. add new block line in blocks_results local
134 changes: 134 additions & 0 deletions modules/dashboard/locals.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
locals {
dashboard_title = var.name

# fill dashboard variable options and current required fields
grafana_templating_list_variables_options_fill = [for variable in var.variables : merge(variable, {
options = [for option in variable.options : merge(option, { text = coalesce(option.text, option.value) })]
})]
grafana_templating_list_variables = [for variable in local.grafana_templating_list_variables_options_fill : merge(variable, {
current = try(variable.options[index(variable.options.*.selected, true)], null)
})]


## Blocks

# get all blocks and annotate
initial_blocks = [
for index1, block in var.rows : {
block : block,
index1 : index1,
type : replace(block.type, "block/", "")
} if strcontains(try(block.type, ""), "block/")
]

# annotate each block type with subIndex
blocks_by_type = { for block_type in distinct([for block in local.initial_blocks : block.type]) :
block_type => [for index2, block in local.initial_blocks : merge(block, { index2 : index2 }) if strcontains(block.type, block_type)]
}

# bring all module results together
blocks_results = {
ingress = values(module.block_ingress).*.result
service = values(module.block_service).*.result
sla = values(module.block_sla).*.result
}

blocks_by_type_results = concat([], [
for block_type, type_blocks in local.blocks_by_type : [
for index3, block in type_blocks : merge(block, { results : local.blocks_results[block.type][index3] }) if contains(keys(local.blocks_results), block.type)
] if contains(keys(local.blocks_results), block_type)
]...)

# inject block widgets into rows/panels listing in place of block/* definitions
rows = concat([], [
for index1, row in var.rows :
concat(strcontains(try(row.type, ""), "block/") ? [] : [row],
[
for item in local.blocks_by_type_results : item.results if item.index1 == index1
]...)
]...)


## Widgets
# default values from module and provided from outside
widget_default_values = merge(
{
period = 3 # in minutes
stat = "Sum"
width = 6
height = 8
expressions = []
yAxis = { left = { min = 0 } }
data_source = var.data_source
container = "$container"
namespace = "$namespace"
cluster = "$cluster"
account_id = null
region = null
anomaly_detection = false
anomaly_deviation = 6
},
var.defaults
)

# this will walk through every widget and add row/column + merge with default values
widget_config_with_raw_column_data_and_defaults = [
for row_number, row in local.rows : [
for column_number, column in row : merge(
local.widget_default_values,
column,
{
row = row_number,
column = column_number,
row_count = length(local.rows),
column_count = length(row)
}
)
]
]

# groups rows by widget type
widget_config = { for key, item in flatten(local.widget_config_with_raw_column_data_and_defaults) :
item.type => merge(
item,
# calculate coordinates based on defaults and row/column details
{
coordinates = {
x = item.column * item.width
y = item.row
width = item.width
height = item.height
}
}
)... }

# combine results (last step)
widget_result = concat(
# Container widgets
values(module.container_cpu_widget).*.data,
values(module.container_memory_widget).*.data,
values(module.container_network_widget).*.data,
values(module.container_restarts_widget).*.data,
values(module.container_replicas_widget).*.data,
values(module.container_request_count_widget).*.data,
values(module.container_response_time_widget).*.data,

# Ingress widgets
values(module.ingress_connections_widget).*.data,
values(module.ingress_request_rate_widget).*.data,
values(module.ingress_request_count_widget).*.data,
values(module.ingress_response_time_widget).*.data,

# Text widgets
values(module.text_title).*.data,
values(module.text_title_with_link).*.data,
values(module.text_title_with_collapse).*.data,

# sla/slo/sli widgets
values(module.widget_sla_slo_sli_main).*.data,
values(module.widget_sla_slo_sli_latency).*.data,

# single widgets
values(module.widget_custom).*.data,
)
}
32 changes: 32 additions & 0 deletions modules/dashboard/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
resource "grafana_dashboard" "metrics" {
config_json = jsonencode({
uid = random_string.grafana_dashboard_id.result
title = local.dashboard_title
style = "dark"
timezone = "browser"
editable = true
schemaVersion = 35
fiscalYearStartMonth = 0
graphTooltip = 0
links = []
liveNow = false
annotations = {}
refresh = "1m"
tags = []
templating = {
list = local.grafana_templating_list_variables
}
time = {
from = "now-6h"
to = "now"
}
timepicker = {}
weekStart = ""
panels = local.widget_result
})
}

resource "random_string" "grafana_dashboard_id" {
length = 16
special = false
}
24 changes: 24 additions & 0 deletions modules/dashboard/modules/blocks/ingress/output.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
output "result" {
description = "description"
value = [
[
{ type : "text/title-with-collapse", text : "Nginx Ingress Controller" }
],
[
{ type : "ingress/request-rate" },
{ type : "ingress/connections" },
{ type : "ingress/response-time" },
{ type : "ingress/request-count" }
],
[
{ type : "ingress/request-rate", by_host : true },
{ type : "ingress/response-time", by_host : true },
{ type : "ingress/request-count", by_host : true },
{ type : "ingress/request-count", by_host : true, only_5xx: true }
],
[
{ type : "container/cpu", container : "controller", namespace : "ingress-nginx", width : 12 },
{ type : "container/memory", container : "controller", namespace : "ingress-nginx", width : 12 }
],
]
}
14 changes: 14 additions & 0 deletions modules/dashboard/modules/blocks/ingress/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
variable "account_id" {
type = string
description = "AWS account ID"
}

variable "balancer_name" {
type = string
description = "ALB name"
}

variable "region" {
type = string
default = ""
}
Loading

0 comments on commit 9ab4667

Please sign in to comment.