Skip to content

weixian-zhang/GCC-CWHD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCC Azure - Central Workload Health Dashboard (AZCWHD)

CWHD is a custom Azure monitoring solution leveraging Grafana to monitor the following aspects:

Color code signals in Grafana dashboards showing Green, Amber and Red tiles depending on:

  • overall resource heath from Azure Resource Health signals
  • all App health using App Insights Standard Test (HTTP ping) web app availability signals * for VM only - configurable threshold of CPU, Memory and Disk usage to display Amber color when threshold is met. (only works for VM)
  • dashboard visualization tiles uses Green, Amber and Red color code to determine the overall availability of an application aggregated by one or more Azure resource's Resource Health

The dashboards are organized in Level 0 and Level 1 depicting the "depth" of monitoring.

  • Level 0 - shows availability status if all Apps.
  • Level 1 - drills into Resource Health of each Azure resource used by the app


Prerequisites

  • Telemtry Required

  • Azure Resources Required

    • a "central" Log Analytics Workspace
    • Azure Managed Grafana
      • enable Managed Identity
      • add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
        • Subscriptions containing resources under monitoring
        • Log Analytics Workspace (if workspace in different subscription from above)
    • Azure Function - App Service Plan S1
      • enable Managed Identity
      • add Azure role assignment (RBAC) for Function Managed Identity with Monitor Reader to:
        • Subscriptions containing resources under monitoring
        • Log Analytics Workspace (if workspace in different subscription from above)
    • All Application Insights must be linked to the same central Log Analytics Workspace
    • Create App Insights Standard Tests to perform availability tests for all App Services and Web Apps. (Standard Tests logs are stored in AppAvailabilityResults table)
  • Assumption

    • has an existing Log Analytics Workspace where "all" Application Insights are linked to

Tech Stack


Architecture

image

CWHD has a REST backend call Telemetry Forager that retrieves and curates telemetry from different telemetry sources including:

  • Azure Monitor REST API for
    • App Service health status - executes kusto query to get App Insights availability result from Log Analytics AppAvailabilityResults table
    • VM: health status is determine by 2 factors
      • Resource Health availability status determines if VM is available or not depicting the Green or Red status.
      • If resource health status is Available/Green, additional 3 metrics CPU, Memory and Disk usage percentage will be monitored according to a set of configurable thresholds. In Grafana, VM Stat visualization will show Amber status if one or more of the 3 metrics reaches the threshold.
  • Azure Resource Health API - get resource health for all resource types except App Service, which gets health status from App Insight Standard Test

Telemetry Forager API Spec

Path Method Param
/RHRetriever POST {
"resources": [
{ [
"resourceId":"{resource id}", [
"standardTestName": "{ App Insights standard test name }", [
"workspaceId": "{Log Analytics Workspace Id}" [
  }
 ]
}

Samples

Level 0 Dashboard

image

The overall available status (green) depends on the dependent Azure resources that each app here is using. If there is any one of the Azure resource used by Cloud Crafty or Pocket Geeks apps that has Resource Health status as "Unavailable", the overall health status at Level 0 will be Unavailable. For example Cloud Crafty uses 3 Azure resources: App Service, Key Vault and APIM. The overall availability status will only be Green when all 3 resourcecs' Resource Health + App Insight Standard Test availability status is available.

Level 1 - Cloud Crafty Dashboard

image

Level 1 - Pocket Geek Dashboard

image image

Level 2 Dashboard

Proposed Distributed Tracing with OpenTelemetry Collector to collect OpenTelemetry traces from apps, collector sends traces to Jaeger backed by Azure Managed Cassandra. Grafana gets traces from Jaeger as datasource to display traces within Grafana centrally, in addition to viewing traces in Jaeger UI.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published