CWHD is a custom Azure monitoring solution leveraging Grafana to monitor the following aspects:
Color code signals in Grafana dashboards showing Green, Amber and Red tiles depending on:
- overall resource heath from Azure Resource Health signals
- all App health using App Insights Standard Test (HTTP ping) web app availability signals * for VM only - configurable threshold of CPU, Memory and Disk usage to display Amber color when threshold is met. (only works for VM)
- dashboard visualization tiles uses Green, Amber and Red color code to determine the overall availability of an application aggregated by one or more Azure resource's Resource Health
The dashboards are organized in Level 0 and Level 1 depicting the "depth" of monitoring.
- Level 0 - shows availability status if all Apps.
- Level 1 - drills into Resource Health of each Azure resource used by the app
-
Telemtry Required
- for App Service and Web App health signals - all Workspace-based Application Insights Standard Test results send to a single Log Analytics Workspace
- for Virtual Machines health signals - enable VM Insights
- All PaaS resources under monitoring, to have Diagnostic Setting configured to send Logs to 1 central Log Analytics Workspace. For e.g: API Management send resource logs to workspace
-
Azure Resources Required
- a "central" Log Analytics Workspace
- Azure Managed Grafana
- enable Managed Identity
- add Azure role assignment (RBAC) for Grafana Managed Identity with Monitor Reader to:
- Subscriptions containing resources under monitoring
- Log Analytics Workspace (if workspace in different subscription from above)
- Azure Function - App Service Plan S1
- enable Managed Identity
- add Azure role assignment (RBAC) for Function Managed Identity with Monitor Reader to:
- Subscriptions containing resources under monitoring
- Log Analytics Workspace (if workspace in different subscription from above)
- All Application Insights must be linked to the same central Log Analytics Workspace
- Create App Insights Standard Tests to perform availability tests for all App Services and Web Apps. (Standard Tests logs are stored in AppAvailabilityResults table)
-
Assumption
- has an existing Log Analytics Workspace where "all" Application Insights are linked to
- Python 3.11
- Azure Managed Grafana Standard - Grafana 10.4.11
- Docker
- Python modules
CWHD has a REST backend call Telemetry Forager that retrieves and curates telemetry from different telemetry sources including:
- Azure Monitor REST API for
- App Service health status - executes kusto query to get App Insights availability result from Log Analytics AppAvailabilityResults table
- VM: health status is determine by 2 factors
- Resource Health availability status determines if VM is available or not depicting the Green or Red status.
- If resource health status is Available/Green, additional 3 metrics CPU, Memory and Disk usage percentage will be monitored according to a set of configurable thresholds. In Grafana, VM Stat visualization will show Amber status if one or more of the 3 metrics reaches the threshold.
- Azure Resource Health API - get resource health for all resource types except App Service, which gets health status from App Insight Standard Test
Path | Method | Param |
---|---|---|
/RHRetriever | POST |
{ "resources": [ { [ "resourceId":"{resource id}", [ "standardTestName": "{ App Insights standard test name }", [ "workspaceId": "{Log Analytics Workspace Id}" [ } ] } |
The overall available status (green) depends on the dependent Azure resources that each app here is using. If there is any one of the Azure resource used by Cloud Crafty or Pocket Geeks apps that has Resource Health status as "Unavailable", the overall health status at Level 0 will be Unavailable. For example Cloud Crafty uses 3 Azure resources: App Service, Key Vault and APIM. The overall availability status will only be Green when all 3 resourcecs' Resource Health + App Insight Standard Test availability status is available.
Proposed Distributed Tracing with OpenTelemetry Collector to collect OpenTelemetry traces from apps, collector sends traces to Jaeger backed by Azure Managed Cassandra. Grafana gets traces from Jaeger as datasource to display traces within Grafana centrally, in addition to viewing traces in Jaeger UI.