This project aims to provide a real-time Site Reliability Engineering (SRE) Dashboard for Kubernetes environments, focusing on the Four Golden Signals (Latency, Traffic, Errors, and Saturation). It integrates an agent-based diagnostic workflow powered by Python, Dash, and Kubernetes APIs to visualise system health, provide actionable recommendations, and assist in Root Cause Analysis (RCA).
The dashboard is designed with scalability and usability in mind, making it a valuable tool for Site Reliability Engineers (SREs), DevOps engineers, and organisations operating Kubernetes clusters.
-
Real-Time Visualisation:
- Displays metrics for Latency, Traffic, Errors, and Saturation.
- Intuitive and interactive charts powered by Dash and Plotly.
-
Agent-Based Diagnostics:
- Iterative workflow for RCA, leveraging Kubernetes logs, metrics, and events.
- Dynamic reasoning to provide actionable insights.
-
Customisable Recommendations:
- Automatically suggests fixes for common Kubernetes issues.
- Tracks and displays incident history.
-
Privacy-Preserving Architecture:
- All data processing occurs within the Kubernetes cluster.
.
├── app/ # Application logic
│ ├── app.py # Flask and Dash app definition
│ ├── config/ # Configuration files
│ │ ├── app_config.py # Application-specific configuration
│ │ ├── logging_config.py # Logging setup
│ │ └── routes_config.py # Flask route configuration
│ ├── routes/ # Flask routes
│ │ ├── api_route.py # API endpoints
│ │ └── index_route.py # Root route
│ ├── services/ # Core services
│ │ ├── database_service.py # Handles database interactions
│ │ ├── figure_service.py # Generates data for dashboard visualisations
│ │ └── __init__.py
│ ├── run.py # Application entry point
│ ├── assets/ # Static files (e.g., CSS)
│ │ └── main.css # Dashboard styling
│ └── __init__.py
├── grafana/ # Optional Grafana setup
│ ├── Dockerfile # Dockerfile for Grafana
│ ├── dashboards/ # Pre-configured Grafana dashboards
│ └── provisioning/ # Provisioning files for Grafana
├── helm/ # Helm chart for Kubernetes deployment
│ └── kubera/
│ ├── Chart.yaml # Helm chart metadata
│ ├── values-prod.yaml # Production values
│ └── templates/ # Kubernetes templates
├── Dockerfile # Dockerfile for the main app
├── docker-compose.yaml # Local development environment setup
├── pyproject.toml # Python project metadata
├── uv.lock # Dependency lock file
├── README.md # This file
└── .dockerignore # Docker build exclusions
- Python 3.12+
- Docker & Docker Compose
- Kubernetes Cluster (optional for production deployment)
-
Clone the repository:
git clone https://github.com/yourusername/kubernetes-rca-dashboard.git cd kubernetes-rca-dashboard
-
Build and run the application using Docker Compose:
docker-compose up --build
-
Access the dashboard:
- Open your browser and navigate to
http://localhost:4567/dashboard/
.
- Open your browser and navigate to
- Latency: Tracks request/response times for applications.
- Traffic: Measures the volume of requests handled by the system.
- Errors: Captures failure rates for requests or system operations.
- Saturation: Monitors resource utilisation (CPU, memory, etc.).
- Iterative reasoning and acting process for RCA:
- Fetch logs, metrics, or events.
- Formulate hypotheses.
- Validate hypotheses and refine analysis.
- Provide actionable recommendations.
- All data remains within the Kubernetes cluster.
- No sensitive information is transmitted externally.
-
Package the application using Helm:
helm package helm/kubera
-
Deploy to your cluster:
helm install kubera helm/kubera -f helm/kubera/values-prod.yaml
-
Verify the deployment:
kubectl get pods
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-name
. - Commit changes:
git commit -m "Add feature-name"
. - Push to the branch:
git push origin feature-name
. - Open a Pull Request.
This project is licensed under the MIT License. See the LICENSE
file for details.
- Dash for interactive visualisations.
- Kubernetes for cluster orchestration.
- Helm for deployment management.
- Open-source contributors for inspiration and guidance.