A modern, real-time status page system built with Django and React. Monitor your services, manage incidents, and keep your users informed with real-time updates.
- 🚀 Real-time updates via WebSockets
- 🔒 Multi-tenant architecture with organization isolation
- 👥 Role-based access control (Admin/Member)
- 📱 Responsive design for all devices
- ⚡ Public and private status pages
- 🔔 Real-time incident management
- 📊 Service status monitoring
- 🔄 Automatic reconnection with exponential backoff
- 🛡️ Rate limiting on public endpoints
Application: https://status-page-day.vercel.app
Backend: https://status-page-3xr4s.kinsta.app
(came across this VPS on a Fireship video, and chose to give it a shot. Not the best of choices tbh - does not support WS)
I have created two organizations to test out the multi-tenancy behavior:
Logins:
- Admin:
- Email:
test1@status.com
| Password:test1@status.com
- Email:
- Member:
- Email:
member@status.com
| Password:member@status.com
- Email:
Logins:
- Admins:
- Email:
admin1@org.com
| Password:admin1@org.com
- Email:
admin2@org.com
| Password:admin2@org.com
- Email:
- Members:
- Email:
member1@org.com
| Password:member1@org.com
- Email:
member2@org.com
| Password:member2@org.com
- Email:
Note: New signups create a new org automatically.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ React │ │ Django │ │ Redis │
│ Frontend │◄───►│ Backend │◄───►│ Server │
└─────────────┘ └─────────────┘ └─────────────┘
▲ ▲ ▲
│ │ │
│ │____ │
▼ │ │
┌─────────────┐ │ ┌─────────────┐
│ PostgreSQL │ │ │ Daphne │
│ Database │ │ │ WebSocket │
└─────────────┘ │ └─────────────┘
(Supabase) │ ▲
│ │
│-►┌─────────────┐
│ Celery │
│ Workers │
└─────────────┘
-
Authentication Flow
User → Clerk → JWT Token → Backend Validation
-
WebSocket Authentication
- Private connections: JWT token validation
- Public connections: Organization slug validation
-
Role-Based Access
- Admins: Full CRUD access
- Members: Read-only access
- Public: Limited read access
-
WebSocket Connections
- Exponential backoff for reconnection attempts
- Maximum retry attempts configuration
- Automatic cleanup of stale connections
-
Rate Limiting
- Public endpoints: 50 requests/second
- WebSocket connection throttling
- Redis-based rate limiting storage
-
Error Handling
- Graceful degradation
- Comprehensive error logging
- User-friendly error messages
-
Services
GET /api/v1/services/
- List servicesPOST /api/v1/services/
- Create service (Admin only)PATCH /api/v1/services/{id}/
- Update service (Admin only)DELETE /api/v1/services/{id}/
- Delete service (Admin only)
-
Incidents
GET /api/v1/incidents/
- List incidentsPOST /api/v1/incidents/
- Create incident (Admin only)PATCH /api/v1/incidents/{id}/
- Update incident (Admin only)DELETE /api/v1/incidents/{id}/
- Delete incident (Admin only)
-
Public Endpoints
GET /api/v1/public/{org_slug}/services/
- List public services for an organizationGET /api/v1/public/{org_slug}/incidents/
- List public incidents for an organizationGET /api/v1/public/{org_slug}/status/
- Get overall status for an organizationws://host/ws/status/public/{org_slug}/
- Public WebSocket endpoint for real-time updates
-
Connection URLs
- Private WebSocket:
ws://host/ws/status/org/{org_id}/?token={jwt_token}
- Public WebSocket:
ws://host/ws/status/public/{org_slug}/
- Private WebSocket:
-
Message Types
a. Service Status Updates
{ "type": "service_status_update", "data": { "id": "service_id", "status": "operational", "name": "Service Name", "description": "Service Description", "status_display": "Operational" } }
b. Incident Updates
{ "type": "incident_update", "data": { "id": "incident_id", "status": "investigating", "title": "Incident Title", "description": "Incident Description", "status_display": "Investigating", "service": { "id": "service_id", "name": "Service Name" } } }
-
Connection Behavior
- Both connections implement exponential backoff for reconnection attempts
- Maximum 5 reconnection attempts before requiring manual refresh
- Reconnection delay starts at 1000ms and doubles with each attempt
- Public connections are rate-limited to prevent abuse
- Private connections require valid JWT token that is validated on connection
-
Event Flow
- Service/Incident updates are sent in real-time as they occur
- All updates are sent to all connected clients for the organization, over public or private connections, depending on the user auth.
Frontend Structure
-
Pages (
/src/pages/
)Dashboard.tsx
- Main dashboard for authenticated usersPublicDashboard.tsx
- Public status page with real-time updatesServices.tsx
- Service management interfaceIncidents.tsx
- Incident management and trackingSettings.tsx
- Organization and user settingsLogin.tsx
- Authentication interface using Clerk.js
-
Components (
/src/components/
)ServiceList.tsx
- Reusable service grid with status indicatorsIncidentModal.tsx
- Form for creating/updating incidentsServiceModal.tsx
- Form for creating/updating servicesPagination.tsx
- Reusable pagination componentLayout.tsx
- Main application layout with navigationPrivateRoute.tsx
- Route wrapper for authentication
-
Utils (
/src/utils/
)websocket.ts
- WebSocket connection management and real-time updatesauth.ts
- Authentication utilities and hooksapi.ts
- API client and request handlerstypes.ts
- TypeScript type definitions
-
State Management
- React Query for server state
- React Context for authentication state
- Local state for UI components
-
Styling
- TailwindCSS for utility-first styling
- Material-UI components for complex interfaces
- Custom CSS modules for specific components
Backend Structure
-
Core (
/backend/core/
)consumers.py
- WebSocket consumers for real-time updatesmiddleware.py
- Custom middleware for auth and org contextthrottling.py
- Rate limiting configurationpermissions.py
- Custom permission classes
-
Services (
/backend/services/
)models.py
- Service and status definitionsviews.py
- API endpoints for service managementserializers.py
- Data serialization/validationtasks.py
- Background tasks for service updates
-
Incidents (
/backend/incidents/
)models.py
- Incident and status definitionsviews.py
- Incident management endpointsserializers.py
- Incident data serializationtasks.py
- Background tasks for incident updates
-
Organizations (
/backend/organizations/
)models.py
- Organization and membership modelsviews.py
- Organization management endpoints
-
Users (
/backend/users/
)models.py
- User and membership modelsviews.py
- User management endpoints
-
Infrastructure
- Redis for caching and real-time messages
- Celery for async task processing
- Daphne for WebSocket handling
- PostgreSQL on Supabase for persistent storage
-
Testing
- Unit tests for models and utilities
- Integration tests for API endpoints
- WebSocket connection tests
- Async task testing
sequenceDiagram
participant U as User
participant F as Frontend
participant C as Clerk.js
participant B as Backend
participant R as Redis
U->>F: Access application
F->>C: Initialize Clerk
C-->>F: Load auth state
U->>F: Click login
F->>C: Redirect to Clerk UI
C->>U: Show login form
U->>C: Enter credentials
C->>C: Validate credentials
C-->>F: Return JWT token
F->>B: API request with JWT
B->>B: Validate JWT signature
B->>R: Cache user session
B-->>F: Return response
Note over F,B: All subsequent requests<br/>include JWT token
Process Description:
- User accesses the application
- Frontend initializes Clerk.js for authentication
- User clicks login and is presented with Clerk's login UI
- After successful authentication:
- Clerk.js provides a JWT token
- Token is stored in browser
- All API requests include this token
- Backend validates the token for each request:
- Verifies JWT signature using Clerk's public key
- Checks token expiration and claims
- Caches user session in Redis for performance
sequenceDiagram
participant C as Client
participant WS as WebSocket
participant B as Backend
participant CE as Celery
participant R as Redis
participant CH as Channel Layer
C->>WS: Connect with token/org_slug
WS->>B: Authenticate connection
B->>R: Validate session
B-->>WS: Accept connection
Note over C,CH: Event triggered (e.g., service update)
B->>CE: Dispatch notification task
CE->>CH: Send to channel layer
CH->>R: Store message
CH->>WS: Forward to relevant connections
WS->>C: Send update
Note over C,WS: Connection lost
C->>WS: Reconnection attempt 1
WS-->>C: Connection failed
Note over C: Wait (exponential backoff)
C->>WS: Reconnection attempt 2
Process Description:
-
WebSocket Connection:
- Client initiates connection with authentication
- Backend validates credentials
- Connection added to appropriate channels
-
Update Flow:
- Backend receives update (e.g., service status change)
- Creates Celery task for async processing
- Task publishes to Redis channel layer
- Channel layer broadcasts to relevant WebSocket connections
-
Fault Tolerance:
- Connection loss triggers reconnection
- Exponential backoff between attempts
- Maximum retry limit enforced
┌────────────────────────────────────────┐
│ Organization A │
├─────────────┬──────────────┬───────────┤
│ Services │ Incidents │ Users │
└─────────────┴──────────────┴───────────┘
┌────────────────────────────────────────┐
│ Organization B │
├─────────────┬──────────────┬───────────┤
│ Services │ Incidents │ Users │
└─────────────┴──────────────┴───────────┘
Implementation Details:
-
Database Level:
- Every model includes
org_id
foreign key - Database constraints enforce isolation
- Indexes optimized for org-scoped queries
- Every model includes
-
Application Level:
- Middleware injects org context
- QuerySets filtered by org
- Permissions checked against org membership
-
API Level:
- JWT contains org claims
- Rate limits per org
- Separate WebSocket channels per org
Services in the system can exist in one of five states:
- Operational: The default state indicating normal service operation
- Degraded Performance: Service is running but experiencing performance issues
- Partial Outage: Service is partially unavailable
- Major Outage: Service is completely unavailable
- Under Maintenance: Service is undergoing planned maintenance. This is not implemented yet.
Status transitions can occur in two ways:
- Manual Updates: Organization administrators can directly update a service's status via the UI.
- Incident-Driven Updates: Service status changes automatically when incidents are created or resolved
The system enforces a strict state transition policy for incident creation:
- From Operational: Can transition to Degraded, Partial, or Major
- From Degraded: Can transition to Partial or Major
- From Partial: Can only transition to Major
- From Major and Maintenance: No further incident-driven transitions allowed
stateDiagram-v2
[*] --> Operational
Operational --> Degraded: Incident/Manual
Operational --> Partial: Incident/Manual
Operational --> Major: Incident/Manual
Operational --> Maintenance: Manual only
Degraded --> Partial: Incident/Manual
Degraded --> Major: Incident/Manual
Partial --> Major: Incident/Manual
Major --> Operational: Incident Resolution
Maintenance --> Operational: Manual only
Incidents follow a defined lifecycle with automatic service status management:
-
Creation:
- Incidents are bound to a service.
- Captures initial service state (
from_state
) - Updates service to new state (
to_state
) - Starts in "Investigating" status
- Triggers real-time notifications
-
Status Progression:
- Investigating → Identified → Monitoring → Resolved
- Cannot reopen resolved incidents
- Must create new incident for recurring issues
-
Resolution:
- Sets
resolved_at
timestamp - Recalculates service status:
- If other active incidents exist: Uses most recent incident's state
- If no active incidents: Returns to "Operational"
- Triggers notifications via Celery tasks
- Sets
-
Deletion:
- Supports soft deletion with audit trail
- Triggers notifications for all subscribers
- Updates service status if needed
sequenceDiagram
participant Admin
participant Service
participant Incident
participant Notifications
Admin->>Incident: Create Incident
activate Incident
Incident->>Service: Lock & Update Status
Incident->>Notifications: Notify Status Change
deactivate Incident
Admin->>Incident: Update Status
activate Incident
Incident->>Notifications: Notify Update
deactivate Incident
Admin->>Incident: Resolve Incident
activate Incident
Incident->>Service: Lock & Recalculate Status
Incident->>Notifications: Notify Resolution
deactivate Incident
- Python 3.9+
- Node.js 16+
- Redis Server
- PostgreSQL 14+
- Poetry (Python dependency management)
- pnpm (Node.js package manager)
-
Clone the Repository
git clone <repository-url> cd status-page
-
Set Up Python Environment
cd backend python3 -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Configure Environment Variables
cp .env.example .env # Edit .env with your configuration: # - Database URL # - Redis URL # - Clerk API keys # - Other settings
-
Set Up Database
python manage.py makemigrations python manage.py migrate python manage.py createsuperuser (if needed)
-
Start Development Server
chmod +x ./run_daphne.sh ./run_daphne.sh
-
Install Redis (MacOS)
brew install redis
-
Start Redis Server
redis-server
-
Start Celery Worker
cd backend celery -A core worker -l INFO
-
Start Celery Beat (for scheduled tasks - not needed for now)
celery -A core beat -l INFO
-
Install Dependencies
cd frontend npm install
-
Configure Environment
cp .env.example .env # Edit .env with your configuration: # - API URL # - WebSocket URL # - Clerk publishable key
-
Start Development Server
npm run dev
-
Start services in this order:
- Redis server
- PostgreSQL database
- Django backend
- Celery worker
- Celery beat - not needed for now
- Frontend development server
-
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Admin interface: http://localhost:8000/admin
-
Code Quality
# Backend black . # Frontend npm run lint npm run format
- Tight coupling with Clerk for authentication
- Organization management is dependent on Clerk's organization features
- Migration to a different auth provider would require significant refactoring
- No fallback mechanism if WebSocket connection fails
- No message queue for handling WebSocket message backlog
- No support for scheduled maintenance windows as of now
- One incident is constrained to only one service in the current design