Status Page

A modern, real-time status page system built with Django and React. Monitor your services, manage incidents, and keep your users informed with real-time updates.

Features

🚀 Real-time updates via WebSockets
🔒 Multi-tenant architecture with organization isolation
👥 Role-based access control (Admin/Member)
📱 Responsive design for all devices
⚡ Public and private status pages
🔔 Real-time incident management
📊 Service status monitoring
🔄 Automatic reconnection with exponential backoff
🛡️ Rate limiting on public endpoints

Current Deployments:

Application: https://status-page-day.vercel.app
Backend: https://status-page-3xr4s.kinsta.app
(came across this VPS on a Fireship video, and chose to give it a shot. Not the best of choices tbh - does not support WS)

I have created two organizations to test out the multi-tenancy behavior:

1. org-slug: `dance`

Logins:

Admin:
- Email: test1@status.com | Password: test1@status.com
Member:
- Email: member@status.com | Password: member@status.com

2. org-slug: `sing`

Logins:

Admins:
- Email: admin1@org.com | Password: admin1@org.com
- Email: admin2@org.com | Password: admin2@org.com
Members:
- Email: member1@org.com | Password: member1@org.com
- Email: member2@org.com | Password: member2@org.com

Note: New signups create a new org automatically.

Architecture

System Components

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   React     │     │   Django    │     │   Redis     │
│  Frontend   │◄───►│   Backend   │◄───►│   Server    │
└─────────────┘     └─────────────┘     └─────────────┘
                          ▲     ▲               ▲
                          │     │               │
                          │     │____           │
                          ▼          │          │
                    ┌─────────────┐  │   ┌─────────────┐
                    │ PostgreSQL  │  │   │   Daphne    │
                    │  Database   │  │   │  WebSocket  │
                    └─────────────┘  │   └─────────────┘
                       (Supabase)    │        ▲
                                    │         │
                                    │-►┌─────────────┐
                                       │   Celery    │
                                       │   Workers   │
                                       └─────────────┘

Authentication & Authorization

As of now, all the auth&auth behaviour is handled by Clerk

Authentication Flow

User → Clerk → JWT Token → Backend Validation

WebSocket Authentication
- Private connections: JWT token validation
- Public connections: Organization slug validation
Role-Based Access
- Admins: Full CRUD access
- Members: Read-only access
- Public: Limited read access

Fault Tolerance

WebSocket Connections
- Exponential backoff for reconnection attempts
- Maximum retry attempts configuration
- Automatic cleanup of stale connections
Rate Limiting
- Public endpoints: 50 requests/second
- WebSocket connection throttling
- Redis-based rate limiting storage
Error Handling
- Graceful degradation
- Comprehensive error logging
- User-friendly error messages

API Documentation

REST Endpoints

Services
- GET /api/v1/services/ - List services
- POST /api/v1/services/ - Create service (Admin only)
- PATCH /api/v1/services/{id}/ - Update service (Admin only)
- DELETE /api/v1/services/{id}/ - Delete service (Admin only)
Incidents
- GET /api/v1/incidents/ - List incidents
- POST /api/v1/incidents/ - Create incident (Admin only)
- PATCH /api/v1/incidents/{id}/ - Update incident (Admin only)
- DELETE /api/v1/incidents/{id}/ - Delete incident (Admin only)
Public Endpoints
- GET /api/v1/public/{org_slug}/services/ - List public services for an organization
- GET /api/v1/public/{org_slug}/incidents/ - List public incidents for an organization
- GET /api/v1/public/{org_slug}/status/ - Get overall status for an organization
- ws://host/ws/status/public/{org_slug}/ - Public WebSocket endpoint for real-time updates

WebSocket Events

Connection URLs
- Private WebSocket: ws://host/ws/status/org/{org_id}/?token={jwt_token}
- Public WebSocket: ws://host/ws/status/public/{org_slug}/

Message Types

a. Service Status Updates

{
  "type": "service_status_update",
  "data": {
    "id": "service_id",
    "status": "operational",
    "name": "Service Name",
    "description": "Service Description",
    "status_display": "Operational"
  }
}

b. Incident Updates

{
  "type": "incident_update",
  "data": {
    "id": "incident_id",
    "status": "investigating",
    "title": "Incident Title",
    "description": "Incident Description",
    "status_display": "Investigating",
    "service": {
      "id": "service_id",
      "name": "Service Name"
    }
  }
}

Connection Behavior
- Both connections implement exponential backoff for reconnection attempts
- Maximum 5 reconnection attempts before requiring manual refresh
- Reconnection delay starts at 1000ms and doubles with each attempt
- Public connections are rate-limited to prevent abuse
- Private connections require valid JWT token that is validated on connection
Event Flow
- Service/Incident updates are sent in real-time as they occur
- All updates are sent to all connected clients for the organization, over public or private connections, depending on the user auth.

Code Structure

Frontend Structure

Core Modules

Pages (/src/pages/)
- Dashboard.tsx - Main dashboard for authenticated users
- PublicDashboard.tsx - Public status page with real-time updates
- Services.tsx - Service management interface
- Incidents.tsx - Incident management and tracking
- Settings.tsx - Organization and user settings
- Login.tsx - Authentication interface using Clerk.js
Components (/src/components/)
- ServiceList.tsx - Reusable service grid with status indicators
- IncidentModal.tsx - Form for creating/updating incidents
- ServiceModal.tsx - Form for creating/updating services
- Pagination.tsx - Reusable pagination component
- Layout.tsx - Main application layout with navigation
- PrivateRoute.tsx - Route wrapper for authentication
Utils (/src/utils/)
- websocket.ts - WebSocket connection management and real-time updates
- auth.ts - Authentication utilities and hooks
- api.ts - API client and request handlers
- types.ts - TypeScript type definitions
State Management
- React Query for server state
- React Context for authentication state
- Local state for UI components
Styling
- TailwindCSS for utility-first styling
- Material-UI components for complex interfaces
- Custom CSS modules for specific components

Backend Structure

Core Applications

Core (/backend/core/)
- consumers.py - WebSocket consumers for real-time updates
- middleware.py - Custom middleware for auth and org context
- throttling.py - Rate limiting configuration
- permissions.py - Custom permission classes
Services (/backend/services/)
- models.py - Service and status definitions
- views.py - API endpoints for service management
- serializers.py - Data serialization/validation
- tasks.py - Background tasks for service updates
Incidents (/backend/incidents/)
- models.py - Incident and status definitions
- views.py - Incident management endpoints
- serializers.py - Incident data serialization
- tasks.py - Background tasks for incident updates
Organizations (/backend/organizations/)
- models.py - Organization and membership models
- views.py - Organization management endpoints
Users (/backend/users/)
- models.py - User and membership models
- views.py - User management endpoints
Infrastructure
- Redis for caching and real-time messages
- Celery for async task processing
- Daphne for WebSocket handling
- PostgreSQL on Supabase for persistent storage
Testing
- Unit tests for models and utilities
- Integration tests for API endpoints
- WebSocket connection tests
- Async task testing

Detailed System Interactions

1. Authentication Flow

sequenceDiagram
    participant U as User
    participant F as Frontend
    participant C as Clerk.js
    participant B as Backend
    participant R as Redis

    U->>F: Access application
    F->>C: Initialize Clerk
    C-->>F: Load auth state
    
    U->>F: Click login
    F->>C: Redirect to Clerk UI
    C->>U: Show login form
    U->>C: Enter credentials
    C->>C: Validate credentials
    C-->>F: Return JWT token
    
    F->>B: API request with JWT
    B->>B: Validate JWT signature
    B->>R: Cache user session
    B-->>F: Return response
    
    Note over F,B: All subsequent requests<br/>include JWT token

Process Description:

User accesses the application
Frontend initializes Clerk.js for authentication
User clicks login and is presented with Clerk's login UI
After successful authentication:
- Clerk.js provides a JWT token
- Token is stored in browser
- All API requests include this token
Backend validates the token for each request:
- Verifies JWT signature using Clerk's public key
- Checks token expiration and claims
- Caches user session in Redis for performance

2. Real-time Updates Flow

sequenceDiagram
    participant C as Client
    participant WS as WebSocket
    participant B as Backend
    participant CE as Celery
    participant R as Redis
    participant CH as Channel Layer

    C->>WS: Connect with token/org_slug
    WS->>B: Authenticate connection
    B->>R: Validate session
    B-->>WS: Accept connection
    
    Note over C,CH: Event triggered (e.g., service update)
    
    B->>CE: Dispatch notification task
    CE->>CH: Send to channel layer
    CH->>R: Store message
    CH->>WS: Forward to relevant connections
    WS->>C: Send update
    
    Note over C,WS: Connection lost
    C->>WS: Reconnection attempt 1
    WS-->>C: Connection failed
    Note over C: Wait (exponential backoff)
    C->>WS: Reconnection attempt 2

Process Description:

WebSocket Connection:
- Client initiates connection with authentication
- Backend validates credentials
- Connection added to appropriate channels
Update Flow:
- Backend receives update (e.g., service status change)
- Creates Celery task for async processing
- Task publishes to Redis channel layer
- Channel layer broadcasts to relevant WebSocket connections
Fault Tolerance:
- Connection loss triggers reconnection
- Exponential backoff between attempts
- Maximum retry limit enforced

3. Multi-tenancy and Data Isolation

┌────────────────────────────────────────┐
│              Organization A            │
├─────────────┬──────────────┬───────────┤
│  Services   │  Incidents   │   Users   │
└─────────────┴──────────────┴───────────┘

┌────────────────────────────────────────┐
│              Organization B            │
├─────────────┬──────────────┬───────────┤
│  Services   │  Incidents   │   Users   │
└─────────────┴──────────────┴───────────┘

Implementation Details:

Database Level:
- Every model includes org_id foreign key
- Database constraints enforce isolation
- Indexes optimized for org-scoped queries
Application Level:
- Middleware injects org context
- QuerySets filtered by org
- Permissions checked against org membership
API Level:
- JWT contains org claims
- Rate limits per org
- Separate WebSocket channels per org

As of now, organization management is handled by Clerk

Service Status Management

Services in the system can exist in one of five states:

Operational: The default state indicating normal service operation
Degraded Performance: Service is running but experiencing performance issues
Partial Outage: Service is partially unavailable
Major Outage: Service is completely unavailable
Under Maintenance: Service is undergoing planned maintenance. This is not implemented yet.

Status transitions can occur in two ways:

Manual Updates: Organization administrators can directly update a service's status via the UI.
Incident-Driven Updates: Service status changes automatically when incidents are created or resolved

The system enforces a strict state transition policy for incident creation:

From Operational: Can transition to Degraded, Partial, or Major
From Degraded: Can transition to Partial or Major
From Partial: Can only transition to Major
From Major and Maintenance: No further incident-driven transitions allowed

stateDiagram-v2
    [*] --> Operational
    Operational --> Degraded: Incident/Manual
    Operational --> Partial: Incident/Manual
    Operational --> Major: Incident/Manual
    Operational --> Maintenance: Manual only
    Degraded --> Partial: Incident/Manual
    Degraded --> Major: Incident/Manual
    Partial --> Major: Incident/Manual
    Major --> Operational: Incident Resolution
    Maintenance --> Operational: Manual only

Incident Management Workflow

Incidents follow a defined lifecycle with automatic service status management:

Creation:
- Incidents are bound to a service.
- Captures initial service state (from_state)
- Updates service to new state (to_state)
- Starts in "Investigating" status
- Triggers real-time notifications
Status Progression:
- Investigating → Identified → Monitoring → Resolved
- Cannot reopen resolved incidents
- Must create new incident for recurring issues
Resolution:
- Sets resolved_at timestamp
- Recalculates service status:
  - If other active incidents exist: Uses most recent incident's state
  - If no active incidents: Returns to "Operational"
- Triggers notifications via Celery tasks
Deletion:
- Supports soft deletion with audit trail
- Triggers notifications for all subscribers
- Updates service status if needed

sequenceDiagram
    participant Admin
    participant Service
    participant Incident
    participant Notifications

    Admin->>Incident: Create Incident
    activate Incident
    Incident->>Service: Lock & Update Status
    Incident->>Notifications: Notify Status Change
    deactivate Incident

    Admin->>Incident: Update Status
    activate Incident
    Incident->>Notifications: Notify Update
    deactivate Incident

    Admin->>Incident: Resolve Incident
    activate Incident
    Incident->>Service: Lock & Recalculate Status
    Incident->>Notifications: Notify Resolution
    deactivate Incident

Local Development Setup

Prerequisites

Python 3.9+
Node.js 16+
Redis Server
PostgreSQL 14+
Poetry (Python dependency management)
pnpm (Node.js package manager)

Backend Setup

Clone the Repository

git clone <repository-url>
cd status-page

Set Up Python Environment

cd backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure Environment Variables

cp .env.example .env
# Edit .env with your configuration:
# - Database URL
# - Redis URL
# - Clerk API keys
# - Other settings

Set Up Database

python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser (if needed)

Start Development Server

chmod +x ./run_daphne.sh
./run_daphne.sh

Redis Setup

Install Redis (MacOS)
```
brew install redis
```
Start Redis Server
```
redis-server
```

Celery Workers

Start Celery Worker

cd backend
celery -A core worker -l INFO

Start Celery Beat (for scheduled tasks - not needed for now)
```
celery -A core beat -l INFO
```

Frontend Setup

Install Dependencies
```
cd frontend
npm install
```

Configure Environment

cp .env.example .env
# Edit .env with your configuration:
# - API URL
# - WebSocket URL
# - Clerk publishable key

Start Development Server
```
npm run dev
```

Running the Full Stack

Start services in this order:
- Redis server
- PostgreSQL database
- Django backend
- Celery worker
- Celery beat - not needed for now
- Frontend development server
Access the application:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- Admin interface: http://localhost:8000/admin

Code Quality

# Backend
black .

# Frontend
npm run lint
npm run format

Current System Limitations and Potential Areas for Improvement

1. Authentication & Organization Management

Tight coupling with Clerk for authentication
Organization management is dependent on Clerk's organization features
Migration to a different auth provider would require significant refactoring

2. Real-time Updates

No fallback mechanism if WebSocket connection fails
No message queue for handling WebSocket message backlog

3. Incident Management

No support for scheduled maintenance windows as of now
One incident is constrained to only one service in the current design

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json

License

kushrike/status-page

Folders and files

Latest commit

History

Repository files navigation