A highly scalable Django-based application allowing users to view and rate posts. It includes advanced features such as user authentication, fraud detection, rate limiting, and asynchronous task processing using Celery. The project is Dockerized for easy deployment and comes with auto-generated API documentation using Swagger and ReDoc.
- Project Overview
- Features
- System Architecture
- System Design Solution
- Key Components
- Use-Cases
- Mechanism of Fraud Detection
- Setup and Installation
- Docker Usage
- Environment Variables
- Makefile Commands
- Testing
- API Documentation
- Performance and Scalability
- Future Improvements
- Contact
The Post Rating System allows users to interact with posts by rating them. It implements fraud detection mechanisms to prevent rating manipulation and is designed to handle high traffic with ease. The project is fully containerized using Docker and supports background tasks with Celery.
- User authentication (JWT)
- Post rating with average score and total count
- Fraud detection for rating manipulation
- Asynchronous processing with Celery
- Redis for caching and rate limiting
- API documentation with Swagger UI and ReDoc
- Dockerized for deployment
The system adopts a microservices architecture, utilizing Docker for containerization and orchestration. Below are the:
flowchart TD
classDef client fill: #f9d71c, stroke: #333, stroke-width: 2px
classDef gateway fill: #f9813a, stroke: #333, stroke-width: 2px
classDef app fill: #38b2ac, stroke: #333, stroke-width: 2px
classDef worker fill: #805ad5, stroke: #333, stroke-width: 2px
classDef storage fill: #63b3ed, stroke: #333, stroke-width: 2px
classDef external fill: #9ae6b4, stroke: #333, stroke-width: 2px
Client[Client]:::client
API[API Gateway]:::gateway
subgraph Core["Core (Django)"]
direction TB
AppServer["Application Server"]:::app
end
subgraph Services["Supporting Services"]
direction TB
RedisCache["Redis (Cache & Rate Limiting)"]:::storage
PostgreSQL["PostgreSQL (Database)"]:::storage
RedisQueue["Redis (Message Queue)"]:::storage
CeleryWorker["Celery Worker"]:::worker
end
Client --> API
API --> AppServer
AppServer --> RedisCache
AppServer --> PostgreSQL
AppServer --> RedisQueue
RedisQueue --> CeleryWorker
Some components are marked with (*) which are not implemented in the current project but are proposed for future
This system design provides a scalable and resilient architecture for the Post Rating System, balancing performance, consistency, and fault tolerance. Eventual consistency is a consistency model used in distributed computing to achieve high availability and partition tolerance. It ensures that, given enough time, all replicas of a data item will converge to the same value. This model allows the system to remain available and partition-tolerant, even if some nodes are temporarily out of sync.
flowchart TD
classDef lb fill: #f6e05e, stroke: #333, stroke-width: 2px
classDef gateway fill: #f9813a, stroke: #333, stroke-width: 2px
classDef app fill: #38b2ac, stroke: #333, stroke-width: 2px
classDef worker fill: #805ad5, stroke: #333, stroke-width: 2px
classDef cache fill: #63b3ed, stroke: #333, stroke-width: 2px
classDef db fill: #4299e1, stroke: #333, stroke-width: 2px
LoadBalancer["Load Balancer"]:::lb
API["API Gateway"]:::gateway
AppServer["Application Servers (Django)"]:::app
RedisCache["Redis (Caching)"]:::cache
PostgreSQL["PostgreSQL (Database)"]:::db
RedisQueue["Redis (Message Queue)"]:::cache
CeleryWorker["Celery Workers"]:::worker
LoadBalancer --> API
API --> AppServer
AppServer --> RedisCache
AppServer --> PostgreSQL
AppServer --> RedisQueue
RedisQueue --> CeleryWorker
CeleryWorker --> PostgreSQL
-
*Load Balancer
- Distributes incoming traffic across multiple API Gateway instances
- Implements health checks and SSL termination
-
*API Gateway
- Handles authentication and rate limiting (e.g., using JWT tokens)
- Routes requests to appropriate microservices (e.g., posts, users)
- Processes API requests (e.g., post rating,list of posts, user authentication)
- Implements business logic and fraud detection mechanisms
- Stores frequently accessed data (e.g., post statistics in retrieval, rates in submission)
- Implements rate limiting and fraud detection
- Primary data store for posts, users, ratings, and statistics
- *Implements sharding for horizontal scaling (e.g., weighted sharding based on post popularity)
- Facilitates asynchronous communication between components (e.g., Apply pending rates)
- Ensures reliable message delivery for background tasks (e.g., updating post stats, periodic tasks)
- Process background tasks (e.g., updating post statistics)
- Handle computationally intensive operations asynchronously (e.g., fraud detection, rate processing)
The system implements an eventual consistency model to ensure high availability and partition tolerance. Here's how data flows through the system:
-
User submits a rating:
- API Gateway validates the request and applies rate limiting (e.g., each user can only send 100 ratings per hour)
- Application server receives the rating and performs initial validation
- Rating is temporarily stored in Redis cache (pending rates)
- Acknowledgment is sent back to the user
-
Background processing:
- Celery worker periodically processes pending ratings from Redis
- Updates are applied to the PostgreSQL database in batches
- Post statistics are recalculated and cached in Redis
-
Read operations:
- Frequently accessed data (e.g., post statistics) is served from Redis cache
- Less frequent queries are served directly from the PostgreSQL database
flowchart TD
classDef module fill: #38b2ac, stroke: #333, stroke-width: 2px
classDef task fill: #805ad5, stroke: #333, stroke-width: 2px
classDef dbstyle fill: #63b3ed, stroke: #333, stroke-width: 2px
subgraph CoreModules["Core Application Modules"]
direction TB
AuthModule["Authentication"]:::module
PostModule["Post Management"]:::module
RateModule["Rating System"]:::module
FraudModule["Fraud Detection"]:::module
end
subgraph ExternalServices["External Services"]
direction TB
RedisCache[("Redis (Cache)")]:::dbstyle
PostgreSQL[("PostgreSQL (DB)")]:::dbstyle
RedisQueue[("Redis (Message Queue)")]:::dbstyle
end
subgraph CelerySystem["Celery System"]
CeleryWorker["Celery Workers"]:::task
UpdateStatsTask["Update Post Stats"]:::task
FraudDetectionTask["Fraud Detection Task"]:::task
BulkProcessingTask["Bulk Rate Processing"]:::task
end
%% Interactions between modules and external services
AuthModule --> RedisCache
AuthModule --> PostgreSQL
PostModule --> RedisCache
PostModule --> PostgreSQL
RateModule --> PostgreSQL
RateModule --> RedisQueue
RateModule --> CeleryWorker
FraudModule --> RedisQueue
FraudModule --> CeleryWorker
%% Celery Tasks and their effects
CeleryWorker --> UpdateStatsTask
CeleryWorker --> FraudDetectionTask
CeleryWorker --> BulkProcessingTask
UpdateStatsTask --> RedisCache
UpdateStatsTask --> PostgreSQL
FraudDetectionTask --> RedisQueue
BulkProcessingTask --> PostgreSQL
- *Horizontal Scaling
- Application servers can be scaled horizontally behind the load balancer
- Database read replicas can be added to handle increased read traffic
- Caching Strategy
- *Use Read-Through and Write-Through caching for improved performance
- Use cache-aside pattern for other frequently accessed data
- Read Operation:
- Check if the data is in the cache.
- If the data is found (cache hit), return it.
- If the data is not found (cache miss), load it from the database, store it in the cache, and then return it.
- Write Operation:
- Write the data to the database.
- Invalidate or update the cache to ensure consistency.
- Read Operation:
- *Use Redis pipeline for (atomic and) batch (mSet) operations to reduce round-trip latency
- *Database Sharding
- Implement horizontal sharding based on post ID or author popularity or user ID
- Use consistent hashing for efficient data distribution
- Asynchronous Processing
- Offload computationally intensive tasks to Celery workers (e.g., fraud detection, post statistics)
- Use message queues to decouple components and ensure reliable processing
The system prioritizes Availability and Partition Tolerance (AP) from the CAP theorem, sacrificing strong consistency for better performance and scalability. This choice is suitable for a rating system where occasional inconsistencies are tolerable.
- Ratings may not be immediately reflected in post statistics (e.g., pending rates)
- Background processes ensure data converges to a consistent state over time (e.g., updating post stats, periodic tasks)
- Implement versioning for ratings to detect and resolve conflicts
- Use last-write-wins (LWW) strategy for simplicity, or implement custom merge logic if needed( if two rating conflict with each other the one with the latest timestamp will be considered)¬
- Data Replication
- Use multi-region database replication for disaster recovery
- Implement read replicas for improved read performance and fault tolerance
- Monitoring and Alerting
- Implement comprehensive monitoring using tools.(e.g., Prometheus, Grafana)
- Set up alerts for critical system metrics and error rates
- Authentication and Authorization
- Implement JWT-based authentication
- *Use role-based access control (RBAC) for fine-grained permissions
- Rate Limiting and Throttling
- Implement IP-based and user-based rate limiting at the API Gateway level (e.g., 1000 requests per hour)
- Implement endpoint-specific rate limits to prevent abuse (e.g.,
/rate
endpoint) - Use Redis to store and manage rate limiting counters
- Implement a recommendation system based on user ratings and behavior
- Introduce real-time updates using WebSockets, GRPC for live rating updates
- Implement A/B testing framework for experimenting with different rating algorithms
- Introduce machine learning-based fraud detection for more sophisticated anomaly detection
- Implement a content delivery network (CDN) for serving static assets and improving global performance
To handle millions of ratings per post without performance issues:
- Store
average_rating
andtotal_ratings
in thePostStat
model - Update these values asynchronously using Celery tasks
- Cache post statistics in Redis
- Implement a write-through cache for immediate updates
- Periodically sync cache with the database
- Accumulate new ratings in Redis (pending rates)
- Process and apply ratings in batches using Celery tasks
- Create appropriate indexes on the
Rate
model (user, post)
In this project used Flag and Action based fraud detection mechanism.
- prevent, update, analyse, remove and detect suspicious activities.
To prevent sudden, artificial changes in post ratings:
-
Time-based Weighted Average
- Implement a weighted average system that gives more weight to established ratings
- Gradually incorporate new ratings over time
-
Rate Limiting
- Implement user-based and IP-based rate limiting to prevent rapid-fire ratings
-
Anomaly Detection
- Monitor for unusual patterns in rating behavior (e.g., sudden spikes in low ratings)
- Flag suspicious activities for review
-
Delayed Impact
- Introduce a delay before new ratings significantly affect the overall score
- Use a sliding window approach to smooth out short-term fluctuations. this approach can help to stabilize the average rating by considering a window of recent ratings rather than just the most recent rating. This helps to prevent sudden changes in the average rating due to a few anomalous ratings.
The Post
model represents user-generated content that can be rated.
class Post(BaseModel):
title = models.CharField(_("title"), max_length=255)
content = models.TextField(_("content"))
- Attributes:
title
: The title of the post.content
: The body of the post.- Inherits from
BaseModel
, which tracks the creation and modification timestamps.
The Rate
model handles user ratings for each post. Each rating is linked to a user and a post.
class Rate(BaseModel):
post = models.ForeignKey("Post", on_delete=models.CASCADE, related_name='rates')
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='rates')
score = models.IntegerField(
null=True,
validators=[
MinValueValidator(RateScoreEnum.ZERO_STARS),
MaxValueValidator(RateScoreEnum.FIVE_STARS)
]
)
is_suspected = models.BooleanField(default=False)
The PostStat
model maintains aggregated statistics for each post, including the average rating and the total number of
ratings.
class PostStat(BaseModel):
post = models.OneToOneField(Post, on_delete=models.CASCADE, related_name='stat')
average_rates = models.DecimalField(max_digits=3, decimal_places=2, default=0)
total_rates = models.PositiveIntegerField(default=0)
- Post Statistics Caching: Post stats (e.g., average rating, total rates) are cached in Redis to reduce load on the database.
- Pending Rates: New ratings are stored in Redis before being processed in bulk to optimize database transactions.
BULK_THRESHOLD = env.int("BULK_THRESHOLD", 50)
def update_or_create_rate(*, user_id: int, post_id: int, score: int, is_suspected=False):
key = RedisKeyTemplates.pending_rates_key()
pending_rates = cache.get(key, [])
pending_rates.append({'user_id': user_id, 'post_id': post_id, 'score': score, 'is_suspected': is_suspected})
"""
There are some situation that system will shutdown so we should enable redis to store in file system
Update: persist=True -> AOF (Append Only File) or RDB (Redis Database Backup)
"""
cache.set(key, pending_rates, timeout=settings.CACHE_TIMEOUT)
if len(pending_rates) >= BULK_THRESHOLD:
""""bulk_update_or_create_rates: Update or create multiple rates in bulk to minimize database load."""
bulk_update_or_create_rates(pending_rates)
cache.delete(key)
...
Celery is used for handling background tasks, such as updating post statistics and applying pending rates.
- Apply Pending Rates: Process new ratings in bulk to minimize database load.
@shared_task
def apply_pending_rates():
key = RedisKeyTemplates.pending_rates_key()
if pending_rates := cache.get(key, []):
from posts.services.commands.rate import bulk_update_or_create_rates
bulk_update_or_create_rates(rate_data=pending_rates)
cache.delete(key)
- Update Post Statistics: Periodically update post statistics based on new ratings.
@shared_task
def update_post_stats_periodical():
"""
Update the average rates and total rates of the post periodically.
note: it will consider number of rates that created or updated after updated_at or created_at in post_stat
"""
try:
for post in Post.objects.all():
post_stat, created = PostStat.objects.get_or_create(post_id=post.id)
last_update = post_stat.updated_at if not created else post.created_at
updated_rates = get_updated_rates(post, last_update)
if updated_rates.exists():
average_rates = calculate_average_rates(post, settings.SUSPECTED_RATES_THRESHOLD)
total_rates = Rate.objects.filter(post_id=post.id).count()
update_post_stat(post, average_rates, total_rates)
else:
logger.info(LogMessages.no_new_rate(post_id=post.id))
except Exception as e:
logger.error(LogMessages.error_update_post_stats(error=e))
- Bulk Update or Create Post Stats: Update existing post statistics and create new ones in bulk. which in, scores come from bulk_update_or_create_rates.
@shared_task
def bulk_update_or_create_post_stats(*, scores: dict):
post_id_list = scores.keys()
post_stats = PostStat.objects.filter(post_id__in=post_id_list)
"""update existing post stats"""
for post_stat in post_stats:
new_score = scores[post_stat.post_id]["score"]
total_score = post_stat.average_rates * post_stat.total_rates
post_stat.average_rates = (total_score + new_score) / post_stat.total_rates
if post_stats:
PostStat.objects.bulk_update(post_stats, ['average_rates', 'total_rates'])
"""create new post stats"""
missing_post_ids = set(post_id_list).difference(post_stats.values_list('post_id', flat=True))
new_post_stats = [
PostStat(
post_id=post_id,
average_rates=scores[post_id]["score"] / scores[post_id]["count"],
total_rates=scores[post_id]["count"]
) for post_id in missing_post_ids
]
created_obj = PostStat.objects.bulk_create(new_post_stats)
"""update cache"""
update_cache_post_stats(post_stats=[*post_stats, *created_obj])
The system employs a multi-tiered approach to prevent fraudulent rating activities. This includes:
- Rate Limiting: Restricting the number of ratings a user can submit within a time period. (e.g., 100 ratings per hour, endpoint-specific rate limits)
class UserHourlyPostRateThrottle(UserRateThrottle):
scope = 'user_hourly_post_rate'
rate = f'{settings.MAX_RATES_PER_HOUR}/hour'
...
class PostRateThrottle(BaseThrottle):
rate = env.int("POST_RATE_LIMIT", 1000)
cache_timeout = env.int("CACHE_TIMEOUT", 3600)
cache_key = 'post_rate_limit'
...
- Fraud Detection: Detecting abnormal activity, such as a sudden spike in ratings, and marking them as suspicious. In this Implementation used a sliding window approach to smooth out short-term fluctuations. This approach can help to stabilize the average rating by considering a window of recent ratings rather than just the most recent rating. This helps to prevent sudden changes in the average rating due to a few anomalous ratings.
class FraudDetection:
...
@classmethod
def detect_suspicious_activity(cls, post_id: int) -> bool:
fraud_detect_key = RedisKeyTemplates.format_fraud_detect_key(post_id)
recent_actions = redis_client.lrange(fraud_detect_key, 0, -1)
if len(recent_actions) >= cls.suspicious_threshold:
first_action_time = float(recent_actions[0])
current_time = time.time()
if current_time - first_action_time < cls.time_threshold:
return True
redis_client.lpush(fraud_detect_key, time.time())
redis_client.ltrim(fraud_detect_key, 0, cls.last_actions_to_track - 1)
redis_client.expire(fraud_detect_key, cls.time_threshold)
return False
...
- Viewing Posts with Statistics: Users can view a list of posts, along with statistics like average ratings and total ratings.
- Submitting a Rating: Authenticated users can submit ratings for posts, with the system updating statistics accordingly.
- Fraud Prevention: The system detects and prevents fraudulent activity when abnormal rating patterns are identified.
- Post Statistics Update: Post statistics are updated periodically by background tasks.
Fraud detection is implemented to identify and prevent rating manipulation. The system uses Redis to track real-time activity and flags suspicious patterns.
- Rate Limiting: Limits the number of actions a user can perform within a given period.
- Suspicious Activity Detection: Detects suspicious behavior based on the frequency of ratings in a short time. ( e.g., sudden spikes in ratings for a post in a short period)
- Docker
- Docker Compose
- Python 3.8+
- Poetry (for dependency management)
-
Clone the repository:
git clone https://github.com/MrRezoo/post-rating-system.git cd post-rating-system
-
Install dependencies using Poetry:
poetry install
-
Set up the environment variables:
cp config.example.env config.env
-
Start the development server:
make runserver
-
Prepare the Docker environment:
make prepare-compose
-
Start the Docker containers:
make up
-
Rebuild and start the containers:
make up-force
-
Stop the containers:
make down
Ensure the following environment variables are set in the config.env
file:
SECRET_KEY = django-insecure
DEBUG = True
LOGLEVEL = info
ALLOWED_HOSTS = 0.0.0.0,127.0.0.1,localhost
POSTGRES_DB = post_rating
POSTGRES_USER = postgres
POSTGRES_PASSWORD = postgres
POSTGRES_HOST = post_rating_postgres
POSTGRES_PORT = 5432
REDIS_HOST = post_rating_redis
REDIS_PORT = 6379
JWT_SECRET_KEY = your_jwt_secret_key
This project includes a Makefile
for automating common tasks:
Command | Description |
---|---|
make help |
Show available commands |
make install |
Install all dependencies using Poetry |
make runserver |
Run Django development server |
make migrate |
Apply database migrations |
make dump-data |
Dump current database data |
make create-superuser |
Create a Django superuser |
make shell |
Open the Django shell |
make show-urls |
Display all registered URLs |
make test |
Run unit tests |
make build |
Build the Docker image |
make up |
Start the Docker containers |
make down |
Stop the Docker containers |
make seeder |
Seed the database with initial data |
To run the tests, use the following command:
make test
The project includes tests for key features, such as fraud detection, rating logic, and system integration.
The project includes automatically generated API documentation using Swagger UI and ReDoc.
- Swagger UI: Available at
/api/v1/swagger/
- ReDoc UI: Available at
/api/v1/redoc/
The system is designed to scale efficiently:
- Caching: Redis caches post statistics to minimize database load.
- Asynchronous Processing: Celery processes background tasks to maintain responsiveness.
- Rate Limiting: Custom throttles prevent abusive behavior.
- Dockerized: Fully containerized for scalability and ease of deployment.
- Enhanced Fraud Detection: Add machine learning algorithms for better fraud detection.
- Real-Time Updates: Introduce WebSockets or gRPC for real-time updates on post ratings.
- Database Sharding: Shard the database to handle large datasets efficiently.
- CDN Integration: Implement a CDN for serving static files to reduce load.
For any inquiries or issues, please contact:
- Name: Reza Mobaraki
- Email: rezam578@gmail.com