This section is a collection of best practices on how you can arrange the tools together to a platform.
It's here especially to help you start your own project in the cloud on AWS, Azure and GCP.
Like the advanced skills section this section also follows my My Data Science Platform Blueprint. In the blueprint I divided the platform into sections: Connect, Buffer, Processing, Store and Visualize.
This order will help you learn how to connect the right tools together. Take your time and research the tools and learn how they work.
Right now the Azure section has a lot of links to platform examples. They are also useful for AWS and GCP, just try to change out the tools.
As always, I am going to add more stuff to this over time.
Have fun!
- Elastic Beanstalk (very old)
- SES Simple Email Service
- API Gateway
- Kinesis
- Kinesis Data Firehose
- Managed Streaming for Kafka (MSK)
- MQ
- Simple Queue Service (SQS)
- Simple Notification Service (SNS)
- EC2
- Athena
- EMR
- Elasticsearch
- Kinesis Data Analytics
- Glue
- Step Functions
- Fargate
- Lambda
- SageMaker
- Simple Storage Service (S3)
- Redshift
- Aurora
- RDS
- DynamoDB
- ElastiCache
- Neptune Graph DB
- Timestream
- DocumentDB (MongoDB compatible)
- Quicksight
- Elastic Container Service (ECS)
- Elastic Container Registry (ECR)
- Elastic Kubernetes Service (EKS)
Deploying a Spring Boot Application on AWS Using AWS Elastic Beanstalk:
How to deploy a Docker Container on AWS:
https://aws.amazon.com/getting-started/hands-on/deploy-docker-containers/
AWS Whitepapers:
https://d1.awsstatic.com/whitepapers/aws-overview.pdf
- Event Hub
- IoT Hub
- Data Factory
- Event Hub
- RedisCache (also Store)
- Stream Analytics Service
- Azure Databricks
- Machine Learning
- Azure Functions
- Azure HDInsight (Hadoop PaaS)
- Blob
- CosmosDB
- MariaDB
- MySQL
- PostgreSQL
- SQL
- Azure Data lake
- Azure Storage (SQL Table?)
- Azure Synapse Analytics
- PowerBI
- Virtual Machines
- Virtual Machine Scale Sets
- Azure Container Service (AKS)
- Container Instances
- Azure Kubernetes Service
Advanced Analytics Architecture:
Anomaly Detection in Real-time Data Streams:
Modern Data Warehouse Architecture:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/modern-data-warehouse
CI/CD for Containers:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/cicd-for-containers
Real Time Analytics on Big Data Architecture:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/real-time-analytics
Anomaly Detection in Real-time Data Streams:
IoT Architecture – Azure IoT Subsystems:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-iot-subsystems
Tier Applications & Data for Analytics:
Extract, transform, and load (ETL) using HDInsight:
IoT using Cosmos DB:
https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/iot-using-cosmos-db
Streaming using HDInsight:
- Cloud IoT Core
- App Engine
- Cloud Dataflow
- Pub/Sub
- Compute Engine
- Cloud Functions
- Specialized tools:
- Cloud Dataflow
- Cloud Dataproc
- Cloud Datalab
- Cloud Dataprep
- Cloud Composer
- App Engine
- Cloud Storage
- Cloud SQL
- Cloud Spanner
- Cloud Datastore
- Cloud BigTable
- Cloud Storage
- Cloud Memorystore
- BigQuery
- Kubernetes Engine
- Container Security
Thanks to Ismail Holoubi for the following GCP links
Best practices for migrating virtual machines to Compute Engine:
https://cloud.google.com/solutions/best-practices-migrating-vm-to-compute-engine
Best practices for Cloud Storage:
https://cloud.google.com/storage/docs/best-practices
Moving a publishing workflow to BigQuery for new data insights:
Architecture: Optimizing large-scale ingestion of analytics events and logs:
https://cloud.google.com/solutions/architecture/optimized-large-scale-analytics-ingestion
Choosing the right architecture for global data distribution:
https://cloud.google.com/solutions/architecture/global-data-distribution
Best Practices for Operating Containers:
https://cloud.google.com/solutions/best-practices-for-operating-containers
Preparing a Google Kubernetes Engine Environment for Production:
https://cloud.google.com/solutions/prep-kubernetes-engine-for-prod
Automating IoT Machine Learning: Bridging Cloud and Device Benefits with AI Platform:
https://cloud.google.com/solutions/automating-iot-machine-learning