Caution: This timeline is tailored for @mhjhamza and might not be suitable for everyone.
Today's Progress:
- Setup Repository for #thepersonalmsds
- Created template for Social Media
- Facebook Page setup
- Enroled in PySpark Course @Udemy
Description: None
Important Links: Udemy | Spark and Python for Big Data with PySpark
Today's Progress: Learning PySpark!
Description: This includes the following:
- Setting up Python with Spark,
- Spark DataFrame Basics,
- Spark DataFrame Project Exercise and
- Machine Learning with MLlib.
Important Links: Udemy | Spark and Python for Big Data with PySpark
Today's Progress: Machine Learning with PySpark!
Description: Includes the following topics: Linear and Logistic Regression with PySpark Course: Spark and Python for Big Data with PySpark @Udemy Book: An Introduction to Statistical Learning by Gareth James Chapter no. 1, 2, and 3 Important Keywords: SparkSession, printSchema, StringIndexer, VectorAssembler, randomSplit, ConfusionMatrix, MeanAbsoluteError, R-Squared, RootMeanSquredError, Precision, Recall, Accuracy, ROC Curve, Pipeline.
Today's Progress: Machine Learning with PySpark!
Description: Includes the following topics: Decision Trees and Random Forrest K-Means Clustering
Important Links: Udemy | Spark and Python for Big Data with PySpark
Today's Progress: Basic Statistics on Coursera
Description: Today's topics include basic concepts of descriptive statistics:
- Cases, variables, and data matrix
- Levels of measurement
- How to present data as tables and graphs
- Measures of Central Tendency (mean, median, mode).
- Measures of Dispersion (range, interquartile range, variance, and standard deviation)
- Z-scores
Important Links: Coursera | Basic Statistics
Today's Progress: Basic Statistics on Coursera
Description: Includes the following topics:
- Correlation,
- Crosstabs,
- Scatterplots,
- Pearson's R,
- Regression,
- Contingency tables.
Important Links: Coursera | Basic Statistics
Today's Progress: Basic Statistics on Coursera
Description: Includes the following topics:
- Randomness,
- Probability,
- Relative Frequency,
- Sample Space,
- Basic Set-Theoretical concepts,
- Conditional Probability,
- Decision Trees,
- Bayes' Law.
Important Links: Coursera | Basic Statistics
Today's Progress: Basic Statistics on Coursera
Description: Includes the following topics:
- Random Variable & Probability Distribution
- Probability Mass Function
- Probability Density Function
- Cumulative Probability Distribution
- Mean and Variance of a random variable
- Normal Distribution & Binomial Distribution
- Sampling Distributions
- Random multi-stage cluster sampling
- Stratified random sampling.
Important Links: Coursera | Basic Statistics
Today's Progress: Basic Statistics on Coursera
Description: Includes the following topics:
- Confidence level
- Statistical Hypotheses
- Null and Alternative Hypothesis
- P-value, Significance level and Rejection region
Important Links: Coursera | Basic Statistics
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Distributed file system
- Hadoop distributed file system
- Scaling distributed file system
- Block and replica states
- HDFS Client
- Namenode Architecture
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hadoop MapReduce Streaming applications with Python
- Hadoop MapReduce application tuning
- Combiner, Practitioner, Comparator and Speculative Execution
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Resilient Distributed Datasets
- Transformation, Actions, Resiliency
- Working with text files
- Joins, Accumulator & Broadcast variable
- Spark UI & Cluster Mode
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Apache Spark
- Map and Filter Transformation
- FlatMap Transformation
- Spark Architecture
- Spark Components
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Filter and MapValue Transformations on Pair RDD
- Reduce By Key Aggregation
- Group By and Sort By Key Transformation
- Data Partitioning and Join Operations
- Accumulators and Broadcast Variables
- Spark SQL
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hands-on SparkSQL with PySpark
- Spark Streaming Project
- Running Spark in a Cluster
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Spark - Cluster Computing with Working Sets. (Paper)
- Started reading "Spark - The Definitive Guide" (Book)
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Transformations (Narrow & Wide)
- Lazy Evaluation
- Actions
- DataFrames & Partitions
- End-to-End example DataFrames and SQL
Second Chapter | Spark - The Definitive Guide
Today's Progress: Today's Roadmap includes the following PySpark topics:
- Structured API
- Structured Spark Types
- Basic Structured Operations
- Columns and Expressions
Book: Spark - The Definitive Guide
Today's Progress: Today's Roadmap includes the following PySpark topics:
- Production Application with Spark
- How Spark Runs on a Cluster
- Execution Modes
- Life Cycle of a Spark Application
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Writing and testing Spark Applications
- Launching Spark Applications
- Cluster Managers; Standalone, YARN, & Mesos Overview
Chapter # 15, 16, 17 of Spark - The Definitive Guide
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hadoop Ecosystem Playlist (link-1 below)
- Intro to Spark Streaming (link-2 below)
- Started Part-5 - "Structured Streaming" of the book (Ref below)
Link-1: https://lnkd.in/ffW97F2 Link-2: https://lnkd.in/fS93_38 Book: Spark - The Definitive Guide
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Structured Streaming Basics
- Core Concepts: Input Sources, Sinks, Output Modes and Triggers
- Event-Time Processing
- Transformations on Streams
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Event-Time and Stateful Processing
- Tumbling Windows & Handling Late Data with Watermarks
- Dropping Duplicates in a Stream
- Arbitrary Stateful Processing
Today's Progress: Today's Roadmap includes the following AWS Internet of Things (IoT) topics:
- IoT Core, IoT Thing Registry
- Message Broker, Rules Engine, Job Service
- Configured first IoT Device
Today's Progress: Today's Roadmap includes the following AWS topics:
- Kinesis Overview
- Kinesis Datastreams & Firehose
- Pipeline for ingestion & analyzing streaming data
Today's Progress: Today's Roadmap includes the following AWS topics:
- AWS Glue Core Concepts & ETL Jobs
- Data Catalog, Tables, and Crawler
- Job, Transform and Trigger
Today's Progress: Today's Roadmap includes the following AWS topics:
- Git Concepts - (Intermediate-advanced)
- AWS CodeCommit
- AWS Cloud9
- Integrate AWS Cloud9 with AWS CodeCommit
Today's Progress: Today's Roadmap includes the following AWS topics:
- Managing & Deploying Code with AWS Developer Tools
- Configuring Git, AWS CLI, and CodeCommit
- Using CodeCommit with other AWS Services
- AWS CodePipeline (Setup, Configuration, and Advanced features)
Today's Progress: Today's Roadmap includes the following AWS topics:
- AWS CodeBuild & CodeDeploy
- Test-Driven Development with AWS Services
- Behavior-Driven Development with AWS Services
Today's Progress: Today's Roadmap includes the following AWS Big Data topics:
- AWS Elastic MapReduce
- EMRFS (EMR FileSystem)
- Transient vs Long-Running Clusters
- Securing an EMR Cluster
- Working with PySpark & S3
- Querying EMR using HIVE
Today's Progress: Today's Roadmap includes the following AWS ML topics:
- AWS Data Pipeline
- Amazon Machine Learning (AML)
- Amazon SageMaker
Today's Progress: Today's Roadmap includes the following AWS Big Data topics:
- Elasticsearch Overview
- Elasticsearch Clusters and Access Policies
- Elasticsearch, Kibana, and AWS Lambda
Today's Progress: Today's Roadmap includes the following AWS Data Lake topics:
- Architecting Data Lakes on AWS
- Amazon Athena Overview
- Kinesis Analytics & Quicksight
Today's Progress: Today's Roadmap includes the following AWS Big Data Lake topics:
-
Written BDD Features/Steps for Data Lake
-
Written TDD Test Cases for Data Lake
-
Implemented a mini Data Lake pipeline including (Authentication, Data Ingestion, Processing, and Logging)
Today's Progress: Today's Roadmap includes the following AWS Big Data Lake topics:
- Reading "Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility"
- Link: https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf
Today's Progress: Today's Roadmap includes the following AWS topics:
- MakeFile Overview
- Bash Shell Scripting
- CloudFormation Core Concepts
- Templates In-Depth
- Template Advanced Concepts
Today's Progress: Today's Roadmap includes the following AWS topics:
- Continued Implementing CloudFormation Stacks
- Reading AWS CloudFormation Documentation
Today's Progress: Today's Roadmap includes the following AWS topics:
- Deep Dive AWS QuickSight, Data Lake, Glue, & Athena.
Today's Progress: Today's Roadmap includes the following AWS topics:
- Reading Big Data Analytics Options on AWS (Whitepaper)
Today's Progress: Today's Roadmap includes the following AWS topics:
- Continued reading Big Data Analytics Options on AWS (Whitepaper)
- Experimented with AWS CodeBuild, CodeDeploy and CodePipeline
Today's Progress: Today's Roadmap includes the following Hadoop Ecosystem topics:
- Hadoop Ecosystem Components (Hive, Pig, HBase, HCatalog)
Today's Progress: Today's Roadmap includes the following Hadoop Ecosystem topics:
- Hadoop Ecosystem Components (Drill, Mahout, Sqoop, and Zookeeper)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Collection (AWS Kinesis)
- Streams, analytics and firehose.
- Producer and Consumer SDKs.
- Shards, shard splitting and merge shards.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Collection (AWS Kinesis)
- Created Kinesis Firehose for streaming data to S3
- Created Kinesis Stream for reading from EC2
- Built a Pipeline to Ingest and Analyze Streaming Data
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Storage & Processing (Amazon S3 + EMR)
- S3 Storage Tiers, Lifecycle Rules, Versioning, CRR, Events, and ETags
- Spun-up an EMR Cluster,
- Loaded UCI Dataset from S3,
- Ran Spark on Zepplin notebook to process the dataset.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Storage (Amazon DynamoDB)
- DynamoDB RCU & WCU, Partitions,
- Indexes LSI & GSI, DAX, and Dynamo Streams.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Processing (AWS Lambda)
- AWS Lambda as Cron Jobs
- Lambda Integration with other services
- Lambda Costs, Promises, and Anti-Patterns
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Amazon SageMaker
- Ran JupyterLab and trained a model
- Deployed the model to an endpoint
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- AWS Redshift Architecture
- Redshift Spectrum
- Redshift Distribution Styles
- Redshift Integration / WLM / Vacuum / Anti-Patterns
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Reading Whitepaper: Streaming Data Solutions on AWS with Amazon Kinesis
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Amazon Database Migration Service (DMS)
- Migrating database to S3 data lake via EMR - Sqoop
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Python Design Patterns
- Reading whitepaper: Data Warehousing on AWS
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Intro to Amazon Quicksight
- Quicksight Pricing and Dashboards
- Choosing Visualization Types
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Reading AWS Database Migration Service Best Practices Whitepaper
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Reading Well-Architeched Framework Whitepaper
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Continued Reading Well-Architeched Framework Whitepaper
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Amazon Rekognition Developer Guide
- Experimenting with Amazon Rekognition API
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Amazon Polly, and Transcribe.
- Experimenting with Polly and Transcribe with Boto3.
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Amazon Comprehend and Lex.
- Experimenting with Polly and Transcribe with Boto3.
- Amazon Service Chaining with AWS Step Functions.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Redshift Distribution Style selection
- Redshift Sort Key Selection
- Loading Data & Designing Queries
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Apache Hadoop, EMR Architecture & Operations (aCloudGuru)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- AWS re:Invent 2018: Big Data Analytics Architectural Patterns & Best Practices
- AWS re:Invent 2018: Effective Data Lakes: Challenges and Design Patterns
- AWS re:Invent 2018: Build and Govern Your Data Lakes with AWS Glue
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
[AWS re: Invent 2018]
- A Deep Dive into What's New with Amazon EMR
- Deep Dive and Best Practices for Amazon Redshift
- Intro to AWS Lake Formation - Build a secure data lake
- Data Lake Implementation: Processing & Querying Data in Place
- High-Performance Data Streaming with Amazon Kinesis: Best Practices
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Data Privacy & Governance in the Age of Big Data (GPSTEC303)
- Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB (DAT401)
- Search at Nike with Amazon Elasticsearch Service (ANT203)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Learning Applications Using TensorFlow, Advanced Microgrid Solutions (AIM401-R2)'
- Deep Learning Applications Using TensorFlow (AIM401-R)
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Build & Deploy ML Models Quickly & Easily with Amazon SageMaker (AIM404-R)
Today's Progress: Today's Roadmap includes the following ML (AWS) topics:
- Enterprise Data Lake: Architecture Using Big Data Technologies
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
AWS re: Invent 2018: Building Serverless Analytics Pipelines with AWS Glue (ANT308)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
AWS re:Invent 2018: Leadership Session: AWS Security (SEC305-L)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
AWS re:Invent 2018: Become an IAM Policy Master in 60 Minutes or Less (SEC316-R1)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Elasticsearch Clusters, Access Policies within VPC.
- Elasticsearch, Kibana, DynamoDB and AWS Lambda.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (1/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (2/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (3/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (4/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (5/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (6/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (7/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (8/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (9/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Taking (10/10) Quizzes for AWS Big Data Speciality.
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Improving Train Safety with AWS IoT
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Exam Readiness: AWS Certified Big Data - Specialty (Digital)
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- AWS Kinesis Streams & Firehose Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Amazon Elasticsearch Service Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- AWS Redshift Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- AWS Elastic MapReduce (EMR) Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- DynamoDB Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Amazon Athena Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Big Data (AWS) topics:
- Amazon Quicksight Documentation & FAQs
Today's Progress: Today's Roadmap includes the following Python topics:
- Solving Python challenges on HackerRank
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hands-On Lab: Migrating Redshift Data to and from S3
Today's Progress: Today's Roadmap includes the following Big Data topics:
Hands-On Lab: Analyzing and Visualizing data with AWS Elasticsearch Service and Kibana
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hands-On Lab: AWS Glue Crawler, Glue ETL with Amazon Athena
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hands-On Lab: DynamoDB bulk insertion with AWS Lambda and S3
Today's Progress: Today's Roadmap includes the following Big Data topics:
- Hands-On Lab: CloudWatch Rules (Cron Job) with AWS Lambda and DynamoDB (Continued)
Today's Progress: Today's Roadmap includes the following DevOps topics:
- Hands-On Lab: DevOps - CodeCommit CodeBuild CodePipeline for Big Data Services
Today's Progress: Today's Roadmap includes the following ML topics:
- Re-Invent: Infinitely Scalable Machine Learning Algorithms
Today's Progress: Today's Roadmap includes the following ML topics:
- Re: Invent 2018: AIOps: Steps Towards Autonomous Operations
Today's Progress: Today's Roadmap includes the following AWS topics:
- Hands-on: API Gateway with DynamoDB Request & Response Mapping
Today's Progress: Today's Roadmap includes the following AWS topics:
- Amazon Cognito
- User Pool & Federated Identities
- AWS Services Access Control