Skip to content
View Noman654's full-sized avatar

Block or report Noman654

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Noman654/README.md

πŸ‘‹ Hi, I'm Mohd Nauman

🌟 Data Engineer | Python Developer | Problem Solver | Technical Trainer
πŸ” Passionate about crafting scalable solutions, optimizing data pipelines, and tackling real-world challenges with data.

πŸŽ“ Pursuing a BSc in Programming and Data Science from IIT Madras
πŸ“ Based in Bengaluru, India


πŸ‘¨β€πŸ’» About Me

A Data Engineer and Python Developer with a passion for designing scalable systems to process and manage large-scale data efficiently. I specialize in optimizing backend operations, reducing costs, and improving performance to deliver impactful results.


🌟 What I Bring:

  • Data Engineering Expertise: Skilled in designing and optimizing ETL pipelines with tools like PySpark, Kafka, and cloud platforms such as AWS and Azure.
  • Backend Development: Experienced in creating robust web applications using Flask and FastAPI.
  • Efficiency & Optimization: Consistently deliver solutions that reduce storage costs by 60% and improve process speeds by 90%.

πŸ’‘ Fun Fact:

  • I once reduced a Spark pipeline’s processing time from 2 days to just 30 minutes, saving alot in compute costs! πŸš€

πŸ› οΈ Technologies I Love:

  • Languages: Python SQL JavaScript
  • Frameworks: Flask FastAPI PySpark
  • Databases: MySQL MongoDB Azure SQL
  • Cloud: AWS Azure

πŸ“‚ Highlighted Projects

Building a Scalable Pipeline for Indic Image Dataset Extraction

  • Engineered a robust pipeline using Hugging Face Obelics to fetch images and text from over 230 million websites via Common Crawl, focusing on Indic content.
  • Designed a scalable, distributed architecture to handle high-volume requests and bypass rate limits effectively.
  • Successfully curated a dataset of 50–100 million images paired with text, enabling advanced applications.
  • Developed a tool for seamless data downloads from sources like Hugging Face and Archive.
  • Optimized for 30–50% faster speeds using parallel processing and reduced I/O overhead.
  • Capable of downloading terabytes of data in under an hour.

πŸ”— Malaysia Airlines Forecasting System

  • Designed and implemented a forecasting pipeline for airline revenue and passenger trends.
  • Reduced storage costs by 60% and optimized process speeds by 90%.

πŸ“ˆ GitHub Stats

GitHub Stats
Top Languages


πŸ“œ Certifications


πŸ“¬ Let's Connect

Popular repositories Loading

  1. StockAdvisor StockAdvisor Public

    Python 2

  2. data_scraper data_scraper Public

    Python 2

  3. dataengineer_prep dataengineer_prep Public

    Jupyter Notebook 2

  4. learning-java-2825378 learning-java-2825378 Public

    Forked from LinkedInLearning/learning-java-2825378

    Learning Java (REVISION Q1 2020)

  5. ping-pong-Game ping-pong-Game Public

    this is a simple two player ping pong game

    Lua

  6. OLDBOOK OLDBOOK Public

    this a simple which store has which type of book

    HTML