π Data Engineer | Python Developer | Problem Solver | Technical Trainer
π Passionate about crafting scalable solutions, optimizing data pipelines, and tackling real-world challenges with data.
π Pursuing a BSc in Programming and Data Science from IIT Madras
π Based in Bengaluru, India
A Data Engineer and Python Developer with a passion for designing scalable systems to process and manage large-scale data efficiently. I specialize in optimizing backend operations, reducing costs, and improving performance to deliver impactful results.
- Data Engineering Expertise: Skilled in designing and optimizing ETL pipelines with tools like PySpark, Kafka, and cloud platforms such as AWS and Azure.
- Backend Development: Experienced in creating robust web applications using Flask and FastAPI.
- Efficiency & Optimization: Consistently deliver solutions that reduce storage costs by 60% and improve process speeds by 90%.
- I once reduced a Spark pipelineβs processing time from 2 days to just 30 minutes, saving alot in compute costs! π
- Engineered a robust pipeline using Hugging Face Obelics to fetch images and text from over 230 million websites via Common Crawl, focusing on Indic content.
- Designed a scalable, distributed architecture to handle high-volume requests and bypass rate limits effectively.
- Successfully curated a dataset of 50β100 million images paired with text, enabling advanced applications.
- Developed a tool for seamless data downloads from sources like Hugging Face and Archive.
- Optimized for 30β50% faster speeds using parallel processing and reduced I/O overhead.
- Capable of downloading terabytes of data in under an hour.
- Designed and implemented a forecasting pipeline for airline revenue and passenger trends.
- Reduced storage costs by 60% and optimized process speeds by 90%.
- π Email: mohdnauman330@gmail.com
- πΌ LinkedIn: linkedin.com/in/nauman330
- π GitHub: github.com/Noman654