Welcome to The Data Engineer Prep Guide, A repository dedicated to helping data engineers prepare for interviews and hone their skills. My goal is to make preparation simple, structured, and collaborative.
- Data Engineer Handbook by Zach Wilson
A comprehensive guide for aspiring and experienced data engineers.
- Manish's Data Engineering Resume A well-structured resume showcasing key skills, projects, and experience in data engineering.
- My Resume
- Sumit Mittal – Founder of BigDataBySumit.
- Joe Reis – Co-author of Fundamentals of Data Engineering.
- Zach Wilson – Data Engineering Specialist.
- Shashank Mishra – Data Engineer and Educator.
- Gowtham SB – Big Data and Cloud Expert.
We’re starting with Spark, one of the most essential tools in a data engineer’s toolkit. The repository currently includes practical examples and commonly asked syntax questions to help you revise effectively.
📂 Spark/
└── syntax_practical/
└── common_asked_syntax.ipynb
└── topics_to_focus.md
📄 README.md
- Spark Syntax Practical:
- A notebook (
common_asked_syntax.ipynb
) covering frequently used Spark commands and operations. - Designed for quick revision and hands-on practice.
- A notebook (
This repository is a work in progress! Future sections will include:
- Kafka: Real-time data streaming concepts and hands-on examples.
- DBT: Data transformations in modern pipelines.
- SQL: Practice queries and optimization tips.
- Data Lake: Best practices for data storage and retrieval.
- Clone this repository:
git clone https://github.com/Noman654/data-engineer-prep.git
- Navigate to the
Spark
folder to start with the provided notebook. - Open the notebook with Jupyter or any compatible tool to explore the syntax examples OR you can directly run notebook using google-collab.
We welcome contributions to make this guide comprehensive and beginner-friendly. Here’s how you can help:
- Fork the repository.
- Create a branch for your updates.
- Submit a pull request with your contributions.
- Add syntax examples or commonly asked questions for Spark.
- Improve the existing content for clarity or accuracy.
- Share practical examples for upcoming topics (Kafka, DBT, SQL, etc.).
This repository is for the community, by the community. Whether you’re preparing for interviews or sharing your expertise, let’s collaborate to make data engineering preparation accessible for everyone.