Welcome to the BigData101 workshop repository! 🚀 In this workshop, we dive into the world of Big Data, covering essential concepts from Information Systems to the practical implementation of Extract, Load, Transform (ELT) processes using Python and MongoDB Data Lakes.
-
Introduction to Information Systems:
- Overview of Information Systems.
- Operational vs. Decisional Systems.
-
Database Technologies:
- Comparison of SQL and NoSQL databases.
- In-depth exploration of MongoDB.
-
Big Data Fundamentals:
- Understanding the 5V's of Big Data.
- Focus on Data Warehouses and Data Lakes.
-
ELT Process with Python and MongoDB:
- Detailed exploration of the ELT process.
- Hands-on implementation using Python, Pandas, Requests, CSV, PyMongo, and Matplotlib.
- (In this workshop, we opted for a straightforward and beginner-friendly approach to big data. Instead of delving into advanced tools like Hadoop and Apache, we focused on simplicity. Using Python and MongoDB, along with libraries such as Pandas and Matplotlib, participants can easily grasp fundamental concepts without unnecessary complexity. This streamlined process is designed to provide a clear introduction to big data for those who may be new to the subject.)
- Overview of Information Systems and their role in organizations.
- Comparison of Operational and Decisional Systems.
- Understanding SQL and NoSQL databases, with a focus on MongoDB.
- Introduction to Big Data and the 5V's.
- Deep dive into the ELT process.
- Explanation of Data Warehouses and Data Lakes.
- Practical implementation using Python and MongoDB.
- Extracting data from a web API, ingesting it into a MongoDB Data Lake, and performing transformations.
- Python 3.x
- MongoDB
- Jupyter Notebook (optional but recommended for interactive learning)
- Pandas
- Requests
- CSV
- PyMongo
- Matplotlib
-
Python:
- Download and install Python from python.org.
-
MongoDB:
- Install MongoDB by following the instructions on mongodb.com.
-
Python Libraries:
-
Open a terminal or command prompt.
-
Run the following commands to install the required libraries:
pip install pandas requests pymongo matplotlib
-
- Presentation Slides:
- Find the presentation slides here : Link to slide
- Clone the repository:
git clone https://github.com/your-username/BigData101.git```
- Navigate to the project directory:
cd BigData101
- Open Jupyter Notebook (optional):
jupyter notebook
- Open the code/notebook.ipynb notebook to follow along with the practical implementation.
- Run the Workshop Code:
- Execute the code cells in the Jupyter Notebook to run the workshop code.
- Alternatively, explore the Python scripts in the code/ directory using your preferred IDE or text editor.
- Explore the Presentation Slides:
- Open the presentation slides in the slides/ directory to review the theoretical concepts covered in the workshop.
- Experiment with Practice Data:
- Find sample data used for practice in the data/ directory.
- Contribute and Share:
- If you find ways to improve the workshop or want to share your own experiences, we encourage you to contribute! Check the Contributions section for guidance.
- Enjoy Learning and Happy Coding! 🚀