Skip to content

tarek-bouras/BigData101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

BigData101 Workshop

Welcome to the BigData101 workshop repository! 🚀 In this workshop, we dive into the world of Big Data, covering essential concepts from Information Systems to the practical implementation of Extract, Load, Transform (ELT) processes using Python and MongoDB Data Lakes.

Workshop Overview

Topics Covered:

  1. Introduction to Information Systems:

    • Overview of Information Systems.
    • Operational vs. Decisional Systems.
  2. Database Technologies:

    • Comparison of SQL and NoSQL databases.
    • In-depth exploration of MongoDB.
  3. Big Data Fundamentals:

    • Understanding the 5V's of Big Data.
    • Focus on Data Warehouses and Data Lakes.
  4. ELT Process with Python and MongoDB:

    • Detailed exploration of the ELT process.
    • Hands-on implementation using Python, Pandas, Requests, CSV, PyMongo, and Matplotlib.
    • (In this workshop, we opted for a straightforward and beginner-friendly approach to big data. Instead of delving into advanced tools like Hadoop and Apache, we focused on simplicity. Using Python and MongoDB, along with libraries such as Pandas and Matplotlib, participants can easily grasp fundamental concepts without unnecessary complexity. This streamlined process is designed to provide a clear introduction to big data for those who may be new to the subject.)

Workshop Structure

Session 1: Theory

  • Overview of Information Systems and their role in organizations.
  • Comparison of Operational and Decisional Systems.
  • Understanding SQL and NoSQL databases, with a focus on MongoDB.
  • Introduction to Big Data and the 5V's.

Session 2: ELT Process Implementation

  • Deep dive into the ELT process.
  • Explanation of Data Warehouses and Data Lakes.
  • Practical implementation using Python and MongoDB.
  • Extracting data from a web API, ingesting it into a MongoDB Data Lake, and performing transformations.

Requirements

Software Requirements:

  • Python 3.x
  • MongoDB
  • Jupyter Notebook (optional but recommended for interactive learning)

Python Libraries:

  • Pandas
  • Requests
  • CSV
  • PyMongo
  • Matplotlib

Installation

  1. Python:

  2. MongoDB:

    • Install MongoDB by following the instructions on mongodb.com.
  3. Python Libraries:

    • Open a terminal or command prompt.

    • Run the following commands to install the required libraries:

      pip install pandas requests pymongo matplotlib

Workshop Files

  1. Presentation Slides:

Getting Started

  1. Clone the repository:
    git clone https://github.com/your-username/BigData101.git```
  2. Navigate to the project directory:
    cd BigData101
  3. Open Jupyter Notebook (optional):
    jupyter notebook
    • Open the code/notebook.ipynb notebook to follow along with the practical implementation.
  4. Run the Workshop Code:
    • Execute the code cells in the Jupyter Notebook to run the workshop code.
    • Alternatively, explore the Python scripts in the code/ directory using your preferred IDE or text editor.
  5. Explore the Presentation Slides:
    • Open the presentation slides in the slides/ directory to review the theoretical concepts covered in the workshop.
  6. Experiment with Practice Data:
    • Find sample data used for practice in the data/ directory.
  7. Contribute and Share:
    • If you find ways to improve the workshop or want to share your own experiences, we encourage you to contribute! Check the Contributions section for guidance.
  8. Enjoy Learning and Happy Coding! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published