Skip to content

This Project demonstrates how a Datawarehousing solution was implemented for Instacart Supermarket so as to allow the BI team extract useful and actionable insights from the data to drive business goals and profitability

Notifications You must be signed in to change notification settings

jaykay04/Implementing-Data-Modeling-And-Warehousing-Solution-Using-PostgreSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Implementing-Data-Modeling-And-Warehousing-Solution-Using-PostgreSQL

This Project demonstrates how a Datawarehousing solution was implemented for Instacart Supermarket so as to allow the BI guys to extract useful and actionable insights from the data to drive business goals and profitability

Languages and Tools

  • SQL
  • Python
  • Postgres Database
  • Jupyter Notebook

Situation

Instacart Supermarket is large e-commerce store that generates millions of datasets daily. The datasets generated are;

  • Aisles data
  • Department data
  • Orders data
  • Order_products data
  • Products data
    Due to the lack of a central repo to hold and connect these data, the Data Analysts are finding it extremely difficult to run analytics on the data being generated thereby leading to an inability to be able to extract useful and actionable insights that will be used to make data driven decisions..

Task

As the Data Engineer, my task was to quickly implement a datawarehousing solution which will act as a central repository for all the daatsets that are being generated by Instacart.

Datawarehouses makes it very convenient for Data Analysts and Scientist to run analyitcs on datasets being generated which is very important for the growth of any organization.

Approach

The approach i took was to study the five datasets and create an Entity Relationship diagram which shows the relationship and cardinality of the datasets as shown below.

The next step was to extract the various datasets into jupyter notebook by converting it into dataframes so that it becomes lot more easier to perform transformations where necessary.

The next approach is to create a connection to the database using psycopg2 and sqlalchemy as shown below;

After creating the connection to the database, the cursor was used to create the tables accordingly and commited for the execution to materialize.

Next, the data from dataframes was copied to its individual table using the to_sql function as shown;

To test run it, i ran a SELECT statement on the tables

Now that the datawarehouse is ready, we can start running some analytics query on the data

  • Create a temporary table that joins the orders, order_products and products tables to get information about each other, including the products that were purchased and their department and aisle information.

  • Create a temporary table that groups the orders by product and finds the total number of times each product was purchased, the total number of times each product was reordered, and the average number of times each product was added to a cart.

  • Create a temporary table that groups the orders by department and finds the total number of products purchased, the total number of unique products purchased, the total number of products purchased on weekdays vs weekends, and the average time of day that products in each department are ordered.

  • Create a temporary table that groups the orders by aisle and finds the top 10 most popular aisles, including the total number of products purchased and the total number of unique products purchased from each aisle.

  • Combine the information from the previous temporary tables into a final table that shows the product ID, product name, department ID, department name, aisle ID, aisle name, total number of times purchased, total number of times reordered, average number of times added to cart, total number of products purchased, total number of unique products purchased, total number of products purchased on weekdays, total number of products purchased on weekends, and average time of day products are ordered in each department.

About

This Project demonstrates how a Datawarehousing solution was implemented for Instacart Supermarket so as to allow the BI team extract useful and actionable insights from the data to drive business goals and profitability

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published