This Project demonstrates how a Datawarehousing solution was implemented for Instacart Supermarket so as to allow the BI guys to extract useful and actionable insights from the data to drive business goals and profitability
- SQL
- Python
- Postgres Database
- Jupyter Notebook
Instacart Supermarket is large e-commerce store that generates millions of datasets daily. The datasets generated are;
- Aisles data
- Department data
- Orders data
- Order_products data
- Products data
Due to the lack of a central repo to hold and connect these data, the Data Analysts are finding it extremely difficult to run analytics on the data being generated thereby leading to an inability to be able to extract useful and actionable insights that will be used to make data driven decisions..
As the Data Engineer, my task was to quickly implement a datawarehousing solution which will act as a central repository for all the daatsets that are being generated by Instacart.
Datawarehouses makes it very convenient for Data Analysts and Scientist to run analyitcs on datasets being generated which is very important for the growth of any organization.
The approach i took was to study the five datasets and create an Entity Relationship diagram which shows the relationship and cardinality of the datasets as shown below.
The next step was to extract the various datasets into jupyter notebook by converting it into dataframes so that it becomes lot more easier to perform transformations where necessary.
The next approach is to create a connection to the database using psycopg2 and sqlalchemy as shown below;
After creating the connection to the database, the cursor was used to create the tables accordingly and commited for the execution to materialize.
Next, the data from dataframes was copied to its individual table using the to_sql function as shown;
To test run it, i ran a SELECT statement on the tables
Now that the datawarehouse is ready, we can start running some analytics query on the data
-
Create a temporary table that joins the orders, order_products and products tables to get information about each other, including the products that were purchased and their department and aisle information.
-
Create a temporary table that groups the orders by product and finds the total number of times each product was purchased, the total number of times each product was reordered, and the average number of times each product was added to a cart.
-
Create a temporary table that groups the orders by department and finds the total number of products purchased, the total number of unique products purchased, the total number of products purchased on weekdays vs weekends, and the average time of day that products in each department are ordered.
-
Create a temporary table that groups the orders by aisle and finds the top 10 most popular aisles, including the total number of products purchased and the total number of unique products purchased from each aisle.
-
Combine the information from the previous temporary tables into a final table that shows the product ID, product name, department ID, department name, aisle ID, aisle name, total number of times purchased, total number of times reordered, average number of times added to cart, total number of products purchased, total number of unique products purchased, total number of products purchased on weekdays, total number of products purchased on weekends, and average time of day products are ordered in each department.