This README provides an overview of the analysis conducted on the Yelp dataset, focusing on profiling, understanding, and deriving insights from the data. The analysis consists of two parts:
-
Data Profiling and Understanding: In this part, various SQL queries were used to profile the dataset by finding total records, distinct records, checking for null values, analyzing specific columns, and exploring distributions.
-
Inferences and Analysis: This part involves conducting specific analyses on the dataset, such as grouping businesses, comparing open and closed businesses, and conducting predictive analysis.
In this section, the following tasks were performed:
- Profiled the data by finding the total number of records for each table.
- Found the total distinct records by foreign key or primary key for each table.
- Checked for columns with null values in the Users table.
- Analyzed specific columns to find the smallest, largest, and average values for selected fields.
- Listed cities with the most reviews in descending order.
- Found the distribution of star ratings for businesses in selected cities.
In this section, the following analyses were conducted:
- Grouped businesses based on open and closed status, and compared differences.
- Conducted predictive analysis to calculate the average star rating for each business based on historical ratings.
The analysis provides valuable insights into the Yelp dataset, including the distribution of reviews, characteristics of businesses based on star ratings, and differences between open and closed businesses. Additionally, predictive analysis offers a glimpse into potential future trends based on historical data.
For detailed SQL queries and analysis results, please refer to the code provided in the respective sections of this README.