Probability distribution can simulate various series of events. Thus, it can be used in order to simulate some events, behavior of the system, or even predict some actions. Different distributions can be used for different usage. In this project, some queries were calculated by some probabilities.
This dataset consists of the actions of users on an online market including:
- Action: Click/load
- Time
- Event ID
- Device ID
- Page Offset
- Post List
- Post ID
1- Probability of clicking on the ad?
The clicking on ads can be modeled by Bernoulli distribution where the user can click or not.
2- The rank of the first ad that is clicked?
The rank of the first ad that is clicked on it can be simulated with the geometric distribution where we try until the first success.
3- Average distance between clicks?
For calculating the average distance between clicks of users, exponential distribution can perform well. It is used for modeling distances between events like distances between clicks.
4- The probability of clicking on the first 3 ads?
In order to calculate the probability of clicking on the first 3 results, a commutative geometric distribution is suitable.
Some analytics is done on the dataset to know better insights from the dataset in the below. In addition, the Sankey diagram was used in order to demonstrate dataset characteristics.
The Sankey diagram with Plotly:
The HTML file is here.