Data Sources: https://corona.jakarta.go.id/id
- Install Packages & Calling the Library
install.packages("tidyverse") #data manipulation
install.packages("cluster") #clustering algorithm
install.packages("factoextra") #clustering algorithm & data visualization
library(tidyverse)
library(cluster)
library(factoextra)
- Import data set
dataset <- read.csv(file.choose())
- Check and delete missing value
dataset <- na.omit(dataset)
summary(dataset)
- Choice value of X
dataCovid <- data.frame(dataset[2:7])
head(dataCovid)
- Make a standardization data
dataCovidNew <- scale(dataCovid)
head(dataCovidNew)
- Find K Optimum
- Elbow Methods
fviz_nbclust(dataCovidNew, kmeans, method = "wss")
- Silhouette Methods
fviz_nbclust(dataCovidNew, kmeans, method = "silhouette")
- Gap Statistic Methods
set.seed(484) #Randomize data and lock the data
gap_stat <-clusGap(dataCovidNew, FUN=kmeans, nstart=25, K.max=10, B=150)
fviz_gap_stat(gap_stat)
- K Means Cluster
set.seed(4848)
covidCluster <- kmeans(dataCovidNew, 3)
print(covidCluster)
- Visualization of Clustering
fviz_cluster(covidCluster, data = dataCovidNew)
- Adding Jakarta Maps [!Coming Soon]
Cluster | Suspect | Probable | Travel Person | Direct Contact | Discarded | Positive |
---|---|---|---|---|---|---|
1 | 0.2015917 | 0.0936914 | -0.04231536 | 0.1538247 | 0.08338454 | 0.2365707 |
2 | 1.8271435 | 0.9853687 | 1.19529068 | 1.9311284 | 1.48988598 | 1.7963892 |
3 | -0.8958647 | -0.4592895 | -0.30962376 | -0.8528384 | -0.60196562 | -0.9416806 |
1st Cluster: 146 sub-districts (Mid)
2nd Cluster: 29 sub-districts (High)
3rd Cluster: 92 sub-districts (Low)
Stay at home and stay healthy, everyone!