Cluster COVID-19 in Jakarta

K-Mean Clustering Methods

Data Sources: https://corona.jakarta.go.id/id

Data taken in 10 November 2020 (10:00 GMT +7)

Cluster Analysis in R

Install Packages & Calling the Library

install.packages("tidyverse") #data manipulation
install.packages("cluster") #clustering algorithm
install.packages("factoextra") #clustering algorithm & data visualization

library(tidyverse)
library(cluster)
library(factoextra)

Import data set

dataset <- read.csv(file.choose())

Check and delete missing value

dataset <- na.omit(dataset)
summary(dataset)

Choice value of X

dataCovid <- data.frame(dataset[2:7])
head(dataCovid)

Make a standardization data

dataCovidNew <- scale(dataCovid)
head(dataCovidNew)

Find K Optimum

Elbow Methods

fviz_nbclust(dataCovidNew, kmeans, method = "wss")

Silhouette Methods

fviz_nbclust(dataCovidNew, kmeans, method = "silhouette")

Gap Statistic Methods

set.seed(484) #Randomize data and lock the data
gap_stat <-clusGap(dataCovidNew, FUN=kmeans, nstart=25, K.max=10, B=150)
fviz_gap_stat(gap_stat)

K-Optimum = 3

K Means Cluster

set.seed(4848)
covidCluster <- kmeans(dataCovidNew, 3)
print(covidCluster)

Visualization of Clustering

fviz_cluster(covidCluster, data = dataCovidNew)

Adding Jakarta Maps [!Coming Soon]

Conclusion

Summary of the data Cluster

Cluster	Suspect	Probable	Travel Person	Direct Contact	Discarded	Positive
1	0.2015917	0.0936914	-0.04231536	0.1538247	0.08338454	0.2365707
2	1.8271435	0.9853687	1.19529068	1.9311284	1.48988598	1.7963892
3	-0.8958647	-0.4592895	-0.30962376	-0.8528384	-0.60196562	-0.9416806

There are 3 clusters (Low, Mid, High)

1st Cluster: 146 sub-districts (Mid)
2nd Cluster: 29 sub-districts (High)
3rd Cluster: 92 sub-districts (Low)

Stay at home and stay healthy, everyone!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
clustering		clustering
data		data
img		img
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cluster COVID-19 in Jakarta

K-Mean Clustering Methods

Data Sources: https://corona.jakarta.go.id/id

Data taken in 10 November 2020 (10:00 GMT +7)

Cluster Analysis in R

Conclusion

Summary of the data Cluster

There are 3 clusters (Low, Mid, High)

About

Languages

afifadayu/covid-19

Folders and files

Latest commit

History

Repository files navigation

Cluster COVID-19 in Jakarta

K-Mean Clustering Methods

Data Sources: https://corona.jakarta.go.id/id

Data taken in 10 November 2020 (10:00 GMT +7)

Cluster Analysis in R

Conclusion

Summary of the data Cluster

There are 3 clusters (Low, Mid, High)

About

Topics

Resources

Stars

Watchers

Forks

Languages