Skip to content

This repository contains all materials from my term project of CPE393-Text Analytics. The purpose of this project is to display a heatmap of location which has the road accident from JS100 official account's tweet.

Notifications You must be signed in to change notification settings

PaweekornS/JS100

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

JS100

This repository contains all materials from my term project, CPE393-Text Analytics. The purpose of this project is to display a heatmap of locations with road accidents from the JS100 official account's tweets.

Web scraping

We have done 3 approaches for scraping the data from twitter:

  1. Twitter API v2 (fail): can call the API for only 15 mins so, we won't get enough data for our task
  2. NTScraper (fail): can't use the Thai account to scrape data
  3. Selenium (not a legal way but work!)

Extracting Location

When we did the data exploration, we separated our dataset into 2 types which are hashtag and none-hashtag data. We used different approaches to extract the locations of each kind:

  1. Hashtag data
  • Almost of the tweets contains the location at the second position of # e.g. #อุบัติเหตุ ... #. From these insight, we decided to use a regular expression (regex) for extracting the location.
  1. None-hashtag data
  • using thaiNER2.0 model from pythaiNLP package for extracting the location

Data modelling and EDA

  • Top 5 tags from JS100 tweet: #อุบัตเหตุ, #รถติด, #JS100, #ข่าวPNC, #JS100
  • Top 5 accident locations from our extracted result: ถนนกาญจนาภิเษก, ถนนพหลโยธิน, ถนนเทพรัตน, ถนนจรัญสนิทวงศ์, ถนนพระราม2

Heatmap creation

  • using geopy package to extract the geo-coordinates from our extracted location
  • using folium package for plotting the geo-coordiantes into the map image

Team members

Paweekorn Soratyathorn
Sawit Koseeyaumporn
Mawin Srichat
Pichawat Adulwiitayakorn

About

This repository contains all materials from my term project of CPE393-Text Analytics. The purpose of this project is to display a heatmap of location which has the road accident from JS100 official account's tweet.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published