# Build
mix deps.get
mix escript.build
# Run
./triads_extractor -i path/to/input.csv -o path/to/output.txt
Scrape using Twint, but since it needs keyword I have to split it to multiple processes.
-
Install Twint
pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
-
Scrape Thai Tweets, as much as you want
# Scrape by geolocation twint -g="13.736717,100.523186,500km" --csv -o tweet-geo-thailand.csv --lang th # Scrape until date export DATE=2020-10-02 ; twint -g="13.736717,100.523186,500km" --csv -o tweet-geo-thailand-$DATE.csv --lang th --until $DATE export DATE=2021-01-01 ; twint -g="13.736717,100.523186,500km" --csv -o tweet-geo-thailand-$DATE.csv --lang th --until $DATE # Scrape @sugree tweets! twint -u=sugree --csv -o tweet-sugree.csv
-
Combine & remove duplicates
cat tweet-*.csv | sort -r -u > tweet-combined-uniq.csv
- Cleanup gibberish data
- Output as triads (json/csv) with frequencies