CS5540_PrinciplesOfBigDataManagement

1. Tweets Collection

Collected tweets using tweetpy and store them into a text file

2. IntelliJ - Spark

Stored the collected tweets into hdfs
Run the word count in scala-spark
Run sql queries to analyze the data

3. Apache Zeppelin to visualize the results

Apache – Zeppelin Installation

Download the binary package with “all” interpreters and extract it in local.
Go to bin -> Zeppelin.cmd in windows command prompt, to start the Zeppelin server.
Once the server is started, goto http://localhost:8080 and click on the “Notebook” drop down and select -> create a new note

4. Project Demo

https://youtu.be/-MdhgmMMEiw

5. Project report and procedure can be found at

https://github.com/Ruthvicp/CS5540_PrinciplesOfBigDataManagement/blob/master/Phase2/Phase2_Report_RPMGZ.pdf