Skip to content

Latest commit

 

History

History
36 lines (17 loc) · 988 Bytes

File metadata and controls

36 lines (17 loc) · 988 Bytes

CS5540_PrinciplesOfBigDataManagement

alt text

1. Tweets Collection

Collected tweets using tweetpy and store them into a text file

2. IntelliJ - Spark

  • Stored the collected tweets into hdfs

  • Run the word count in scala-spark

  • Run sql queries to analyze the data

3. Apache Zeppelin to visualize the results

Apache – Zeppelin Installation

  • Download the binary package with “all” interpreters and extract it in local.

  • Go to bin -> Zeppelin.cmd in windows command prompt, to start the Zeppelin server.

  • Once the server is started, goto http://localhost:8080 and click on the “Notebook” drop down and select -> create a new note

4. Project Demo

https://youtu.be/-MdhgmMMEiw

5. Project report and procedure can be found at

https://github.com/Ruthvicp/CS5540_PrinciplesOfBigDataManagement/blob/master/Phase2/Phase2_Report_RPMGZ.pdf