Skip to content

Ruthvicp/CS5540_PrinciplesOfBigDataManagement

Repository files navigation

CS5540_PrinciplesOfBigDataManagement

alt text

1. Tweets Collection

Collected tweets using tweetpy and store them into a text file

2. IntelliJ - Spark

  • Stored the collected tweets into hdfs

  • Run the word count in scala-spark

  • Run sql queries to analyze the data

3. Apache Zeppelin to visualize the results

Apache – Zeppelin Installation

  • Download the binary package with “all” interpreters and extract it in local.

  • Go to bin -> Zeppelin.cmd in windows command prompt, to start the Zeppelin server.

  • Once the server is started, goto http://localhost:8080 and click on the “Notebook” drop down and select -> create a new note

4. Project Demo

https://youtu.be/-MdhgmMMEiw

5. Project report and procedure can be found at

https://github.com/Ruthvicp/CS5540_PrinciplesOfBigDataManagement/blob/master/Phase2/Phase2_Report_RPMGZ.pdf

About

A system to collect, analyze and visualize the tweets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published