- Install the hadoop, hbase, kafka, flume and spark with homebrew
- Write the configuration files:
kafka:zookeeper.properties
,server.properties
flume:streaming_project_kafka.conf
hadoop:core-site.xml
,hadoop-env.sh
,hdfs-site.xml
,mapred-site.xml
,yarn-site.xml
Using crontab
to generate streaming logs.
- Get into the Logs_Generation folder to give authorization:
chmod u+x log_generator.sh
andtail -200f access.log
- Use Crontab to generate Logs with time:
crontab -e
*/1 * * * * sh /Users/..../log_generator.sh
:wq
- Check crontab, vim, list, remove
crontab -e
,crontab -l
,crontab -r
- Edit the file "streaming_project_kafka.conf"
- Run the Zookeeper, Kafka, Flume
zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties &
kafka-server-start -daemon /usr/local/etc/kafka/server.properties &
kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kafka_streaming_topic
- Get into the flume folder, and run
bin/flume-ng agent -Xms1000m -Xmx1000m --conf conf --conf-file conf_files/streaming_project_kafka.conf --name exec-memory-kafka -Dflume.root.logger=INFO,console
- Receive the data throught kafka consumer (Check)
kafka-console-consumer --bootstrap-server localhost:9092 --topic streamingtopic --from-beginning
- Start hadoop server
cd /usr/local/cellar/hadoop/3.2.1_1/sbin
./start-all.sh
- Start hbase
cd /usr/local/cellar/hbase/1.3.5/bin
./start-hbase.sh
./hbase shell
- Create Table
create 'language_search_count', 'info'
Spark Streaming to consume topics from Kafka and save to the Hbase
run LogsStreaming
in folder Logs_Analysis in intellij.
- run
WebVisualizationApplication
in folder Web_Visualization in intellij. - open the
http://localhost:9999/gatech/echarts