This is a guide that will walk users through setting up a sample Data Prepper pipeline for log ingestion.
This guide will go through the steps required to create a simple log ingestion pipeline from
Fluent Bit → Data Prepper → OpenSearch. This log ingestion flow is shown in the diagram below.
- An OpenSearch domain running through Docker
- A FluentBit agent running through Docker
- Data Prepper, which includes a
log_pipeline.yaml
- An Apache Log Generator in the form of a python script
-
Take a look at the docker-compose.yaml. This
docker-compose.yaml
will pull the FluentBit and OpenSearch Docker images and run them in thelog-ingestion_opensearch-net
Docker network. -
Now take a look at the fluent-bit.conf. This config will tell FluentBit to tail the
/var/log/test.log
file for logs, and uses the FluentBit http output plugin to forward these logs to the http source of Data Prepper, which runs by default on port 2021. Thefluent-bit.conf
file is mounted as a Docker volume through thedocker-compose.yaml
. -
An empty file named
test.log
has been created. This file is also mounted through thedocker-compose.yaml
, and will be the file FluentBit is tailing to collect logs from. -
Now that you understand a bit more about how FluentBit and OpenSearch are set up, run them with:
docker-compose --project-name data-prepper up
- Pull down the latest Data Prepper Docker image.
docker pull opensearchproject/data-prepper:2
-
Take a look at log_pipeline.yaml. This configuration will take logs sent to the http source, process them with the Grok Processor by matching against the
COMMONAPACHELOG
pattern, and send the processed logs to a local OpenSearch sink to an index namedapache_logs
. -
Run the Data Prepper docker image with the
log_pipeline.yaml
from step 2 passed in. This command attaches the Data Prepper Docker image to the Docker networklog-ingestion_opensearch_net
so that FluentBit is able to send logs to the http source of Data Prepper.
Run the following to start Data Prepper:
docker run --name data-prepper -v ${PWD}/log_pipeline.yaml:/usr/share/data-prepper/pipelines/log_pipeline.yaml --network "data-prepper_opensearch-net" opensearchproject/data-prepper:2
If Data Prepper is running correctly, you should see something similar to the following line as the latest output in your terminal.
INFO org.opensearch.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900
Note that if you just want to see the log ingestion workflow in action, you can simply copy and paste some logs into the test.log
file yourself without using the Python Fake Apache Log Generator.
Here is a sample batch of randomly generated Apache Logs if you choose to take this route.
63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] "GET /search/tag/list HTTP/1.0" 200 5003
71.52.186.114 - - [04/Nov/2021:15:07:27 -0500] "GET /search/tag/list HTTP/1.0" 200 5015
223.195.133.151 - - [04/Nov/2021:15:07:29 -0500] "GET /posts/posts/explore HTTP/1.0" 200 5049
249.189.38.1 - - [04/Nov/2021:15:07:31 -0500] "GET /app/main/posts HTTP/1.0" 200 5005
36.155.45.2 - - [04/Nov/2021:15:07:33 -0500] "GET /search/tag/list HTTP/1.0" 200 5001
4.54.90.166 - - [04/Nov/2021:15:07:35 -0500] "DELETE /wp-content HTTP/1.0" 200 4965
214.246.93.195 - - [04/Nov/2021:15:07:37 -0500] "GET /apps/cart.jsp?appID=4401 HTTP/1.0" 200 5008
72.108.181.108 - - [04/Nov/2021:15:07:39 -0500] "GET /wp-content HTTP/1.0" 200 5020
194.43.128.202 - - [04/Nov/2021:15:07:41 -0500] "GET /app/main/posts HTTP/1.0" 404 4943
14.169.135.206 - - [04/Nov/2021:15:07:43 -0500] "DELETE /wp-content HTTP/1.0" 200 4985
208.0.179.237 - - [04/Nov/2021:15:07:45 -0500] "GET /explore HTTP/1.0" 200 4953
134.29.61.53 - - [04/Nov/2021:15:07:47 -0500] "GET /explore HTTP/1.0" 200 4937
213.229.161.38 - - [04/Nov/2021:15:07:49 -0500] "PUT /posts/posts/explore HTTP/1.0" 200 5092
82.41.77.121 - - [04/Nov/2021:15:07:51 -0500] "GET /app/main/posts HTTP/1.0" 200 5016
Additionally, if you just want to test a single log, you can send it to test.log
directly with:
echo '63.173.168.120 - - [04/Nov/2021:15:07:25 -0500] "GET /search/tag/list HTTP/1.0" 200 5003' >> test.log
In order to simulate an application generating logs, a simple python script will be used. This script only runs with python 2. You can download this script by running
git clone https://github.com/graytaylor0/Fake-Apache-Log-Generator.git
Note the requirements in the README of the Apache Log Generator. You must have Python 2.7 and you must run
pip install -r requirements.txt
to install the necessary dependencies.
Run the apache log generator python script so that it sends an apache log to the test.log
file from the fluent-bit docker-compose.yaml
every 2 seconds.
python apache-fake-log-gen.py -n 0 -s 2 -l "CLF" -o "LOG" -f "/full/path/to/test.log"
You should now be able to check your terminal output for FluentBit and Data Prepper to verify that they are processing logs.
The following FluentBit ouptut means that FluentBit was able to forward logs to the Data Prepper http source
fluent-bit | [2021/10/30 17:16:39] [ info] [output:http:http.0] host.docker.internal:2021, HTTP status=200
Finally, head into OpenSearch Dashboards (http://localhost:5601) to view your processed logs.
You will need to create an index pattern for the index provided in your pipeline.yaml
in order to see them. You can do this by selecting the Manage
menu with the gear icon at the top of the home page and then the Index Patterns
menu on the left side of the page. Select the Create index pattern
button and then start typing in the name of the index you sent logs to in the Index pattern name
field (in this guide it was apache_logs
). You should see that the index pattern matches 1 source. Click Next Step
and then Create index pattern
. After, you should be able to go to the Discover
page with a link on the menu to the left, and see your processed logs.