-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ongoing Benchmarking of system module #20877
Comments
Pinging @elastic/integrations (Team:Integrations) |
Thanks a lot! |
@fearful-symmetry thanks for running these benchmark tests, they are very helpful. Looks like process has the large share of disk footprint and reducing them from per event to every 10s didn't result into much savings (29589 vs 30617 for process metricset). I assume per event data footprint will depend on the process activity on the machine. Is that correct? If yes, it may be worth to extend ths test by running an application on the box and load testing it to see if it makes a difference in terms of # of process docs? Also looks like storage footprint for process is higher (( 30617 -> 8.5mb) than any other metricset. Are you able to share a raw process document? I am curious if there are ways to optimize the document size for process metricset? FYI @sorantis in case you haven't seen this already. |
The process metricset is set to 10s by default, the #4 benchmark was for
There's some variation in
Metricbeat here is running on the same server as ES, Kibana and a Jupyter Lab instance I keep running, precisely so I could collect data under a somewhat more realistic environment.
Here's one of the chunkier documents. Depending on the process in question, we might get more fields or larger fields. I suspect the disk usage has more to do with the fact that it's sending a particularly large amount of documents (5-7) per period. For example,
|
From the Metrics UI perspective we don't need everything from the process metricset. Currently we're looking at the following details to expose in the UI. Generally the process information would need to be collected at a higher rate, because it's so dynamic. We should control the amount of information produced for each process as well as retention period for process metricset, which I presume should be shorter than for other information. |
@sorantis I think you're on the right track. Speaking as someone who used the stack extensively at my last job, the |
Thanks again for the data @fearful-symmetry . With this, we could say that an upper bound with the current defaults is around 40 MB/day (single replica) per host. |
Closing this, since we have all the data we need. |
Over the weekend I've started benchmarking the disk usage of the system module in metricbeat, based on conversations with @andresrc and @mukeshelastic. We're hoping this data can better inform our decisions about where to take the system Integration in the next few releases. Keep in mind this is ongoing, and I'm running more tests as I'm writing this.
Parameters
Each test was run for 710 minutes on the same server with 64GB RAM, a 6TB main disk and two 6-core Xeons. In addition to Metricbeat, the server was running ES and Kibana, since we're not as interested in benchmarking storage on Idle servers. We might want to run some additional tests with different runtimes in the future.
Benchmark 1: All Defaults
This test is just metricbeat OOTB. Nothing changed accept for connectivity settings.
After 710 minutes, metricbeat had used 18.5mb over 75096 docs:
Here's a breakdown of documents by metricset, as obtained by a terms aggregation:
Here's a nice pie chart:
Process and Network alone take up more than half of the space.
Benchmark 2: Disabling metadata processors.
This test was the same as above, but I disabled the
add_*_processor
processors, to see how much that's tacking on.In this case, it looks like the metadata processors added about 2.4MB after 710 minutes.
Benchmark 3: Separate indices per-metricset
For this test, I disabled ILM and used the
name.metricset
field to create dynamic indices per-metricset. I wanted this to be a specific test, since I wasn't sure how this would affect any on-disk compression or storage.Or, presented in a more useful format, with size in KB:
Some of this data is pretty interesting, and suggests that there's some kind of document compression going on behind the scenes that I don't quite understand, as it looks like larger indices are compressed more efficiently. Also, the total for all indices comes out to 14.7MB, which is a few MB larger than the benchmark in test 1. Regardless, it demonstrates that that
process
andnetwork
are the biggest offenders by a wide margin. Keep in mind thatprocess
reports per-event, andnetwork
per-interface.benchmark 4: Seperate indicies per-metricset, period the same across all metricsets
This is the same as benchmark 3, but all metrics are now on a 10 second period. In the default settings,
uptime
,fsstat
, andfilesystem
are on 15 minute and 1 minute periods.Here's another more useful chart:
It's a bit easier to see that
process
's outsize disk usage is down to the number of events it's sending: five per period by default. The high usage ofmemory
is due to the extra metrics added on Linux.The text was updated successfully, but these errors were encountered: