-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Next generation pipeline #4340
Next generation pipeline #4340
Conversation
I've done some benchmarking parsing apache logs with the cloud benchmarker and indexing them to ES. The results are pretty interesting. The upshot is, for this benchmark, the ng_pipeline was 28.67% faster than master. This is a preliminary benchmark of just apache log file parsing (with grok / geoip / useragent) with the cloud benchmarker. I anticipate other benchmarks to act differently. The comments below discuss its performance on this task. To start with the new pipeline seems quite a bit faster BUT the tuning is all different. There are three parameters to tune the new pipeline:
The key takeaways were:
|
I should mention that watching these benchmarks in htop it seemed that the classic pipeline didn't saturate the cores as effectively and used more This was partially measured in other benchmarks in #4254 by running the two pipelines within the |
I just completed a round of tests & benchmarks using 2 different configs and sets of inputs. I will put my results in a table but before I wanted to report I problem I encountered using many workers and bigger batch size so that other can see if they can reproduce. I am seeing the following error output and getting various output delays (output completely stops for seconds)
The config & data used is in https://gist.github.com/colinsurprenant/8143a0dfd30830d57b03 and the command line where I started noticing these errors is with $ while true; do cat syslog.dat; done | bin/logstash -w 8 -b 8000 -f syslog.conf using |
@andrewvc your most recent patch (the while loop timer) LGTM, butI haven't tested it. |
2ae204a
to
c02dd53
Compare
@andrewvc what was the issue with the task? |
c02dd53
to
77c1ee0
Compare
@andrewvc perfect! I much prefer this simpler flushing implementation, plus it uses I haven't been able to reproduce the Before I post my benchmarks results, I realize that the high workers + bigger batch runs might have been slowed down by the heap memory pressure, so I will re-run them and post results! |
Thanks @colinsurprenant ! In retrospect, the benchmarks I ran above probably saw declining numbers with greater batch sizes due to memory pressure more than CPU. That said, I think I'll hold off on running further benchmarks till a future date since the main point is that the new pipeline is much more efficient than the old one given similar resources. |
def busy_workers | ||
@worker_queue.size | ||
end | ||
end end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woa cleaner +1
Here are my results.
$ yes '{"foo":"testfoo","bar":"testbar"}' | LS_HEAP_SIZE=4g bin/logstash -w X -b Y -f bench.conf | pv -Wlart > /dev/null
|
LGTM |
did another run LGTM, nice work @andrewvc |
Thanks for putting in so much time to review and find bugs in this PR @colinsurprenant and @ph ! |
one observation: as we can see, these isn't much a difference in the previous performance numbers, for the same workers and default batch the ng-pipeline is slightly faster. I will not post all the results but I performed a few similar tests using the config at https://gist.github.com/colinsurprenant/8143a0dfd30830d57b03 and the performance gain is higher, for example from 5.5k/s in master to 6.6k/s with ng-pipeline which is a lot more significative. In the previous benchmarks, the input side had more to do because of the JSON parsing by the codec and this input is not parallelized so I think that the input was the actual choke point while in the other tests, the input is straight stdin line input and everything is done in the filters so the hard work is effectively in the filter stage which benefits the ng batching etc. |
This is predicated on the fact that with the ng_pipeline it is expected that workers will spend a significant amount of time in iowait due to outputs like the Elasticsearch output. In benchmarks based on real-world Apache log files the best performance came out of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of cores is a defensive decision that should handle cases where users have particularly IO heavy inputs. For most users we should recommend setting the number of workers to be as high as possible until performance decreases. Previous benchmark information: elastic#4340
This is predicated on the fact that with the ng_pipeline it is expected that workers will spend a significant amount of time in iowait due to outputs like the Elasticsearch output. In benchmarks based on real-world Apache log files the best performance came out of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of cores is a defensive decision that should handle cases where users have particularly IO heavy inputs. For most users we should recommend setting the number of workers to be as high as possible until performance decreases. Previous benchmark information: elastic#4340
This is predicated on the fact that with the ng_pipeline it is expected that workers will spend a significant amount of time in iowait due to outputs like the Elasticsearch output. In benchmarks based on real-world Apache log files the best performance came out of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of cores is a defensive decision that should handle cases where users have particularly IO heavy inputs. For most users we should recommend setting the number of workers to be as high as possible until performance decreases. Previous benchmark information: #4340 Fixes #4414
This is predicated on the fact that with the ng_pipeline it is expected that workers will spend a significant amount of time in iowait due to outputs like the Elasticsearch output. In benchmarks based on real-world Apache log files the best performance came out of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of cores is a defensive decision that should handle cases where users have particularly IO heavy inputs. For most users we should recommend setting the number of workers to be as high as possible until performance decreases. Previous benchmark information: #4340 Fixes #4414
This is predicated on the fact that with the ng_pipeline it is expected that workers will spend a significant amount of time in iowait due to outputs like the Elasticsearch output. In benchmarks based on real-world Apache log files the best performance came out of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of cores is a defensive decision that should handle cases where users have particularly IO heavy inputs. For most users we should recommend setting the number of workers to be as high as possible until performance decreases. Previous benchmark information: #4340 Fixes #4414
This replaces #4254 which got just a bit too big. It's squashed and rebased off master.