Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next generation pipeline #4340

Closed
wants to merge 2 commits into from

Conversation

andrewvc
Copy link
Contributor

This replaces #4254 which got just a bit too big. It's squashed and rebased off master.

@andrewvc andrewvc mentioned this pull request Dec 11, 2015
@andrewvc
Copy link
Contributor Author

I've done some benchmarking parsing apache logs with the cloud benchmarker and indexing them to ES. The results are pretty interesting. The upshot is, for this benchmark, the ng_pipeline was 28.67% faster than master.

This is a preliminary benchmark of just apache log file parsing (with grok / geoip / useragent) with the cloud benchmarker. I anticipate other benchmarks to act differently. The comments below discuss its performance on this task.

To start with the new pipeline seems quite a bit faster BUT the tuning is all different. There are three parameters to tune the new pipeline:

  • -w number of workers
  • -b batch size
  • -u batch pop latency

The key takeaways were:

  • Have more takeaways than cores for either pipeline. -w 8 seemed to be the sweet spot.
  • The -u setting doesn't change much if anything. We should leave it in, just in case, but the default of 5ms works fine.
  • Tuning -b is very important! For the apache logs a larger batch size yielded significant gains. We may strongly want to consider size based dynamic batches to help optimize this. Currently its hard to know the size of an event, but with the persistent pipeline that should be easier.
  • Adjusting master's SizedQueue size did nothing

pipeline_benchmarks_-_google_sheets

@andrewvc
Copy link
Contributor Author

I should mention that watching these benchmarks in htop it seemed that the classic pipeline didn't saturate the cores as effectively and used more system time. I think there are some constant time operations in it that impede thread liveness.

This was partially measured in other benchmarks in #4254 by running the two pipelines within the time command. There's probably an even better way to get at this impact.

@colinsurprenant
Copy link
Contributor

I just completed a round of tests & benchmarks using 2 different configs and sets of inputs. I will put my results in a table but before I wanted to report I problem I encountered using many workers and bigger batch size so that other can see if they can reproduce.

I am seeing the following error output and getting various output delays (output completely stops for seconds)

Concurrent::TimeoutError {:class=>"Concurrent::TimeoutError", :backtrace=>nil, :level=>:warn}

The config & data used is in https://gist.github.com/colinsurprenant/8143a0dfd30830d57b03 and the command line where I started noticing these errors is with

$ while true; do cat syslog.dat; done | bin/logstash -w 8 -b 8000 -f syslog.conf

using syslog.dat and syslog.conf in the gist ^^

@jordansissel
Copy link
Contributor

@andrewvc your most recent patch (the while loop timer) LGTM, butI haven't tested it.

@andrewvc andrewvc force-pushed the squashed_ng_pipeline branch 2 times, most recently from 2ae204a to c02dd53 Compare December 14, 2015 23:06
@ph
Copy link
Contributor

ph commented Dec 15, 2015

@andrewvc what was the issue with the task?

@colinsurprenant
Copy link
Contributor

@andrewvc perfect! I much prefer this simpler flushing implementation, plus it uses Stud.stoppable_sleep which is preferable for handling the shutdown sequence.

I haven't been able to reproduce the Concurrent::TimeoutError exception 👍

Before I post my benchmarks results, I realize that the high workers + bigger batch runs might have been slowed down by the heap memory pressure, so I will re-run them and post results!

@andrewvc
Copy link
Contributor Author

Thanks @colinsurprenant ! In retrospect, the benchmarks I ran above probably saw declining numbers with greater batch sizes due to memory pressure more than CPU. That said, I think I'll hold off on running further benchmarks till a future date since the main point is that the new pipeline is much more efficient than the old one given similar resources.

def busy_workers
@worker_queue.size
end
end end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woa cleaner +1

@colinsurprenant
Copy link
Contributor

Here are my results.

  • MBP 13r, 2.8 GHz Intel Core i7, 16GB
  • Java 8 1.8.0_45-b14
  • logstash config:
input {
  stdin {
    codec => json_lines
  }
}

filter {
  mutate {
    add_field => ["[fields][date]",  "%{+YYYY-MM-dd:HH:mm:ss}"]
    add_field => ["[fields][mixed]",  "foo %{+YYYY-MM-dd} %{foo}"]
    add_field => ["[fields][foo]",  "%{foo}"]
    add_field => ["[fields][bar]",  "%{bar}"]
    remove_field => ["foo"]
    remove_field => ["bar"]
  }
}

output {
  stdout { codec => json_lines }
}
  • base command line, adjust -w and -b per bench:
$ yes '{"foo":"testfoo","bar":"testbar"}' | LS_HEAP_SIZE=4g bin/logstash -w X -b Y -f bench.conf | pv -Wlart > /dev/null
  • results for a 1 minute run, the average k lines per sec (or kTPS). Note that only a single run has been measured for each scenario, and there is slight variations from run to run with same parameters.

screen shot 2015-12-15 at 1 30 56 pm

@colinsurprenant
Copy link
Contributor

LGTM

@ph
Copy link
Contributor

ph commented Dec 15, 2015

did another run LGTM, nice work @andrewvc

@andrewvc
Copy link
Contributor Author

Thanks for putting in so much time to review and find bugs in this PR @colinsurprenant and @ph !

@elasticsearch-bot
Copy link

Andrew Cholakian merged this into the following branches!

Branch Commits
master 028576b, 3a1b0f2

@andrewvc andrewvc closed this in 028576b Dec 15, 2015
andrewvc added a commit that referenced this pull request Dec 15, 2015
@colinsurprenant
Copy link
Contributor

one observation: as we can see, these isn't much a difference in the previous performance numbers, for the same workers and default batch the ng-pipeline is slightly faster. I will not post all the results but I performed a few similar tests using the config at https://gist.github.com/colinsurprenant/8143a0dfd30830d57b03 and the performance gain is higher, for example from 5.5k/s in master to 6.6k/s with ng-pipeline which is a lot more significative.

In the previous benchmarks, the input side had more to do because of the JSON parsing by the codec and this input is not parallelized so I think that the input was the actual choke point while in the other tests, the input is straight stdin line input and everything is done in the filters so the hard work is effectively in the filter stage which benefits the ng batching etc.

@suyograo suyograo mentioned this pull request Dec 16, 2015
andrewvc added a commit to andrewvc/logstash that referenced this pull request Jan 4, 2016
This is predicated on the fact that with the ng_pipeline it is expected that workers
will spend a significant amount of time in iowait due to outputs like the Elasticsearch
output. In benchmarks based on real-world Apache log files the best performance came out
of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of
cores is a defensive decision that should handle cases where users have particularly IO
heavy inputs.

For most users we should recommend setting the number of workers to be as high as possible
until performance decreases.

Previous benchmark information:
elastic#4340
andrewvc added a commit to andrewvc/logstash that referenced this pull request Jan 5, 2016
This is predicated on the fact that with the ng_pipeline it is expected that workers
will spend a significant amount of time in iowait due to outputs like the Elasticsearch
output. In benchmarks based on real-world Apache log files the best performance came out
of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of
cores is a defensive decision that should handle cases where users have particularly IO
heavy inputs.

For most users we should recommend setting the number of workers to be as high as possible
until performance decreases.

Previous benchmark information:
elastic#4340
andrewvc added a commit that referenced this pull request Jan 5, 2016
This is predicated on the fact that with the ng_pipeline it is expected that workers
will spend a significant amount of time in iowait due to outputs like the Elasticsearch
output. In benchmarks based on real-world Apache log files the best performance came out
of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of
cores is a defensive decision that should handle cases where users have particularly IO
heavy inputs.

For most users we should recommend setting the number of workers to be as high as possible
until performance decreases.

Previous benchmark information:
#4340

Fixes #4414
andrewvc added a commit that referenced this pull request Jan 5, 2016
This is predicated on the fact that with the ng_pipeline it is expected that workers
will spend a significant amount of time in iowait due to outputs like the Elasticsearch
output. In benchmarks based on real-world Apache log files the best performance came out
of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of
cores is a defensive decision that should handle cases where users have particularly IO
heavy inputs.

For most users we should recommend setting the number of workers to be as high as possible
until performance decreases.

Previous benchmark information:
#4340

Fixes #4414
andrewvc added a commit that referenced this pull request Jan 5, 2016
This is predicated on the fact that with the ng_pipeline it is expected that workers
will spend a significant amount of time in iowait due to outputs like the Elasticsearch
output. In benchmarks based on real-world Apache log files the best performance came out
of scenarios where pipeline_workers > num_cpu_cores. Setting this to default to the # of
cores is a defensive decision that should handle cases where users have particularly IO
heavy inputs.

For most users we should recommend setting the number of workers to be as high as possible
until performance decreases.

Previous benchmark information:
#4340

Fixes #4414
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants