You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On big infraestructures we would like to store and process only needed data. Suppose a cluster of apache servers , with big load and we need only the number of hits/interval and response time processed from their access log.
Suppose our servers are proccessing up than 3millons of hits/hour ( on 10 servers) and only need 3 metrics ( hits and average,max,p90 response time)
So we would only store 3x10x60 = 180 metrics / hour instead of 3 millions of inserts with a lot of unneeded data.
We can use telegraf and logparser as the base for this work, this could be interesting to get log processing also over windows systems.
Feature Request
We would like to have a config option for each file with switch behaviour from "all events sent" to "only summarized send", and also the kind of summarization , how to group data and what to send.
Proposal:
configuration could be something like that.
[[inputs.logparser]]
# files should be an array of "id"-"filename"
files = [
["8080","/var/log/httpd/access8080.log"],
["80","/var/log/httpd/access80.log"],
["443","/var/log/httpd/access443.log]
]
from_beginning = false
[inputs.logparser.grok]
custom_patterns = '''
APACHE_LOG_WITH_RT %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)
%{COMMONAPACHELOG} %{QS:referrer} %{QS:agent}
%{INT:rt}
'''
#input.logparse.(id)
[inputs.logparser."8080"]
send_all_events=true
#nothing more to config here.
#input.logparse.(id)
[inputs.logparser."80"]
send_all_events=false
id_tag=log_port
#match.grok field.regex_filter
[match."rawrequest"."/some/url.*[a-Z]$"]
#measurement where group data
extratags=[ url="myurl" , othertag="valuetag"]
measurement="http_stats"
groupby_grok_field=response
groupby_tag="httpcode"
[field "hits_x_interval"]
#summarize_type should be any of "counter,sum,max,min,avg,percentile(N)"
summarize_type="counter"
summarize_grok_field=any
[field "rt_avg"]
summarize_type="avg"
summarize_grok_field=rt
[field "rt_max"]
summarize_type="max"
summarize_grok_field=rt
[field "rt_p90"]
summarize_type="percentile(90)"
summarize_grok_field=rt
Desired behavior:
with this config we will get data : measurement [fields] tags, as follows
We will be doing a generic solution for this for all plugins, not just the logparser. See #1419. You may also be able to do some of this already using the pass/drop filters?
Directions
On big infraestructures we would like to store and process only needed data. Suppose a cluster of apache servers , with big load and we need only the number of hits/interval and response time processed from their access log.
Suppose our servers are proccessing up than 3millons of hits/hour ( on 10 servers) and only need 3 metrics ( hits and average,max,p90 response time)
So we would only store 3x10x60 = 180 metrics / hour instead of 3 millions of inserts with a lot of unneeded data.
We can just do this with collectd +Tail Plugin
https://collectd.org/wiki/index.php/Plugin:Tail
or collectd + apachelog plugin.
https://github.com/toni-moreno/collectd-apachelog-plugin
We can use telegraf and logparser as the base for this work, this could be interesting to get log processing also over windows systems.
Feature Request
We would like to have a config option for each file with switch behaviour from "all events sent" to "only summarized send", and also the kind of summarization , how to group data and what to send.
Proposal:
configuration could be something like that.
Desired behavior:
with this config we will get data : measurement [fields] tags, as follows
What do you think about?
The text was updated successfully, but these errors were encountered: