-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: reworked varnish_cache plugin #9432
Conversation
|
||
// Find the most recent 'VBE.reload_' prefix using string compare | ||
// 'VBE.reload_20210623_170621_31083' | ||
func findActiveReloadPrefix(countersJSON map[string]interface{}) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach only work if you reload services, but if you load vcl manually with custom name doesn't valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked this PR, now it is part of the varnish plugin, varnishadm tool is used to get active vcl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ismaelpuerto is this concern resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rhajek thanks for the PR! It's a significant change to the varnish plugin. I've left review comments with suggestions and some questions that would be great if you could respond to/ resolve.
Are we trying to support a specific version of varnish with this plugin or is it cross-version compatible?
There's also a test failure in circle-ci that would need to be resolved:
FAIL: TestJsonTypes (0.00s)
varnish_test.go:617:
Error Trace: varnish_test.go:617
Error: Not equal:
expected: float64(123.45)
actual : <nil>(<nil>)
Test: TestJsonTypes
FAIL
FAIL github.com/influxdata/telegraf/plugins/inputs/varnish
Has this been tested by someone who was experiencing the cardinality issue?
@ismaelpuerto have you had a chance to test this out and see if it works when you load a vcl manually? It would also be great if we could check that this is backwards compatible when the metric version is set to 1. As I would expect this change in behavior with removal of nonactive VCL's to only happen if a user configures the plugin to be metric version 2.
plugins/inputs/varnish/varnish.go
Outdated
@@ -139,6 +244,16 @@ func (s *Varnish) Gather(acc telegraf.Accumulator) error { | |||
continue | |||
} | |||
|
|||
//skip not active vcls | |||
vMetric := parseMetricV2(stat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is parse metrics v2 inside a function named process metrics v1? I would expect metric 1 to keep backwards compatibility for our users (so not removing nonactive VCL's).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal was also to fix https://github.com/influxdata/EAR/issues/2289 and strip the active VCL name from field name for v1 metrics. If we want v1 to stay fully backward compatible, we should revert this and remove also unnecessary varnishadm vcl.list
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand but we try to keep backwards compatibility as up until now nonactive vcl's were reported. The approach is to give users the option to opt in to new behavior rather than changing behavior on an upgrade of Telegraf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this MP. A couple of small in-line comments and questions and one higher-level question:
Can you explain why version 2 metrics include the id
tag while version 1 metrics omit it? This seems to greatly increase the cardinality even on my little system running varnish + apache2 with the default configs. See my comment in the parseMetricV2
function.
Given one purpose of this MP is to reduce cardinality this seems to do the opposite as I have run it above.
Thanks!
edit: updated link to version 2 metrics
plugins/inputs/varnish/varnish.go
Outdated
//check vcl.list output | ||
if jsonOut[0].(float64) != 2 { | ||
return "", fmt.Errorf("unsupported varnishadm format %v", jsonOut[0]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of any docs on when and how this version value changes? Even though it is at 2 right now, it will be a breaking surprise when it does in fact change.
Not required, just something for next time: If it is relatively static, then it might have been nice to unmarshal this into an object, as you do below with vclStruct, to make the references below easier to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only documentation that I found:
- https://varnish-cache.org/docs/6.6/reference/varnish-cli.html#json
- https://varnish-cache.org/docs/6.6/reference/varnish-cli.html#vcl-list-j
version "2" is still the same from 6.0.3 released in 9 Feb 2019
May be, check for the presence of "vcl.list", "-j" is enough and we can ignore version.
Unmarshal json array like this
[ 2, ["vcl.list", "-j"], 1631878235.913,
{
"status": "active",
"state": "auto",
"temperature": "warm",
"busy": 0,
"name": "boot"
}
]
into object is definitely better, I will try to fix this.
metric.fieldName = val | ||
} else if val != "" { | ||
metric.tags[sub] = val | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where id
is getting added as a tag and why I am asking at a high level why we are increasing the cardinality on that value. Is that something users want to index against?
It seems concerning to add anything that doesn't match _vcl
or _field
as a tag as any change to the data format is going to cause cardinality to increase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review. Id tag is added by the last default regexp rule (?P)
//generic metric like MSE_STORE.store-1-1.g_aio_running_bytes_write
regexp.MustCompile(`([\w\-]*)\.(?P<id>[\w\-.]*)\.([\w\-]*)`),
My idea was not to increase the cardinality, but make the field names corresponding with https://varnish-cache.org/docs/trunk/reference/varnish-counters.html
Cardinality issue was caused because varnish adds timestamps into its metrics names when reloading. This should be solved by tripping vcl name from the name.
Example: we have following varnish metrics
"MEMPOOL.busyobj.allocs"
"MEMPOOL.ssl_buf.allocs"
"MEMPOOL.req0.allocs"
"MEMPOOL.sess0.allocs"
"MEMPOOL.req1.allocs"
"MEMPOOL.sess1.allocs"
Middle part identifies the name of resource. There are limited number of these so it is suitable for tagging/indexing.
We reduce number of tags but increase number of fields. I think cardinality in this case is the same.
varnish,section=MEMPOOL,id=busyobj alloc=xxxx
varnish,section=MEMPOOL,id=ssl_buf alloc=xxxx
varnish,section=MEMPOOL,id=req0 alloc=xxxx
varnish,section=MEMPOOL busyobj.alloc=xxxx
varnish,section=MEMPOOL ssl_buf.alloc=xxxx
varnish,section=MEMPOOL req0.alloc=xxxx
Biggest disadvantage of this approach i see is that this approach will not be backward compatible with v1.
Varnish metrics can have various names, it tried to cover that in 'plugins/inputs/varnish/varnish_test.go:441'
@helenosheaa @ismaelpuerto @anthosz So should we remove the id tag and add middle part into field name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a static number of values that will be set as id=tagvalue
I don't think it will be as much of cardinality problem as we were seeing with vcl name where fields included timestamps or ip's so they were constantly increasing.
However what is the advantage of splitting it out into a tag? The more consistency between v1 and v2 unless there is a value add I think the better.
That said I believe the build from this PR is being tested so shall we make a decision on this once we get feedback from that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting varnish metric names into tags can be useful when creating dashboards. Represent a newly added resource as a tag looks more natural to me, it will be automatically shown in the dashboard if you use generic queries and filters.
But, may be, more important is to keep v2 as much as possible backward compatible and not break existing things.
User can always use custom regexps in telegraf configuration to create custom tag splitting if needed.
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
hey @ismaelpuerto would you mind testing this build with the two different metric versions? |
Hello, I just test with both version and It looks fine, For this test I used varnish-plus-6.0.8r4 and I tested metrics like kvstore, vha and accounting, regarding this metric, maybe it make sense that exists a tag with value of <namespace>, but I can understand that this a special case with a low audience and right now, it's not priority. Regarding tags, for me is fine have "section" only, I don't need the name of vcl as tag because we have metric "n_vcl" to know if there was loaded a new vcl. |
To add a namespace tag for accounting metrics you can add following regexp into telegraf config. Should I include this rule to the defaults ? [[inputs.varnish]] |
I think instead of adding it to defaults as it's potentially low audience right now, just add it as an example in the README so people can add if they want to. |
@popey - the approver is unclear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivankudibal @rhajek lgtm, however we are just waiting for feedback from the customer on the change before we would merge this.
@helenosheaa can we get artifacts rebuild again ? Without this I cannot test changes |
@rhajek can you rebase on master and push, please? Artifacts are only available for 30-days and there are some folks who want to test this PR out. Updating the PR with a rebase should trigger the tests and package builds again. If users feel comfortable building the go binary themselves they can also do the following:
The above will produce a binary for use on the system it was run on (i.e. if you build on |
@powersj I had to upgrade Golang to do it but it worked and I will use my compiled one to test it |
@rhajek I still see plenty of varnishstat (varnish-6.0.6 revision 29a1a8243dbef3d973aec28dc90403188c1dc8e7)
Do we have to add some special configuration to get rid of these multiple fields ? |
📦 Looks like new artifacts were built from this PR. Expand this list to get them here! 🐯Artifact URLs |
Hi @rhajek , Thank you very much for the works, I just tested with the artifacts of this day and seems to be good: I have only the active vcls. In addition, I have one remark concerning the "unhealthy" field, I cannot find it via varnishadm/varnishstat and it's always set to zero (I tried to stop the backend -> instance sick on varnish side -> but unhealthy value is always set to 0) |
Another thing concerning custom_arguments (https://github.com/influxdata/telegraf/blob/db40a348a3dbefafdf20557d221522d3751d141d/plugins/inputs/varnish/README.md#custom-arguments):
|
I fixed how to setup a custom instance name when using custom arguments in README.md. |
📦 Looks like new artifacts were built from this PR. Expand this list to get them **here**! 🐯Artifact URLs |
Co-authored-by: Helen Weller <38860767+helenosheaa@users.noreply.github.com>
Co-authored-by: Helen Weller <38860767+helenosheaa@users.noreply.github.com>
Co-authored-by: Helen Weller <38860767+helenosheaa@users.noreply.github.com>
64a8449
to
c901824
Compare
📦 Looks like new artifacts were built from this PR. Expand this list to get them here ! 🐯Artifact URLs |
Co-authored-by: Helen Weller <38860767+helenosheaa@users.noreply.github.com>
Dear @sspaink , Do you know when it will be released? Best regards, |
It will be included in the next feature release which is scheduled March 16th, if you need it sooner I recommended using a nightly release artifact: https://github.com/influxdata/telegraf/blob/master/docs/NIGHTLIES.md In the future, we are hoping to move to monthly releases so features should be released quicker: https://community.influxdata.com/t/request-for-community-feedback-rcs-monthy-releases/23703 |
Hi @sspaink , I don't see any new release since the 16th (I prefer to use a "stable" version). Do you have an estimation? Best regards, |
Plan is to start the release tomorrow - we decided to hold off a little bit so we could land some other snmp changes. |
@anthosz v1.22.0 has been released: https://github.com/influxdata/telegraf/releases/tag/v1.22.0 ! |
@sspaink thank you! \o/ |
Good job to all for this PR, all works well :) |
Resolves #5622 #6832
Added reworked version of the VarnishCache input plugin based on json parsing of
varnishstat -j
.Conversion to metrics from Varnish stats is reworked, it solves issue with high cardinality when reloading Varnish server, parses backends into tags, supports "VBE.*" metrics.