-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inputs.prometheus): Add internal metrics #14424
feat(inputs.prometheus): Add internal metrics #14424
Conversation
Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
Co-authored-by: Thomas Casteleyn <thomas.casteleyn@me.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I would still only add this metric when enabled in the config.
Hello
If you still want conditional activation, what configuration variable? => enable_request_metrics THANKS |
No it should not be enabled by default, to keep backwards compatibility. This way users can control whether to have additional metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tguenneguez! Some small comments...
Good idea Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
Good idea Co-authored-by: Sven Rebhan <36194019+srebhan@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tguenneguez!
} | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any other types of errors? What happens if there are?
} | ||
|
||
tags["result"] = resultString | ||
requestFields["result_code"] = resultCodes[resultString] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translating the error string to a numeric value is up to the user and this should not be hard-coded.
The result string can continue to exist as a tag, but the result code field should be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand but I use the same method as here:
https://github.com/influxdata/telegraf/blob/master/plugins/inputs/http_response/http_response.go#L141
var metrics []telegraf.Metric | ||
requestFields := make(map[string]interface{}) | ||
tags := map[string]string{} | ||
u.OriginalURL.User = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tguenneguez thanks for the updates - just another ping on this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates - just another ping on this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the updates - just another ping on this question.
I think you meant to respond not just copy my response? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, for what line exactly have you a question ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u.OriginalURL.User = nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I was searching on line :
var metrics []telegraf.Metric
You have a keen eye ;-)
New commit
@@ -422,6 +491,7 @@ func (p *Prometheus) gatherURL(u URLAndAddress, acc telegraf.Accumulator) error | |||
|
|||
var err error | |||
var resp *http.Response | |||
start := time.Now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be right before the client.Do and the time captured immediately after the client.Do. That way we avoid the if statements + error checking getting included in the total time.
return fmt.Errorf("error making HTTP request to %q: %w", u.URL, err) | ||
if setError(err, requestFields, tags) == nil { | ||
// Any error not recognized by `set_error` is considered a "connection_failed" | ||
setResult("connection_failed", requestFields, tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing the return code and response time of the request is acceptable for a new metric, however, adding all these additional error conditions is not something to start adding to all the plugins. These errors are reported via the the internal metrics already and you can you that or the lack of data as alerts.
We agreed to adding the basic support for capturing timing information. Duplicating the entire http_response plugin inside the prometheus plugin is not acceptable. See this comment and Sven's response: #13103 (comment) |
I understand your point of view, but when on a telegraf agent we have several prometheus inputs, if we base ourselves on the internal metrics, then it is difficult to remotely identify which source is defective.
The metrics was : |
…neguez/telegraf into Add-collection-information
No it was not. We should have been more explicitly, but it was in reference to what hipska had proposed previously: response_time and content_length.
You have also been insistent that you must collect timing information. Which means you are trying to resolve two things in one PR. Do one at a time. As-is this is not going to land. |
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working with us to get this in a great state.
Summary
Be able to have information about collect data off this plugin.
Checklist
Related issues
resolves #13103