Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown if iptables-save fails #1

Open
perlun opened this issue Nov 6, 2019 · 3 comments
Open

Shutdown if iptables-save fails #1

perlun opened this issue Nov 6, 2019 · 3 comments

Comments

@perlun
Copy link

perlun commented Nov 6, 2019

Hi,

Thanks for an interesting exporter. We are currently investigating using it for one of our deployments.

When developing the Ansible role for deploying it to a server, I discovered that the error handling works in a slightly unusual way: if the capabilities for the process described in https://github.com/retailnext/iptables_exporter#required-permissions are not available, each request to /metrics will result in an exit status 1 error being logged to the syslog. The process will remain running.

I think this is not perhaps an ideal error handling strategy. It would be better if iptables-save would be called right on start, and if it fails, abort the process (exit with a non-zero return code). That way, the semantics are much more clearer to the user that "something is not working as it should".

Your thoughts on this, @eriksw?

Nov  6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Starting iptables_exporter (version=0.1.0, branch=master, revision=9172cd2acb4afc4ba77d48c85a46fbef36d5cb80)" source="iptables_exporter.go:153"
Nov  6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Build context (go=go1.10.3, user=root@f2efa59b1ef9, date=20180628-04:29:29)" source="iptables_exporter.go:154"
Nov  6 13:43:00 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:00Z" level=info msg="Listening on 0.0.0.0:9455" source="iptables_exporter.go:170"
Nov  6 13:43:46 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:43:46Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
Nov  6 13:44:23 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:44:23Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
Nov  6 13:44:24 scalability-test-centre iptables_exporter[71596]: time="2019-11-06T13:44:24Z" level=error msg="exit status 1" source="iptables_exporter.go:95"
@eriksw
Copy link
Collaborator

eriksw commented Nov 6, 2019

One concern I have with exiting out is if there are any circumstances where an exit status of 1 from iptables-save could be transient, but I think it's entirely reasonable to expect that the exporter is run from something that will restart it.

A related thought I have is that iptables_scrape_success might not be a good idea, that it might be better if the request from Prometheus gets an http error response, so that the exporter being in an unhealthy state (unable to run iptables-save) will trip a generic "master caution" alert (up == 0) within Prometheus to draw attention to the issue.

This project isn't a high priority for me at the moment, but I'd be happy to accept a PR improving on this in either or both ways. (Also eratta like switching to go modules and bringing in a more up to date prometheus lib, etc...)

@perlun
Copy link
Author

perlun commented Nov 7, 2019

Thanks for a prompt reply. Yes, we run the exporter from systemd (like I presume most people would do), so automatic restarts are simple in that sense.

A related thought I have is that iptables_scrape_success might not be a good idea, that it might be better if the request from Prometheus gets an http error response, so that the exporter being in an unhealthy state (unable to run iptables-save) will trip a generic "master caution" alert (up == 0) within Prometheus to draw attention to the issue.

Yep, that also sounds reasonable. (We have alerts for that specific kind of scenario, as a generic fallback in case we don't have more specific alerts for a given service)

This project isn't a high priority for me at the moment, but I'd be happy to accept a PR improving on this in either or both ways.

Would be an interesting experiment, but I don't really have really any Go experience from beforehand, and it's not really very high on my priority list either. 🙂 (= other work is more important at the moment) Given that the exporter typically works (if the capabilities are there), this is mostly for me about being able to properly handle edge cases.

I'd suggest we leave the issue open for some time, if not for anything else that it keeps track of this discussion having taken place. Maybe it will scratch an itch for someone else with more time/urgency to get this improved. 😇

@jwalzer
Copy link

jwalzer commented Oct 7, 2020

Thanks @eriksw for that exporter. I was about to write an uggly shellscript myself, when I found this project.

Concerning the issue, I may add, that the semantics of prometheus exporters have never been, to change the HTTP-returncode, when an error occurs for the scraping.

Signaling config- or scrape- errors in a dedicated metric is what node_exporter does for example (ie for textcollector metrics).

Having an error metric and a last-successfull-scrape metric would be suggestion, if anyone diggs into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants