Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux UDP buffer telemetry using procfs #187

Closed

Conversation

SpencerMalone
Copy link
Contributor

We've had some problems when we cannot deploy a statsd exporter on localhost. Our exporters had misleading rate data, because the statsd exporter wasn't keeping up with events, and our linux UDP buffer was overflowing. A major part in fixing that is making it easier to track the UDP buffer overflow rate (and while I was at it, the UDP buffer current depth). This PR is for that thing.

Alternatives I explored:
I originally wanted to use https://github.com/google/cadvisor to collect this data, but was scared off of collecting UDP buffer data by a warning on one of their github issues proclaiming... tcp/udp create an _enormous_ number of additional metric streams compared with the basic metrics. I didn't want a large amount of metrics, I just wanted a few

I had also hoped to use the procfs library seen in the prom org, but could not find support in that library for these metrics, so I moved away from that.

I excluded non-linux OSes from this data collection because I don't have a great way to test them.

Very open to suggestions, lemme know your thoughts.

Signed-off-by: SpencerMalone <malone.spencer@gmail.com>
@brian-brazil
Copy link
Contributor

This is machine-level monitoring, which is the responsibility of the node exporter. I think you want the UDP InErrors metric, which is already supported.

@SpencerMalone
Copy link
Contributor Author

Hrm, any idea how I could get that data out of a container? I noticed that both netstat and the node exporters that are run on the same hosts as these containers have conflicting UDP drop rate information and seem unable to track when UDP buffer overflows happen in the containers themselves. I'm guessing maybe there's some permissions hijinx? Lemme poke around with some of our systems peeps.

@SuperQ
Copy link
Member

SuperQ commented Feb 28, 2019

@brian-brazil I need to check for sure, but I think this would be the correct way to do this. I think the kernel is monitoring per-PID stats.

I would like to see this parsing added to the procfs library, as it would be useful for things other than the statsd_exporter.

@SpencerMalone
Copy link
Contributor Author

SpencerMalone commented Feb 28, 2019

Lemme spend some more time swimming around in that codebase and I can see about opening a PR to the procfs lib.

@brian-brazil
Copy link
Contributor

I think the kernel is monitoring per-PID stats.

It's only namespaced, not per-PID.

Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm in favor – dropped UDP packets are a big issue in the statsd world and it makes sense that it does what it can to monitor itself.

Q: we know the exact listening socket because we create it, can we somehow get stats about that?

I agree that any parsing of /proc should be in procfs, and then we can use that here.

t.Fatalf("Should be able to write a procfs-like file: %s", err)
}

queued, dropped, err := parseProcfsNetFile(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you make this function take a reader (or a []byte even), you don't need to write an actual file in unit tests

@@ -181,6 +182,10 @@ func main() {

ul := &StatsDUDPListener{conn: uconn}
go ul.Listen(events)

if runtime.GOOS == "linux" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If at all possible, I would prefer that code that doesn't work on non-linux platforms doesn't get compiled in, not just skipped with an if.

@matthiasr
Copy link
Contributor

It's only namespaced, not per-PID

IFF we cannot get any per-socket stats, I would be okay with explaining the limitations of this metric but exposing it anyway. Then users can make their own judgment calls whether there could be anything other that has UDP drops on the machine.

I don't think this can be solved satisfactorily with the node exporter (because there is no way to get the per-namespace stats if the exporter is containerized) or cAdvisor (which would include too much detail from monitoring all sockets on all network namespaces, while we already know exactly which one we're interested in).

@SpencerMalone
Copy link
Contributor Author

Q: we know the exact listening socket because we create it, can we somehow get stats about that?

I had the same thought poking around last night! We can, we have the technology.

Sounds like I need a todo list. Lemme know if y'all feel that more needs to be on here:

  • Add a procfs change for net/protocol, which for now will include udp, and udp6. More would be nice here, but being honest, I probably don't have the patience to map out every protocol in procfs right now since each one is different.
  • Implement procfs changes in this PR
  • Track only the drop rate of ports we expose
  • Skip compiling udp buffer watching code on non-linux platforms
  • Figure out our error handling scenarios. Maybe bail if we fail to load the files the first time, and after that if we run into any exceptions, wait a few seconds and try again?

@matthiasr
Copy link
Contributor

matthiasr commented Mar 1, 2019 via email

@brian-brazil
Copy link
Contributor

because there is no way to get the per-namespace stats if the exporter is containerized

/proc/net/snmp is not namespaced. In addition I'd presume that /proc/net/udp has the same issues as /proc/net/tcp and is n^2 or worse performance wise.

If you're going to pull in data here it should be from the socket, not anything from /proc that may include other processes.

@SpencerMalone
Copy link
Contributor Author

@brian-brazil - This is a little deeper than I had anticipated going, but if it's a thing, it sounds like a good thing. I think I can read the max buffer size with a syscall using SO_RCVBUF, but I am having trouble figuring out how to pull out the dropped packet count. Any pointers on where to head for docs around this?

@brian-brazil
Copy link
Contributor

I thought I'd found one previously, but can't find it again now. The kernel has the data anyway.

@rtreffer
Copy link

rtreffer commented Mar 4, 2019

For linux it looks like some data is availabile through sock_diag: http://man7.org/linux/man-pages/man7/sock_diag.7.html

There should be a way to get the queued data in the receive path (I couldn't easily find it in the golang api). If you know SO_RCVBUF and the maximum packet size you'd be able to prove no packet drop in the UDP layer if queued data + max packet size < maximum buffer size.

Another simple (but not perfect) option would be to move the kernel queue draining into its own goroutine. That goroutine would read from the socket as quickly as possible and forward all packets to an internal fixed size channel in a non-blocking manner. The main assumption is that you should always be able to read the data fast enough if you do not do any processing on top. The side assumption is that you are probably game over if that does not hold.

You can then exposing the failed channel writes as a metric. This metric would describe a lower bound on dropped packets.

@brian-brazil
Copy link
Contributor

That goroutine would read from the socket as quickly as possible and forward all packets to an internal fixed size channel in a non-blocking manner.

That's possibly a good idea in any case.

@SpencerMalone
Copy link
Contributor Author

SpencerMalone commented Mar 4, 2019 via email

@matthiasr
Copy link
Contributor

That would be a massive improvement! I shied away from it so far because we rely on the implicit serialization from the single processing goroutine a lot – if you're willing to tackle that it would be great!

What do you think about starting this incrementally by first splitting the network-receive from the processing goroutine – that way we would have the metrics to see when processing is too slow? My assumption would be that a plain receive-and-stuff-into-channel will always be faster than doing the actual processing, but if you're still concerned we could also add the network receive buffer depth monitoring that @rtreffer mentioned.

@SpencerMalone
Copy link
Contributor Author

Bleh, I fell off on this. Here's where I got, and where I am going:
I could not for the life of me get a good UDP linux buffer overflow count without procfs. I poked at using netlink / sock_diag, got that working, and can get queued data as @rtreffer outlined, but I'm not sure that knowing "some data was lost" is enough.

I personally strongly prefer being able to say "less that 0.05% of UDP messages received are dropped". If we only have the vague knowledge of "some data was lost", we can only say "we got all the data" or "we lost an unknown amount of data", and the in between is impossible, which pushes people towards setting unmaintainable goals such as "never drop UDP packets". Furthermore, I think you would either have to do that netlink check before you read data each time or accept that you may have untracked buffer drops (because if you did it on a timer, the read operations between the data collection may drop the data enough that you go from a state of "being at max buffer" -> "being under max buffer"). Please correct me if I'm wrong here.

In this case, my preference is definitely accuracy over performance, and we're not looking at slamming procfs constantly. With those in mind, I'm gonna head down the procfs path and put this behind an optional command line flag? If there ends up being cases where the performance is unbearable, we could revisit, but worst case they could just disable the logic for now.

@matthiasr
Copy link
Contributor

I think in this case aiming for 0 dropped packets is not entirely unreasonable, since drops on the network path cannot be considered for this count anyway.

I understood the sock_diag approach to include checking before every read. If we combine this with the separate reader goroutine, we can

  • prove that we did not drop any packets before they hit user space
  • keep track of the ratio of handled vs. "dropped" (by overflowing the go channel buffer) packages

I think this is going to give us more information than looking at (netns-)global drop counts, because will usually include something else.

You said you have a version with the separate read goroutine, and working code for the netlink approach. Could you submit these two (combined is fine, since the code is going to be intertwined)? I think with these two we are in a much better position for everyone, and it doesn't require anyone to make a decision or be confronted with a non-process-scoped metric.

If you think it's still necessary then, I would be willing to accept an optional procfs check as well, but if the above gives enough insight I would prefer not to add this complexity.

@SpencerMalone
Copy link
Contributor Author

Closing for #196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants