Add linux UDP buffer telemetry using procfs #187

SpencerMalone · 2019-02-28T21:47:37Z

We've had some problems when we cannot deploy a statsd exporter on localhost. Our exporters had misleading rate data, because the statsd exporter wasn't keeping up with events, and our linux UDP buffer was overflowing. A major part in fixing that is making it easier to track the UDP buffer overflow rate (and while I was at it, the UDP buffer current depth). This PR is for that thing.

Alternatives I explored:
I originally wanted to use https://github.com/google/cadvisor to collect this data, but was scared off of collecting UDP buffer data by a warning on one of their github issues proclaiming... tcp/udp create an _enormous_ number of additional metric streams compared with the basic metrics. I didn't want a large amount of metrics, I just wanted a few

I had also hoped to use the procfs library seen in the prom org, but could not find support in that library for these metrics, so I moved away from that.

I excluded non-linux OSes from this data collection because I don't have a great way to test them.

Very open to suggestions, lemme know your thoughts.

Signed-off-by: SpencerMalone <malone.spencer@gmail.com>

brian-brazil · 2019-02-28T21:56:43Z

This is machine-level monitoring, which is the responsibility of the node exporter. I think you want the UDP InErrors metric, which is already supported.

SpencerMalone · 2019-02-28T22:53:23Z

Hrm, any idea how I could get that data out of a container? I noticed that both netstat and the node exporters that are run on the same hosts as these containers have conflicting UDP drop rate information and seem unable to track when UDP buffer overflows happen in the containers themselves. I'm guessing maybe there's some permissions hijinx? Lemme poke around with some of our systems peeps.

SuperQ · 2019-02-28T23:03:09Z

@brian-brazil I need to check for sure, but I think this would be the correct way to do this. I think the kernel is monitoring per-PID stats.

I would like to see this parsing added to the procfs library, as it would be useful for things other than the statsd_exporter.

SpencerMalone · 2019-02-28T23:08:46Z

Lemme spend some more time swimming around in that codebase and I can see about opening a PR to the procfs lib.

brian-brazil · 2019-03-01T00:19:15Z

I think the kernel is monitoring per-PID stats.

It's only namespaced, not per-PID.

matthiasr

In general, I'm in favor – dropped UDP packets are a big issue in the statsd world and it makes sense that it does what it can to monitor itself.

Q: we know the exact listening socket because we create it, can we somehow get stats about that?

I agree that any parsing of /proc should be in procfs, and then we can use that here.

matthiasr · 2019-03-01T13:52:08Z

telemetry_test.go

+		t.Fatalf("Should be able to write a procfs-like file: %s", err)
+	}
+
+	queued, dropped, err := parseProcfsNetFile(filename)


if you make this function take a reader (or a []byte even), you don't need to write an actual file in unit tests

matthiasr · 2019-03-01T13:57:24Z

main.go

@@ -181,6 +182,10 @@ func main() {

 		ul := &StatsDUDPListener{conn: uconn}
 		go ul.Listen(events)
+
+		if runtime.GOOS == "linux" {


If at all possible, I would prefer that code that doesn't work on non-linux platforms doesn't get compiled in, not just skipped with an if.

matthiasr · 2019-03-01T14:00:26Z

It's only namespaced, not per-PID

IFF we cannot get any per-socket stats, I would be okay with explaining the limitations of this metric but exposing it anyway. Then users can make their own judgment calls whether there could be anything other that has UDP drops on the machine.

I don't think this can be solved satisfactorily with the node exporter (because there is no way to get the per-namespace stats if the exporter is containerized) or cAdvisor (which would include too much detail from monitoring all sockets on all network namespaces, while we already know exactly which one we're interested in).

SpencerMalone · 2019-03-01T14:16:05Z

Q: we know the exact listening socket because we create it, can we somehow get stats about that?

I had the same thought poking around last night! We can, we have the technology.

Sounds like I need a todo list. Lemme know if y'all feel that more needs to be on here:

Add a procfs change for net/protocol, which for now will include udp, and udp6. More would be nice here, but being honest, I probably don't have the patience to map out every protocol in procfs right now since each one is different.
Implement procfs changes in this PR
Track only the drop rate of ports we expose
Skip compiling udp buffer watching code on non-linux platforms
Figure out our error handling scenarios. Maybe bail if we fail to load the files the first time, and after that if we run into any exceptions, wait a few seconds and try again?

matthiasr · 2019-03-01T14:22:46Z

sounds like a good plan! Thank you for taking this on.

…

On Fri, Mar 1, 2019 at 2:16 PM SpencerMalone ***@***.***> wrote: Q: we *know* the exact listening socket because we create it, can we somehow get stats about that? I had the same thought poking around last night! We can, we have the technology. Sounds like I need a todo list. Lemme know if y'all feel that more needs to be on here: - Add a procfs change for net/protocol, which for now will include udp, and udp6. More would be nice here, but being honest, I probably don't have the patience to map out every protocol in procfs right now since each one is different. - Implement procfs changes in this PR - Track only the drop rate of ports we expose - Skip compiling udp buffer watching code on non-linux platforms - Figure out our error handling scenarios. Maybe bail if we fail to load the files the first time, and after that if we run into any exceptions, wait a few seconds and try again? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAICBh2coa4xt2b0RwvoFVrV6KVoFO-gks5vSTYlgaJpZM4bXzk8> .

brian-brazil · 2019-03-01T16:30:20Z

because there is no way to get the per-namespace stats if the exporter is containerized

/proc/net/snmp is not namespaced. In addition I'd presume that /proc/net/udp has the same issues as /proc/net/tcp and is n^2 or worse performance wise.

If you're going to pull in data here it should be from the socket, not anything from /proc that may include other processes.

SpencerMalone · 2019-03-01T19:23:07Z

@brian-brazil - This is a little deeper than I had anticipated going, but if it's a thing, it sounds like a good thing. I think I can read the max buffer size with a syscall using SO_RCVBUF, but I am having trouble figuring out how to pull out the dropped packet count. Any pointers on where to head for docs around this?

brian-brazil · 2019-03-01T19:43:46Z

I thought I'd found one previously, but can't find it again now. The kernel has the data anyway.

rtreffer · 2019-03-04T10:15:08Z

For linux it looks like some data is availabile through sock_diag: http://man7.org/linux/man-pages/man7/sock_diag.7.html

There should be a way to get the queued data in the receive path (I couldn't easily find it in the golang api). If you know SO_RCVBUF and the maximum packet size you'd be able to prove no packet drop in the UDP layer if queued data + max packet size < maximum buffer size.

Another simple (but not perfect) option would be to move the kernel queue draining into its own goroutine. That goroutine would read from the socket as quickly as possible and forward all packets to an internal fixed size channel in a non-blocking manner. The main assumption is that you should always be able to read the data fast enough if you do not do any processing on top. The side assumption is that you are probably game over if that does not hold.

You can then exposing the failed channel writes as a metric. This metric would describe a lower bound on dropped packets.

brian-brazil · 2019-03-04T10:17:56Z

That goroutine would read from the socket as quickly as possible and forward all packets to an internal fixed size channel in a non-blocking manner.

That's possibly a good idea in any case.

SpencerMalone · 2019-03-04T22:58:05Z

Agreed! We have a wip branch that started with that, and continued through to processing data in another set of goroutines. Not ready for review yet, but so far we went from maxing out at 8k events per second to around 165k events per second with 6 cores (have not tested past that). Excited to bring that forward some time in the next month.

…

On Mon, Mar 4, 2019, 5:17 AM Brian Brazil ***@***.***> wrote: That goroutine would read from the socket as quickly as possible and forward all packets to an internal fixed size channel in a non-blocking manner. That's possibly a good idea in any case. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIYTQKB7o7G99JN0Gh4ICKIGzv-0gZfYks5vTPLVgaJpZM4bXzk8> .

matthiasr · 2019-03-05T09:15:30Z

That would be a massive improvement! I shied away from it so far because we rely on the implicit serialization from the single processing goroutine a lot – if you're willing to tackle that it would be great!

What do you think about starting this incrementally by first splitting the network-receive from the processing goroutine – that way we would have the metrics to see when processing is too slow? My assumption would be that a plain receive-and-stuff-into-channel will always be faster than doing the actual processing, but if you're still concerned we could also add the network receive buffer depth monitoring that @rtreffer mentioned.

SpencerMalone · 2019-03-23T17:51:14Z

Bleh, I fell off on this. Here's where I got, and where I am going:
I could not for the life of me get a good UDP linux buffer overflow count without procfs. I poked at using netlink / sock_diag, got that working, and can get queued data as @rtreffer outlined, but I'm not sure that knowing "some data was lost" is enough.

I personally strongly prefer being able to say "less that 0.05% of UDP messages received are dropped". If we only have the vague knowledge of "some data was lost", we can only say "we got all the data" or "we lost an unknown amount of data", and the in between is impossible, which pushes people towards setting unmaintainable goals such as "never drop UDP packets". Furthermore, I think you would either have to do that netlink check before you read data each time or accept that you may have untracked buffer drops (because if you did it on a timer, the read operations between the data collection may drop the data enough that you go from a state of "being at max buffer" -> "being under max buffer"). Please correct me if I'm wrong here.

In this case, my preference is definitely accuracy over performance, and we're not looking at slamming procfs constantly. With those in mind, I'm gonna head down the procfs path and put this behind an optional command line flag? If there ends up being cases where the performance is unbearable, we could revisit, but worst case they could just disable the logic for now.

matthiasr · 2019-03-25T09:26:01Z

I think in this case aiming for 0 dropped packets is not entirely unreasonable, since drops on the network path cannot be considered for this count anyway.

I understood the sock_diag approach to include checking before every read. If we combine this with the separate reader goroutine, we can

prove that we did not drop any packets before they hit user space
keep track of the ratio of handled vs. "dropped" (by overflowing the go channel buffer) packages

I think this is going to give us more information than looking at (netns-)global drop counts, because will usually include something else.

You said you have a version with the separate read goroutine, and working code for the netlink approach. Could you submit these two (combined is fine, since the code is going to be intertwined)? I think with these two we are in a much better position for everyone, and it doesn't require anyone to make a decision or be confronted with a non-process-scoped metric.

If you think it's still necessary then, I would be willing to accept an optional procfs check as well, but if the above gives enough insight I would prefer not to add this complexity.

SpencerMalone · 2019-04-02T18:08:56Z

Closing for #196

Add linux UDP buffer telemetry using procfs

7ce615e

Signed-off-by: SpencerMalone <malone.spencer@gmail.com>

SpencerMalone force-pushed the monitor-udp-buffer branch from 145d77e to 7ce615e Compare February 28, 2019 21:48

matthiasr reviewed Mar 1, 2019

View reviewed changes

SpencerMalone mentioned this pull request Apr 2, 2019

Add simple threading to UDP packet handling, dump message when UDP buffers likely overflowed. #196

Closed

SpencerMalone closed this Apr 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add linux UDP buffer telemetry using procfs #187

Add linux UDP buffer telemetry using procfs #187

SpencerMalone commented Feb 28, 2019

brian-brazil commented Feb 28, 2019

SpencerMalone commented Feb 28, 2019

SuperQ commented Feb 28, 2019

SpencerMalone commented Feb 28, 2019 •

edited

Loading

brian-brazil commented Mar 1, 2019

matthiasr left a comment

matthiasr Mar 1, 2019

matthiasr Mar 1, 2019

matthiasr commented Mar 1, 2019

SpencerMalone commented Mar 1, 2019

matthiasr commented Mar 1, 2019 via email

brian-brazil commented Mar 1, 2019

SpencerMalone commented Mar 1, 2019

brian-brazil commented Mar 1, 2019

rtreffer commented Mar 4, 2019

brian-brazil commented Mar 4, 2019

SpencerMalone commented Mar 4, 2019 via email

matthiasr commented Mar 5, 2019

SpencerMalone commented Mar 23, 2019

matthiasr commented Mar 25, 2019

SpencerMalone commented Apr 2, 2019

Add linux UDP buffer telemetry using procfs #187

Add linux UDP buffer telemetry using procfs #187

Conversation

SpencerMalone commented Feb 28, 2019

brian-brazil commented Feb 28, 2019

SpencerMalone commented Feb 28, 2019

SuperQ commented Feb 28, 2019

SpencerMalone commented Feb 28, 2019 • edited Loading

brian-brazil commented Mar 1, 2019

matthiasr left a comment

Choose a reason for hiding this comment

matthiasr Mar 1, 2019

Choose a reason for hiding this comment

matthiasr Mar 1, 2019

Choose a reason for hiding this comment

matthiasr commented Mar 1, 2019

SpencerMalone commented Mar 1, 2019

matthiasr commented Mar 1, 2019 via email

brian-brazil commented Mar 1, 2019

SpencerMalone commented Mar 1, 2019

brian-brazil commented Mar 1, 2019

rtreffer commented Mar 4, 2019

brian-brazil commented Mar 4, 2019

SpencerMalone commented Mar 4, 2019 via email

matthiasr commented Mar 5, 2019

SpencerMalone commented Mar 23, 2019

matthiasr commented Mar 25, 2019

SpencerMalone commented Apr 2, 2019

SpencerMalone commented Feb 28, 2019 •

edited

Loading