Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement - process udp packets in separate goroutine in order to minimize packet drops #511

Merged
merged 6 commits into from
Oct 23, 2023

Conversation

kullanici0606
Copy link
Contributor

Summary of this PR

In this PR I am introducing a queue for raw statsd udp payloads and a worker to process them in another goroutine

Problem

As discussed in grafana/k6#2044, when both number of packets arriving and size of the packets are large, statds_exporter starts to drop UDP packets because processing speed cannot catch the packet arrival speed

Solution

In the past, I implemented similar UDP packet processing application in Java and best practice was there to read packet in one thread and process packets in another thread(s), therefore I introduced similar logic here: I added a queue / channel (chan []byte) so that the goroutine reading the packets from UDP socket can enqueue them in order to minimize time spent per packet and packets are processed in another goroutine which reads from the queue / channel

Result

Packet drop rates decreased, most of the time no packet drops occur while in old version packets are always dropped for the case discussed in grafana/k6#2044

Before:

$ curl -s http://localhost:9102/metrics | grep udp
statsd_exporter_udp_packets_total 21270

packets in tcpdump (udp.dstport == 9125) 21477

207 packets are dropped

After:

$ curl -s http://localhost:9102/metrics | grep udp
statsd_exporter_udp_packets_total 21384

packets in tcpdump (udp.dstport == 9125) 21384

No packets are dropped

Future TODOs
Memory benchmarks discussed in #459

kullanici0606 added 4 commits September 20, 2023 10:37
…ot dropped

Signed-off-by: kullanici0606 <yakup.turgut@btk.gov.tr>
Signed-off-by: kullanici0606 <yakup.turgut@btk.gov.tr>
Signed-off-by: kullanici0606 <yakup.turgut@btk.gov.tr>
…t in packet read in order not to drop packets

Signed-off-by: kullanici0606 <yakup.turgut@btk.gov.tr>
Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, and sorry for the delay, I was out for a few weeks and am still catching up.

Overall I really like the simplicity of this approach. I think it's ready to go with two minor edits. I am also wondering whether we should use this opportunity to handle the case where the queue is full explicitly, but I am happy to leave that to a future PR (or never, if it has undesirable side effects).

@@ -64,12 +66,26 @@ func (l *StatsDUDPListener) Listen() {
level.Error(l.Logger).Log("error", err)
return
}
l.HandlePacket(buf[0:n])
// avoid making copies of slices since we need to minimize the time spent here in order not to drop packets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing to me – EnqueueUdpPacket immediately and synchronously goes on to make a copy of the packet?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really wanted to avoid allocations in the hot path, I guess we would need a rotating pool of buffers, but I'm not sure the memory overhead is worth it, so I would not go there right now. I think the best way here is to delete the confusing comment and accept that we are making a copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking your time and reviewing my PR. You are right, comment is confusing, I removed the misleading comment. I also tried pooling but I think it made thinks worse since it needs some kind of syncronization.

l.UDPPackets.Inc()
packetCopy := make([]byte, n)
copy(packetCopy, packet)
l.UdpPacketQueue <- packetCopy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the queue is full? Do we want to back pressure here, or would it be useful to drop the packet and increment a counter when this happens? For historical context, we had this idea before. The upside would be that it's easier to keep track of the dropped packets that way, instead of having to determine it from the kernel. At the time, we ended up not doing that in favor of other internal performance improvements.

Since it seems that you are still running into drops, I was wondering if this would be a good opportunity to revive that approach, especially since it looks like in your single-processor approach it would be fairly easy to do this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your comment, I added a counter for UDP packet drops and incremented it when channel is full. Could you please review it?

main.go Outdated Show resolved Hide resolved
kullanici0606 and others added 2 commits October 23, 2023 10:01
…g comment

Signed-off-by: kullanici0606 <yakup.turgut@btk.gov.tr>
Co-authored-by: Matthias Rampke <matthias.rampke@googlemail.com>
Signed-off-by: kullanici0606 <35795498+kullanici0606@users.noreply.github.com>
Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

@matthiasr matthiasr merged commit 12d14bb into prometheus:master Oct 23, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants