Major performance improvement in UDP processing by reusing buffer #33

cezarsa · 2016-10-07T21:04:42Z

This pull-request has two major parts. The first one is adding benchmarks to check performance of simply receiving new messages without any further processing by parsers and formatters.
The second part is a very simple change that improve UDP processing throughput by 6 times.

These are the results comparing benchmark results before and after the proposed change:

benchmark                           old ns/op     new ns/op     delta
BenchmarkDatagramNoFormatting-4     16172         2668          -83.50%
BenchmarkTCPNoFormatting-4          7682          7644          -0.49%

benchmark                           old MB/s     new MB/s     speedup
BenchmarkDatagramNoFormatting-4     2.91         17.62        6.05x
BenchmarkTCPNoFormatting-4          6.12         6.15         1.00x

benchmark                           old allocs     new allocs     delta
BenchmarkDatagramNoFormatting-4     5              4              -20.00%
BenchmarkTCPNoFormatting-4          6              6              +0.00%

benchmark                           old bytes     new bytes     delta
BenchmarkDatagramNoFormatting-4     65905         368           -99.44%
BenchmarkTCPNoFormatting-4          464           464           +0.00%

Receiving messages from a UDP connection used to be very slow, slower even than using TCP which was a little strange. By simply reusing the buffer used to receive datagrams we can see a major improvement in processing time and memory allocations.

I came across this change after profiling https://github.com/tsuru/bs in a production environment and coming across this output:

(pprof) top
55.43GB of 59.56GB total (93.06%)
Dropped 350 nodes (cum <= 0.30GB)
Showing top 10 nodes out of 53 (cum >= 0.37GB)
      flat  flat%   sum%        cum   cum%
   50.77GB 85.24% 85.24%    50.87GB 85.42%  github.com/tsuru/bs/vendor/gopkg.in/mcuadros/go-syslog%2ev2.(*Server).goReceiveDatagrams.func1
    0.87GB  1.47% 86.71%     1.02GB  1.71%  github.com/tsuru/bs/log.(*LenientParser).Parse
    0.61GB  1.03% 87.74%     0.61GB  1.03%  github.com/tsuru/bs/vendor/golang.org/x/net/websocket.(*hybiFrameWriter).Write
    0.55GB  0.93% 88.67%     0.61GB  1.03%  encoding/xml.(*Decoder).rawToken
    0.51GB  0.86% 89.53%     0.54GB  0.91%  net.(*dnsMsg).Pack
    0.49GB  0.83% 90.36%     0.80GB  1.35%  io.copyBuffer
    0.49GB  0.82% 91.18%     0.52GB  0.87%  github.com/tsuru/bs/log.(*LenientParser).Dump
    0.38GB  0.64% 91.82%     0.77GB  1.29%  net.unpackStruct
    0.38GB  0.63% 92.45%     0.38GB  0.63%  net.unpackDomainName
    0.37GB  0.61% 93.06%     0.37GB  0.61%  bytes.makeSlice

The reason we can't simply use ListenUDP() and Dial() in the datagram benchmark is that we wouldn't be able to ensure that all messages would arrive on the server, even for a udp connection to localhost. By using the fakePacketConn we ensure that all messages are delivered making the datagram benchmark reliable.

As we can see in the added benchmarks receiving messages from a UDP connection used to be very slow, slower than using TCP which was a little strange. By simply reusing the buffer used to receive datagrams we can see a major improvement in processing time and memory allocations. benchmark old ns/op new ns/op delta BenchmarkDatagramNoFormatting-4 16172 2668 -83.50% BenchmarkTCPNoFormatting-4 7682 7644 -0.49% benchmark old MB/s new MB/s speedup BenchmarkDatagramNoFormatting-4 2.91 17.62 6.05x BenchmarkTCPNoFormatting-4 6.12 6.15 1.00x benchmark old allocs new allocs delta BenchmarkDatagramNoFormatting-4 5 4 -20.00% BenchmarkTCPNoFormatting-4 6 6 +0.00% benchmark old bytes new bytes delta BenchmarkDatagramNoFormatting-4 65905 368 -99.44% BenchmarkTCPNoFormatting-4 464 464 +0.00%

mcuadros · 2016-10-09T10:16:15Z

@cezarsa thanks a lot, highly appreciated

cezarsa added 3 commits October 7, 2016 17:35

Connections as PacketConn slice, avoid runtime type assertions

bdac4a5

mcuadros merged commit 8487663 into mcuadros:master Oct 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major performance improvement in UDP processing by reusing buffer #33

Major performance improvement in UDP processing by reusing buffer #33

cezarsa commented Oct 7, 2016

mcuadros commented Oct 9, 2016

Major performance improvement in UDP processing by reusing buffer #33

Major performance improvement in UDP processing by reusing buffer #33

Conversation

cezarsa commented Oct 7, 2016

mcuadros commented Oct 9, 2016