Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance improvement in UDP processing by reusing buffer #33

Merged
merged 3 commits into from
Oct 9, 2016

Conversation

cezarsa
Copy link
Contributor

@cezarsa cezarsa commented Oct 7, 2016

This pull-request has two major parts. The first one is adding benchmarks to check performance of simply receiving new messages without any further processing by parsers and formatters.
The second part is a very simple change that improve UDP processing throughput by 6 times.

These are the results comparing benchmark results before and after the proposed change:

benchmark                           old ns/op     new ns/op     delta
BenchmarkDatagramNoFormatting-4     16172         2668          -83.50%
BenchmarkTCPNoFormatting-4          7682          7644          -0.49%

benchmark                           old MB/s     new MB/s     speedup
BenchmarkDatagramNoFormatting-4     2.91         17.62        6.05x
BenchmarkTCPNoFormatting-4          6.12         6.15         1.00x

benchmark                           old allocs     new allocs     delta
BenchmarkDatagramNoFormatting-4     5              4              -20.00%
BenchmarkTCPNoFormatting-4          6              6              +0.00%

benchmark                           old bytes     new bytes     delta
BenchmarkDatagramNoFormatting-4     65905         368           -99.44%
BenchmarkTCPNoFormatting-4          464           464           +0.00%

Receiving messages from a UDP connection used to be very slow, slower even than using TCP which was a little strange. By simply reusing the buffer used to receive datagrams we can see a major improvement in processing time and memory allocations.

I came across this change after profiling https://github.com/tsuru/bs in a production environment and coming across this output:

(pprof) top
55.43GB of 59.56GB total (93.06%)
Dropped 350 nodes (cum <= 0.30GB)
Showing top 10 nodes out of 53 (cum >= 0.37GB)
      flat  flat%   sum%        cum   cum%
   50.77GB 85.24% 85.24%    50.87GB 85.42%  github.com/tsuru/bs/vendor/gopkg.in/mcuadros/go-syslog%2ev2.(*Server).goReceiveDatagrams.func1
    0.87GB  1.47% 86.71%     1.02GB  1.71%  github.com/tsuru/bs/log.(*LenientParser).Parse
    0.61GB  1.03% 87.74%     0.61GB  1.03%  github.com/tsuru/bs/vendor/golang.org/x/net/websocket.(*hybiFrameWriter).Write
    0.55GB  0.93% 88.67%     0.61GB  1.03%  encoding/xml.(*Decoder).rawToken
    0.51GB  0.86% 89.53%     0.54GB  0.91%  net.(*dnsMsg).Pack
    0.49GB  0.83% 90.36%     0.80GB  1.35%  io.copyBuffer
    0.49GB  0.82% 91.18%     0.52GB  0.87%  github.com/tsuru/bs/log.(*LenientParser).Dump
    0.38GB  0.64% 91.82%     0.77GB  1.29%  net.unpackStruct
    0.38GB  0.63% 92.45%     0.38GB  0.63%  net.unpackDomainName
    0.37GB  0.61% 93.06%     0.37GB  0.61%  bytes.makeSlice

The reason we can't simply use ListenUDP() and Dial() in the datagram
benchmark is that we wouldn't be able to ensure that all messages would
arrive on the server, even for a udp connection to localhost.
By using the fakePacketConn we ensure that all messages are delivered
making the datagram benchmark reliable.
As we can see in the added benchmarks receiving messages from a UDP
connection used to be very slow, slower than using TCP which was a
little strange. By simply reusing the buffer used to receive datagrams
we can see a major improvement in processing time and memory
allocations.

benchmark                           old ns/op     new ns/op     delta
BenchmarkDatagramNoFormatting-4     16172         2668          -83.50%
BenchmarkTCPNoFormatting-4          7682          7644          -0.49%

benchmark                           old MB/s     new MB/s     speedup
BenchmarkDatagramNoFormatting-4     2.91         17.62        6.05x
BenchmarkTCPNoFormatting-4          6.12         6.15         1.00x

benchmark                           old allocs     new allocs     delta
BenchmarkDatagramNoFormatting-4     5              4
-20.00%
BenchmarkTCPNoFormatting-4          6              6              +0.00%

benchmark                           old bytes     new bytes     delta
BenchmarkDatagramNoFormatting-4     65905         368           -99.44%
BenchmarkTCPNoFormatting-4          464           464           +0.00%
@mcuadros
Copy link
Owner

mcuadros commented Oct 9, 2016

@cezarsa thanks a lot, highly appreciated

@mcuadros mcuadros merged commit 8487663 into mcuadros:master Oct 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants