Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly calculate compressed message lengths #833

Merged
merged 23 commits into from
Nov 29, 2018

Conversation

tmthrgd
Copy link
Collaborator

@tmthrgd tmthrgd commented Nov 27, 2018

This entirely replaces the rather hacky and very confusing dance done in compressionLenSlice, with a new approach that is far more like how packing is performed. It fixes a number of different bugs (some listed at the end).

The new code has two allocations that aren't actually needed, but are hard to avoid. (MsgLength & MsgLengthMassive should have 0 and 9 allocations respectively). The problem is that compression map[string]struct{} is passed to RR.len where RR is an interface. This interacts badly with Golang's escape analysis (see golang/go#19361 and https://npat-efault.github.io/programming/2016/10/10/escape-analysis-and-interfaces.html), forcing the map to be heap allocated instead of stack allocated. It is possible to force these to be devirtualised by using a massive type switch statement, this worsens performance overall for MsgLengthMassive while eliminating the loss for MsgLength.

The performance loss in MsgLengthNoCompression & MsgLengthOnlyQuestion is (I believe) due to the call to domainNameLen which can't be inlined currently. Though this is unfortunate, the overhead is necessary as it fixes a bug where Len could previously be off by one for uncompressed messages. (See the existing TestMsgLength2 that is now fixed). Inlining changes are coming to go1.12 so this can be revisited then and it may be possible to recover this performance loss.

Some of the benchmarks below (like LenLabels and MuxMatch) are just noise (although I don't quite understand how they survived through benchstat and ten runs), but they could come down to the compiler laying out code differently and cache interactions. TLDR; they're a distraction.

When taken along with #820, MsgLengthMassive is now twice as fast, with half the allocated bytes and 92% fewer allocations.

$ benchstat {old,new}.bench
$ benchstat {old,new}.bench
name                            old time/op    new time/op    delta
MsgLength-12                       336ns ± 0%     545ns ± 6%   +62.24%  (p=0.001 n=6+8)
MsgLengthNoCompression-12         9.95ns ± 3%   26.64ns ± 2%  +167.79%  (p=0.000 n=10+10)
MsgLengthPack-12                  2.20µs ±15%    2.30µs ±23%      ~     (p=0.315 n=10+10)
MsgLengthMassive-12               54.0µs ± 5%    31.8µs ± 6%   -41.04%  (p=0.000 n=10+9)
MsgLengthOnlyQuestion-12          5.53ns ± 1%    9.19ns ± 1%   +66.15%  (p=0.000 n=10+10)
PackDomainName-12                  211ns ± 6%     204ns ± 7%      ~     (p=0.064 n=10+10)
UnpackDomainName-12                150ns ±14%     157ns ±14%      ~     (p=0.383 n=10+10)
UnpackDomainNameUnprintable-12     145ns ± 3%     150ns ± 4%      ~     (p=0.089 n=9+10)
Copy-12                            880ns ±20%     742ns ±55%      ~     (p=0.247 n=10+10)
PackA-12                          41.1ns ± 2%    40.9ns ± 1%    -0.56%  (p=0.044 n=10+9)
UnpackA-12                         222ns ±21%     208ns ±41%      ~     (p=0.726 n=10+10)
PackMX-12                         75.1ns ± 1%    73.4ns ± 1%    -2.24%  (p=0.000 n=10+10)
UnpackMX-12                        278ns ± 8%     278ns ±16%      ~     (p=0.913 n=8+10)
PackAAAAA-12                      40.6ns ± 2%    40.2ns ± 4%      ~     (p=0.055 n=10+10)
UnpackAAAA-12                      243ns ±12%     230ns ± 9%      ~     (p=0.129 n=9+7)
PackMsg-12                        1.57µs ± 8%    1.57µs ±14%      ~     (p=0.853 n=10+10)
PackMsgOnlyQuestion-12             234ns ± 1%     237ns ± 2%    +1.14%  (p=0.046 n=9+9)
UnpackMsg-12                      1.38µs ±29%    1.37µs ±19%      ~     (p=0.853 n=10+10)
IdGeneration-12                   15.7ns ± 2%    15.7ns ± 0%      ~     (p=0.666 n=10+7)
Generate-12                        153µs ± 2%     153µs ± 1%      ~     (p=0.739 n=10+10)
SplitLabels-12                    69.6ns ±12%    62.0ns ±29%      ~     (p=0.063 n=10+10)
LenLabels-12                      27.2ns ± 1%    20.1ns ± 1%   -25.96%  (p=0.000 n=10+8)
CompareDomainName-12               166ns ±17%     161ns ±11%      ~     (p=0.343 n=10+10)
IsSubDomain-12                     529ns ± 3%     493ns ±20%      ~     (p=0.327 n=8+10)
UnpackString-12                    137ns ± 6%     131ns ± 9%      ~     (p=0.089 n=9+10)
Dedup-12                          1.94µs ± 8%    1.86µs ±11%      ~     (p=0.128 n=10+10)
NewRR-12                          2.64µs ± 5%    2.62µs ± 5%      ~     (p=0.289 n=10+10)
ReadRR-12                         4.28µs ±20%    4.10µs ±19%      ~     (p=0.684 n=10+10)
ParseZone-12                       123µs ±14%     115µs ±48%      ~     (p=0.971 n=10+10)
ZoneParser-12                     10.4µs ± 1%    10.6µs ± 4%      ~     (p=0.243 n=9+10)
MuxMatch/lowercase-12             69.7ns ± 2%    67.9ns ± 2%    -2.65%  (p=0.000 n=10+10)
MuxMatch/uppercase-12              120ns ± 4%     125ns ± 3%    +4.26%  (p=0.000 n=10+10)
Serve-12                          56.5µs ± 8%    56.4µs ± 4%      ~     (p=0.853 n=10+10)
Serve6-12                         57.3µs ± 5%    57.7µs ± 5%      ~     (p=0.684 n=10+10)
ServeCompress-12                  57.8µs ± 2%    58.5µs ± 3%      ~     (p=0.800 n=2+3)
SprintName-12                      218ns ± 2%     206ns ± 2%    -5.42%  (p=0.000 n=10+10)
SprintTxtOctet-12                  234ns ± 5%     233ns ±12%      ~     (p=0.858 n=9+10)
SprintTxt-12                       267ns ± 8%     277ns ± 7%      ~     (p=0.147 n=10+10)
 
name                            old alloc/op   new alloc/op   delta
MsgLength-12                       32.0B ± 0%    192.0B ± 0%  +500.00%  (p=0.000 n=10+10)
MsgLengthNoCompression-12          0.00B          0.00B           ~     (all equal)
MsgLengthPack-12                    896B ± 0%      896B ± 0%      ~     (all equal)
MsgLengthMassive-12               14.8kB ± 0%    10.9kB ± 0%   -26.59%  (p=0.000 n=10+10)
MsgLengthOnlyQuestion-12           0.00B          0.00B           ~     (all equal)
PackDomainName-12                  64.0B ± 0%     64.0B ± 0%      ~     (all equal)
UnpackDomainName-12                64.0B ± 0%     64.0B ± 0%      ~     (all equal)
UnpackDomainNameUnprintable-12     48.0B ± 0%     48.0B ± 0%      ~     (all equal)
Copy-12                             432B ± 0%      432B ± 0%      ~     (all equal)
PackA-12                           0.00B          0.00B           ~     (all equal)
UnpackA-12                          100B ± 0%      100B ± 0%      ~     (all equal)
PackMX-12                          0.00B          0.00B           ~     (all equal)
UnpackMX-12                         116B ± 0%      116B ± 0%      ~     (all equal)
PackAAAAA-12                       0.00B          0.00B           ~     (all equal)
UnpackAAAA-12                       100B ± 0%      100B ± 0%      ~     (all equal)
PackMsg-12                          576B ± 0%      576B ± 0%      ~     (all equal)
PackMsgOnlyQuestion-12             64.0B ± 0%     64.0B ± 0%      ~     (all equal)
UnpackMsg-12                        592B ± 0%      592B ± 0%      ~     (all equal)
IdGeneration-12                    0.00B          0.00B           ~     (all equal)
Generate-12                       31.9kB ± 0%    31.9kB ± 0%      ~     (p=1.000 n=10+10)
SplitLabels-12                     32.0B ± 0%     32.0B ± 0%      ~     (all equal)
LenLabels-12                       0.00B          0.00B           ~     (all equal)
CompareDomainName-12               64.0B ± 0%     64.0B ± 0%      ~     (all equal)
IsSubDomain-12                      192B ± 0%      192B ± 0%      ~     (all equal)
UnpackString-12                    48.0B ± 0%     48.0B ± 0%      ~     (all equal)
Dedup-12                            624B ± 0%      624B ± 0%      ~     (all equal)
NewRR-12                            784B ± 0%      784B ± 0%      ~     (all equal)
ReadRR-12                         1.79kB ± 0%    1.79kB ± 0%      ~     (all equal)
ParseZone-12                      84.0kB ± 0%    84.0kB ± 0%      ~     (all equal)
ZoneParser-12                     1.57kB ± 0%    1.57kB ± 0%      ~     (all equal)
MuxMatch/lowercase-12              0.00B          0.00B           ~     (all equal)
MuxMatch/uppercase-12              32.0B ± 0%     32.0B ± 0%      ~     (all equal)
Serve-12                          3.36kB ± 0%    3.36kB ± 0%      ~     (all equal)
Serve6-12                         3.20kB ± 0%    3.20kB ± 0%      ~     (all equal)
ServeCompress-12                  3.62kB ± 0%    3.62kB ± 0%      ~     (all equal)
SprintName-12                      48.0B ± 0%     48.0B ± 0%      ~     (all equal)
SprintTxtOctet-12                  80.0B ± 0%     80.0B ± 0%      ~     (all equal)
SprintTxt-12                       80.0B ± 0%     80.0B ± 0%      ~     (all equal)
 
name                            old allocs/op  new allocs/op  delta
MsgLength-12                        1.00 ± 0%      2.00 ± 0%  +100.00%  (p=0.000 n=10+10)
MsgLengthNoCompression-12           0.00           0.00           ~     (all equal)
MsgLengthPack-12                    8.00 ± 0%      8.00 ± 0%      ~     (all equal)
MsgLengthMassive-12                  138 ± 0%        11 ± 0%   -92.03%  (p=0.000 n=10+10)
MsgLengthOnlyQuestion-12            0.00           0.00           ~     (all equal)
PackDomainName-12                   1.00 ± 0%      1.00 ± 0%      ~     (all equal)
UnpackDomainName-12                 1.00 ± 0%      1.00 ± 0%      ~     (all equal)
UnpackDomainNameUnprintable-12      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
Copy-12                             8.00 ± 0%      8.00 ± 0%      ~     (all equal)
PackA-12                            0.00           0.00           ~     (all equal)
UnpackA-12                          3.00 ± 0%      3.00 ± 0%      ~     (all equal)
PackMX-12                           0.00           0.00           ~     (all equal)
UnpackMX-12                         4.00 ± 0%      4.00 ± 0%      ~     (all equal)
PackAAAAA-12                        0.00           0.00           ~     (all equal)
UnpackAAAA-12                       3.00 ± 0%      3.00 ± 0%      ~     (all equal)
PackMsg-12                          7.00 ± 0%      7.00 ± 0%      ~     (all equal)
PackMsgOnlyQuestion-12              1.00 ± 0%      1.00 ± 0%      ~     (all equal)
UnpackMsg-12                        12.0 ± 0%      12.0 ± 0%      ~     (all equal)
IdGeneration-12                     0.00           0.00           ~     (all equal)
Generate-12                        1.55k ± 0%     1.55k ± 0%      ~     (all equal)
SplitLabels-12                      1.00 ± 0%      1.00 ± 0%      ~     (all equal)
LenLabels-12                        0.00           0.00           ~     (all equal)
CompareDomainName-12                2.00 ± 0%      2.00 ± 0%      ~     (all equal)
IsSubDomain-12                      6.00 ± 0%      6.00 ± 0%      ~     (all equal)
UnpackString-12                     2.00 ± 0%      2.00 ± 0%      ~     (all equal)
Dedup-12                            31.0 ± 0%      31.0 ± 0%      ~     (all equal)
NewRR-12                            15.0 ± 0%      15.0 ± 0%      ~     (all equal)
ReadRR-12                           17.0 ± 0%      17.0 ± 0%      ~     (all equal)
ParseZone-12                        92.0 ± 0%      92.0 ± 0%      ~     (all equal)
ZoneParser-12                       81.0 ± 0%      81.0 ± 0%      ~     (all equal)
MuxMatch/lowercase-12               0.00           0.00           ~     (all equal)
MuxMatch/uppercase-12               1.00 ± 0%      1.00 ± 0%      ~     (all equal)
Serve-12                            54.0 ± 0%      54.0 ± 0%      ~     (all equal)
Serve6-12                           51.0 ± 0%      51.0 ± 0%      ~     (all equal)
ServeCompress-12                    56.0 ± 0%      56.0 ± 0%      ~     (all equal)
SprintName-12                       2.00 ± 0%      2.00 ± 0%      ~     (all equal)
SprintTxtOctet-12                   2.00 ± 0%      2.00 ± 0%      ~     (all equal)
SprintTxt-12                        2.00 ± 0%      2.00 ± 0%      ~     (all equal)

Updates #709
Fixes #821
Fixes #824
Closes #826
Fixes #829

/cc @pierresouchay (who changed much of this in #668).

This wasn't used anywhere but TestCompressionLenSearch, and was very
wrong.
This replaces the confusing and complicated compressionLenSlice
function.
This leaves the len() functions unused and they'll soon be removed.

This also fixes the off-by-one error of compressedLen when a (Q)NAME
is ".".
This eliminates the need to loop over the domain name twice when we're
compressing the name.
These are the only RRs with multiple compressible names within the same
RR, and they were previously broken.
It also handles the length of uncompressed domain names.
This should allow us to avoid the call overhead of
compressionLenMapInsert in certain limited cases and may result in a
slight performance increase.

compressionLenMapInsert still has a maxCompressionOffset check inside
the for loop.
This better reflects that it also calculates the uncompressed length.
They're both testing the same thing.
@tmthrgd tmthrgd requested a review from miekg November 27, 2018 23:28
@codecov-io
Copy link

codecov-io commented Nov 27, 2018

Codecov Report

Merging #833 into master will increase coverage by 0.33%.
The diff coverage is 46.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #833      +/-   ##
==========================================
+ Coverage    57.6%   57.94%   +0.33%     
==========================================
  Files          43       42       -1     
  Lines       10839    10661     -178     
==========================================
- Hits         6244     6177      -67     
+ Misses       3505     3396     -109     
+ Partials     1090     1088       -2
Impacted Files Coverage Δ
privaterr.go 67.56% <0%> (-2.86%) ⬇️
dnssec.go 58.27% <100%> (ø) ⬆️
msg.go 78.14% <100%> (-0.46%) ⬇️
sig0.go 65.51% <100%> (ø) ⬆️
dns.go 62.5% <100%> (ø) ⬆️
tsig.go 41.5% <100%> (ø) ⬆️
edns.go 25.09% <100%> (ø) ⬆️
ztypes.go 45.48% <33.33%> (+0.79%) ⬆️
types.go 73.63% <76.92%> (+0.07%) ⬆️
server.go 65.96% <0%> (-0.24%) ⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c03911...178611e. Read the comment docs.

@tmthrgd tmthrgd mentioned this pull request Nov 27, 2018
@pierresouchay
Copy link
Contributor

I had a quick look, it seems reasonable, and removing my crappy hacks is good news ;)
I'll try to bench the performance effects in Consul once merged

compressionLenSearch does everything compressionLenMapInsert did anyway.
The last two commits worsened the performance of domainNameLen
noticably, this change restores it's original performance.

name                            old time/op    new time/op    delta
MsgLength-12                       550ns ±13%     510ns ±21%    ~     (p=0.050 n=10+10)
MsgLengthNoCompression-12         26.9ns ± 2%    27.0ns ± 1%    ~     (p=0.198 n=9+10)
MsgLengthPack-12                  2.30µs ±12%    2.26µs ±16%    ~     (p=0.739 n=10+10)
MsgLengthMassive-12               32.9µs ± 7%    32.0µs ±10%    ~     (p=0.243 n=9+10)
MsgLengthOnlyQuestion-12          9.60ns ± 1%    9.20ns ± 1%  -4.16%  (p=0.000 n=9+9)
@miekg
Copy link
Owner

miekg commented Nov 28, 2018 via email

@miekg
Copy link
Owner

miekg commented Nov 28, 2018

this looks reasonable; I'll have to do a local checkout to take a closer look though

@tmthrgd tmthrgd mentioned this pull request Nov 28, 2018
@tmthrgd tmthrgd mentioned this pull request Nov 29, 2018
@tmthrgd tmthrgd merged commit 778aa4f into miekg:master Nov 29, 2018
@tmthrgd tmthrgd deleted the new-comp-len branch November 29, 2018 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compressedLen "one more than needed" SOA & MINFO compression length mismatch Question compression
4 participants