-
Notifications
You must be signed in to change notification settings - Fork 39
Change the representation of IP to a pair of uint64s. #63
Conversation
a96d362
to
f9acd0b
Compare
f9acd0b
to
e8df004
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a cursory first pass over this. If @bradfitz is OK with the general idea of moving to uint64s (I like it), then I'll do a second, more careful review.
Also, needs gofmt.
This passes on master but fails in inetaf#63 (at the moment). This makes it interesting enough to keep.
Fuzz testing found a bug. See #66. Time for me to get the fuzz function committed so that you can use it too. PR coming shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't review this on mobile. Will check in the evening.
32012c7
to
70a5f1b
Compare
Fixed your feedback and the failure found by fuzz testing (the test for Is4in6 was incorrect). |
0a650c7
to
45d1fc8
Compare
I'm wrapping up for the day and week. Looking forward to reviewing next week. This is lovely. |
For reference, golang/go#41663 is the reason that I didn't get the same performance improvement when I kept [16]byte and did a BigEndian read-twiddle-write where needed. You might expect the resulting code to be the same, but it breaks down into byte-wise twiddling. Whereas casting the bytes into uint64 from the get-go compiles efficiently in both sites: the BigEndian read/write are a MOVQ+BSWAPQ, and the twiddling ops are 64-bit register math. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and it's a red diff! Beautiful.
45d1fc8
to
ef1513e
Compare
d382ab4
to
0ca0d35
Compare
Managed to dump SSA, and as expected the compiler inlines, but doesn't do any intelligent coalescing. The check
Compare to
So, the generated code is definitely worse, and in line with what I saw when I tried to use encoding/binary to Uint64+twiddle+PutUint64. Based on that, I would prefer to continue operating on larger chunks of the address, rather than break it down bytewise. Maybe something like |
I'd prefer to err on the side of explicit & readable code until there's a performance reason to micro-optimize Out of curiosity, does Go do any better at |
This allows IP manipulation operations to operate on IPs as two registers, which empirically leads to significant speedups, in particular on OOO superscalars where the two halves can be processed in parallel. You might expect that we could keep the representation as [16]byte, and do a cycle of BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient code. Unfortunately, due to a variety of missing optimizations in the Go compiler, that is not the case and that code turns into byte-wise operations. On the other hand, converting to a uint64 pair at construction results in efficient construction (a pair of MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit pairs). See also: - golang/go#41663 - golang/go#41684 - golang/go#42958 - The discussion in #63 name old time/op new time/op delta StdIPv4-8 146ns ± 2% 141ns ± 2% -3.42% (p=0.016 n=5+5) IPv4-8 120ns ± 1% 107ns ± 2% -10.65% (p=0.008 n=5+5) IPv4_inline-8 120ns ± 0% 118ns ± 1% -1.67% (p=0.016 n=4+5) StdIPv6-8 211ns ± 2% 215ns ± 1% +2.18% (p=0.008 n=5+5) IPv6-8 281ns ± 1% 252ns ± 1% -10.19% (p=0.008 n=5+5) IPv4Contains-8 11.8ns ± 4% 4.7ns ± 2% -60.00% (p=0.008 n=5+5) ParseIPv4-8 68.1ns ± 4% 78.8ns ± 1% +15.74% (p=0.008 n=5+5) ParseIPv6-8 419ns ± 1% 409ns ± 0% -2.40% (p=0.016 n=4+5) StdParseIPv4-8 73.7ns ± 1% 88.8ns ± 2% +20.50% (p=0.008 n=5+5) StdParseIPv6-8 132ns ± 2% 134ns ± 1% ~ (p=0.079 n=5+5) IPPrefixMasking/IPv4_/32-8 36.3ns ± 3% 4.8ns ± 4% -86.72% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/17-8 39.0ns ± 0% 4.8ns ± 3% -87.78% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/0-8 36.9ns ± 2% 4.8ns ± 4% -87.07% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/128-8 32.7ns ± 1% 4.7ns ± 2% -85.47% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/65-8 39.8ns ± 1% 4.7ns ± 1% -88.13% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/0-8 40.7ns ± 1% 4.7ns ± 2% -88.41% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/128-8 136ns ± 3% 5ns ± 2% -96.53% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 142ns ± 2% 5ns ± 1% -96.65% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 143ns ± 2% 5ns ± 3% -96.67% (p=0.008 n=5+5) IPSetFuzz-8 22.7µs ± 2% 16.4µs ± 2% -27.84% (p=0.008 n=5+5) name old alloc/op new alloc/op delta StdIPv4-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv4-8 0.00B 0.00B ~ (all equal) IPv4_inline-8 0.00B 0.00B ~ (all equal) StdIPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv4Contains-8 0.00B 0.00B ~ (all equal) ParseIPv4-8 0.00B 0.00B ~ (all equal) ParseIPv6-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) StdParseIPv4-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) StdParseIPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPPrefixMasking/IPv4_/32-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv4_/17-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv4_/0-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/128-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/65-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/0-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_zone_/128-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPSetFuzz-8 2.60kB ± 0% 2.60kB ± 0% ~ (p=1.000 n=5+5) name old allocs/op new allocs/op delta StdIPv4-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv4-8 0.00 0.00 ~ (all equal) IPv4_inline-8 0.00 0.00 ~ (all equal) StdIPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv4Contains-8 0.00 0.00 ~ (all equal) ParseIPv4-8 0.00 0.00 ~ (all equal) ParseIPv6-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) StdParseIPv4-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) StdParseIPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPPrefixMasking/IPv4_/32-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv4_/17-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv4_/0-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/128-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/65-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/0-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_zone_/128-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPSetFuzz-8 33.0 ± 0% 33.0 ± 0% ~ (all equal) Signed-off-by: David Anderson <dave@natulte.net>
0ca0d35
to
b7d6c09
Compare
Not by much. The compare becomes at 16-byte CMPW, but the load is still dumb (loads ip.lo twice into two different registers, shifts each one by a different amount, then ORs them together). Keeping at bytewise loads for now. Will see in a followup if I can make it generate slightly less rubbish code without compromising readability. |
LGTM at b7d6c09. Thanks! Merge away. |
This passes on master but fails in inetaf#63 (at the moment). This makes it interesting enough to keep. Signed-off-by: Stefan Majer <stefan.majer@gmail.com>
This allows IP manipulation operations to operate on IPs as two registers, which empirically leads to significant speedups, in particular on OOO superscalars where the two halves can be processed in parallel. You might expect that we could keep the representation as [16]byte, and do a cycle of BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient code. Unfortunately, due to a variety of missing optimizations in the Go compiler, that is not the case and that code turns into byte-wise operations. On the other hand, converting to a uint64 pair at construction results in efficient construction (a pair of MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit pairs). See also: - golang/go#41663 - golang/go#41684 - golang/go#42958 - The discussion in inetaf#63 name old time/op new time/op delta StdIPv4-8 146ns ± 2% 141ns ± 2% -3.42% (p=0.016 n=5+5) IPv4-8 120ns ± 1% 107ns ± 2% -10.65% (p=0.008 n=5+5) IPv4_inline-8 120ns ± 0% 118ns ± 1% -1.67% (p=0.016 n=4+5) StdIPv6-8 211ns ± 2% 215ns ± 1% +2.18% (p=0.008 n=5+5) IPv6-8 281ns ± 1% 252ns ± 1% -10.19% (p=0.008 n=5+5) IPv4Contains-8 11.8ns ± 4% 4.7ns ± 2% -60.00% (p=0.008 n=5+5) ParseIPv4-8 68.1ns ± 4% 78.8ns ± 1% +15.74% (p=0.008 n=5+5) ParseIPv6-8 419ns ± 1% 409ns ± 0% -2.40% (p=0.016 n=4+5) StdParseIPv4-8 73.7ns ± 1% 88.8ns ± 2% +20.50% (p=0.008 n=5+5) StdParseIPv6-8 132ns ± 2% 134ns ± 1% ~ (p=0.079 n=5+5) IPPrefixMasking/IPv4_/32-8 36.3ns ± 3% 4.8ns ± 4% -86.72% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/17-8 39.0ns ± 0% 4.8ns ± 3% -87.78% (p=0.008 n=5+5) IPPrefixMasking/IPv4_/0-8 36.9ns ± 2% 4.8ns ± 4% -87.07% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/128-8 32.7ns ± 1% 4.7ns ± 2% -85.47% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/65-8 39.8ns ± 1% 4.7ns ± 1% -88.13% (p=0.008 n=5+5) IPPrefixMasking/IPv6_/0-8 40.7ns ± 1% 4.7ns ± 2% -88.41% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/128-8 136ns ± 3% 5ns ± 2% -96.53% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 142ns ± 2% 5ns ± 1% -96.65% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 143ns ± 2% 5ns ± 3% -96.67% (p=0.008 n=5+5) IPSetFuzz-8 22.7µs ± 2% 16.4µs ± 2% -27.84% (p=0.008 n=5+5) name old alloc/op new alloc/op delta StdIPv4-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv4-8 0.00B 0.00B ~ (all equal) IPv4_inline-8 0.00B 0.00B ~ (all equal) StdIPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPv4Contains-8 0.00B 0.00B ~ (all equal) ParseIPv4-8 0.00B 0.00B ~ (all equal) ParseIPv6-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) StdParseIPv4-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) StdParseIPv6-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) IPPrefixMasking/IPv4_/32-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv4_/17-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv4_/0-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/128-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/65-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_/0-8 0.00B 0.00B ~ (all equal) IPPrefixMasking/IPv6_zone_/128-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 16.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) IPSetFuzz-8 2.60kB ± 0% 2.60kB ± 0% ~ (p=1.000 n=5+5) name old allocs/op new allocs/op delta StdIPv4-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv4-8 0.00 0.00 ~ (all equal) IPv4_inline-8 0.00 0.00 ~ (all equal) StdIPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPv4Contains-8 0.00 0.00 ~ (all equal) ParseIPv4-8 0.00 0.00 ~ (all equal) ParseIPv6-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) StdParseIPv4-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) StdParseIPv6-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IPPrefixMasking/IPv4_/32-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv4_/17-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv4_/0-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/128-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/65-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_/0-8 0.00 0.00 ~ (all equal) IPPrefixMasking/IPv6_zone_/128-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/65-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPPrefixMasking/IPv6_zone_/0-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) IPSetFuzz-8 33.0 ± 0% 33.0 ± 0% ~ (all equal) Signed-off-by: David Anderson <dave@natulte.net> Signed-off-by: Stefan Majer <stefan.majer@gmail.com>
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.
Signed-off-by: David Anderson dave@natulte.net