Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: {Int, Uint}.MarshalTo use an expensive (math/big.Int).Bytes() to detect if isZero instead of the cheaper .BitLen() #8575

Closed
4 tasks done
odeke-em opened this issue Feb 12, 2021 · 0 comments · Fixed by #8580
Assignees
Labels
T: Performance Performance improvements

Comments

@odeke-em
Copy link
Collaborator

Summary of Bug

While auditing I noticed this code

cosmos-sdk/types/uint.go

Lines 160 to 176 in f8a6987

func (u *Uint) MarshalTo(data []byte) (n int, err error) {
if u.i == nil {
u.i = new(big.Int)
}
if len(u.i.Bytes()) == 0 {
copy(data, []byte{0x30})
return 1, nil
}
bz, err := u.Marshal()
if err != nil {
return 0, err
}
copy(data, bz)
return len(bz), nil
}
for Int.MarshalTo and Uint.MarshalTo, but notice that it tries to check if a value is zero. That code invokes .Bytes() and checks the length, an allocation, just to only check if a value is zero. The math/big package provides a way to trivially check if a value is zero with .BitLen() == 0

Results

With just this small change, we see some good benefits
image

and then when we apply benchmarks over the entire project there are benefits

name                                                      old time/op    new time/op    delta
UintMarshal-8                                               11.0µs ±56%     8.1µs ±16%  -25.84%  (p=0.004 n=10+10)
RejectUnknownFields_parallel-8                               156ns ± 4%     158ns ± 1%   +1.87%  (p=0.043 n=10+7)
BcryptGenerateFromPassword/benchmark-security-param-9-8     30.6ms ± 1%    29.8ms ± 2%   -2.61%  (p=0.000 n=10+9)
BcryptGenerateFromPassword/benchmark-security-param-10-8    61.1ms ± 1%    58.9ms ± 0%   -3.68%  (p=0.000 n=10+8)
BcryptGenerateFromPassword/benchmark-security-param-11-8     123ms ± 1%     118ms ± 1%   -3.61%  (p=0.000 n=10+9)
BcryptGenerateFromPassword/benchmark-security-param-12-8     245ms ± 0%     236ms ± 1%   -3.41%  (p=0.000 n=9+10)
BcryptGenerateFromPassword/benchmark-security-param-13-8     489ms ± 0%     476ms ± 3%   -2.58%  (p=0.001 n=8+10)
BcryptGenerateFromPassword/benchmark-security-param-14-8     979ms ± 1%     940ms ± 1%   -4.00%  (p=0.000 n=10+9)
BcryptGenerateFromPassword/benchmark-security-param-15-8     1.97s ± 2%     1.90s ± 3%   -3.61%  (p=0.000 n=10+10)
CacheKVStoreIterator1000-8                                  76.0µs ± 0%   124.4µs ±40%  +63.63%  (p=0.003 n=8+10)
CacheKVStoreIterator100000-8                                33.3ms ± 2%    29.7ms ±13%  -10.78%  (p=0.004 n=9+8)
CacheKVStoreGetNoKeyFound-8                                  733ns ± 5%    1284ns ± 8%  +75.12%  (p=0.000 n=9+8)
IAVLIteratorNext-8                                           487ns ± 3%     524ns ±13%   +7.59%  (p=0.001 n=9+10)
MultistoreSnapshot100K-8                                     3.12s ± 1%     3.17s ± 1%   +1.48%  (p=0.000 n=8+10)
MultistoreSnapshot1M-8                                       31.5s ± 0%     31.8s ± 2%   +0.91%  (p=0.004 n=8+10)
MultistoreSnapshotRestore100K-8                              5.17s ± 1%     5.21s ± 1%   +0.71%  (p=0.016 n=8+10)
CoinsAdditionIntersect/sizes:_A_1,_B_1000-8                 14.0µs ± 2%    13.8µs ± 0%   -1.65%  (p=0.000 n=10+9)
CoinsAdditionIntersect/sizes:_A_2,_B_1000-8                 14.2µs ± 1%    14.0µs ± 2%   -1.18%  (p=0.021 n=8+9)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1-8                  48.3ns ± 1%    48.7ns ± 1%   +0.98%  (p=0.003 n=9+9)
CoinsAdditionNoIntersect/sizes:_A_5,_B_5-8                   496ns ± 1%     505ns ± 2%   +1.76%  (p=0.000 n=9+9)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1000-8               14.2µs ± 3%    14.0µs ± 2%   -1.72%  (p=0.014 n=10+10)
Bech32ifyPubKey-8                                           9.09µs ± 1%    9.39µs ± 2%   +3.36%  (p=0.000 n=10+10)
GetPubKeyFromBech32-8                                       14.2µs ± 2%    14.7µs ± 4%   +3.44%  (p=0.001 n=10+10)
ParseCoin-8                                                 8.51µs ± 0%    8.69µs ± 3%   +2.13%  (p=0.000 n=9+10)
AccountMapperSetAccount-8                                   4.86µs ± 1%    4.92µs ± 3%   +1.38%  (p=0.008 n=9+9)
BlockProvision-8                                             391ns ± 0%     396ns ± 2%   +1.27%  (p=0.001 n=9+9)
NextInflation-8                                             1.34µs ± 0%    1.34µs ± 0%   +0.32%  (p=0.031 n=8+9)

name                                                      old speed      new speed      delta
UintMarshal-8                                              680kB/s ±40%   872kB/s ±17%  +28.24%  (p=0.003 n=10+10)
RejectUnknownFields_parallel-8                             605MB/s ± 4%   593MB/s ± 1%   -1.87%  (p=0.043 n=10+7)

name                                                      old alloc/op   new alloc/op   delta
UintMarshal-8                                                 440B ± 0%      392B ± 0%  -10.91%  (p=0.000 n=10+10)
CacheKVStoreIterator1000-8                                  17.5kB ± 0%    17.5kB ± 0%   +0.15%  (p=0.028 n=10+10)
CacheKVStoreIterator100000-8                                6.46MB ± 1%    6.31MB ±11%   -2.36%  (p=0.022 n=10+9)
CacheKVStoreGetNoKeyFound-8                                   220B ± 1%      258B ± 0%  +17.21%  (p=0.000 n=9+8)

name                                                      old allocs/op  new allocs/op  delta
UintMarshal-8                                                 31.0 ± 0%      25.0 ± 0%  -19.35%  (p=0.000 n=10+10)
CacheKVStoreIterator1000-8                                    23.0 ± 0%      23.6 ± 3%   +2.61%  (p=0.011 n=10+10)
CacheKVStoreIterator100000-8                                 32.2k ± 5%     29.4k ±43%   -8.44%  (p=0.014 n=10+9)

Let's use the cheaper and more efficient alternative


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@odeke-em odeke-em added the T: Performance Performance improvements label Feb 12, 2021
odeke-em added a commit that referenced this issue Feb 14, 2021
Instead of using len((*math/big.Int).Bytes()) == 0, which expensively
creates a byte slice and marshals a value, on the majority hot path,
instead use the cheaper method .BitLen() to check if 0.

Benchmarking results, just from types:

name                                           old time/op    new time/op    delta
CoinsAdditionIntersect/sizes:_A_1,_B_1-8          132ns ± 2%     126ns ±13%   -4.55%  (p=0.050 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_5-8         1.41µs ± 3%    1.41µs ± 2%     ~     (p=1.000 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_20-8        2.30µs ± 1%    2.27µs ± 3%     ~     (p=0.066 n=10+10)
CoinsAdditionIntersect/sizes:_A_1,_B_1000-8      30.9µs ± 3%    30.7µs ± 1%     ~     (p=0.218 n=10+10)
CoinsAdditionIntersect/sizes:_A_2,_B_1000-8      31.4µs ± 3%    30.8µs ± 2%   -1.94%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1-8        116ns ± 1%     114ns ± 4%     ~     (p=0.142 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_5-8       1.11µs ± 1%    1.08µs ± 3%   -2.36%  (p=0.003 n=8+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_20-8      1.85µs ± 2%    1.82µs ± 1%   -1.38%  (p=0.001 n=10+9)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1000-8    30.7µs ± 1%    30.6µs ± 3%     ~     (p=0.393 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_2,_B_1000-8    31.1µs ± 1%    30.7µs ± 2%   -1.32%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1000,_B_2-8    31.0µs ± 2%    30.7µs ± 2%     ~     (p=0.190 n=10+10)
Bech32ifyPubKey-8                                28.8µs ± 5%    28.8µs ± 3%     ~     (p=0.965 n=10+8)
GetPubKeyFromBech32-8                            38.8µs ± 3%    39.4µs ± 2%   +1.70%  (p=0.013 n=9+10)
ParseCoin-8                                      16.7µs ± 6%    15.8µs ± 4%   -5.21%  (p=0.001 n=10+10)
MarshalTo-8                                       521ns ± 5%     508ns ± 3%   -2.56%  (p=0.029 n=10+10)
UintMarshal-8                                    3.10µs ±17%    2.56µs ± 3%  -17.45%  (p=0.000 n=10+9)
IntMarshal-8                                     2.52µs ±10%    1.94µs ± 2%  -23.10%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Bech32ifyPubKey-8                                4.02kB ± 0%    4.02kB ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                            2.48kB ± 0%    2.48kB ± 0%     ~     (all equal)
ParseCoin-8                                      2.21kB ± 0%    2.21kB ± 0%     ~     (all equal)
MarshalTo-8                                       80.0B ± 0%     80.0B ± 0%     ~     (all equal)
UintMarshal-8                                      440B ± 0%      392B ± 0%  -10.91%  (p=0.000 n=10+10)
IntMarshal-8                                       216B ± 0%      168B ± 0%  -22.22%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
Bech32ifyPubKey-8                                  25.0 ± 0%      25.0 ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                              85.0 ± 0%      85.0 ± 0%     ~     (all equal)
ParseCoin-8                                        71.0 ± 0%      71.0 ± 0%     ~     (all equal)
MarshalTo-8                                        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
UintMarshal-8                                      31.0 ± 0%      25.0 ± 0%  -19.35%  (p=0.000 n=10+10)
IntMarshal-8                                       24.0 ± 0%      18.0 ± 0%  -25.00%  (p=0.000 n=10+10)

name                                           old speed      new speed      delta
UintMarshal-8                                  2.27MB/s ±15%  2.75MB/s ± 2%  +20.87%  (p=0.000 n=10+8)
IntMarshal-8                                   2.78MB/s ± 9%  3.60MB/s ± 2%  +29.69%  (p=0.000 n=10+10)

Fixes #8575
@odeke-em odeke-em self-assigned this Feb 14, 2021
odeke-em added a commit that referenced this issue Feb 15, 2021
Instead of using len((*math/big.Int).Bytes()) == 0, which expensively
creates a byte slice and marshals a value, on the majority hot path,
instead use the cheaper method .BitLen() to check if 0.

Benchmarking results, just from types:

name                                           old time/op    new time/op    delta
CoinsAdditionIntersect/sizes:_A_1,_B_1-8          132ns ± 2%     126ns ±13%   -4.55%  (p=0.050 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_5-8         1.41µs ± 3%    1.41µs ± 2%     ~     (p=1.000 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_20-8        2.30µs ± 1%    2.27µs ± 3%     ~     (p=0.066 n=10+10)
CoinsAdditionIntersect/sizes:_A_1,_B_1000-8      30.9µs ± 3%    30.7µs ± 1%     ~     (p=0.218 n=10+10)
CoinsAdditionIntersect/sizes:_A_2,_B_1000-8      31.4µs ± 3%    30.8µs ± 2%   -1.94%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1-8        116ns ± 1%     114ns ± 4%     ~     (p=0.142 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_5-8       1.11µs ± 1%    1.08µs ± 3%   -2.36%  (p=0.003 n=8+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_20-8      1.85µs ± 2%    1.82µs ± 1%   -1.38%  (p=0.001 n=10+9)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1000-8    30.7µs ± 1%    30.6µs ± 3%     ~     (p=0.393 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_2,_B_1000-8    31.1µs ± 1%    30.7µs ± 2%   -1.32%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1000,_B_2-8    31.0µs ± 2%    30.7µs ± 2%     ~     (p=0.190 n=10+10)
Bech32ifyPubKey-8                                28.8µs ± 5%    28.8µs ± 3%     ~     (p=0.965 n=10+8)
GetPubKeyFromBech32-8                            38.8µs ± 3%    39.4µs ± 2%   +1.70%  (p=0.013 n=9+10)
ParseCoin-8                                      16.7µs ± 6%    15.8µs ± 4%   -5.21%  (p=0.001 n=10+10)
MarshalTo-8                                       521ns ± 5%     508ns ± 3%   -2.56%  (p=0.029 n=10+10)
UintMarshal-8                                    3.10µs ±17%    2.56µs ± 3%  -17.45%  (p=0.000 n=10+9)
IntMarshal-8                                     2.52µs ±10%    1.94µs ± 2%  -23.10%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Bech32ifyPubKey-8                                4.02kB ± 0%    4.02kB ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                            2.48kB ± 0%    2.48kB ± 0%     ~     (all equal)
ParseCoin-8                                      2.21kB ± 0%    2.21kB ± 0%     ~     (all equal)
MarshalTo-8                                       80.0B ± 0%     80.0B ± 0%     ~     (all equal)
UintMarshal-8                                      440B ± 0%      392B ± 0%  -10.91%  (p=0.000 n=10+10)
IntMarshal-8                                       216B ± 0%      168B ± 0%  -22.22%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
Bech32ifyPubKey-8                                  25.0 ± 0%      25.0 ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                              85.0 ± 0%      85.0 ± 0%     ~     (all equal)
ParseCoin-8                                        71.0 ± 0%      71.0 ± 0%     ~     (all equal)
MarshalTo-8                                        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
UintMarshal-8                                      31.0 ± 0%      25.0 ± 0%  -19.35%  (p=0.000 n=10+10)
IntMarshal-8                                       24.0 ± 0%      18.0 ± 0%  -25.00%  (p=0.000 n=10+10)

name                                           old speed      new speed      delta
UintMarshal-8                                  2.27MB/s ±15%  2.75MB/s ± 2%  +20.87%  (p=0.000 n=10+8)
IntMarshal-8                                   2.78MB/s ± 9%  3.60MB/s ± 2%  +29.69%  (p=0.000 n=10+10)

Fixes #8575
@mergify mergify bot closed this as completed in #8580 Feb 15, 2021
mergify bot added a commit that referenced this issue Feb 15, 2021
Instead of using len((*math/big.Int).Bytes()) == 0, which expensively
creates a byte slice and marshals a value, on the majority hot path,
instead use the cheaper method .BitLen() to check if 0.

Benchmarking results, just from types:

name                                           old time/op    new time/op    delta
CoinsAdditionIntersect/sizes:_A_1,_B_1-8          132ns ± 2%     126ns ±13%   -4.55%  (p=0.050 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_5-8         1.41µs ± 3%    1.41µs ± 2%     ~     (p=1.000 n=10+10)
CoinsAdditionIntersect/sizes:_A_5,_B_20-8        2.30µs ± 1%    2.27µs ± 3%     ~     (p=0.066 n=10+10)
CoinsAdditionIntersect/sizes:_A_1,_B_1000-8      30.9µs ± 3%    30.7µs ± 1%     ~     (p=0.218 n=10+10)
CoinsAdditionIntersect/sizes:_A_2,_B_1000-8      31.4µs ± 3%    30.8µs ± 2%   -1.94%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1-8        116ns ± 1%     114ns ± 4%     ~     (p=0.142 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_5-8       1.11µs ± 1%    1.08µs ± 3%   -2.36%  (p=0.003 n=8+10)
CoinsAdditionNoIntersect/sizes:_A_5,_B_20-8      1.85µs ± 2%    1.82µs ± 1%   -1.38%  (p=0.001 n=10+9)
CoinsAdditionNoIntersect/sizes:_A_1,_B_1000-8    30.7µs ± 1%    30.6µs ± 3%     ~     (p=0.393 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_2,_B_1000-8    31.1µs ± 1%    30.7µs ± 2%   -1.32%  (p=0.015 n=10+10)
CoinsAdditionNoIntersect/sizes:_A_1000,_B_2-8    31.0µs ± 2%    30.7µs ± 2%     ~     (p=0.190 n=10+10)
Bech32ifyPubKey-8                                28.8µs ± 5%    28.8µs ± 3%     ~     (p=0.965 n=10+8)
GetPubKeyFromBech32-8                            38.8µs ± 3%    39.4µs ± 2%   +1.70%  (p=0.013 n=9+10)
ParseCoin-8                                      16.7µs ± 6%    15.8µs ± 4%   -5.21%  (p=0.001 n=10+10)
MarshalTo-8                                       521ns ± 5%     508ns ± 3%   -2.56%  (p=0.029 n=10+10)
UintMarshal-8                                    3.10µs ±17%    2.56µs ± 3%  -17.45%  (p=0.000 n=10+9)
IntMarshal-8                                     2.52µs ±10%    1.94µs ± 2%  -23.10%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
Bech32ifyPubKey-8                                4.02kB ± 0%    4.02kB ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                            2.48kB ± 0%    2.48kB ± 0%     ~     (all equal)
ParseCoin-8                                      2.21kB ± 0%    2.21kB ± 0%     ~     (all equal)
MarshalTo-8                                       80.0B ± 0%     80.0B ± 0%     ~     (all equal)
UintMarshal-8                                      440B ± 0%      392B ± 0%  -10.91%  (p=0.000 n=10+10)
IntMarshal-8                                       216B ± 0%      168B ± 0%  -22.22%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
Bech32ifyPubKey-8                                  25.0 ± 0%      25.0 ± 0%     ~     (all equal)
GetPubKeyFromBech32-8                              85.0 ± 0%      85.0 ± 0%     ~     (all equal)
ParseCoin-8                                        71.0 ± 0%      71.0 ± 0%     ~     (all equal)
MarshalTo-8                                        2.00 ± 0%      2.00 ± 0%     ~     (all equal)
UintMarshal-8                                      31.0 ± 0%      25.0 ± 0%  -19.35%  (p=0.000 n=10+10)
IntMarshal-8                                       24.0 ± 0%      18.0 ± 0%  -25.00%  (p=0.000 n=10+10)

name                                           old speed      new speed      delta
UintMarshal-8                                  2.27MB/s ±15%  2.75MB/s ± 2%  +20.87%  (p=0.000 n=10+8)
IntMarshal-8                                   2.78MB/s ± 9%  3.60MB/s ± 2%  +29.69%  (p=0.000 n=10+10)

Fixes #8575

Co-authored-by: Alessio Treglia <alessio@tendermint.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants