Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basenc: perform faster, streaming encoding #6719

Merged
merged 17 commits into from
Oct 5, 2024

Conversation

andrewliebenow
Copy link
Contributor

Improve the performance, both in memory and time, of the encoding performed by the basenc (except in --z85 mode), base32, and base64 programs.

These programs now perform encoding in a buffered/streaming manner, so encoding is not constrained by the amount of available memory.

Improve the performance, both in memory and time, of the encoding
performed by the basenc (except in --z85 mode), base32, and base64
programs.

These programs now perform encoding in a buffered/streaming manner,
so encoding is not constrained by the amount of available memory.
@andrewliebenow
Copy link
Contributor Author

Setup

❯ dd if=/dev/urandom of=/dev/shm/one-random-gibibyte bs=1024 count=1048576

❯ du -k /dev/shm/one-random-gibibyte
1048576 /dev/shm/one-random-gibibyte

❯ cargo build --bin coreutils --features base64 --no-default-features --profile release

No wrapping

New implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.70
Maximum resident set size (kbytes): 2716

❯ /usr/bin/time --verbose -- ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.56
        System time (seconds): 0.13
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.70
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2716
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 119
        Voluntary context switches: 1
        Involuntary context switches: 1
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Existing implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.99
Maximum resident set size (kbytes): 2452180

❯ /usr/bin/time --verbose -- coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.61
        System time (seconds): 1.36
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2452180
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 611915
        Voluntary context switches: 1
        Involuntary context switches: 157
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Core Utilities's implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.72
Maximum resident set size (kbytes): 1996

❯ /usr/bin/time --verbose -- /usr/bin/base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "/usr/bin/base64 --wrap 0 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.61
        System time (seconds): 0.10
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.72
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1996
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 103
        Voluntary context switches: 2
        Involuntary context switches: 14
        Swaps: 0
        File system inputs: 16
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Default wrapping (76 characters)

New implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.40
Maximum resident set size (kbytes): 2592

❯ /usr/bin/time --verbose -- ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 3.22
        System time (seconds): 0.16
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.40
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2592
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 122
        Voluntary context switches: 1
        Involuntary context switches: 29
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Existing implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.04
Maximum resident set size (kbytes): 2452504

❯ /usr/bin/time --verbose -- coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "coreutils base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 3.21
        System time (seconds): 4.78
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.04
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2452504
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 611918
        Voluntary context switches: 1
        Involuntary context switches: 491
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

GNU Core Utilities's implementation

Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.99
Maximum resident set size (kbytes): 1912

❯ /usr/bin/time --verbose -- /usr/bin/base64 -- /dev/shm/one-random-gibibyte 1>/dev/null
        Command being timed: "/usr/bin/base64 -- /dev/shm/one-random-gibibyte"
        User time (seconds): 0.79
        System time (seconds): 0.19
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1912
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 102
        Voluntary context switches: 1
        Involuntary context switches: 52
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Contributor

Please use hyperfine for benchmarking. Time isn't reliable enough for performances

@andrewliebenow
Copy link
Contributor Author

(Was having trouble getting poop to pipe output to /dev/null without using a shell script)

benchmark.sh:

#!/bin/sh

set \
	-C

set \
	-e

set \
	-f

set \
	-u

set \
	-x

${BASE_SIX_FOUR_BINARY:?} \
	--wrap \
	"${BASE_SIX_FOUR_WRAP_ARGUMENT:?}" \
	-- \
	/dev/shm/one-random-gibibyte \
	1>/dev/null

No wrapping

poop

New implementation

❯ BASE_SIX_FOUR_BINARY='./target/release/coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' poop --duration 10000 './benchmark.sh'
Benchmark 1 (14 runs): ./benchmark.sh
  measurement          mean ± σ            min … max           outliers
  wall_time           724ms ± 8.12ms     711ms …  736ms          0 ( 0%)        
  peak_rss           3.48MB ± 99.1KB    3.33MB … 3.65MB          0 ( 0%)        
  cpu_cycles         2.19G  ± 12.2M     2.18G  … 2.22G           0 ( 0%)        
  instructions       9.55G  ±  167      9.55G  … 9.55G           0 ( 0%)        
  cache_references   35.7M  ± 1.44M     33.8M  … 38.1M           0 ( 0%)        
  cache_misses        328K  ± 24.6K      306K  …  385K           0 ( 0%)        
  branch_misses      2.33M  ± 3.70K     2.32M  … 2.33M           0 ( 0%)        

Existing implementation

❯ BASE_SIX_FOUR_BINARY='coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' poop --duration 10000 './benchmark.sh' 
Benchmark 1 (5 runs): ./benchmark.sh
  measurement          mean ± σ            min … max           outliers
  wall_time          2.01s  ± 66.5ms    1.95s  … 2.13s           0 ( 0%)        
  peak_rss           2.51GB ±  108KB    2.51GB … 2.51GB          0 ( 0%)        
  cpu_cycles         2.38G  ± 27.6M     2.36G  … 2.42G           0 ( 0%)        
  instructions       9.40G  ±  370      9.40G  … 9.40G           0 ( 0%)        
  cache_references    116M  ±  320K      115M  …  116M           0 ( 0%)        
  cache_misses       1.44M  ± 46.4K     1.40M  … 1.51M           0 ( 0%)        
  branch_misses       372K  ±  294       372K  …  373K           0 ( 0%)        

hyperfine

New implementation

❯ BASE_SIX_FOUR_BINARY='./target/release/coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' hyperfine -- './benchmark.sh'         
Benchmark 1: ./benchmark.sh
  Time (mean ± σ):     714.1 ms ±  11.2 ms    [User: 535.9 ms, System: 172.8 ms]
  Range (min … max):   699.6 ms … 731.7 ms    10 runs

Existing implementation

❯ BASE_SIX_FOUR_BINARY='coreutils base64' BASE_SIX_FOUR_WRAP_ARGUMENT='0' hyperfine -- './benchmark.sh'        
Benchmark 1: ./benchmark.sh
  Time (mean ± σ):      1.975 s ±  0.042 s    [User: 0.565 s, System: 1.396 s]
  Range (min … max):    1.929 s …  2.084 s    10 runs

@sylvestre
Copy link
Contributor

you should call hyperfine once with the old and new implementation
it will compare the results

@andrewliebenow
Copy link
Contributor Author

No wrapping

❯ hyperfine ' ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null ' ' coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null '
Benchmark 1:  ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):     733.6 ms ±  21.8 ms    [User: 545.9 ms, System: 182.4 ms]
  Range (min … max):   710.7 ms … 772.3 ms    10 runs
 
Benchmark 2:  coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      2.051 s ±  0.022 s    [User: 0.565 s, System: 1.469 s]
  Range (min … max):    2.013 s …  2.082 s    10 runs
 
Summary
   ./target/release/coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null  ran
    2.80 ± 0.09 times faster than  coreutils base64 --wrap 0 -- /dev/shm/one-random-gibibyte 1>/dev/null 

Default wrapping (76 characters)

❯ hyperfine ' ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null ' ' coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null '           
Benchmark 1:  ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      3.288 s ±  0.157 s    [User: 3.093 s, System: 0.177 s]
  Range (min … max):    3.023 s …  3.461 s    10 runs
 
Benchmark 2:  coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      8.115 s ±  0.067 s    [User: 3.234 s, System: 4.831 s]
  Range (min … max):    8.006 s …  8.220 s    10 runs
 
Summary
   ./target/release/coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null  ran
    2.47 ± 0.12 times faster than  coreutils base64 -- /dev/shm/one-random-gibibyte 1>/dev/null 

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Same as previous changes, just applied to decoding
@andrewliebenow
Copy link
Contributor Author

Decoding performance data

Setup

❯ cargo \
        build \
        --bin \
        coreutils \
        --features \
        base32 \
        --features \
        base64 \
        --features \
        basenc \
        --no-default-features \
        --profile release

❯ dd if=/dev/urandom of=/dev/stdout bs=1024 count=1048576 | /usr/bin/base64 --wrap 0 -- - 1>/dev/shm/one-random-gibibyte-base-six-four-encoded

❯ du -k -- /dev/shm/one-random-gibibyte-base-six-four-encoded
1398104 /dev/shm/one-random-gibibyte-base-six-four-encoded

Hyperfine

❯ hyperfine \
        -- \
        ' ./target/release/coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null ' \
        ' coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null ' \
        ' /usr/bin/base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null '

Benchmark 1:  ./target/release/coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null 
  Time (mean ± σ):     906.3 ms ±  10.7 ms    [User: 631.3 ms, System: 269.8 ms]
  Range (min … max):   893.4 ms … 920.2 ms    10 runs
 
Benchmark 2:  coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null 
  Time (mean ± σ):      2.815 s ±  0.033 s    [User: 1.274 s, System: 1.524 s]
  Range (min … max):    2.775 s …  2.871 s    10 runs
 
Benchmark 3:  /usr/bin/base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null 
  Time (mean ± σ):      1.652 s ±  0.036 s    [User: 1.338 s, System: 0.305 s]
  Range (min … max):    1.611 s …  1.733 s    10 runs
 
Summary
   ./target/release/coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null  ran
    1.82 ± 0.04 times faster than  /usr/bin/base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null 
    3.11 ± 0.05 times faster than  coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded 1>/dev/null 

poop

❯ poop \
        ' ./target/release/coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded ' \
        ' coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded ' \
        ' /usr/bin/base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded '

Benchmark 1 (6 runs): ./target/release/coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           905ms ± 13.7ms     893ms …  920ms          0 ( 0%)        0%
  peak_rss           2.73MB ± 45.7KB    2.68MB … 2.79MB          0 ( 0%)        0%
  cpu_cycles         2.49G  ± 3.69M     2.49G  … 2.50G           0 ( 0%)        0%
  instructions       11.1G  ±  196      11.1G  … 11.1G           0 ( 0%)        0%
  cache_references   20.9M  ±  896K     20.1M  … 22.1M           0 ( 0%)        0%
  cache_misses        519K  ± 23.9K      494K  …  556K           0 ( 0%)        0%
  branch_misses      2.88M  ± 2.87K     2.87M  … 2.88M           0 ( 0%)        0%
Benchmark 2 (3 runs): coreutils base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.81s  ± 30.0ms    2.78s  … 2.84s           0 ( 0%)        💩+211.0% ±  3.7%
  peak_rss           2.51GB ±  144KB    2.51GB … 2.51GB          0 ( 0%)        💩+92020.0% ±  5.3%
  cpu_cycles         5.29G  ± 30.4M     5.26G  … 5.32G           0 ( 0%)        💩+112.2% ±  1.1%
  instructions       22.2G  ± 37.1      22.2G  … 22.2G           0 ( 0%)        💩+ 99.5% ±  0.0%
  cache_references    118M  ±  513K      118M  …  119M           0 ( 0%)        💩+465.3% ±  6.4%
  cache_misses        946K  ± 1.87K      945K  …  948K           0 ( 0%)        💩+ 82.4% ±  6.5%
  branch_misses       272K  ±  812       271K  …  273K           0 ( 0%)        ⚡- 90.5% ±  0.1%
Benchmark 3 (4 runs): /usr/bin/base64 --decode -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.63s  ± 40.5ms    1.60s  … 1.69s           0 ( 0%)        💩+ 79.7% ±  4.5%
  peak_rss           2.00MB ± 45.5KB    1.97MB … 2.06MB          0 ( 0%)        ⚡- 26.7% ±  2.5%
  cpu_cycles         5.39G  ±  151M     5.29G  … 5.61G           0 ( 0%)        💩+116.2% ±  5.5%
  instructions       25.3G  ±  108      25.3G  … 25.3G           0 ( 0%)        💩+127.8% ±  0.0%
  cache_references   7.61M  ±  629K     7.10M  … 8.48M           0 ( 0%)        ⚡- 63.7% ±  5.7%
  cache_misses        518K  ± 2.22K      515K  …  520K           1 (25%)          -  0.1% ±  5.4%
  branch_misses      4.08M  ± 15.1K     4.06M  … 4.10M           0 ( 0%)        💩+ 41.8% ±  0.5%

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@andrewliebenow
Copy link
Contributor Author

Last one:

❯ poop \
        ' ./target/release/coreutils base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded ' \
        ' coreutils base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded ' \
        ' /usr/bin/base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded '

Benchmark 1 (4 runs): ./target/release/coreutils base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.47s  ± 12.7ms    1.46s  … 1.49s           0 ( 0%)        0%
  peak_rss           2.73MB ± 41.1KB    2.67MB … 2.76MB          1 (25%)        0%
  cpu_cycles         4.88G  ± 35.6M     4.86G  … 4.93G           0 ( 0%)        0%
  instructions       21.1G  ±  272      21.1G  … 21.1G           0 ( 0%)        0%
  cache_references   25.9M  ± 1.17M     24.7M  … 27.5M           0 ( 0%)        0%
  cache_misses        484K  ± 19.0K      457K  …  499K           1 (25%)        0%
  branch_misses      3.05M  ± 1.69K     3.05M  … 3.05M           0 ( 0%)        0%
Benchmark 2 (3 runs): coreutils base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          31.6s  ± 60.1ms    31.5s  … 31.6s           0 ( 0%)        💩+2045.4% ±  5.2%
  peak_rss           2.51GB ± 35.5KB    2.51GB … 2.51GB          0 ( 0%)        💩+91805.4% ±  2.8%
  cpu_cycles          125G  ± 82.8M      125G  …  125G           0 ( 0%)        💩+2455.1% ±  2.4%
  instructions        174G  ±  492       174G  …  174G           0 ( 0%)        💩+723.0% ±  0.0%
  cache_references    121M  ±  433K      120M  …  121M           0 ( 0%)        💩+365.6% ±  7.2%
  cache_misses       1.32M  ± 52.3K     1.26M  … 1.36M           0 ( 0%)        💩+172.3% ± 14.7%
  branch_misses      2.35G  ± 13.8M     2.34G  … 2.36G           0 ( 0%)        💩+76749.9% ± 559.8%
Benchmark 3 (3 runs): /usr/bin/base64 --decode --ignore-garbage -- /dev/shm/one-random-gibibyte-base-six-four-encoded
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          2.30s  ± 58.5ms    2.24s  … 2.36s           0 ( 0%)        💩+ 56.6% ±  5.1%
  peak_rss           1.99MB ± 52.2KB    1.96MB … 2.05MB          0 ( 0%)        ⚡- 27.1% ±  3.3%
  cpu_cycles         8.11G  ±  235M     7.86G  … 8.32G           0 ( 0%)        💩+ 66.2% ±  6.1%
  instructions       36.8G  ±  128      36.8G  … 36.8G           0 ( 0%)        💩+ 74.1% ±  0.0%
  cache_references   10.5M  ± 1.26M     9.63M  … 12.0M           0 ( 0%)        ⚡- 59.3% ±  9.2%
  cache_misses        629K  ±  104K      532K  …  738K           0 ( 0%)        💩+ 29.8% ± 27.2%
  branch_misses      4.46M  ± 25.9K     4.45M  … 4.49M           0 ( 0%)        💩+ 46.2% ±  1.1%

@andrewliebenow
Copy link
Contributor Author

Encoding with line wrapping enabled was the last remaining scenario I'm aware of in which GNU Core Utilities was still faster, and now that's been fixed:

hyperfine

❯ hyperfine \
        ' ./target/release/coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null ' \
        ' coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null ' \
        ' /usr/bin/base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null '
Benchmark 1:  ./target/release/coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):     774.4 ms ±  18.6 ms    [User: 603.9 ms, System: 165.0 ms]
  Range (min … max):   761.5 ms … 821.8 ms    10 runs
 
Benchmark 2:  coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      8.064 s ±  0.146 s    [User: 3.241 s, System: 4.766 s]
  Range (min … max):    7.875 s …  8.266 s    10 runs
 
Benchmark 3:  /usr/bin/base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null 
  Time (mean ± σ):      1.038 s ±  0.022 s    [User: 0.840 s, System: 0.188 s]
  Range (min … max):    1.017 s …  1.091 s    10 runs
 
Summary
   ./target/release/coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null  ran
    1.34 ± 0.04 times faster than  /usr/bin/base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null 
   10.41 ± 0.31 times faster than  coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte 1>/dev/null 

poop

❯ poop \
    ' coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte ' \
    ' /usr/bin/base64 --wrap 76 -- /dev/shm/one-random-gibibyte ' \
    ' ./target/release/coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte '

Benchmark 1 (3 runs): coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          7.99s  ± 72.5ms    7.93s  … 8.07s           0 ( 0%)        0%
  peak_rss           2.51GB ±  207KB    2.51GB … 2.51GB          0 ( 0%)        0%
  cpu_cycles         10.4G  ± 72.8M     10.3G  … 10.4G           0 ( 0%)        0%
  instructions       20.8G  ±  306      20.8G  … 20.8G           0 ( 0%)        0%
  cache_references    120M  ±  496K      120M  …  121M           0 ( 0%)        0%
  cache_misses       1.96M  ± 91.7K     1.86M  … 2.02M           0 ( 0%)        0%
  branch_misses       143M  ±  117K      142M  …  143M           0 ( 0%)        0%
Benchmark 2 (5 runs): /usr/bin/base64 --wrap 76 -- /dev/shm/one-random-gibibyte
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.03s  ± 32.4ms    1.00s  … 1.08s           0 ( 0%)        ⚡- 87.1% ±  1.1%
  peak_rss           2.05MB ± 35.9KB    2.00MB … 2.08MB          0 ( 0%)        ⚡- 99.9% ±  0.0%
  cpu_cycles         3.35G  ± 98.7M     3.29G  … 3.52G           0 ( 0%)        ⚡- 67.8% ±  1.6%
  instructions       14.5G  ± 70.1      14.5G  … 14.5G           0 ( 0%)        ⚡- 30.3% ±  0.0%
  cache_references   72.7M  ± 3.23M     69.7M  … 77.8M           0 ( 0%)        ⚡- 39.5% ±  3.9%
  cache_misses        524K  ±  151K      339K  …  720K           0 ( 0%)        ⚡- 73.3% ± 12.2%
  branch_misses      3.64M  ± 7.71K     3.63M  … 3.65M           0 ( 0%)        ⚡- 97.4% ±  0.1%
Benchmark 3 (7 runs): ./target/release/coreutils base64 --wrap 76 -- /dev/shm/one-random-gibibyte
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           778ms ± 14.8ms     763ms …  807ms          0 ( 0%)        ⚡- 90.3% ±  0.8%
  peak_rss           2.84MB ± 67.6KB    2.70MB … 2.90MB          0 ( 0%)        ⚡- 99.9% ±  0.0%
  cpu_cycles         2.46G  ± 31.4M     2.44G  … 2.53G           0 ( 0%)        ⚡- 76.3% ±  0.7%
  instructions       9.60G  ±  289      9.60G  … 9.60G           0 ( 0%)        ⚡- 53.7% ±  0.0%
  cache_references   93.3M  ± 2.01M     91.3M  … 97.4M           0 ( 0%)        ⚡- 22.4% ±  2.3%
  cache_misses        397K  ± 43.8K      339K  …  476K           0 ( 0%)        ⚡- 79.8% ±  4.8%
  branch_misses      2.58M  ± 11.1K     2.56M  … 2.60M           0 ( 0%)        ⚡- 98.2% ±  0.1%

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@BenWiederhake
Copy link
Collaborator

Not directly related, but I'm mentioning it for completeness: #6008

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, and your performance comparison looks awesome, thank you!

I have some suggestions about some names and style decisions that I feel are way too unusual and surprising, and would like to ask you to change them.

Also, Codecov complains about lines being untested, and there seem to be no unit test for the hundreds of new lines. Please fix that.

This PR improves base* substantially, and I'm looking forward to seeing it merged!

src/uu/base32/src/base_common.rs Outdated Show resolved Hide resolved
src/uu/base32/src/base_common.rs Outdated Show resolved Hide resolved
src/uu/base32/src/base_common.rs Outdated Show resolved Hide resolved
src/uu/base32/src/base_common.rs Outdated Show resolved Hide resolved
src/uu/base32/src/base_common.rs Outdated Show resolved Hide resolved
src/uucore/src/lib/features/encoding.rs Outdated Show resolved Hide resolved
src/uucore/src/lib/features/encoding.rs Outdated Show resolved Hide resolved
src/uu/base32/src/base_common.rs Show resolved Hide resolved
src/uucore/src/lib/features/encoding.rs Show resolved Hide resolved
tests/by-util/test_basenc.rs Outdated Show resolved Hide resolved
@andrewliebenow
Copy link
Contributor Author

I have added property testing of the core decoding/encoding logic to the PR. Doing so helped catch two bugs: one that I introduced in cc9d92cecb7446942579196e6c8d46c2f9824149 while extracting some logic into a helper function (I accidentally transposed two arguments). This bug was not caught by the existing unit tests, because it only manifested on large-ish inputs.

The other bug was related to my misunderstanding of input length validation for Z85.

Anyway, this definitely sold me on property testing. I have tuned the parameters so that the property tests for base64/base32/basenc complete within a minute (on my machine, at least).

Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/timeout/timeout is no longer failing!

@sylvestre
Copy link
Contributor

Could you please add your methodology into src/uu/base64/BENCHMARKING.md like we are doing for other programs?

thanks

Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/timeout/timeout is no longer failing!

@andrewliebenow
Copy link
Contributor Author

Copy link

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@sylvestre
Copy link
Contributor

great doc, it is a nice read :)

Copy link

GNU testsuite comparison:

Congrats! The gnu test tests/timeout/timeout is no longer failing!

@andrewliebenow andrewliebenow force-pushed the basenc-faster-encoding branch 2 times, most recently from 26782d5 to 9fa405f Compare October 4, 2024 00:27
Copy link

github-actions bot commented Oct 4, 2024

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

Copy link

github-actions bot commented Oct 5, 2024

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?

@sylvestre sylvestre merged commit d41d51a into uutils:main Oct 5, 2024
67 of 68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants