Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix foreach(f, other arguments..., product(A, B)) #113

Merged
merged 3 commits into from
Jun 28, 2020
Merged

Fix foreach(f, other arguments..., product(A, B)) #113

merged 3 commits into from
Jun 28, 2020

Conversation

tkf
Copy link
Owner

@tkf tkf commented Jun 28, 2020

@tkf tkf changed the title Fix Fix foreach(f, other arguments..., product(A, B)) Jun 28, 2020
@codecov
Copy link

codecov bot commented Jun 28, 2020

Codecov Report

Merging #113 into master will decrease coverage by 1.46%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
- Coverage   80.34%   78.88%   -1.47%     
==========================================
  Files           8        8              
  Lines         407      412       +5     
==========================================
- Hits          327      325       -2     
- Misses         80       87       +7     
Impacted Files Coverage Δ
src/foreach.jl 81.81% <0.00%> (-3.04%) ⬇️
src/utils.jl 73.21% <0.00%> (-1.34%) ⬇️
src/countingsort.jl 11.47% <0.00%> (-1.03%) ⬇️
src/map.jl 85.71% <0.00%> (-0.96%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a573a0b...d1ab849. Read the comment docs.

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmarks:
    • Target: 28 Jun 2020 - 08:31
    • Baseline: 28 Jun 2020 - 08:36
  • Package commits:
    • Target: 1545a9
    • Baseline: 7362ea
  • Julia commits:
    • Target: 44fa15
    • Baseline: 44fa15
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2
    • Baseline: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["findfirst", "0%", "tx-noterm"] 0.88 (5%) ✅ 1.00 (1%)
["findfirst", "0%", "tx-seq"] 0.95 (5%) ✅ 1.00 (1%)
["findfirst", "10%", "base"] 0.58 (5%) ✅ 1.00 (1%)
["findfirst", "10%", "tx"] 0.74 (5%) ✅ 1.00 (1%)
["findfirst", "10%", "tx-noterm"] 0.68 (5%) ✅ 0.93 (1%) ✅
["findfirst", "10%", "tx-seq"] 0.86 (5%) ✅ 1.00 (1%)
["findfirst", "20%", "base"] 0.58 (5%) ✅ 1.00 (1%)
["findfirst", "20%", "tx"] 0.70 (5%) ✅ 1.00 (1%)
["findfirst", "20%", "tx-noterm"] 0.66 (5%) ✅ 0.89 (1%) ✅
["findfirst", "20%", "tx-seq"] 0.86 (5%) ✅ 1.00 (1%)
["findfirst", "30%", "tx"] 0.78 (5%) ✅ 1.00 (1%)
["findfirst", "30%", "tx-noterm"] 0.70 (5%) ✅ 1.00 (1%)
["findfirst", "40%", "base"] 0.91 (5%) ✅ 1.00 (1%)
["findfirst", "40%", "tx"] 0.73 (5%) ✅ 1.00 (1%)
["findfirst", "40%", "tx-noterm"] 0.64 (5%) ✅ 1.00 (1%)
["findfirst", "40%", "tx-seq"] 0.86 (5%) ✅ 1.00 (1%)
["findfirst", "50%", "base"] 1.06 (5%) ❌ 1.00 (1%)
["findfirst", "50%", "tx"] 0.75 (5%) ✅ 1.00 (1%)
["findfirst", "50%", "tx-noterm"] 0.64 (5%) ✅ 1.00 (1%)
["foreach", "base", "A .= B .+ C"] 1.08 (5%) ❌ 1.00 (1%)
["foreach_seq", "base", "Transpose"] 0.86 (5%) ✅ 1.00 (1%)
["foreach_seq", "tx", "Matrix"] 0.87 (5%) ✅ 1.00 (1%)
["foreach_seq", "tx", "Vector"] 0.87 (5%) ✅ 1.00 (1%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 43769.00 (5%) ❌ 1.00 (1%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 51061.68 (5%) ❌ 1.00 (1%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 49292.93 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.50 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.31 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 1.16 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 1.16 (5%) ❌ 1.00 (1%)
["sort", "F64 (wide)", "Base"] 1.14 (5%) ❌ 1.00 (1%)
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 1.09 (5%) ❌ 1.00 (1%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 1.05 (5%) ❌ 1.00 (1%)
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 1.05 (5%) ❌ 1.00 (1%)
["sort", "reversed", "Base"] 0.94 (5%) ✅ 1.00 (1%)
["sort", "sorted", "ThreadsX.MergeSort"] 0.93 (5%) ✅ 1.00 (1%)
["sort", "sorted", "ThreadsX.QuickSort"] 1.06 (5%) ❌ 1.00 (1%)
["sort", "sorted", "ThreadsX.StableQuickSort"] 0.93 (5%) ✅ 1.00 (1%)
["unique", "rand(1:10, 1000000)", "base"] 0.95 (5%) ✅ 1.00 (1%)
["unique", "rand(1:10, 1000000)", "tx"] 0.91 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Target

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      46684 s          0 s       2635 s      34840 s          0 s
       #2  2095 MHz      58232 s          0 s       3022 s      22685 s          0 s
       
  Memory: 6.764884948730469 GB (2069.7109375 MB free)
  Uptime: 861.0 sec
  Load Avg:  1.2509765625  1.3212890625  0.89208984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Baseline

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      66797 s          0 s       3172 s      49089 s          0 s
       #2  2095 MHz      84898 s          0 s       3826 s      30137 s          0 s
       
  Memory: 6.764884948730469 GB (2471.28125 MB free)
  Uptime: 1211.0 sec
  Load Avg:  1.2265625  1.33251953125  1.04248046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Target result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:31
  • Package commit: 1545a9
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 2.600 ns (5%)
["findfirst", "0%", "tx"] 21.900 μs (5%) 11.97 KiB (1%) 219
["findfirst", "0%", "tx-noterm"] 18.700 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 208.870 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 59.000 μs (5%)
["findfirst", "10%", "tx"] 66.800 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 181.701 μs (5%) 30.58 KiB (1%) 561
["findfirst", "10%", "tx-seq"] 87.900 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 118.701 μs (5%)
["findfirst", "20%", "tx"] 118.801 μs (5%) 21.33 KiB (1%) 393
["findfirst", "20%", "tx-noterm"] 182.701 μs (5%) 37.61 KiB (1%) 694
["findfirst", "20%", "tx-seq"] 175.801 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 265.702 μs (5%)
["findfirst", "30%", "tx"] 173.001 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 203.802 μs (5%) 28.28 KiB (1%) 520
["findfirst", "30%", "tx-seq"] 263.402 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 359.403 μs (5%)
["findfirst", "40%", "tx"] 229.702 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 228.702 μs (5%) 35.33 KiB (1%) 651
["findfirst", "40%", "tx-seq"] 351.002 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 508.604 μs (5%)
["findfirst", "50%", "tx"] 263.203 μs (5%) 37.70 KiB (1%) 698
["findfirst", "50%", "tx-noterm"] 291.702 μs (5%) 51.56 KiB (1%) 950
["findfirst", "50%", "tx-seq"] 438.603 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 275.542 ms (5%) 23.109 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 220.599 ms (5%) 23.058 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 7.328 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 6.316 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 3.923 ms (5%) 25.94 KiB (1%) 360
["foreach", "tx", "A .= B .+ C"] 3.254 ms (5%) 12.73 KiB (1%) 123
["foreach_seq", "base", "Matrix"] 720.506 μs (5%)
["foreach_seq", "base", "Transpose"] 1.761 ms (5%)
["foreach_seq", "base", "Vector"] 724.706 μs (5%)
["foreach_seq", "tx", "Matrix"] 628.105 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.128 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 624.404 μs (5%)
["foreach_seq_double", "cartesian", "man"] 19.700 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 20.500 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 23.800 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 19.600 μs (5%)
["foreach_seq_double", "linear", "man"] 49.393 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 43.769 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 51.062 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 49.293 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.051 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.047 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 2.678 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 2.678 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.137 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.668 ms (5%) 1.19 MiB (1%) 533
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 566.004 μs (5%) 965.13 KiB (1%) 1227
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 588.705 μs (5%) 1.02 MiB (1%) 1247
["sort", "F64 (wide)", "Base"] 6.167 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.268 ms (5%) 1.19 MiB (1%) 565
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 3.418 ms (5%) 1.01 MiB (1%) 2149
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 4.021 ms (5%) 1.39 MiB (1%) 2196
["sort", "I64 (narrow)", "Base"] 115.301 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 102.301 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 102.101 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 114.301 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.271 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.520 ms (5%) 1.19 MiB (1%) 554
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 3.299 ms (5%) 1.01 MiB (1%) 2236
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 4.012 ms (5%) 1.40 MiB (1%) 2272
["sort", "reversed", "Base"] 600.605 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.112 ms (5%) 1.18 MiB (1%) 435
["sort", "reversed", "ThreadsX.QuickSort"] 898.107 μs (5%) 998.77 KiB (1%) 1872
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.259 ms (5%) 1.36 MiB (1%) 1902
["sort", "sorted", "Base"] 568.405 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 821.306 μs (5%) 1.18 MiB (1%) 431
["sort", "sorted", "ThreadsX.QuickSort"] 876.507 μs (5%) 998.77 KiB (1%) 1872
["sort", "sorted", "ThreadsX.StableQuickSort"] 994.507 μs (5%) 1.36 MiB (1%) 1902
["unique", "rand(1:10, 1000000)", "base"] 7.655 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 4.320 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 7.317 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 5.035 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      46684 s          0 s       2635 s      34840 s          0 s
       #2  2095 MHz      58232 s          0 s       3022 s      22685 s          0 s
       
  Memory: 6.764884948730469 GB (2069.7109375 MB free)
  Uptime: 861.0 sec
  Load Avg:  1.2509765625  1.3212890625  0.89208984375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Baseline result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:36
  • Package commit: 7362ea
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 2.600 ns (5%)
["findfirst", "0%", "tx"] 22.900 μs (5%) 11.97 KiB (1%) 219
["findfirst", "0%", "tx-noterm"] 21.300 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 220.002 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 101.700 μs (5%)
["findfirst", "10%", "tx"] 90.101 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 265.302 μs (5%) 32.91 KiB (1%) 606
["findfirst", "10%", "tx-seq"] 102.100 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 203.501 μs (5%)
["findfirst", "20%", "tx"] 168.801 μs (5%) 21.33 KiB (1%) 393
["findfirst", "20%", "tx-noterm"] 276.402 μs (5%) 42.13 KiB (1%) 772
["findfirst", "20%", "tx-seq"] 204.201 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 262.902 μs (5%)
["findfirst", "30%", "tx"] 221.202 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 289.502 μs (5%) 28.30 KiB (1%) 521
["findfirst", "30%", "tx-seq"] 263.602 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 396.003 μs (5%)
["findfirst", "40%", "tx"] 312.702 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 358.502 μs (5%) 35.33 KiB (1%) 651
["findfirst", "40%", "tx-seq"] 407.803 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 478.704 μs (5%)
["findfirst", "50%", "tx"] 351.402 μs (5%) 37.67 KiB (1%) 696
["findfirst", "50%", "tx-noterm"] 455.003 μs (5%) 51.64 KiB (1%) 955
["findfirst", "50%", "tx-seq"] 438.803 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 267.397 ms (5%) 28.425 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 205.010 ms (5%) 28.236 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 7.259 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 6.423 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 3.960 ms (5%) 25.92 KiB (1%) 359
["foreach", "tx", "A .= B .+ C"] 3.278 ms (5%) 12.73 KiB (1%) 123
["foreach_seq", "base", "Matrix"] 724.905 μs (5%)
["foreach_seq", "base", "Transpose"] 2.047 ms (5%)
["foreach_seq", "base", "Vector"] 726.605 μs (5%)
["foreach_seq", "tx", "Matrix"] 725.906 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.158 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 721.705 μs (5%)
["foreach_seq_double", "cartesian", "man"] 20.100 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 19.900 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 23.900 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 20.200 μs (5%)
["foreach_seq_double", "linear", "man"] 49.393 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 0.001 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 0.001 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 0.001 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 700.000 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 800.000 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 2.300 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 2.300 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.121 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.659 ms (5%) 1.19 MiB (1%) 535
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 556.505 μs (5%) 965.13 KiB (1%) 1227
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 595.204 μs (5%) 1.02 MiB (1%) 1247
["sort", "F64 (wide)", "Base"] 5.388 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.056 ms (5%) 1.19 MiB (1%) 564
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 3.406 ms (5%) 1.01 MiB (1%) 2147
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 3.922 ms (5%) 1.39 MiB (1%) 2197
["sort", "I64 (narrow)", "Base"] 117.101 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 103.800 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 103.401 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 104.601 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.254 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.293 ms (5%) 1.19 MiB (1%) 554
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 3.137 ms (5%) 1.01 MiB (1%) 2238
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 3.945 ms (5%) 1.40 MiB (1%) 2272
["sort", "reversed", "Base"] 641.705 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.123 ms (5%) 1.18 MiB (1%) 435
["sort", "reversed", "ThreadsX.QuickSort"] 880.507 μs (5%) 998.72 KiB (1%) 1869
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.272 ms (5%) 1.36 MiB (1%) 1903
["sort", "sorted", "Base"] 565.005 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 886.906 μs (5%) 1.18 MiB (1%) 432
["sort", "sorted", "ThreadsX.QuickSort"] 824.406 μs (5%) 998.75 KiB (1%) 1871
["sort", "sorted", "ThreadsX.StableQuickSort"] 1.074 ms (5%) 1.36 MiB (1%) 1904
["unique", "rand(1:10, 1000000)", "base"] 8.083 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 4.736 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 7.465 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 4.934 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      66797 s          0 s       3172 s      49089 s          0 s
       #2  2095 MHz      84898 s          0 s       3826 s      30137 s          0 s
       
  Memory: 6.764884948730469 GB (2471.28125 MB free)
  Uptime: 1211.0 sec
  Load Avg:  1.2265625  1.33251953125  1.04248046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Stepping:            4
CPU MHz:             2095.078
BogoMIPS:            4190.15
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmarks:
    • Target: 28 Jun 2020 - 08:40
    • Baseline: 28 Jun 2020 - 08:46
  • Package commits:
    • Target: d6fc9d
    • Baseline: 7362ea
  • Julia commits:
    • Target: 44fa15
    • Baseline: 44fa15
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2
    • Baseline: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["findfirst", "20%", "tx"] 0.95 (5%) ✅ 1.00 (1%)
["findfirst", "20%", "tx-noterm"] 0.93 (5%) ✅ 0.86 (1%) ✅
["findfirst", "30%", "tx-noterm"] 1.05 (5%) ❌ 1.00 (1%)
["findfirst", "40%", "tx-noterm"] 1.10 (5%) ❌ 1.00 (1%)
["findfirst", "50%", "tx-noterm"] 1.00 (5%) 1.17 (1%) ❌
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 1.07 (5%) ❌ 1.00 (1%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 1.08 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.29 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.29 (5%) ❌ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Target

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz      47726 s          0 s       2613 s      35555 s          0 s
       #2  2394 MHz      61035 s          0 s       2795 s      21971 s          0 s
       
  Memory: 6.764884948730469 GB (2177.44140625 MB free)
  Uptime: 879.0 sec
  Load Avg:  1.4306640625  1.369140625  0.93115234375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Baseline

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz      67358 s          0 s       3415 s      50795 s          0 s
       #2  2394 MHz      88436 s          0 s       3436 s      29596 s          0 s
       
  Memory: 6.764884948730469 GB (2408.34375 MB free)
  Uptime: 1238.0 sec
  Load Avg:  1.2958984375  1.37109375  1.07568359375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Target result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:40
  • Package commit: d6fc9d
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 3.700 ns (5%)
["findfirst", "0%", "tx"] 27.200 μs (5%) 11.97 KiB (1%) 219
["findfirst", "0%", "tx-noterm"] 23.000 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 275.342 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 78.200 μs (5%)
["findfirst", "10%", "tx"] 82.100 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 207.100 μs (5%) 32.89 KiB (1%) 603
["findfirst", "10%", "tx-seq"] 78.700 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 155.900 μs (5%)
["findfirst", "20%", "tx"] 144.700 μs (5%) 21.33 KiB (1%) 393
["findfirst", "20%", "tx-noterm"] 198.400 μs (5%) 28.30 KiB (1%) 521
["findfirst", "20%", "tx-seq"] 156.400 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 233.800 μs (5%)
["findfirst", "30%", "tx"] 198.200 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 202.300 μs (5%) 28.31 KiB (1%) 522
["findfirst", "30%", "tx-seq"] 234.400 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 311.900 μs (5%)
["findfirst", "40%", "tx"] 284.000 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 279.900 μs (5%) 35.33 KiB (1%) 651
["findfirst", "40%", "tx-seq"] 312.200 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 389.600 μs (5%)
["findfirst", "50%", "tx"] 308.700 μs (5%) 37.72 KiB (1%) 699
["findfirst", "50%", "tx-noterm"] 341.200 μs (5%) 46.89 KiB (1%) 864
["findfirst", "50%", "tx-seq"] 390.000 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 388.840 ms (5%) 40.886 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 246.144 ms (5%) 27.698 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 18.161 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 7.867 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 9.362 ms (5%) 25.94 KiB (1%) 360
["foreach", "tx", "A .= B .+ C"] 6.410 ms (5%) 12.75 KiB (1%) 124
["foreach_seq", "base", "Matrix"] 747.600 μs (5%)
["foreach_seq", "base", "Transpose"] 2.519 ms (5%)
["foreach_seq", "base", "Vector"] 747.500 μs (5%)
["foreach_seq", "tx", "Matrix"] 753.300 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.156 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 747.401 μs (5%)
["foreach_seq_double", "cartesian", "man"] 26.500 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 26.400 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 26.200 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 26.500 μs (5%)
["foreach_seq_double", "linear", "man"] 108.155 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 107.403 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 107.709 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 104.828 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 2.200 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 2.200 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 3.425 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 3.425 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.760 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.980 ms (5%) 1.19 MiB (1%) 535
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 1.743 ms (5%) 965.09 KiB (1%) 1225
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 1.765 ms (5%) 1.02 MiB (1%) 1245
["sort", "F64 (wide)", "Base"] 6.538 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.553 ms (5%) 1.19 MiB (1%) 563
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 5.301 ms (5%) 1.01 MiB (1%) 2145
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 6.208 ms (5%) 1.39 MiB (1%) 2194
["sort", "I64 (narrow)", "Base"] 162.200 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 168.000 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 167.500 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 168.100 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.558 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.762 ms (5%) 1.19 MiB (1%) 554
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 4.449 ms (5%) 1.01 MiB (1%) 2235
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 5.295 ms (5%) 1.40 MiB (1%) 2269
["sort", "reversed", "Base"] 869.000 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.309 ms (5%) 1.18 MiB (1%) 435
["sort", "reversed", "ThreadsX.QuickSort"] 1.227 ms (5%) 998.78 KiB (1%) 1873
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.724 ms (5%) 1.36 MiB (1%) 1904
["sort", "sorted", "Base"] 816.800 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 965.701 μs (5%) 1.18 MiB (1%) 431
["sort", "sorted", "ThreadsX.QuickSort"] 1.256 ms (5%) 998.77 KiB (1%) 1872
["sort", "sorted", "ThreadsX.StableQuickSort"] 1.404 ms (5%) 1.36 MiB (1%) 1904
["unique", "rand(1:10, 1000000)", "base"] 10.560 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 5.561 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 9.741 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 5.859 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz      47726 s          0 s       2613 s      35555 s          0 s
       #2  2394 MHz      61035 s          0 s       2795 s      21971 s          0 s
       
  Memory: 6.764884948730469 GB (2177.44140625 MB free)
  Uptime: 879.0 sec
  Load Avg:  1.4306640625  1.369140625  0.93115234375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Baseline result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:46
  • Package commit: 7362ea
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 3.700 ns (5%)
["findfirst", "0%", "tx"] 26.500 μs (5%) 11.95 KiB (1%) 218
["findfirst", "0%", "tx-noterm"] 23.300 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 269.315 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 78.101 μs (5%)
["findfirst", "10%", "tx"] 82.200 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 204.701 μs (5%) 32.88 KiB (1%) 601
["findfirst", "10%", "tx-seq"] 78.700 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 156.000 μs (5%)
["findfirst", "20%", "tx"] 153.000 μs (5%) 21.34 KiB (1%) 394
["findfirst", "20%", "tx-noterm"] 213.400 μs (5%) 32.95 KiB (1%) 606
["findfirst", "20%", "tx-seq"] 156.500 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 234.100 μs (5%)
["findfirst", "30%", "tx"] 201.600 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 191.800 μs (5%) 28.30 KiB (1%) 521
["findfirst", "30%", "tx-seq"] 234.400 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 311.800 μs (5%)
["findfirst", "40%", "tx"] 284.200 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 254.700 μs (5%) 35.30 KiB (1%) 649
["findfirst", "40%", "tx-seq"] 312.200 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 389.800 μs (5%)
["findfirst", "50%", "tx"] 320.700 μs (5%) 37.70 KiB (1%) 698
["findfirst", "50%", "tx-noterm"] 341.400 μs (5%) 40.06 KiB (1%) 744
["findfirst", "50%", "tx-seq"] 390.100 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 388.426 ms (5%) 37.784 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 250.874 ms (5%) 37.376 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 18.082 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 7.881 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 9.438 ms (5%) 25.94 KiB (1%) 360
["foreach", "tx", "A .= B .+ C"] 6.641 ms (5%) 12.75 KiB (1%) 124
["foreach_seq", "base", "Matrix"] 747.001 μs (5%)
["foreach_seq", "base", "Transpose"] 2.475 ms (5%)
["foreach_seq", "base", "Vector"] 747.500 μs (5%)
["foreach_seq", "tx", "Matrix"] 752.301 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.103 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 747.200 μs (5%)
["foreach_seq_double", "cartesian", "man"] 26.400 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 26.500 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 26.200 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 26.600 μs (5%)
["foreach_seq_double", "linear", "man"] 108.476 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 100.000 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 100.000 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 100.000 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.700 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.700 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 3.300 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 3.300 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.752 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.959 ms (5%) 1.19 MiB (1%) 535
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 1.767 ms (5%) 965.11 KiB (1%) 1226
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 1.778 ms (5%) 1.02 MiB (1%) 1245
["sort", "F64 (wide)", "Base"] 6.532 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.506 ms (5%) 1.19 MiB (1%) 564
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 5.282 ms (5%) 1.01 MiB (1%) 2146
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 6.046 ms (5%) 1.39 MiB (1%) 2194
["sort", "I64 (narrow)", "Base"] 164.100 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 166.300 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 164.200 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 164.301 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.543 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.670 ms (5%) 1.19 MiB (1%) 554
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 4.488 ms (5%) 1.01 MiB (1%) 2236
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 5.239 ms (5%) 1.40 MiB (1%) 2271
["sort", "reversed", "Base"] 870.402 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.343 ms (5%) 1.18 MiB (1%) 435
["sort", "reversed", "ThreadsX.QuickSort"] 1.266 ms (5%) 998.70 KiB (1%) 1868
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.740 ms (5%) 1.36 MiB (1%) 1904
["sort", "sorted", "Base"] 817.701 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 979.302 μs (5%) 1.18 MiB (1%) 431
["sort", "sorted", "ThreadsX.QuickSort"] 1.272 ms (5%) 998.75 KiB (1%) 1871
["sort", "sorted", "ThreadsX.StableQuickSort"] 1.427 ms (5%) 1.36 MiB (1%) 1903
["unique", "rand(1:10, 1000000)", "base"] 10.488 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 5.515 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 9.665 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 5.841 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2394 MHz      67358 s          0 s       3415 s      50795 s          0 s
       #2  2394 MHz      88436 s          0 s       3436 s      29596 s          0 s
       
  Memory: 6.764884948730469 GB (2408.34375 MB free)
  Uptime: 1238.0 sec
  Load Avg:  1.2958984375  1.37109375  1.07568359375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Stepping:            2
CPU MHz:             2394.455
BogoMIPS:            4788.91
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Vendor :Intel
Architecture :Haswell
Model Family: 0x06, Model: 0x3f, Stepping: 0x02, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 30720) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmarks:
    • Target: 28 Jun 2020 - 08:50
    • Baseline: 28 Jun 2020 - 08:56
  • Package commits:
    • Target: 515f02
    • Baseline: 7362ea
  • Julia commits:
    • Target: 44fa15
    • Baseline: 44fa15
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2
    • Baseline: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["findfirst", "0%", "base"] 1.14 (5%) ❌ 1.00 (1%)
["findfirst", "0%", "tx-noterm"] 1.11 (5%) ❌ 1.00 (1%)
["findfirst", "0%", "tx-seq"] 1.08 (5%) ❌ 1.00 (1%)
["findfirst", "10%", "tx-noterm"] 0.99 (5%) 1.07 (1%) ❌
["findfirst", "20%", "tx"] 1.06 (5%) ❌ 1.00 (1%)
["findfirst", "30%", "tx-noterm"] 0.88 (5%) ✅ 1.00 (1%)
["findfirst", "40%", "tx-noterm"] 0.90 (5%) ✅ 1.00 (1%)
["findfirst", "50%", "tx-noterm"] 1.03 (5%) 1.35 (1%) ❌
["foreach", "base", "A .= B .+ B'"] 0.92 (5%) ✅ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.29 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.29 (5%) ❌ 1.00 (1%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 0.93 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Target

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2397 MHz      50303 s          0 s       2679 s      35156 s          0 s
       #2  2397 MHz      60772 s          0 s       2980 s      23899 s          0 s
       
  Memory: 6.764884948730469 GB (2103.734375 MB free)
  Uptime: 897.0 sec
  Load Avg:  1.21337890625  1.24462890625  0.8720703125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Baseline

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2397 MHz      69254 s          0 s       3260 s      52741 s          0 s
       #2  2397 MHz      90252 s          0 s       3936 s      30531 s          0 s
       
  Memory: 6.764884948730469 GB (2363.20703125 MB free)
  Uptime: 1271.0 sec
  Load Avg:  1.255859375  1.29833984375  1.0263671875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Target result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:50
  • Package commit: 515f02
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 4.200 ns (5%)
["findfirst", "0%", "tx"] 27.702 μs (5%) 11.95 KiB (1%) 218
["findfirst", "0%", "tx-noterm"] 26.601 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 280.412 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 78.103 μs (5%)
["findfirst", "10%", "tx"] 81.203 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 214.308 μs (5%) 35.30 KiB (1%) 652
["findfirst", "10%", "tx-seq"] 78.503 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 156.006 μs (5%)
["findfirst", "20%", "tx"] 156.506 μs (5%) 21.33 KiB (1%) 393
["findfirst", "20%", "tx-noterm"] 213.808 μs (5%) 28.31 KiB (1%) 522
["findfirst", "20%", "tx-seq"] 156.405 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 233.610 μs (5%)
["findfirst", "30%", "tx"] 201.709 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 195.109 μs (5%) 28.30 KiB (1%) 521
["findfirst", "30%", "tx-seq"] 234.010 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 311.412 μs (5%)
["findfirst", "40%", "tx"] 286.811 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 248.310 μs (5%) 35.30 KiB (1%) 649
["findfirst", "40%", "tx-seq"] 311.812 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 389.217 μs (5%)
["findfirst", "50%", "tx"] 322.613 μs (5%) 37.70 KiB (1%) 698
["findfirst", "50%", "tx-noterm"] 353.914 μs (5%) 53.89 KiB (1%) 992
["findfirst", "50%", "tx-seq"] 389.616 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 406.510 ms (5%) 27.848 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 251.400 ms (5%) 27.699 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 19.073 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 8.076 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 10.113 ms (5%) 25.94 KiB (1%) 360
["foreach", "tx", "A .= B .+ C"] 6.423 ms (5%) 12.75 KiB (1%) 124
["foreach_seq", "base", "Matrix"] 746.520 μs (5%)
["foreach_seq", "base", "Transpose"] 2.448 ms (5%)
["foreach_seq", "base", "Vector"] 746.919 μs (5%)
["foreach_seq", "tx", "Matrix"] 751.520 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.090 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 746.919 μs (5%)
["foreach_seq_double", "cartesian", "man"] 26.501 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 26.601 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 26.501 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 26.601 μs (5%)
["foreach_seq_double", "linear", "man"] 107.627 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 104.312 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 104.164 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 104.533 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 2.189 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 2.189 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 3.425 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 3.425 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.751 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.994 ms (5%) 1.19 MiB (1%) 535
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 1.760 ms (5%) 965.11 KiB (1%) 1226
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 1.780 ms (5%) 1.02 MiB (1%) 1244
["sort", "F64 (wide)", "Base"] 6.527 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.554 ms (5%) 1.19 MiB (1%) 563
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 5.305 ms (5%) 1.01 MiB (1%) 2143
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 6.302 ms (5%) 1.39 MiB (1%) 2195
["sort", "I64 (narrow)", "Base"] 163.905 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 168.005 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 166.205 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 168.505 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.549 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.769 ms (5%) 1.19 MiB (1%) 554
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 4.471 ms (5%) 1.01 MiB (1%) 2237
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 5.320 ms (5%) 1.40 MiB (1%) 2269
["sort", "reversed", "Base"] 870.025 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.320 ms (5%) 1.18 MiB (1%) 434
["sort", "reversed", "ThreadsX.QuickSort"] 1.245 ms (5%) 998.77 KiB (1%) 1872
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.735 ms (5%) 1.36 MiB (1%) 1903
["sort", "sorted", "Base"] 816.123 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 968.927 μs (5%) 1.18 MiB (1%) 431
["sort", "sorted", "ThreadsX.QuickSort"] 1.256 ms (5%) 998.77 KiB (1%) 1872
["sort", "sorted", "ThreadsX.StableQuickSort"] 1.426 ms (5%) 1.36 MiB (1%) 1904
["unique", "rand(1:10, 1000000)", "base"] 10.539 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 5.516 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 9.725 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 5.873 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2397 MHz      50303 s          0 s       2679 s      35156 s          0 s
       #2  2397 MHz      60772 s          0 s       2980 s      23899 s          0 s
       
  Memory: 6.764884948730469 GB (2103.734375 MB free)
  Uptime: 897.0 sec
  Load Avg:  1.21337890625  1.24462890625  0.8720703125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Baseline result

Benchmark Report for /home/runner/work/ThreadsX.jl/ThreadsX.jl

Job Properties

  • Time of benchmark: 28 Jun 2020 - 8:56
  • Package commit: 7362ea
  • Julia commit: 44fa15
  • Julia command flags: None
  • Environment variables: OMP_NUM_THREADS => 1 JULIA_NUM_THREADS => 2

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["findfirst", "0%", "base"] 3.700 ns (5%)
["findfirst", "0%", "tx"] 28.202 μs (5%) 11.95 KiB (1%) 218
["findfirst", "0%", "tx-noterm"] 24.002 μs (5%) 11.97 KiB (1%) 218
["findfirst", "0%", "tx-seq"] 260.818 ns (5%) 544 bytes (1%) 14
["findfirst", "10%", "base"] 78.206 μs (5%)
["findfirst", "10%", "tx"] 83.106 μs (5%) 14.36 KiB (1%) 266
["findfirst", "10%", "tx-noterm"] 217.016 μs (5%) 32.92 KiB (1%) 604
["findfirst", "10%", "tx-seq"] 78.506 μs (5%) 560 bytes (1%) 15
["findfirst", "20%", "base"] 155.912 μs (5%)
["findfirst", "20%", "tx"] 147.711 μs (5%) 21.33 KiB (1%) 393
["findfirst", "20%", "tx-noterm"] 212.815 μs (5%) 28.30 KiB (1%) 521
["findfirst", "20%", "tx-seq"] 156.311 μs (5%) 560 bytes (1%) 15
["findfirst", "30%", "base"] 233.820 μs (5%)
["findfirst", "30%", "tx"] 198.617 μs (5%) 28.27 KiB (1%) 520
["findfirst", "30%", "tx-noterm"] 221.718 μs (5%) 28.31 KiB (1%) 522
["findfirst", "30%", "tx-seq"] 234.120 μs (5%) 560 bytes (1%) 15
["findfirst", "40%", "base"] 311.325 μs (5%)
["findfirst", "40%", "tx"] 285.122 μs (5%) 35.31 KiB (1%) 651
["findfirst", "40%", "tx-noterm"] 274.621 μs (5%) 35.31 KiB (1%) 650
["findfirst", "40%", "tx-seq"] 311.824 μs (5%) 560 bytes (1%) 15
["findfirst", "50%", "base"] 389.132 μs (5%)
["findfirst", "50%", "tx"] 315.926 μs (5%) 37.70 KiB (1%) 698
["findfirst", "50%", "tx-noterm"] 343.528 μs (5%) 40.06 KiB (1%) 744
["findfirst", "50%", "tx-seq"] 389.532 μs (5%) 560 bytes (1%) 15
["foreach", "base", "A .= B .+ B'"] 440.789 ms (5%) 38.303 ms 305.18 MiB (1%) 16000002
["foreach", "base", "A .= B .+ C"] 257.689 ms (5%) 39.561 ms 305.18 MiB (1%) 16000001
["foreach", "broadcast", "A .= B .+ B'"] 18.617 ms (5%)
["foreach", "broadcast", "A .= B .+ C"] 7.956 ms (5%)
["foreach", "tx", "A .= B .+ B'"] 9.789 ms (5%) 25.94 KiB (1%) 360
["foreach", "tx", "A .= B .+ C"] 6.534 ms (5%) 12.75 KiB (1%) 124
["foreach_seq", "base", "Matrix"] 746.338 μs (5%)
["foreach_seq", "base", "Transpose"] 2.452 ms (5%)
["foreach_seq", "base", "Vector"] 746.937 μs (5%)
["foreach_seq", "tx", "Matrix"] 751.537 μs (5%)
["foreach_seq", "tx", "Transpose"] 1.092 ms (5%) 16 bytes (1%) 1
["foreach_seq", "tx", "Vector"] 747.036 μs (5%)
["foreach_seq_double", "cartesian", "man"] 26.601 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => :ivdep"] 26.401 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => false"] 26.302 μs (5%)
["foreach_seq_double", "cartesian", "tx", ":simd => true"] 26.601 μs (5%)
["foreach_seq_double", "linear", "man"] 108.292 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => :ivdep"] 100.000 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => false"] 100.000 ns (5%)
["foreach_seq_double", "linear", "tx", ":simd => true"] 100.000 ns (5%)
["foreach_seq_sum_many", ":nvecs => 8", "man"] 1.700 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => :ivdep"] 1.700 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => false"] 3.300 μs (5%)
["foreach_seq_sum_many", ":nvecs => 8", "tx", ":simd => true"] 3.700 μs (5%)
["sort", "F64 (narrow)", "Base"] 2.745 ms (5%)
["sort", "F64 (narrow)", "ThreadsX.MergeSort"] 2.967 ms (5%) 1.19 MiB (1%) 534
["sort", "F64 (narrow)", "ThreadsX.QuickSort"] 1.764 ms (5%) 965.11 KiB (1%) 1226
["sort", "F64 (narrow)", "ThreadsX.StableQuickSort"] 1.799 ms (5%) 1.02 MiB (1%) 1246
["sort", "F64 (wide)", "Base"] 6.523 ms (5%)
["sort", "F64 (wide)", "ThreadsX.MergeSort"] 5.489 ms (5%) 1.19 MiB (1%) 560
["sort", "F64 (wide)", "ThreadsX.QuickSort"] 5.343 ms (5%) 1.01 MiB (1%) 2143
["sort", "F64 (wide)", "ThreadsX.StableQuickSort"] 6.220 ms (5%) 1.39 MiB (1%) 2193
["sort", "I64 (narrow)", "Base"] 162.110 μs (5%) 160 bytes (1%) 1
["sort", "I64 (narrow)", "ThreadsX.MergeSort"] 165.310 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.QuickSort"] 163.410 μs (5%) 864 bytes (1%) 17
["sort", "I64 (narrow)", "ThreadsX.StableQuickSort"] 164.010 μs (5%) 864 bytes (1%) 17
["sort", "I64 (wide)", "Base"] 6.546 ms (5%)
["sort", "I64 (wide)", "ThreadsX.MergeSort"] 4.685 ms (5%) 1.19 MiB (1%) 555
["sort", "I64 (wide)", "ThreadsX.QuickSort"] 4.524 ms (5%) 1.01 MiB (1%) 2235
["sort", "I64 (wide)", "ThreadsX.StableQuickSort"] 5.343 ms (5%) 1.40 MiB (1%) 2270
["sort", "reversed", "Base"] 869.249 μs (5%)
["sort", "reversed", "ThreadsX.MergeSort"] 1.338 ms (5%) 1.18 MiB (1%) 435
["sort", "reversed", "ThreadsX.QuickSort"] 1.288 ms (5%) 998.75 KiB (1%) 1871
["sort", "reversed", "ThreadsX.StableQuickSort"] 1.745 ms (5%) 1.36 MiB (1%) 1905
["sort", "sorted", "Base"] 817.245 μs (5%)
["sort", "sorted", "ThreadsX.MergeSort"] 988.152 μs (5%) 1.18 MiB (1%) 430
["sort", "sorted", "ThreadsX.QuickSort"] 1.289 ms (5%) 998.75 KiB (1%) 1871
["sort", "sorted", "ThreadsX.StableQuickSort"] 1.450 ms (5%) 1.36 MiB (1%) 1904
["unique", "rand(1:10, 1000000)", "base"] 10.600 ms (5%) 832 bytes (1%) 8
["unique", "rand(1:10, 1000000)", "tx"] 5.523 ms (5%) 50.98 KiB (1%) 882
["unique", "rand(1:1000, 1000000)", "base"] 9.835 ms (5%) 65.95 KiB (1%) 27
["unique", "rand(1:1000, 1000000)", "tx"] 5.892 ms (5%) 1.07 MiB (1%) 1186

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["findfirst", "0%"]
  • ["findfirst", "10%"]
  • ["findfirst", "20%"]
  • ["findfirst", "30%"]
  • ["findfirst", "40%"]
  • ["findfirst", "50%"]
  • ["foreach", "base"]
  • ["foreach", "broadcast"]
  • ["foreach", "tx"]
  • ["foreach_seq", "base"]
  • ["foreach_seq", "tx"]
  • ["foreach_seq_double", "cartesian"]
  • ["foreach_seq_double", "cartesian", "tx"]
  • ["foreach_seq_double", "linear"]
  • ["foreach_seq_double", "linear", "tx"]
  • ["foreach_seq_sum_many", ":nvecs => 8"]
  • ["foreach_seq_sum_many", ":nvecs => 8", "tx"]
  • ["sort", "F64 (narrow)"]
  • ["sort", "F64 (wide)"]
  • ["sort", "I64 (narrow)"]
  • ["sort", "I64 (wide)"]
  • ["sort", "reversed"]
  • ["sort", "sorted"]
  • ["unique", "rand(1:10, 1000000)"]
  • ["unique", "rand(1:1000, 1000000)"]

Julia versioninfo

Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.3.0-1028-azure #29~18.04.1-Ubuntu SMP Fri Jun 5 14:32:34 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz: 
              speed         user         nice          sys         idle          irq
       #1  2397 MHz      69254 s          0 s       3260 s      52741 s          0 s
       #2  2397 MHz      90252 s          0 s       3936 s      30531 s          0 s
       
  Memory: 6.764884948730469 GB (2363.20703125 MB free)
  Uptime: 1271.0 sec
  Load Avg:  1.255859375  1.29833984375  1.0263671875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Stepping:            2
CPU MHz:             2397.223
BogoMIPS:            4794.44
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Vendor :Intel
Architecture :Haswell
Model Family: 0x06, Model: 0x3f, Stepping: 0x02, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 30720) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@mergify mergify bot merged commit 6727601 into master Jun 28, 2020
@mergify mergify bot deleted the fix branch June 28, 2020 08:57
tkf added a commit that referenced this pull request Jun 28, 2020
To really fix `foreach(f, other arguments..., product(A, B))` (#113).
tkf added a commit that referenced this pull request Jun 28, 2020
To really fix `foreach(f, other arguments..., product(A, B))` (#113).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant