Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make unzip faster: seq[i]=val can be 7X faster than seq.add(val) #13448

Merged
merged 2 commits into from
Feb 21, 2020

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Feb 20, 2020

this PR makes unzip (#13429) up to 7X faster by simply using []= instead of add

general performance note: avoid newSeqOfCap anti-pattern

when the final size is known in advance, newSeqOfCap(n) + seq.add(elem) should not be used; it can be 7X slower than this pattern:
newSeq(n) + seq[i]=elem.

and it makes sense, because add has to do more work (both with and without -d:danger): to increment length, and also to compare new length against capacity. So while reallocation is avoided in both newSeq(n) and newSeqOfCap(n) cases thanks to preallocation, the 1st pattern is a lot slower because of the extra work.

benchmark code

import times

proc unzip1[S, T](s: openArray[(S, T)]): (seq[S], seq[T]) =
  result[0] = newSeqOfCap[S](s.len)
  result[1] = newSeqOfCap[T](s.len)
  for elem in s:
    result[0].add(elem[0])
    result[1].add(elem[1])

proc unzip2[S, T](s: openArray[(S, T)]): (seq[S], seq[T]) =
  result[0] = newSeq[S](s.len)
  result[1] = newSeq[T](s.len)
  for i in 0..<s.len:
    result[0][i]=s[i][0]
    result[1][i]=s[i][1]

var ret = 0
proc main()=
  let n = 100_000
  let numIter = 1000
  var s=newSeq[(int,int)](n)
  for i in 0..<n: s[i] = (i,i)
  var t1, t2: float
  template perf(fun): float =
    let t = epochTime()
    for j in 0..<numIter:
      let s2 = fun(s)
      ret += s2[1][^1]
      ret += s2[0][5]
    var dt = epochTime() - t
    echo (astToStr(fun), dt, ret)
    dt

  for i in 0..<5:
    t1 += perf(unzip1)
    t2 += perf(unzip2)
  echo (t1, t2)
  doAssert unzip1(s) == unzip2(s)
main()

benchmark times

nim c -r main.nim

("unzip1", 6.597026109695435, 100004000)
("unzip2", 1.029499053955078, 200008000)
("unzip1", 6.262022018432617, 300012000)
("unzip2", 0.9490630626678467, 400016000)
("unzip1", 6.228638172149658, 500020000)
("unzip2", 0.9355859756469727, 600024000)
("unzip1", 6.221283912658691, 700028000)
("unzip2", 0.9702279567718506, 800032000)
("unzip1", 6.275908946990967, 900036000)
("unzip2", 0.9814701080322266, 1000040000)
(31.58487915992737, 4.865846157073975)

nim c -r -d:danger main.nim

("unzip1", 1.104756832122803, 100004000)
("unzip2", 0.1451900005340576, 200008000)
("unzip1", 0.9864020347595215, 300012000)
("unzip2", 0.1396539211273193, 400016000)
("unzip1", 0.9924249649047852, 500020000)
("unzip2", 0.1378440856933594, 600024000)
("unzip1", 1.028185844421387, 700028000)
("unzip2", 0.1567389965057373, 800032000)
("unzip1", 1.103832960128784, 900036000)
("unzip2", 0.1695449352264404, 1000040000)
(5.21560263633728, 0.7489719390869141)

note

i haven't tried with {.noinit.} ; this could have potential concerns depending on the type (eg if it's a reference type and something throws in the middle); this is very related to this: dlang/phobos#6178 (comment) from an old D PR:

This function has a pernicious bug: if construction throws an exception, ret will be destroyed and destructors will be called for uninitialized objects. Recall that statically-sized arrays on the stack insert one destructor call for each member.

@Vindaar
Copy link
Contributor

Vindaar commented Feb 20, 2020

Wow, TIL.

edit:

EDIT: I had an embarassing bug in prior implementation, the speedup is lower than I previously reported)

phew, ok... Now I'm a little bit relieved. :P

@timotheecour timotheecour force-pushed the pr_fix_unzip_performance branch from e9f426b to 9864156 Compare February 20, 2020 20:46
@timotheecour timotheecour changed the title make unzip faster: newSeq(n)/seq[i]=elem is 5-15X faster than newSeqOfCap(n)/seq.add(elem) in general make unzip a bit faster: make unzip a bit faster: seq[i] a bit faster than seq.add Feb 20, 2020
@timotheecour timotheecour changed the title make unzip a bit faster: make unzip a bit faster: seq[i] a bit faster than seq.add make unzip 1.4X faster: seq[i]=val is faster than seq.add(elem) Feb 20, 2020
@timotheecour timotheecour changed the title make unzip 1.4X faster: seq[i]=val is faster than seq.add(elem) make unzip faster: seq[i]=val can be 7X faster than seq.add(elem) Feb 20, 2020
@timotheecour timotheecour force-pushed the pr_fix_unzip_performance branch from 9864156 to ef460da Compare February 20, 2020 21:21
@timotheecour
Copy link
Member Author

phew, ok... Now I'm a little bit relieved. :P

not so quick: see benchmark above, showing 7X speedup

@kaushalmodi
Copy link
Contributor

TIL.. may the fastest implementation win!

@timotheecour
Copy link
Member Author

failure unrelated (see #13449) but will re-run CI

@Vindaar
Copy link
Contributor

Vindaar commented Feb 20, 2020

TIL.. may the fastest implementation win!

I mean yeah, definitely. I'm just a little shocked, that's all.

@timotheecour
Copy link
Member Author

same in C++ std::vector although less drastic (but it's not apples to apples comparison, the tests are a bit different):

clang++ -O3 -o /tmp/z01 $timn_D/tests/nim/all/t10234.cpp
time /tmp/z01

/tmp/z01  8.39s user 1.21s system 98% cpu 9.732 total
/tmp/z01  3.40s user 1.21s system 96% cpu 4.794 total
#include <iostream>
#include <stdio.h>
#include <vector>

static const int n = 1000000;
int total = 0;
void fun1(){
  std::vector<int> x;
  x.reserve(n);
  for(int i=0;i<n;i++){
    x.push_back(i);
  }
  total+=x[n/2];
}

void fun2(){
  std::vector<int> x(n);
  for(int i=0;i<n;i++){
    x[i]=i;
  }
  total+=x[n/2];
}

int main (int argc, char *argv[]) {
  int numIter = 10000;
  for(int i=0;i<numIter;i++){
    fun1();
    // fun2();
  }
  std::cout << total << std::endl;
  return 0;
}

@Araq Araq merged commit e05aca8 into nim-lang:devel Feb 21, 2020
@timotheecour timotheecour changed the title make unzip faster: seq[i]=val can be 7X faster than seq.add(elem) make unzip faster: seq[i]=val can be 7X faster than seq.add(val) Feb 21, 2020
@timotheecour timotheecour deleted the pr_fix_unzip_performance branch February 21, 2020 19:42
This was referenced Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants