You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'.
library(data.table) ## 1.13.5
setDTthreads(0L) ## 40
set.seed(108)
N=1e9LK=1e2LDT=list()
DT[["id3"]] =factor(sample(sprintf("id%010d",1:(N/K)), N, TRUE))
DT[["v3"]] = round(runif(N,max=100),6)
setDT(DT)
system.time(naf<-DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 #Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns#5.615s elapsed (00:01:39 cpu) #Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.074s cpu) #Getting back original order ... forder.c received a vector type 'integer' length 10000000#1.037s elapsed (2.888s cpu) #lapply optimization is on, j unchanged as 'list(mean(v3))'#GForce optimized j to 'list(gmean(v3))'#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.319#gforce assign high and low took 4.399#This gsum took (narm=FALSE) ... gather took ... 2.107s#2.322s#gforce eval took 2.339#8.738s elapsed (00:02:39 cpu) ## user system elapsed #261.852 67.723 15.498
system.time(nat<-DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 #Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns#5.799s elapsed (00:01:42 cpu) #Finding group sizes from the positions (can be avoided to save RAM) ... 0.090s elapsed (0.074s cpu) #Getting back original order ... forder.c received a vector type 'integer' length 10000000#2.608s elapsed (3.275s cpu) #lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'#GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.346#gforce assign high and low took 4.978#gforce eval took 33.515#40.2s elapsed (00:02:24 cpu) ## user system elapsed #250.858 68.804 48.679
Timings on #4851
na.rm=TRUE 48.6s down to 14.3
na.rm=FALSE 15.5 down to 14.7
> system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
5.198s elapsed (00:01:35 cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.075s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 10000000
0.479s elapsed (2.959s cpu)
lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.321
gforce assign high and low took 4.868
This gmean took (narm=TRUE) ... gather took ... 2.068s
2.298s
gforce eval took 2.300
8.537s elapsed (00:02:43 cpu)
user system elapsed
262.668 63.634 14.322
## drop caches in another session
> system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
6.565s elapsed (00:01:35 cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.093s e
lapsed (0.085s cpu)
Getting back original order ... forder.c received a vector type 'integer' length
10000000
0.601s elapsed (5.789s cpu)
lapply optimization is on, j unchanged as 'list(mean(v3))'
GForce optimized j to 'list(gmean(v3))'
Making each group and running j (GForce TRUE) ... gforce initial population of g
rp took 0.314
gforce assign high and low took 4.880
This gmean took (narm=FALSE) ... gather took ... 1.717s
1.931s
gforce eval took 1.931
7.467s elapsed (00:02:39 cpu)
user system elapsed
261.039 61.257 14.738
Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was
sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
.This is actually mentioned in #3202.
The text was updated successfully, but these errors were encountered: