Roll had stopped respecting fractions in double, closes #1904. Also c…

…loses #1867, ASAN error in bmerge.c
Rdatatable · Nov 23, 2016 · fe2d0c8 · fe2d0c8
1 parent 5de3372
commit fe2d0c8
Show file tree

Hide file tree

Showing 4 changed files with 49 additions and 35 deletions.
diff --git a/CRAN_Release.cmd b/CRAN_Release.cmd
@@ -117,7 +117,7 @@ Rdevel CMD INSTALL ~/data.table_1.9.7.tar.gz
 Rdevel --vanilla
 require(data.table)
 require(bit64)
-test.data.table()           # slower than usual of course due to UBSAN and ASAN
+test.data.table()     # slower than usual of course due to UBSAN and ASAN. Too slow to run R CMD check.
 # Throws /0 errors on R's summary.c (tests 648 and 1185.2) but ignore those: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16000
 test.data.table(verbose=TRUE)
 q("no")

diff --git a/NEWS.md b/NEWS.md
@@ -343,6 +343,8 @@
   40. Following latest recommended testthat practices and to avoid a warning that it now issues, `inst/tests/testthat` has been moved to `/tests/testthat`. This means that testthat tests won't be installed for use by users by default and that `test_package("data.table")` will now fail with error `No matching test file in dir` and also a warning `Placing tests in inst/tests/ is deprecated. Please use tests/testthat/ instead`. (That warning seems to be misleading since we already have made that move.) To install testthat tests (and this applies to all packages using testthat not just data.table) you need to follow the [deleted instructions](https://github.com/hadley/testthat/commit/0a7d27bb9ea545be7da1a10e511962928d888302) in testthat's README; i.e., reinstall data.table either with `--install-tests` passed to `R CMD INSTALL` or `INSTALL_opts = "--install-tests"` passed to `install.packages()`. After that, `test_package("data.table")` will work. However, the main test suite of data.table (5,000+ tests) doesn't use testthat at all. Those tests are always installed so that `test.data.table()` can always be run by users at any time to confirm your installation on your platform is working correctly. Sometimes when supporting you, you may be asked to run `test.data.table()` and provide the output. Particularly now that data.table uses OpenMP. The file `/tests/tests.R` (which just calls `test.data.table()`) has been renamed to `/tests/main.R` to make this clearer to those looking at the GitHub repository and a comment has been added to `/tests/main.R` pointing to `/inst/tests/tests.Rraw` where those tests live. Some of these tests test data.table's compability with other packages and that is the reason those packages are listed in `DESCRIPTION:Suggests`. If you don't have some of those packages installed, `test.data.table()` will print output that it has skipped tests of compatibility with those packages. On CRAN all Suggests packages are available and data.table's tests of compatibility with them are tested by CRAN every day.
 
   41. The license field is changed from "GPL (>= 2)" to "GPL-3 | file LICENSE" due to independent communication from two users of data.table at Google. The lack of an explicit license file was preventing them from contributing patches to data.table. Further, Google lawyers require the full text of the license and not a URL to the license. Since this requirement appears to require the choice of one license, we opted for GPL-3 and we checked the GPL-3 is fine by Google for them to use and contribute to. Accordingly, data.table's LICENSE file is an exact duplicate copy of the canonical GPL-3.
+
+  42. Thanks to @rrichmond for finding and reporting a regression in dev before release with `roll` not respecting fractions in type double, [#1904](https://github.com/Rdatatable/data.table/issues/1904). For example dates like `zoo::as.yearmon("2016-11")` which is stored as `double` value 2016.833. Fixed and test added.
 
 
 ### Changes in v1.9.6  (on CRAN 19 Sep 2015)

diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
@@ -9721,6 +9721,15 @@ test(1743.3, fread("a,b\n1,a", colClasses=c(NA, TRUE)), error="when colClasses i
 # also unknown issue in mixed character/factor output and colClasses vector
 test(1743.4, sapply(fread("a,b\n1,a", colClasses=c("character", "factor")), class), c(a="character", b="factor"))
 
+# rolling join stopped working for double with fractions, #1904
+DT = data.table(A=c(1999.917,2000.417,2000.917,2001.417,2001.917))
+setkey(DT,A)
+x = c(2000.167,2000.417,2000.667,2000.917,2001.167)
+test(1744.1, DT[.(x),roll=FALSE,which=TRUE], INT(NA,2,NA,3,NA))
+test(1744.2, DT[.(x),roll=TRUE, which=TRUE], INT(1,2,2,3,3))
+test(1744.3, DT[.(x),roll=1/12, which=TRUE], INT(NA,2,NA,3,NA))
+
+
 ##########################
 
 # TODO: Tests involving GForce functions needs to be run with optimisation level 1 and 2, so that both functions are tested all the time.

diff --git a/src/bmerge.c b/src/bmerge.c
@@ -26,8 +26,6 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) :
 #define GE 4
 #define GT 5
 
-typedef unsigned long long ull;
-
 static SEXP i, x, nqgrp;
 static int ncol, *icols, *xcols, *o, *xo, *retFirst, *retLength, *retIndex, *allLen1, *allGrp1, *rollends, ilen, anslen;
 static int *op, nqmaxgrp, *tmpptr, scols;
@@ -200,9 +198,10 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE
 static union {
   int i;
   double d;
-  unsigned long long ll;
+  unsigned long long ull;
+  long long ll;
   SEXP s;
-} ival, xval; // last two are for #1405 for use in roll condition check
+} ival, xval;
 
 static int mid, tmplow, tmpupp;  // global to save them being added to recursive stack. Maybe optimizer would do this anyway.
 static SEXP ic, xc;
@@ -224,8 +223,7 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
 // new: col starts with -1 for non-equi joins, which gathers rows from nested id group counter 'thisgrp'
 {
     int xlow=xlowIn, xupp=xuppIn, ilow=ilowIn, iupp=iuppIn, j, k, ir, lir, tmp;
-    SEXP class;
-    Rboolean isInt64;
+    Rboolean isInt64=FALSE;
     ir = lir = ilow + (iupp-ilow)/2;           // lir = logical i row.
     if (o) ir = o[lir]-1;                      // ir = the actual i row if i were ordered
     if (col>-1) {
@@ -339,29 +337,28 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
         }
         break;
     case REALSXP :
-        class = getAttrib(xc, R_ClassSymbol);
-        isInt64 = isString(class) && STRING_ELT(class, 0)==char_integer64;
+        isInt64 = INHERITS(xc, char_integer64);
         twiddle = isInt64 ? &i64twiddle : &dtwiddle;
-        ival.ll = twiddle(DATAPTR(ic), ir, 1);
+        ival.ull = twiddle(DATAPTR(ic), ir, 1);
         while(xlow < xupp-1) {
             mid = xlow + (xupp-xlow)/2;
-            xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-            if (xval.ll<ival.ll) {
+            xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+            if (xval.ull<ival.ull) {
                 xlow=mid;
-            } else if (xval.ll>ival.ll) {
+            } else if (xval.ull>ival.ull) {
                 xupp=mid;
-            } else { // xval.ll == ival.ll) 
+            } else { // xval.ull == ival.ull) 
                 tmplow = mid;
                 tmpupp = mid;
                 while(tmplow<xupp-1) {
                     mid = tmplow + (xupp-tmplow)/2;
-                    xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-                    if (xval.ll == ival.ll) tmplow=mid; else xupp=mid;
+                    xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+                    if (xval.ull == ival.ull) tmplow=mid; else xupp=mid;
                 }
                 while(xlow<tmpupp-1) {
                     mid = xlow + (tmpupp-xlow)/2;
-                    xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-                    if (xval.ll == ival.ll) tmpupp=mid; else xlow=mid;
+                    xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+                    if (xval.ull == ival.ull) tmpupp=mid; else xlow=mid;
                 }
                 break;
             }
@@ -390,13 +387,13 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
         if (col>-1) {
             while(tmplow<iupp-1) {
                 mid = tmplow + (iupp-tmplow)/2;
-                xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
-                if (xval.ll == ival.ll) tmplow=mid; else iupp=mid;
+                xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
+                if (xval.ull == ival.ull) tmplow=mid; else iupp=mid;
             }
             while(ilow<tmpupp-1) {
                 mid = ilow + (tmpupp-ilow)/2;
-                xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
-                if (xval.ll == ival.ll) tmpupp=mid; else ilow=mid;
+                xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
+                if (xval.ull == ival.ull) tmpupp=mid; else ilow=mid;
             }
         }
         // ilow and iupp now surround the group in ic, too
@@ -485,27 +482,33 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
                 retLength[ir] = 1;
             }
         } else {
-            // To fix #1405 we've to compute the difference in unsigned long long and typecase back to 
-            // double; to handle integer64 cases as well. Previous implementation segfaulted since it 
-            // accessed values beyond xupp/xlow, #1650. Fix for that involves integrating the logic 
-            // directly inside. "ull" is typedef for unsigned long long
-            // TODO: incorporate the twiddle logic for roll as well instead of tolerance?
+            // Regular roll=TRUE|+ve|-ve
+            // Fixed issues: #1405, #1650, #1007
+            // TODO: incorporate the twiddle logic for roll as well instead of tolerance?  
             if ( (   (roll>0.0 && (!lowmax || xlow>xlowIn) && (xupp<xuppIn || !uppmax || rollends[1]))
                   || (roll<0.0 && xupp==xuppIn && uppmax && rollends[1]) )
-              && (   (TYPEOF(ic)==REALSXP && 
-                   ((double)((ull)REAL(ic)[ir]-(ull)REAL(xc)[XIND(xlow)])-rollabs<1e-6 || 
-                    (double)((ull)REAL(ic)[ir]-(ull)REAL(xc)[XIND(xlow)]) == rollabs)) // #1007 fix
-                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs<1e-6 ) 
+              && (   (TYPEOF(ic)==REALSXP &&
+                      (ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xlow)], 1) &&
+                     (( !isInt64 &&
+                        (ival.d-xval.d-rollabs < 1e-6 || 
+                         ival.d-xval.d == rollabs /*#1007*/))
+                   || ( isInt64 &&
+                        (double)(ival.ll-xval.ll)-rollabs < 1e-6 ) ))  // cast to double for when rollabs==Inf
+                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs < 1e-6 )
                   || (TYPEOF(ic)==STRSXP)   )) {
                 retFirst[ir] = xlow+1;
                 retLength[ir] = 1;
             } else if
                (  (  (roll<0.0 && (!uppmax || xupp<xuppIn) && (xlow>xlowIn || !lowmax || rollends[0]))
                   || (roll>0.0 && xlow==xlowIn && lowmax && rollends[0]) )
-              && (   (TYPEOF(ic)==REALSXP && 
-                    ((double)((ull)REAL(xc)[XIND(xupp)]-(ull)REAL(ic)[ir])-rollabs<1e-6 || 
-                     (double)((ull)REAL(xc)[XIND(xupp)]-(ull)REAL(ic)[ir]) == rollabs)) // 1007 fix
-                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs<1e-6 )
+              && (   (TYPEOF(ic)==REALSXP &&
+                      (ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xupp)], 1) &&
+                     (( !isInt64 &&
+                        (xval.d-ival.d-rollabs < 1e-6 || 
+                         xval.d-ival.d == rollabs /*#1007*/))
+                   || ( isInt64 &&
+                        (double)(xval.ll-ival.ll)-rollabs < 1e-6 ) ))
+                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs < 1e-6 )
                   || (TYPEOF(ic)==STRSXP)   )) {
                 retFirst[ir] = xupp+1;   // == xlow+2
                 retLength[ir] = 1;