Skip to content

Commit

Permalink
Roll had stopped respecting fractions in double, closes #1904. Also c…
Browse files Browse the repository at this point in the history
…loses #1867, ASAN error in bmerge.c
  • Loading branch information
mattdowle committed Nov 23, 2016
1 parent 5de3372 commit fe2d0c8
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 35 deletions.
2 changes: 1 addition & 1 deletion CRAN_Release.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ Rdevel CMD INSTALL ~/data.table_1.9.7.tar.gz
Rdevel --vanilla
require(data.table)
require(bit64)
test.data.table() # slower than usual of course due to UBSAN and ASAN
test.data.table() # slower than usual of course due to UBSAN and ASAN. Too slow to run R CMD check.
# Throws /0 errors on R's summary.c (tests 648 and 1185.2) but ignore those: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16000
test.data.table(verbose=TRUE)
q("no")
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,8 @@
40. Following latest recommended testthat practices and to avoid a warning that it now issues, `inst/tests/testthat` has been moved to `/tests/testthat`. This means that testthat tests won't be installed for use by users by default and that `test_package("data.table")` will now fail with error `No matching test file in dir` and also a warning `Placing tests in inst/tests/ is deprecated. Please use tests/testthat/ instead`. (That warning seems to be misleading since we already have made that move.) To install testthat tests (and this applies to all packages using testthat not just data.table) you need to follow the [deleted instructions](https://github.com/hadley/testthat/commit/0a7d27bb9ea545be7da1a10e511962928d888302) in testthat's README; i.e., reinstall data.table either with `--install-tests` passed to `R CMD INSTALL` or `INSTALL_opts = "--install-tests"` passed to `install.packages()`. After that, `test_package("data.table")` will work. However, the main test suite of data.table (5,000+ tests) doesn't use testthat at all. Those tests are always installed so that `test.data.table()` can always be run by users at any time to confirm your installation on your platform is working correctly. Sometimes when supporting you, you may be asked to run `test.data.table()` and provide the output. Particularly now that data.table uses OpenMP. The file `/tests/tests.R` (which just calls `test.data.table()`) has been renamed to `/tests/main.R` to make this clearer to those looking at the GitHub repository and a comment has been added to `/tests/main.R` pointing to `/inst/tests/tests.Rraw` where those tests live. Some of these tests test data.table's compability with other packages and that is the reason those packages are listed in `DESCRIPTION:Suggests`. If you don't have some of those packages installed, `test.data.table()` will print output that it has skipped tests of compatibility with those packages. On CRAN all Suggests packages are available and data.table's tests of compatibility with them are tested by CRAN every day.

41. The license field is changed from "GPL (>= 2)" to "GPL-3 | file LICENSE" due to independent communication from two users of data.table at Google. The lack of an explicit license file was preventing them from contributing patches to data.table. Further, Google lawyers require the full text of the license and not a URL to the license. Since this requirement appears to require the choice of one license, we opted for GPL-3 and we checked the GPL-3 is fine by Google for them to use and contribute to. Accordingly, data.table's LICENSE file is an exact duplicate copy of the canonical GPL-3.

42. Thanks to @rrichmond for finding and reporting a regression in dev before release with `roll` not respecting fractions in type double, [#1904](https://github.com/Rdatatable/data.table/issues/1904). For example dates like `zoo::as.yearmon("2016-11")` which is stored as `double` value 2016.833. Fixed and test added.


### Changes in v1.9.6 (on CRAN 19 Sep 2015)
Expand Down
9 changes: 9 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -9721,6 +9721,15 @@ test(1743.3, fread("a,b\n1,a", colClasses=c(NA, TRUE)), error="when colClasses i
# also unknown issue in mixed character/factor output and colClasses vector
test(1743.4, sapply(fread("a,b\n1,a", colClasses=c("character", "factor")), class), c(a="character", b="factor"))

# rolling join stopped working for double with fractions, #1904
DT = data.table(A=c(1999.917,2000.417,2000.917,2001.417,2001.917))
setkey(DT,A)
x = c(2000.167,2000.417,2000.667,2000.917,2001.167)
test(1744.1, DT[.(x),roll=FALSE,which=TRUE], INT(NA,2,NA,3,NA))
test(1744.2, DT[.(x),roll=TRUE, which=TRUE], INT(1,2,2,3,3))
test(1744.3, DT[.(x),roll=1/12, which=TRUE], INT(NA,2,NA,3,NA))


##########################

# TODO: Tests involving GForce functions needs to be run with optimisation level 1 and 2, so that both functions are tested all the time.
Expand Down
71 changes: 37 additions & 34 deletions src/bmerge.c
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) :
#define GE 4
#define GT 5

typedef unsigned long long ull;

static SEXP i, x, nqgrp;
static int ncol, *icols, *xcols, *o, *xo, *retFirst, *retLength, *retIndex, *allLen1, *allGrp1, *rollends, ilen, anslen;
static int *op, nqmaxgrp, *tmpptr, scols;
Expand Down Expand Up @@ -200,9 +198,10 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE
static union {
int i;
double d;
unsigned long long ll;
unsigned long long ull;
long long ll;
SEXP s;
} ival, xval; // last two are for #1405 for use in roll condition check
} ival, xval;

static int mid, tmplow, tmpupp; // global to save them being added to recursive stack. Maybe optimizer would do this anyway.
static SEXP ic, xc;
Expand All @@ -224,8 +223,7 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
// new: col starts with -1 for non-equi joins, which gathers rows from nested id group counter 'thisgrp'
{
int xlow=xlowIn, xupp=xuppIn, ilow=ilowIn, iupp=iuppIn, j, k, ir, lir, tmp;
SEXP class;
Rboolean isInt64;
Rboolean isInt64=FALSE;
ir = lir = ilow + (iupp-ilow)/2; // lir = logical i row.
if (o) ir = o[lir]-1; // ir = the actual i row if i were ordered
if (col>-1) {
Expand Down Expand Up @@ -339,29 +337,28 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
}
break;
case REALSXP :
class = getAttrib(xc, R_ClassSymbol);
isInt64 = isString(class) && STRING_ELT(class, 0)==char_integer64;
isInt64 = INHERITS(xc, char_integer64);
twiddle = isInt64 ? &i64twiddle : &dtwiddle;
ival.ll = twiddle(DATAPTR(ic), ir, 1);
ival.ull = twiddle(DATAPTR(ic), ir, 1);
while(xlow < xupp-1) {
mid = xlow + (xupp-xlow)/2;
xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ll<ival.ll) {
xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ull<ival.ull) {
xlow=mid;
} else if (xval.ll>ival.ll) {
} else if (xval.ull>ival.ull) {
xupp=mid;
} else { // xval.ll == ival.ll)
} else { // xval.ull == ival.ull)
tmplow = mid;
tmpupp = mid;
while(tmplow<xupp-1) {
mid = tmplow + (xupp-tmplow)/2;
xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ll == ival.ll) tmplow=mid; else xupp=mid;
xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ull == ival.ull) tmplow=mid; else xupp=mid;
}
while(xlow<tmpupp-1) {
mid = xlow + (tmpupp-xlow)/2;
xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ll == ival.ll) tmpupp=mid; else xlow=mid;
xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
if (xval.ull == ival.ull) tmpupp=mid; else xlow=mid;
}
break;
}
Expand Down Expand Up @@ -390,13 +387,13 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
if (col>-1) {
while(tmplow<iupp-1) {
mid = tmplow + (iupp-tmplow)/2;
xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
if (xval.ll == ival.ll) tmplow=mid; else iupp=mid;
xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
if (xval.ull == ival.ull) tmplow=mid; else iupp=mid;
}
while(ilow<tmpupp-1) {
mid = ilow + (tmpupp-ilow)/2;
xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
if (xval.ll == ival.ll) tmpupp=mid; else ilow=mid;
xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
if (xval.ull == ival.ull) tmpupp=mid; else ilow=mid;
}
}
// ilow and iupp now surround the group in ic, too
Expand Down Expand Up @@ -485,27 +482,33 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisg
retLength[ir] = 1;
}
} else {
// To fix #1405 we've to compute the difference in unsigned long long and typecase back to
// double; to handle integer64 cases as well. Previous implementation segfaulted since it
// accessed values beyond xupp/xlow, #1650. Fix for that involves integrating the logic
// directly inside. "ull" is typedef for unsigned long long
// TODO: incorporate the twiddle logic for roll as well instead of tolerance?
// Regular roll=TRUE|+ve|-ve
// Fixed issues: #1405, #1650, #1007
// TODO: incorporate the twiddle logic for roll as well instead of tolerance?
if ( ( (roll>0.0 && (!lowmax || xlow>xlowIn) && (xupp<xuppIn || !uppmax || rollends[1]))
|| (roll<0.0 && xupp==xuppIn && uppmax && rollends[1]) )
&& ( (TYPEOF(ic)==REALSXP &&
((double)((ull)REAL(ic)[ir]-(ull)REAL(xc)[XIND(xlow)])-rollabs<1e-6 ||
(double)((ull)REAL(ic)[ir]-(ull)REAL(xc)[XIND(xlow)]) == rollabs)) // #1007 fix
|| (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs<1e-6 )
&& ( (TYPEOF(ic)==REALSXP &&
(ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xlow)], 1) &&
(( !isInt64 &&
(ival.d-xval.d-rollabs < 1e-6 ||
ival.d-xval.d == rollabs /*#1007*/))
|| ( isInt64 &&
(double)(ival.ll-xval.ll)-rollabs < 1e-6 ) )) // cast to double for when rollabs==Inf
|| (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs < 1e-6 )
|| (TYPEOF(ic)==STRSXP) )) {
retFirst[ir] = xlow+1;
retLength[ir] = 1;
} else if
( ( (roll<0.0 && (!uppmax || xupp<xuppIn) && (xlow>xlowIn || !lowmax || rollends[0]))
|| (roll>0.0 && xlow==xlowIn && lowmax && rollends[0]) )
&& ( (TYPEOF(ic)==REALSXP &&
((double)((ull)REAL(xc)[XIND(xupp)]-(ull)REAL(ic)[ir])-rollabs<1e-6 ||
(double)((ull)REAL(xc)[XIND(xupp)]-(ull)REAL(ic)[ir]) == rollabs)) // 1007 fix
|| (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs<1e-6 )
&& ( (TYPEOF(ic)==REALSXP &&
(ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xupp)], 1) &&
(( !isInt64 &&
(xval.d-ival.d-rollabs < 1e-6 ||
xval.d-ival.d == rollabs /*#1007*/))
|| ( isInt64 &&
(double)(xval.ll-ival.ll)-rollabs < 1e-6 ) ))
|| (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs < 1e-6 )
|| (TYPEOF(ic)==STRSXP) )) {
retFirst[ir] = xupp+1; // == xlow+2
retLength[ir] = 1;
Expand Down

0 comments on commit fe2d0c8

Please sign in to comment.