Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.Date can result in different underlying types #6602

Closed
ben-schwen opened this issue Nov 3, 2024 · 13 comments · Fixed by #6603
Closed

as.Date can result in different underlying types #6602

ben-schwen opened this issue Nov 3, 2024 · 13 comments · Fixed by #6603
Milestone

Comments

@ben-schwen
Copy link
Member

R CMD check is currently failing on windows-devel with typeof x.START_DATE (integer) != typeof i.DATE (double)

The responsible code is explicitly casting to as.Date, so I guess this a bug in R-devel?

Seems related to this recent change in R-devel:
https://bugs.r-project.org/show_bug.cgi?id=18782

@MichaelChirico
Copy link
Member

Could you link the data.table code since you found it?

Id say yes it's likely related to that Bugzilla report + associated fix. Seems R-core want to move more towards greedily allowing Date type to have underlying integer storage sometimes, not sure what exactly we should do in light of that.

@ben-schwen
Copy link
Member Author

ben-schwen commented Nov 3, 2024

It is test case 1848.1
https://github.com/Rdatatable/data.table/blob/6a15f8617de121a406cee97b22e83e0c2c4bb034/inst/tests/tests.Rraw#L12219-L12233

As you mentioned, with R-devel r87285 on Windows we have typeof(seq(as.Date('2015-01-01'), as.Date('2015-01-05'), by="day")) returning integer while the previous behavior was that it always returned double.

The strange part is that typeof(DT1$DATE) and typeof(DT2$START_DATE) both return "integer" on R-devel r87285, so bmerge should work.

@ben-schwen
Copy link
Member Author

ben-schwen commented Nov 3, 2024

Update: Turning on verbose tells us the problem here.
We do a join where a single column, is in two join conditions.

i.RANDOM_STRING has same type (character) as x.RANDOM_STRING. No coercion needed.
i.DATE has same type (integer) as x.START_DATE. No coercion needed.
Coercing integer column i.DATE to type double for join to match type of x.EXPIRY_DATE.

Minimal reprex:

date_int = seq(as.Date('2015-01-01'), as.Date('2015-01-01'), by="day")
x = data.table(a=date_int, b=1)
y = data.table(c=date_int, d=as.Date('2015-01-01'))
y[x, on=.(c == a, d == a)]

@tdhock
Copy link
Member

tdhock commented Nov 11, 2024

I can reproduce using R-devel

hoct2726@DINF-THOCK-01A MINGW64 ~/R/data.table (fix-ci-nov-2024)
$ R --version
R Under development (unstable) (2024-11-11 r87319 ucrt) -- "Unsuffered Consequences"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.

(base) �]0;MINGW64:/c/Users/hoct2726/R/data.table�
hoct2726@DINF-THOCK-01A MINGW64 ~/R/data.table (fix-ci-nov-2024)
$ R CMD INSTALL . && R -e 'library(data.table);test.data.table()'
* installing to library 'C:/Users/hoct2726/AppData/Local/R/win-library/4.5'
* installing *source* package 'data.table' ...
** using staged installation

   **********************************************
   WARNING: this package has a configure script
         It probably needs manual configuration
   **********************************************


** libs
make: Warning: File 'C:/PROGRA~1/R/R-devel/etc/x64/Makeconf' has modification time 6.9 s in the future
make: warning:  Clock skew detected.  Your build may be incomplete.
using C compiler: 'gcc.exe (GCC) 13.2.0'
make: Warning: File 'C:/PROGRA~1/R/R-devel/etc/x64/Makeconf' has modification time 6 s in the future
gcc -shared -s -static-libgcc -o data.table.dll tmp.def assign.o between.o bmerge.o chmatch.o cj.o coalesce.o dogroups.o fastmean.o fcast.o fifelse.o fmelt.o forder.o frank.o fread.o freadR.o froll.o frollR.o frolladaptive.o fsort.o fwrite.o fwriteR.o gsumm.o idatetime.o ijoin.o init.o inrange.o nafill.o negate.o nqrecreateindices.o openmp-utils.o programming.o quickselect.o rbindlist.o reorder.o shift.o snprintf.o subset.o transpose.o types.o uniqlist.o utils.o vecseq.o wrappers.o -fopenmp -lz -LC:/rtools44/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools44/x86_64-w64-mingw32.static.posix/lib -LC:/PROGRA~1/R/R-devel/bin/x64 -lR
mv data.table.dll data_table.dll
make: warning:  Clock skew detected.  Your build may be incomplete.
installing to C:/Users/hoct2726/AppData/Local/R/win-library/4.5/00LOCK-data.table/00new/data.table/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (data.table)

R Under development (unstable) (2024-11-11 r87319 ucrt) -- "Unsuffered Consequences"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(data.table);test.data.table()
getDTthreads(verbose=TRUE):
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            6
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          6
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 3 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: C:/Users/hoct2726/AppData/Local/R/win-library/4.5/data.table/tests/tests.Rraw
Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  : 
  typeof x.START_DATE (integer) != typeof i.DATE (double)

Mon Nov 11 16:05:25 2024  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='America/Toronto', Sys.getlocale()=='LC_COLLATE=English_United States.utf8;LC_CTYPE=English_United States.utf8;LC_MONETARY=English_United States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codepage=65001; system.codepage=65001', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==6; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==6; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 3 threads with throttle==1024. See ?setDTthreads.', .libPaths()=='C:/Users/hoct2726/AppData/Local/R/win-library/4.5','C:/Program Files/R/R-devel/library', zlibVersion()==1.3.1 ZLIB_VERSION==1.3.1
Error in test.data.table() : 
  Failed in 55.3s elapsed (21.1s cpu) after test 1847.3 before the next test() call in C:/Users/hoct2726/AppData/Local/R/win-library/4.5/data.table/tests/tests.Rraw
Calls: test.data.table -> stopf -> raise_condition -> signal
Execution halted
(base) �]0;MINGW64:/c/Users/hoct2726/R/data.table�
hoct2726@DINF-THOCK-01A MINGW64 ~/R/data.table (fix-ci-nov-2024)
$ 

@ben-schwen
Copy link
Member Author

This now also errors on all devel versions on the CRAN checks

@TysonStanley
Copy link
Member

Unless something changed, CRAN reached out a few weeks ago and suggested it was a bug on devel regarding coercion of dates. They said they weren't planning on making an actual change here. Not sure if it's the same issue tho, can look into it.

@MichaelChirico
Copy link
Member

We do a join where a single column, is in two join conditions.

Could you lay out why that's an issue? Maybe coercion is ignored for the first time the column is used, but then done the second time, which would require also having coerced the first time?

@ben-schwen
Copy link
Member Author

We do a join where a single column, is in two join conditions.

Could you lay out why that's an issue? Maybe coercion is ignored for the first time the column is used, but then done the second time, which would require also having coerced the first time?

Suppose you have the tables, A: i1 and B: i2, d1.
If we now join A and B and i1==i2 and i1==d1, then data.table recognizes that for the first condition, both have the same type and no coercion is needed and for the second condition, it coerces i1 to a double. The error now happens when trying to merge i1 (which is coerced to a double) with i2 which stays as integer.

@MichaelChirico
Copy link
Member

Hmm, that makes some sense, but then why are Dates involved? It sounds like we could reproduce with only atomic columns, which I tried without success:

DT1 = data.table(i = 1:10)
DT2 = data.table(d1 = c(-1, 5, 11), i2 = c(3L, -2L, 12L))

dim(DT1[DT2, on = .(i >= d1, i <= i2), verbose=TRUE])
Coercing double column i.i1 (which contains no fractions) to type integer to match type of x.i.
# ...
# [1] 5 2

dim(DT1[DT2, on = .(i <= i2, i >= d1), verbose=TRUE])
# i.i2 has same type (integer) as x.i. No coercion needed.
# Coercing double column i.i1 (which contains no fractions) to type integer to match type of x.i.
# ...
# [1] 5 2

@ben-schwen
Copy link
Member Author

Good point and indeed the error also appears when swapping right for left join

DT1 = data.table(i = 1:10)
DT2 = data.table(d1 = c(-1, 5, 11), i2 = c(3L, -2L, 12L))
DT2[DT1, on = .(i2 <= i, d1 >= i), verbose=TRUE]
# i.i has same type (integer) as x.i2. No coercion needed.
# Coercing integer column i.i to type double for join to match type of x.d1.
# Assigning to all 10 rows
# RHS_list_of_columns == false
# Direct plonk of unnamed RHS, no copy. MAYBE_REFERENCED==1, MAYBE_SHARED==0
# Non-equi join operators detected ...
#   forder took ... forder.c received 3 rows and 2 columns
# forderReuseSorting: opt=-1, took 0.000s
# 0.000s elapsed (0.000s cpu)
#   Generating non-equi group ids ... done in 0.005s elapsed (0.006s cpu)
#   Recomputing forder with non-equi ids ... Assigning to all 3 rows
# RHS_list_of_columns == false
# RHS for item 1 has been duplicated because MAYBE_REFERENCED==1 MAYBE_SHARED==1, but then is being plonked. length(values)==3; length(cols)==1)
# forder.c received 3 rows and 3 columns
# forderReuseSorting: opt=-1, took 0.000s
# done in 0.005s elapsed (0.006s cpu)
#   Found 2 non-equi groups ...
# Starting bmerge ...
# Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult,  :
#   typeof x.i2 (integer) != typeof i.i (double)

@MichaelChirico
Copy link
Member

Nice... that points to a more fundamental issue about joins involving the same column in multiple conditions that we should fix regardless of the r-devel behavior

@MichaelChirico
Copy link
Member

@TysonStanley this is fixed in a patch branch & ready to be shipped I believe:

https://github.com/Rdatatable/data.table/tree/patch-1.16.4

Thank you!

@TysonStanley
Copy link
Member

Thanks all. Submitted the patch to CRAN today, awaiting review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants