Skip to content

Commit

Permalink
version 1.16.4
Browse files Browse the repository at this point in the history
  • Loading branch information
TysonStanley authored and cran-robot committed Dec 6, 2024
1 parent fb36734 commit dada9ea
Show file tree
Hide file tree
Showing 21 changed files with 190 additions and 148 deletions.
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: data.table
Version: 1.16.2
Version: 1.16.4
Title: Extension of `data.frame`
Depends: R (>= 3.3.0)
Imports: methods
Expand Down Expand Up @@ -94,7 +94,7 @@ Authors@R: c(
person("Ivan", "Krylov", role="ctb")
)
NeedsCompilation: yes
Packaged: 2024-10-09 21:37:41 UTC; tysonbarrett
Packaged: 2024-12-04 23:18:02 UTC; tysonbarrett
Author: Tyson Barrett [aut, cre] (<https://orcid.org/0000-0002-2137-1391>),
Matt Dowle [aut],
Arun Srinivasan [aut],
Expand Down Expand Up @@ -175,4 +175,4 @@ Author: Tyson Barrett [aut, cre] (<https://orcid.org/0000-0002-2137-1391>),
Ivan Krylov [ctb]
Maintainer: Tyson Barrett <t.barrett88@gmail.com>
Repository: CRAN
Date/Publication: 2024-10-10 16:10:06 UTC
Date/Publication: 2024-12-06 15:10:10 UTC
40 changes: 20 additions & 20 deletions MD5
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
e3d451e8f4700529c150f6759be147dd *DESCRIPTION
126108daee164f24fb19fb189ccd1ac1 *DESCRIPTION
815ca599c9df247a0c7f619bab123dad *LICENSE
7832e68f8a32867afa2cb74a71ce1e27 *NAMESPACE
37a0e791d6e5034d9d26cecbdeb7e8fb *NEWS.md
e532614e0c32436a632553a8a3b8f0ab *NEWS.md
794ced8294de6d37183148a1f1f34c1e *R/AllS4.R
b2710011d3c2250883c3f21dfab28509 *R/IDateTime.R
c69ff420be9c165e9c3b87f9ed000ca1 *R/as.data.table.R
09e6cb1b32ecc761c5270d17be53553d *R/between.R
97029150b01739872757797cd5afc651 *R/bmerge.R
8328407eeb9e2787c5e1a12df589f2c4 *R/bmerge.R
64706528f788194122c5b347019fe14d *R/cedta.R
6ac810dd4ab69038b7688572b97cd3c9 *R/data.table.R
55202f3124b9c06468785d3a51f9e401 *R/devel.R
Expand Down Expand Up @@ -43,38 +43,38 @@ a2e830d2c88984acd3e707699cd29b0e *R/uniqlist.R
ac7ebd71d09719a70cd007673c19ca74 *R/wrappers.R
733673a85a287d3c71e2612d015a8cd9 *R/xts.R
273e41d947f46d05be29454dad105bc4 *README.md
3a1d53e1c41019f0cae9ce017e80eabb *build/vignette.rds
03659726e1f2b8817cea2097b7f57d07 *build/vignette.rds
6071edd604dbeb75308cfbedc7790398 *cleanup
1d71eb324d5e288488d95e478a502591 *configure
9504815ba38eeb382a22ea5d0242d61c *inst/cc
bcb9a54307774c931eb842db9cb82141 *inst/doc/datatable-benchmarking.Rmd
1e014c93bcd61ffa4624604b0e3b8a04 *inst/doc/datatable-benchmarking.html
71b785046b92ed45bc360dc6518ad1e5 *inst/doc/datatable-benchmarking.html
78af996e3599dcbb8a25d36a8aa33995 *inst/doc/datatable-faq.R
f32ff091fd55792f2af43d52e687b4c6 *inst/doc/datatable-faq.Rmd
ef813eb3abf6dac552788655a102c2ad *inst/doc/datatable-faq.html
519fbf8f0211b4a6b6c5bc96ce8b63f9 *inst/doc/datatable-faq.html
99ac6f160e64166a6d25efb11dc16a09 *inst/doc/datatable-importing.Rmd
e7ce474ddfa147d2cb2f747589fbd807 *inst/doc/datatable-importing.html
9a9354e73ef9740af8541e0891904d6f *inst/doc/datatable-intro.R
47cc55ef600bae6610ef1cca69dcafeb *inst/doc/datatable-importing.html
44ecabc60fdee95386d49cc7f2f526e1 *inst/doc/datatable-intro.R
e5c464c2bf8479c5c6a8550572b2211f *inst/doc/datatable-intro.Rmd
1cb5a458f5bf428e2e4d9c8c8f0f397c *inst/doc/datatable-intro.html
fb475abee93018578d2552b22009487c *inst/doc/datatable-keys-fast-subset.R
cb7d7440e3a2099ba1a81d8957a3147c *inst/doc/datatable-intro.html
8ccdb27464c43b2d1f2dbada3b8d2fed *inst/doc/datatable-keys-fast-subset.R
ba464c4c714060af80fff9fdc2cc8e5f *inst/doc/datatable-keys-fast-subset.Rmd
5b05c8158cf68f8998992035c1ef8bf1 *inst/doc/datatable-keys-fast-subset.html
3061f05eca1f587e716d0d7ff6da526c *inst/doc/datatable-keys-fast-subset.html
7726e71097ea329f4e81fbb2336332ed *inst/doc/datatable-programming.R
4dfded8c1accfa827f2b971505a55014 *inst/doc/datatable-programming.Rmd
28ec9aec49a2c007197165acb8ede4f5 *inst/doc/datatable-programming.html
40fdae694040bd304e5fc54c2e044d38 *inst/doc/datatable-reference-semantics.R
f18b81216b8e8f2c7e9ae764b8a0c679 *inst/doc/datatable-programming.html
739f4fcccd120f7b0f6a9efb5d3130d0 *inst/doc/datatable-reference-semantics.R
d209714399d215c6573a34b172a042b0 *inst/doc/datatable-reference-semantics.Rmd
7a30b553bd3ba94a7a14ffd4a0f52cb6 *inst/doc/datatable-reference-semantics.html
0b78408fdc4baba4fb478701ab4732b2 *inst/doc/datatable-reference-semantics.html
90b767499a017638c55887d09b132462 *inst/doc/datatable-reshape.R
248d3d8c846f14e3d1cf07f1732a465c *inst/doc/datatable-reshape.Rmd
04d4107ad8e453ee61cf9625561b79e6 *inst/doc/datatable-reshape.html
2f4816ce333772316971853338e83a34 *inst/doc/datatable-reshape.html
f3ab4569b4d2c146524f510ff527e290 *inst/doc/datatable-sd-usage.R
e661e1e906266ba302ab924fba797a89 *inst/doc/datatable-sd-usage.Rmd
fb482ddbd7e5d17d69a6e50ab8fbe744 *inst/doc/datatable-sd-usage.html
367dbcaa8178ce1725045614b4b5eadd *inst/doc/datatable-secondary-indices-and-auto-indexing.R
0c916352de0537e9c1f18572d023b290 *inst/doc/datatable-sd-usage.html
f271212899911bf027b46a50f6ef021c *inst/doc/datatable-secondary-indices-and-auto-indexing.R
1e46840f69c117e6b0929655924424dc *inst/doc/datatable-secondary-indices-and-auto-indexing.Rmd
afea495fd6cd2e14ba1b768d3afd1982 *inst/doc/datatable-secondary-indices-and-auto-indexing.html
b4537adb106481832ebffe60d500c34b *inst/doc/datatable-secondary-indices-and-auto-indexing.html
49735cc61476d7bf8c15d9a7fb15e007 *inst/include/datatableAPI.h
db43b498843b1100f6c995e79ca53c7f *inst/po/en@quot/LC_MESSAGES/R-data.table.mo
be55865ffbb8ebbbe8defe19863eed25 *inst/po/en@quot/LC_MESSAGES/data.table.mo
Expand Down Expand Up @@ -162,7 +162,7 @@ dca082d791806aadd3127436883a84f1 *inst/tests/test1372-1.Rdata
773f2947ecaefab52f049698be623178 *inst/tests/test1372.Rdata
5ce8f8dbc49189e6980f070c6db5f2c6 *inst/tests/test2224.Rdata
ea4eaa4b3c3136fdf7831621f64f6bcf *inst/tests/test2233-43.Rdata
153bb2fcbe4d1c7ea09180d290fa3026 *inst/tests/tests.Rraw.bz2
46e6092035c85a279365e18ad77db667 *inst/tests/tests.Rraw.bz2
864110fc78037e10545091c75dec24c7 *inst/tests/types.Rraw.bz2
161e66cd8341ec95e4fb2c2dfb20010e *inst/tests/unescaped.csv
5281b3033fc0932f09393789435b7744 *inst/tests/utf16be.txt
Expand Down Expand Up @@ -269,7 +269,7 @@ b462d48eecd96a8221e385824d9bfccc *src/freadR.h
7d1d51051e1421801ca3cb1fe4a60ce6 *src/gsumm.c
81d144e2be715b15f65a1fbe24cbcbc4 *src/idatetime.c
07ae56049cf5315d0d149b37267bf2a6 *src/ijoin.c
70ce02ce07ace74d638bd5bbb7d4fa02 *src/init.c
73cc3279e28532250de365a1143468a3 *src/init.c
bf34375cfb057ac0749fa10ee92a7e0a *src/inrange.c
998a0d5163e537e7b4040ccfe488782c *src/myomp.h
08bc819801a97494ff15539664fd0f84 *src/nafill.c
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
**If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.**

# data.table [v1.16.4](https://github.com/Rdatatable/data.table/milestone/36) 4 December 2024

## BUG FIXES

1. Joins on multiple columns, such as `x[y, on=c("x1==y1", "x2==y1")]`, could fail during implicit type coercions if `x1` and `x2` had different but still compatible types, [#6602](https://github.com/Rdatatable/data.table/issues/6602). This was particularly unexpected when columns `x1`, `x2`, and `y1` were all of the same class, e.g. `Date`, but differed in their underlying storage types. Thanks to Benjamin Schwendinger for the report and the fix.

# data.table [v1.16.2](https://github.com/Rdatatable/data.table/milestone/35) (9 October 2024)

## BUG FIXES
Expand Down
144 changes: 90 additions & 54 deletions R/bmerge.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,25 @@


mergeType = function(x) {
ans = typeof(x)
if (ans=="integer") { if (is.factor(x)) ans = "factor" }
else if (ans=="double") { if (inherits(x, "integer64")) ans = "integer64" }
# do not call isReallyReal(x) yet because i) if both types are double we don't need to coerce even if one or both sides
# are int-as-double, and ii) to save calling it until we really need it
ans
}

cast_with_atts = function(x, as.f) {
ans = as.f(x)
if (!is.null(attributes(x))) attributes(ans) = attributes(x)
ans
}

coerce_col = function(dt, col, from_type, to_type, from_name, to_name, verbose_msg=NULL) {
if (!is.null(verbose_msg)) catf(verbose_msg, from_type, from_name, to_type, to_name, domain=NULL)
set(dt, j=col, value=cast_with_atts(dt[[col]], match.fun(paste0("as.", to_type))))
}

bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbose)
{
callersi = i
Expand All @@ -25,95 +46,110 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos

supported = c(ORDERING_TYPES, "factor", "integer64")

getClass = function(x) {
ans = typeof(x)
if (ans=="integer") { if (is.factor(x)) ans = "factor" }
else if (ans=="double") { if (inherits(x, "integer64")) ans = "integer64" }
# do not call isReallyReal(x) yet because i) if both types are double we don't need to coerce even if one or both sides
# are int-as-double, and ii) to save calling it until we really need it
ans
}

if (nrow(i)) for (a in seq_along(icols)) {
# - check that join columns have compatible types
# - do type coercions if necessary on just the shallow local copies for the purpose of join
# - handle factor columns appropriately
# Note that if i is keyed, if this coerces i's key gets dropped by set()
ic = icols[a]
xc = xcols[a]
xclass = getClass(x[[xc]])
iclass = getClass(i[[ic]])
xname = paste0("x.", names(x)[xc])
iname = paste0("i.", names(i)[ic])
if (!xclass %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, xclass)
if (!iclass %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, iclass)
if (xclass=="factor" || iclass=="factor") {
icol = icols[a]
xcol = xcols[a]
x_merge_type = mergeType(x[[xcol]])
i_merge_type = mergeType(i[[icol]])
xname = paste0("x.", names(x)[xcol])
iname = paste0("i.", names(i)[icol])
if (!x_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", xname, x_merge_type)
if (!i_merge_type %chin% supported) stopf("%s is type %s which is not supported by data.table join", iname, i_merge_type)
if (x_merge_type=="factor" || i_merge_type=="factor") {
if (roll!=0.0 && a==length(icols))
stopf("Attempting roll join on factor column when joining %s to %s. Only integer, double or character columns may be roll joined.", xname, iname)
if (xclass=="factor" && iclass=="factor") {
if (x_merge_type=="factor" && i_merge_type=="factor") {
if (verbose) catf("Matching %s factor levels to %s factor levels.\n", iname, xname)
set(i, j=ic, value=chmatch(levels(i[[ic]]), levels(x[[xc]]), nomatch=0L)[i[[ic]]]) # nomatch=0L otherwise a level that is missing would match to NA values
set(i, j=icol, value=chmatch(levels(i[[icol]]), levels(x[[xcol]]), nomatch=0L)[i[[icol]]]) # nomatch=0L otherwise a level that is missing would match to NA values
next
} else {
if (xclass=="character") {
if (x_merge_type=="character") {
if (verbose) catf("Coercing factor column %s to type character to match type of %s.\n", iname, xname)
set(i, j=ic, value=val<-as.character(i[[ic]]))
set(callersi, j=ic, value=val) # factor in i joining to character in x will return character and not keep x's factor; e.g. for antaresRead #3581
set(i, j=icol, value=val<-as.character(i[[icol]]))
set(callersi, j=icol, value=val) # factor in i joining to character in x will return character and not keep x's factor; e.g. for antaresRead #3581
next
} else if (iclass=="character") {
} else if (i_merge_type=="character") {
if (verbose) catf("Matching character column %s to factor levels in %s.\n", iname, xname)
newvalue = chmatch(i[[ic]], levels(x[[xc]]), nomatch=0L)
if (anyNA(i[[ic]])) newvalue[is.na(i[[ic]])] = NA_integer_ # NA_character_ should match to NA in factor, #3809
set(i, j=ic, value=newvalue)
newvalue = chmatch(i[[icol]], levels(x[[xcol]]), nomatch=0L)
if (anyNA(i[[icol]])) newvalue[is.na(i[[icol]])] = NA_integer_ # NA_character_ should match to NA in factor, #3809
set(i, j=icol, value=newvalue)
next
}
}
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, xclass, iname, iclass)
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, x_merge_type, iname, i_merge_type)
}
if (xclass == iclass) {
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, xclass, xname)
# we check factors first to cater for the case when trying to do rolling joins on factors
if (x_merge_type == i_merge_type) {
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, x_merge_type, xname)
next
}
if (xclass=="character" || iclass=="character" ||
xclass=="logical" || iclass=="logical" ||
xclass=="factor" || iclass=="factor") {
if (anyNA(i[[ic]]) && allNA(i[[ic]])) {
if (verbose) catf("Coercing all-NA %s (%s) to type %s to match type of %s.\n", iname, iclass, xclass, xname)
set(i, j=ic, value=match.fun(paste0("as.", xclass))(i[[ic]]))
cfl = c("character", "logical", "factor")
if (x_merge_type %chin% cfl || i_merge_type %chin% cfl) {
msg = if(verbose) gettext("Coercing all-NA %s column %s to type %s to match type of %s.\n") else NULL
if (anyNA(i[[icol]]) && allNA(i[[icol]])) {
coerce_col(i, icol, i_merge_type, x_merge_type, iname, xname, msg)
next
}
else if (anyNA(x[[xc]]) && allNA(x[[xc]])) {
if (verbose) catf("Coercing all-NA %s (%s) to type %s to match type of %s.\n", xname, xclass, iclass, iname)
set(x, j=xc, value=match.fun(paste0("as.", iclass))(x[[xc]]))
if (anyNA(x[[xcol]]) && allNA(x[[xcol]])) {
coerce_col(x, xcol, x_merge_type, i_merge_type, xname, iname, msg)
next
}
stopf("Incompatible join types: %s (%s) and %s (%s)", xname, xclass, iname, iclass)
stopf("Incompatible join types: %s (%s) and %s (%s)", xname, x_merge_type, iname, i_merge_type)
}
if (xclass=="integer64" || iclass=="integer64") {
if (x_merge_type=="integer64" || i_merge_type=="integer64") {
nm = c(iname, xname)
if (xclass=="integer64") { w=i; wc=ic; wclass=iclass; } else { w=x; wc=xc; wclass=xclass; nm=rev(nm) } # w is which to coerce
if (x_merge_type=="integer64") { w=i; wc=icol; wclass=i_merge_type; } else { w=x; wc=xcol; wclass=x_merge_type; nm=rev(nm) } # w is which to coerce
if (wclass=="integer" || (wclass=="double" && !isReallyReal(w[[wc]]))) {
if (verbose) catf("Coercing %s column %s%s to type integer64 to match type of %s.\n", wclass, nm[1L], if (wclass=="double") " (which contains no fractions)" else "", nm[2L])
set(w, j=wc, value=bit64::as.integer64(w[[wc]]))
} else stopf("Incompatible join types: %s is type integer64 but %s is type double and contains fractions", nm[2L], nm[1L])
} else {
# just integer and double left
if (iclass=="double") {
if (!isReallyReal(i[[ic]])) {
ic_idx = which(icol == icols) # check if on is joined on multiple conditions, #6602
if (i_merge_type=="double") {
coerce_x = FALSE
if (!isReallyReal(i[[icol]])) {
coerce_x = TRUE
# common case of ad hoc user-typed integers missing L postfix joining to correct integer keys
# we've always coerced to int and returned int, for convenience.
if (verbose) catf("Coercing double column %s (which contains no fractions) to type integer to match type of %s.\n", iname, xname)
val = as.integer(i[[ic]])
if (!is.null(attributes(i[[ic]]))) attributes(val) = attributes(i[[ic]]) # to retain Date for example; 3679
set(i, j=ic, value=val)
set(callersi, j=ic, value=val) # change the shallow copy of i up in [.data.table to reflect in the result, too.
} else {
if (verbose) catf("Coercing integer column %s to type double to match type of %s which contains fractions.\n", xname, iname)
set(x, j=xc, value=as.double(x[[xc]]))
if (length(ic_idx)>1L) {
xc_idx = xcols[ic_idx]
for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "double")]) {
if (isReallyReal(x[[xb]])) {
coerce_x = FALSE
break
}
}
}
if (coerce_x) {
msg = if (verbose) gettext("Coercing %s column %s (which contains no fractions) to type %s to match type of %s.\n") else NULL
coerce_col(i, icol, "double", "integer", iname, xname, msg)
set(callersi, j=icol, value=i[[icol]]) # change the shallow copy of i up in [.data.table to reflect in the result, too.
if (length(ic_idx)>1L) {
xc_idx = xcols[ic_idx]
for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "double")]) {
coerce_col(x, xb, "double", "integer", paste0("x.", names(x)[xb]), xname, msg)
}
}
}
}
if (!coerce_x) {
msg = if (verbose) gettext("Coercing %s column %s to type %s to match type of %s which contains fractions.\n") else NULL
coerce_col(x, xcol, "integer", "double", xname, iname, msg)
}
} else {
if (verbose) catf("Coercing integer column %s to type double for join to match type of %s.\n", iname, xname)
set(i, j=ic, value=as.double(i[[ic]]))
msg = if (verbose) gettext("Coercing %s column %s to type %s for join to match type of %s.\n") else NULL
coerce_col(i, icol, "integer", "double", iname, xname, msg)
if (length(ic_idx)>1L) {
xc_idx = xcols[ic_idx]
for (xb in xc_idx[which(vapply_1c(.shallow(x, xc_idx), mergeType) == "integer")]) {
coerce_col(x, xb, "integer", "double", paste0("x.", names(x)[xb]), xname, msg)
}
}
}
}
}
Expand Down
Binary file modified build/vignette.rds
Binary file not shown.
2 changes: 1 addition & 1 deletion inst/doc/datatable-benchmarking.html
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@
<div class="frontmatter">
<div class="title"><h1>Benchmarking data.table</h1></div>
<div class="author"><h2></h2></div>
<div class="date"><h3>2024-10-09</h3></div>
<div class="date"><h3>2024-12-04</h3></div>
</div>
<div class="body">
<div id="TOC">
Expand Down
2 changes: 1 addition & 1 deletion inst/doc/datatable-faq.html
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@
<div class="frontmatter">
<div class="title"><h1>Frequently Asked Questions about data.table</h1></div>
<div class="author"><h2></h2></div>
<div class="date"><h3>2024-10-09</h3></div>
<div class="date"><h3>2024-12-04</h3></div>
</div>
<div class="body">
<div id="TOC">
Expand Down
2 changes: 1 addition & 1 deletion inst/doc/datatable-importing.html
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@
<div class="frontmatter">
<div class="title"><h1>Importing data.table</h1></div>
<div class="author"><h2></h2></div>
<div class="date"><h3>2024-10-09</h3></div>
<div class="date"><h3>2024-12-04</h3></div>
</div>
<div class="body">
<style>
Expand Down
Loading

0 comments on commit dada9ea

Please sign in to comment.