Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set function crashes the R session #4824

Closed
clerousset opened this issue Nov 30, 2020 · 3 comments · Fixed by #4829
Closed

set function crashes the R session #4824

clerousset opened this issue Nov 30, 2020 · 3 comments · Fixed by #4829
Assignees
Milestone

Comments

@clerousset
Copy link

A very bad usage of set leads to the systematic crash of the Rsession.

# minimal reproductible example :

library(data.table)
n<-1e6
dt<-
  data.table(
    fact=factor(rep("A",n)),
    case=rep(TRUE,n),
    coeff=rep(5,n))
x<-"fact"
 set(
    dt,
    i= which(dt$case==TRUE),
    j= "fact",
    value=dt[(case),..x]*dt$coeff
    )

For the crash to happen :

  • n should be big (n=1e4 : no crash, n=1e5 : crash)
  • dt$fact needs to be a factor
  • argument i needs to be used in set
  • argument j needs to be exactly "fact" (needs to overwrite column "fact")
  • one should use ..x and not "fact" in dt[(case),..x]

# Output of sessionInfo()

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)Matrix products: defaultlocale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252attached base packages:
[1] stats graphics grDevices utils datasets methods baseother attached packages:
[1] data.table_1.13.0loaded via a namespace (and not attached):
[1] compiler_4.0.0 tools_4.0.0

@jangorecki
Copy link
Member

jangorecki commented Nov 30, 2020

Thank you for reporting, are you able to reproduce it on recent version?
On Linux and recent devel I am not having a crash, yet result is corrupted object, that errors when trying to print, so needs some fix anyway.
Worth to note that set is meant to be low overhead function, that can be easily looped million+ times, so probably proper checking for validity of value is not possible.

library(data.table)
n<-1e6
dt<-
  data.table(
    fact=factor(rep("A",n)),
    case=rep(TRUE,n),
    coeff=rep(5,n))
x<-"fact"
 set(
    dt,
    i= which(dt$case==TRUE),
    j= "fact",
    value=dt[(case),..x]*dt$coeff
    )
#Warning message:
#In Ops.factor(left, right) : '*' not meaningful for factors
dt
#Error in as.character.factor(x) : malformed factor

@shrektan
Copy link
Member

shrektan commented Nov 30, 2020

I can reproduce the crash on Windows 10 with the dev version of data.table. A simpler repro could be this:

library(data.table)
n <- 1e6
dt <- data.table(fact = factor(rep('a', n)))
x <- "fact"
set(dt,
    i = seq_len(1e5),
    j = "fact",
    value = rep(NA, 1e5))

The root cause, IMHO, is that the logical value NA was assigned to an integer vector (factor vectors are in integer), which should be coerced to the numeric NA_INTEGER first.

@shrektan shrektan self-assigned this Nov 30, 2020
@jangorecki

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants