Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault/unexpected results with fread and colClasses #3143

Closed
kszela24 opened this issue Nov 12, 2018 · 2 comments
Closed

segfault/unexpected results with fread and colClasses #3143

kszela24 opened this issue Nov 12, 2018 · 2 comments

Comments

@kszela24
Copy link

kszela24 commented Nov 12, 2018

The gist of the issue is that when attempting to cast a character column to integer via colClasses, fread will error out under some conditions and produce unexpected results in others (instead of quietly returning the uncasted column). Main examples were all run on 1.11.8. Code was originally found to work on 1.10.4 and is labeled as the expected result.

Not sure if each of these belong as separate issues, since they seem to be somewhat related. If these deserve separate issues please let me know and I'll be happy to create them.

# Minimal reproducible examples
[1.11.8] Segfault is created when casting character column to integer when the character column should be returned:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.1)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
 *** caught segfault ***
address 0x14576766a, cause 'memory not mapped'

Traceback:
 1: data.table::fread("~/test_dt.csv", colClasses = c(b = "integer"))

[1.10.4] Expected result (a is numeric, b is character):

     a b
1: 1.0 1
2: 2.0 2
3: 3.0 E
4: 4.0 4
5: 5.1 5

[1.11.8] When decimals can be cast to integers, no segfault occurs, but the character values of the cast column are all the same. In this case, a is read in as integer, b as character:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.0)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
   a b
1: 1 5
2: 2 5
3: 3 5
4: 4 5
5: 5 5

[1.10.4] Expected result (a is integer, b is character):

   a b
1: 1 1
2: 2 2
3: 3 E
4: 4 4
5: 5 5

[1.11.8] Above is the same behavior as if column a did not exist (column b is output as character):

testDT <- data.table::data.table(b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(b = "integer"))
readDT
   b
1: 5
2: 5
3: 5
4: 5
5: 5

[1.10.4] Expected result (b is character):

   b
1: 1
2: 2
3: E
4: 4
5: 5

[1.11.8] And lastly, if I additionally cast the integer column back to numeric via colClasses, the resulting column contains mostly empty strings. In this case, a is numeric, b is character:

testDT <- data.table::data.table(a = c(1.0, 2.0, 3.0, 4.0, 5.0)
                                 , b = c("1", "2", "E", "4", "5"))
data.table::fwrite(testDT, "~/test_dt.csv")
readDT <- data.table::fread("~/test_dt.csv"
                            , colClasses = c(a = "numeric"
                                             , b = "integer"))
readDT
   a b
1: 1  
2: 2  
3: 3  
4: 4  
5: 5 5

[1.10.4] Expected result (a is numeric, b is character):

   a b
1: 1 1
2: 2 2
3: 3 E
4: 4 4
5: 5 5

# Output of sessionInfo()

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.8

loaded via a namespace (and not attached):
[1] compiler_3.4.4 tools_3.4.4    yaml_2.1.18
@jangorecki jangorecki added this to the 1.12.0 milestone Nov 19, 2018
@mattdowle mattdowle modified the milestones: 1.12.0, 1.12.2 Jan 6, 2019
@mattdowle
Copy link
Member

This was just fixed in v1.12.0 now on CRAN but I omitted to link this issue into that fix. I just ran your tests and confirmed. I'll add them verbatim to the test suite and thank you for reporting in NEWS item 5 for v1.12.0.

@mattdowle
Copy link
Member

Closed by 084242c

@mattdowle mattdowle modified the milestones: 1.12.2, 1.12.0 Jan 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants