`data.table::fread` #2

ghost · 2017-02-27T19:16:02Z

before fread support compressed file, or there is a cross platform solution to uncompress files, use parameter zipfile = FALSE for fread, fall back to read.csv when zip file is needed.

For 160M csv, fread took 2.64s while read.csv took 21s.

The text was updated successfully, but these errors were encountered:

xhdong-umd · 2017-02-27T20:10:24Z

The ideal solution is also use fread for zip file. There are several approaches:

It's a popular request in data.table
uncompress file to stdout with fread(input = 'zcat < data.gz'). However windows doesn't have zcat gzip installed by default. It's difficult to create a simple cross platform solution without needs user to install software first.
uncompress file to temp file, read file, delete the temp file. The problem here is the complexity of zip file. There are multiple possible zip methods, including file created by tar which cannot be recognized by R internal function unzip. R.utils used R connections method to uncompress zip to files, but you still need to identify compression method first.

Right now the parameter method is simplest without need of much change to existing code. We can further improve this depend on new usage or development in related packages.

chfleming · 2017-02-27T21:06:11Z

I think the most important thing is that as.telemetry "just work" with default arguments. I put in some code that checks to see if the filename looks like a CSV, then attempts fread. If the filename doesn't look like a CSV or fread fails, then the slower read.table is used instead.

  data < NULL
  # fread doesn't work on compressed files yet
  if(endsWith(tolower(object),".csv"))
  { data <- try(data.table::fread(object,data.table=FALSE,check.names=TRUE,...)) }
  # if fread fails, then fall back on read.table
  if(class(data)!="data.frame")
  { data <- utils::read.csv(object,...) }

We could add in more logic for different compression formats, but I don't know that the command & pipe notation is the same across platforms.

xhdong-umd · 2017-02-27T21:11:10Z

@chfleming This is a much better solution compared to extra parameter.

I think there is no need to check compression formats since there are many possibilities and platform compatibility problems.

xhdong-umd · 2017-02-27T21:22:29Z

@chfleming I think we can actually just fread the first 5 rows without the file name check. It's possible the csv file have different file name (I saw .txt before). How about this:

data <- try(data.table::fread(object, data.table = FALSE, check.names = TRUE, nrows = 5), 
            silent = TRUE)
if (class(data) == "data.frame") {
  data <- data.table::fread(object,data.table=FALSE,check.names=TRUE,...)
} else {
  data <- utils::read.csv(object,...)
}

I think the direct read test should be fast enough that comparable to the file name check, and it will handle all possible cases without complex logic.

chfleming · 2017-02-27T21:38:01Z

That seems to work well. Pushed.

xhdong-umd mentioned this issue Feb 27, 2017

added parameter “zipfile = FALSE” to as.telemetry.character #1

Closed

xhdong-umd added the to review label Feb 27, 2017

dracodoc closed this as completed Feb 27, 2017

xhdong-umd added Done and removed to review labels Feb 28, 2017

xhdong-umd mentioned this issue Mar 2, 2017

decompress zip to temp file for fread #7

Merged

chfleming added a commit that referenced this issue Feb 2, 2024

mean.ctmm dim fix #2

5c2299a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`data.table::fread` #2

`data.table::fread` #2

ghost commented Feb 27, 2017 •

edited by xhdong-umd

Loading

xhdong-umd commented Feb 27, 2017

chfleming commented Feb 27, 2017

xhdong-umd commented Feb 27, 2017

xhdong-umd commented Feb 27, 2017

chfleming commented Feb 27, 2017

data.table::fread #2

data.table::fread #2

Comments

ghost commented Feb 27, 2017 • edited by xhdong-umd Loading

xhdong-umd commented Feb 27, 2017

chfleming commented Feb 27, 2017

xhdong-umd commented Feb 27, 2017

xhdong-umd commented Feb 27, 2017

chfleming commented Feb 27, 2017

`data.table::fread` #2

`data.table::fread` #2

ghost commented Feb 27, 2017 •

edited by xhdong-umd

Loading