Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement frev as fast base::rev alternative #5907

Open
wants to merge 73 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
d8db6e3
add macro version
ben-schwen Jan 12, 2024
c468589
write explicit parallel version
ben-schwen Jan 13, 2024
95208ac
copy attributes
ben-schwen Jan 13, 2024
fe474d7
add tests
ben-schwen Jan 13, 2024
ed845d3
Merge branch 'master' into frev
ben-schwen Jan 13, 2024
8ba2d2d
add to NAMESPACE
ben-schwen Jan 13, 2024
30ae580
add to tests
ben-schwen Jan 13, 2024
9839ef5
copy names
ben-schwen Jan 13, 2024
3b6fa52
add man page
ben-schwen Jan 13, 2024
1d1b0df
update man
ben-schwen Jan 13, 2024
320678d
fix typos
ben-schwen Jan 13, 2024
6943ebd
update tests
ben-schwen Jan 13, 2024
67bd0c9
add coverage
ben-schwen Jan 13, 2024
812a854
add benchmark example
ben-schwen Jan 13, 2024
529028a
coverage
ben-schwen Jan 13, 2024
73d2fdb
NEWS
ben-schwen Jan 13, 2024
b9e167c
trim NEWS
ben-schwen Jan 13, 2024
59b59ab
update NEWS
ben-schwen Jan 13, 2024
88d1ff9
add bit64
ben-schwen Jan 13, 2024
f85922a
update naming in NEWS
ben-schwen Jan 14, 2024
e4324cf
1.15.0 on CRAN. Bump to 1.15.99
MichaelChirico Jan 6, 2024
18a7209
Fix transform slowness (#5493)
OfekShilon Jan 6, 2024
b6bd964
Improvements to the introductory vignette (#5836)
Anirban166 Jan 6, 2024
68f0e41
Vignette typo patch (#5402)
davidbudzynski Jan 6, 2024
7e1a950
Improved handling of list columns with NULL entries (#4250)
sritchie73 Jan 7, 2024
d9d17a7
clarify that list input->unnamed list output (#5383)
MichaelChirico Jan 8, 2024
da24f85
fix subsetting issue in split.data.table (#5368)
MichaelChirico Jan 8, 2024
58608a2
switch to 3.2.0 R dep (#5905)
MichaelChirico Jan 12, 2024
c84a123
Allow early exit from check for eval/evalq in cedta (#5660)
MichaelChirico Jan 12, 2024
513f20f
frollmax1: frollmax, frollmax adaptive, left adaptive support (#5889)
jangorecki Jan 12, 2024
daee139
Friendlier error in assignment with trailing comma (#5467)
MichaelChirico Jan 14, 2024
f5ef168
Link to ?read.delim in ?fread to give a closer analogue of expected b…
MLopez-Ibanez Jan 13, 2024
f658ff4
Run GHA jobs on 1-15-99 dev branch (#5909)
MichaelChirico Jan 14, 2024
53149ed
prohibit matrix
ben-schwen Jan 14, 2024
a99d32f
readd deleted line
ben-schwen Jan 14, 2024
a56b796
Make declarations static for covr (#5910)
MichaelChirico Jan 15, 2024
1bef92c
reorder code
ben-schwen Jan 15, 2024
6d6d1cd
Merge branch 'frev' of github.com:Rdatatable/data.table into frev
ben-schwen Jan 15, 2024
a6907ad
return invisible if inplace
ben-schwen Jan 15, 2024
1e9f481
cut to 1 line
ben-schwen Jan 15, 2024
07fbea8
use isTRUE for copy=NA
ben-schwen Jan 15, 2024
a285661
speedup strings and lists
ben-schwen Jan 15, 2024
4318bb7
add Hughs comments
ben-schwen Jan 16, 2024
86d3d59
add coverage
ben-schwen Jan 16, 2024
c507fa5
dedup INTSXP LGLSXP
ben-schwen Jan 16, 2024
08b3591
make tests lighter
ben-schwen Jan 16, 2024
97ea3ff
rm altrep include
ben-schwen Jan 16, 2024
df4f160
change testnum
ben-schwen Jan 17, 2024
461a97a
Merge branch '1-15-99' into frev
ben-schwen Jan 17, 2024
025a3c5
remove altrep
ben-schwen Jan 17, 2024
48ded0b
remove duplicated tests
ben-schwen Jan 17, 2024
526a4ed
Merge branch 'master' into frev
MichaelChirico Feb 22, 2024
be50528
mostly fix botched merge
MichaelChirico Feb 22, 2024
f15ae3c
migrate NEWS item
MichaelChirico Feb 22, 2024
976d3ba
revert bad search+replace
MichaelChirico Feb 22, 2024
796828d
update NEWS wording
ben-schwen Mar 15, 2024
181957e
add small body
ben-schwen Mar 15, 2024
c751124
Merge branch 'master' into frev
ben-schwen Mar 15, 2024
d02df36
add additional test cases
ben-schwen Mar 18, 2024
2001816
rerun benchmarks single threaded
ben-schwen Mar 18, 2024
3cc839c
update doc
ben-schwen Mar 18, 2024
e27a6f3
remove unnecessary assignment
ben-schwen Mar 18, 2024
276cdeb
Merge branch 'master' into frev
ben-schwen Mar 18, 2024
a7de0f8
change to frev/setrev
ben-schwen Mar 19, 2024
300ea93
add symbol for setrev
ben-schwen Mar 19, 2024
17319c6
update docs
ben-schwen Mar 19, 2024
b4fe534
update NEWS
ben-schwen Mar 20, 2024
832324c
add details about attributes
ben-schwen Mar 20, 2024
7d6aea9
Merge branch 'master' into frev
ben-schwen Mar 20, 2024
ccd9ee6
drop attributes except names, class and levels
ben-schwen May 18, 2024
6b6da26
Merge branch 'master' into frev
ben-schwen May 18, 2024
b2cde13
update docs
ben-schwen May 18, 2024
cabddd2
Merge branch 'master' into frev
ben-schwen May 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,4 @@ S3method(format_list_item, default)

export(fdroplevels)
S3method(droplevels, data.table)
export(frev)
18 changes: 18 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -293,6 +293,24 @@

41. `tables()` is faster by default by excluding the size of character strings in R's global cache (which may be shared) and excluding the size of list column items (which also may be shared). `mb=` now accepts any function which accepts a `data.table` and returns a higher and better estimate of its size in bytes, albeit more slowly; e.g. `mb = utils::object.size`.

42. `base::rev` gains a fast method `frev(x, copy)` for atomic vectors/list, [#5885](https://github.com/Rdatatable/data.table/issues/5885). Thanks to Benjamin Schwendinger for suggesting and implementing.

```R
x = sample(2e8)
microbenchmark::microbenchmark(
base = rev(x),
frev_copy = frev(x, copy=TRUE),
frev_inPlace = frev(x, copy=FALSE),
times = 10L,
unit = "s"
)
# Unit: seconds
# expr min lq mean median uq max neval cld
# base 1.376 1.397 1.864 1.544 1.917 4.274 10 a
# frev_copy 0.529 0.591 0.769 0.659 0.727 1.351 10 b
# frev_inPlace 0.064 0.065 0.066 0.066 0.067 0.070 10 c
```

## BUG FIXES

1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
Expand Down
1 change: 0 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -165,4 +165,3 @@ rss = function() { #5515 #5517
round(ans / 1024, 1L) # return MB
# nocov end
}

2 changes: 2 additions & 0 deletions R/wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ isRealReallyInt = function(x) .Call(CisRealReallyIntR, x)
isReallyReal = function(x) .Call(CisReallyReal, x)

coerceAs = function(x, as, copy=TRUE) .Call(CcoerceAs, x, as, copy)

frev = function(x, copy=TRUE) .Call(Cfrev, x, copy)
47 changes: 47 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -18260,3 +18260,50 @@ test(2243.54, dt[, .I[j], x]$V1, c(1L, 3L), output="GForce TRUE")
test(2243.55, dt[, .I[i], x]$V1, 1:4, output="GForce FALSE")
test(2243.56, dt[, .I[1:2], x]$V1, 1:4, output="GForce FALSE")
options(old)

# 5885 implement frev
d = c(NA, NaN, Inf, -Inf)
test(2244.01, frev(c(0L, NA), copy=TRUE), c(NA, 0L))
test(2244.02, frev(1:3, copy=TRUE), 3:1)
test(2244.03, frev(d, copy=TRUE), c(-Inf, Inf, NaN, NA))
test(2244.04, frev(c(NA, 1, 0+2i), copy=TRUE), c(0+2i, 1, NA))
test(2244.05, frev(as.raw(0:1), copy=TRUE), as.raw(1:0))
test(2244.06, frev(NULL, copy=TRUE), NULL)
test(2244.07, frev(character(5), copy=TRUE), character(5))
test(2244.08, frev(integer(0), copy=TRUE), integer(0))
test(2244.09, frev(list(1, "a"), copy=TRUE), list("a", 1))
test(2244.11, frev(c(0L, NA), copy=FALSE), c(NA, 0L))
test(2244.12, frev(1:3, copy=FALSE), 3:1)
test(2244.13, frev(d, copy=FALSE), c(-Inf, Inf, NaN, NA))
test(2244.14, frev(c(NA, 1, 0+2i), copy=FALSE), c(0+2i, 1, NA))
test(2244.15, frev(as.raw(0:1), copy=FALSE), as.raw(1:0))
test(2244.16, frev(NULL, copy=FALSE), NULL)
test(2244.17, frev(character(5), copy=FALSE), character(5))
test(2244.18, frev(integer(0), copy=FALSE), integer(0))
test(2244.19, frev(list(1, "a"), copy=FALSE), list("a", 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend adding a test for ALTREP vectors (e.g. 1:1e6) and long vectors (e.g. 1:1e10).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests needs to be light, therefore I would comment out those tests after running and confirming they work as expected.

Copy link
Member Author

@ben-schwen ben-schwen Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For making 1:1e10 work we would need to support ALTREP. Supporting ALTREP for copy=FALSE seems rather straightforward but creating new ALTREP objects seems to be a bit clunky.

  if (ALTREP(x) && R_altrep_data2(x) == R_NilValue) {
    SEXP info = R_altrep_data1(x);
    R_xlen_t LENGTH = (R_xlen_t)REAL0(info)[0];
    int FIRST = (int)REAL0(info)[1];
    int INCR = (int)REAL0(info)[2];
    REAL0(info)[1] = FIRST+(LENGTH-1)*INCR;
    REAL0(info)[2] = INCR*(-1);
    R_set_altrep_data1(x, info);
    UNPROTECT(nprotect);
    return x;
  }

Copy link
Member

@jangorecki jangorecki Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will eat a lot of memory. Matt spent quite some time making tests memory conservative. Any bigger tests should go into another script not run by default. Or be commented out, and confirmed it worked at the time or PR.

Copy link
Member

@HughParsonage HughParsonage Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altrep is easiest done at the R level (see below), but if there's currently no interface (i.e. no is_altrep(x) in data.table already) I withdraw my recommendation to include it in the test suite here.

frev <- function(x) {
  if (is_altrep(x)) {
    return(last(x):first(x))
  }
....
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HughParsonage Thats actually such a big brain move. Ty for that!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Till now we always unpacked altrep in DT.

# copy arguments
x = 1:3
test(2244.21, {frev(x, copy=TRUE); x}, 1:3)
test(2244.22, {frev(x, copy=FALSE); x}, 3:1)
# levels
f = as.factor(letters)
test(2244.31, frev(f, copy=TRUE), rev(f))
test(2244.32, frev(as.IDate(1:10), copy=TRUE), as.IDate(10:1))
test(2244.33, frev(as.IDate(1:10), copy=TRUE), as.IDate(10:1))
# names
x = c(a=1L, b=2L, c=3L)
test(2244.41, frev(x, copy=TRUE), rev(x))
test(2244.42, frev(x, copy=FALSE), x)
# attributes
x = structure(1:10, class = c("IDate", "Date"), att = 1L)
test(2244.51, attr(frev(x, copy=TRUE), "att"), 1L)
test(2244.52, attr(frev(x, copy=FALSE), "att"), 1L)
# errors
test(2244.61, frev(data.table()), error= "should not be data.frame or data.table")
test(2244.62, frev(1:2, copy=NA), error= "must be TRUE or FALSE")
test(2244.63, frev(expression(1)), error= "is not supported by frev")
if (test_bit64) {
x = as.integer64(c(1, NA, 3))
test(2244.71, frev(x, copy=TRUE), rev(x))
test(2244.72, frev(x, copy=FALSE), x)
}
32 changes: 32 additions & 0 deletions man/frev.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
\name{frev}
\alias{frev}
\alias{rev}
\title{Fast reverse}
\description{
Similar to \code{base::rev} but \emph{much faster}.
}

\usage{
frev(x, copy=TRUE)
}
\arguments{
\item{x}{ An atomic \code{vector} or \code{list}. }
\item{copy}{ logical (default is \code{TRUE}). If \code{FALSE} reversing happens in-place. }
}

\value{
\code{frev} returns the input reversed.
}

\examples{
# on vectors
x = setNames(1:26, letters)
frev(x[1:10])
# reverse in-place
frev(x, copy=FALSE)
x

# list
frev(list(1, "a"))
}
\keyword{ data }
1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@ SEXP islockedR(SEXP x);
bool need2utf8(SEXP x);
SEXP coerceUtf8IfNeeded(SEXP x);
SEXP coerceAs(SEXP x, SEXP as, SEXP copyArg);
SEXP frev(SEXP x, SEXP copyArg);

// types.c
char *end(char *start);
Expand Down
1 change: 1 addition & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ R_CallMethodDef callMethods[] = {
{"CstartsWithAny", (DL_FUNC)&startsWithAny, -1},
{"CconvertDate", (DL_FUNC)&convertDate, -1},
{"Cnotchin", (DL_FUNC)&notchin, -1},
{"Cfrev", (DL_FUNC) &frev, -1},
{NULL, NULL, 0}
};

Expand Down
99 changes: 99 additions & 0 deletions src/utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -424,3 +424,102 @@ SEXP startsWithAny(const SEXP x, const SEXP y, SEXP start) {
return ScalarLogical(false);
}

SEXP frev(SEXP x, SEXP copyArg) {
if (INHERITS(x, char_dataframe) || INHERITS(x, char_datatable))
error(_("'x' should not be data.frame or data.table."));
if (!IS_TRUE_OR_FALSE(copyArg))
error(_("%s must be TRUE or FALSE."), "copy");
int n = LENGTH(x);
bool copy = LOGICAL(copyArg)[0];
int nprotect = 0;
if (copy) {
x = PROTECT(duplicate(x));
nprotect++;
copy = false;
}
if (!copy) {
SEXP names = getAttrib(x, R_NamesSymbol);
if (n==0) {
UNPROTECT(nprotect);
return x;
}
switch (TYPEOF(x)) {
case LGLSXP: case INTSXP: {
int *restrict xd = INTEGER(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const int tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case REALSXP: if (INHERITS(x, char_integer64)) {
int64_t *xd = (int64_t *)REAL(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const int64_t tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} else {
double *xd = REAL(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const double tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case STRSXP: {
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const SEXP tmp = STRING_ELT(x, i);
SET_STRING_ELT(x, i, STRING_ELT(x, k));
SET_STRING_ELT(x, k, tmp);
}
} break;
case VECSXP: {
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const SEXP tmp = VECTOR_ELT(x, i);
SET_VECTOR_ELT(x, i, VECTOR_ELT(x, k));
SET_VECTOR_ELT(x, k, tmp);
}
} break;
case CPLXSXP: {
Rcomplex *xd = COMPLEX(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const Rcomplex tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
case RAWSXP: {
Rbyte *xd = RAW(x);
#pragma omp parallel for num_threads(getDTthreads(n, true))
for (uint64_t i=0; i<n/2; ++i) {
const int k = n-1-i;
const Rbyte tmp = xd[i];
xd[i] = xd[k];
xd[k] = tmp;
}
} break;
default:
error(_("Type '%s' is not supported by frev"), type2char(TYPEOF(x)));
}
if (!isNull(names)) {
frev(names, ScalarLogical(FALSE));
}
UNPROTECT(nprotect);
return x;
} else {
error(_("Internal error: Please report to issue tracker.")); // # nocov
}
}
Loading