Skip to content

Commit

Permalink
Closes #686. Implemented 'rleid()' a convenience function.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed Jan 7, 2015
1 parent c54cb93 commit b8c1b01
Show file tree
Hide file tree
Showing 5 changed files with 63 additions and 0 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ export(frank)
export(frankv)
export(address)
export(.SD,.N,.I,.GRP,.BY,.EACHI)
export(rleid)

S3method("[", data.table)
S3method("[<-", data.table)
Expand Down
25 changes: 25 additions & 0 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -2345,6 +2345,31 @@ setDT <- function(x, giveNames=TRUE, keep.rownames=FALSE) {
invisible(x)
}

# FR #686
rleid <- function(x, cols=seq_along(x)) {
as_list <- function(x) {
xx = vector("list", 1L)
.Call(Csetlistelt, xx, 1L, x)
xx
}
if (is.atomic(x)) {
if (!missing(cols) && !is.null(cols))
stop("x is a single vector, non-NULL 'cols' doesn't make sense")
cols = 1L
x = as_list(x)
} else {
if (!length(cols))
stop("x is a list, 'cols' can not be 0-length")
if (is.character(cols))
cols = chmatch(cols, names(x))
cols = as.integer(cols)
}
x = .shallow(x, cols) # shallow copy even if list..
setDT(x)
ulist = uniqlist(x)
rep.int(seq_along(ulist), uniqlengths(ulist, nrow(x)))
}

gsum <- function(x, na.rm=FALSE) .Call(Cgsum, x, na.rm)
gmean <- function(x, na.rm=FALSE) .Call(Cgmean, x, na.rm)
gmin <- function(x, na.rm=FALSE) .Call(Cgmin, x, na.rm)
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@

6. `frank()` is now implemented. It's much faster than `base::rank` and does more. It accepts *vectors*, *lists* with all elements of equal lengths, *data.frames* and *data.tables*, and optionally takes a `cols` argument. In addition to implementing all the `ties.method` methods available from `base::rank`, it also implements *dense rank*. See `?frank` for more. Closes [#760](https://github.com/Rdatatable/data.table/issues/760) and [#771](https://github.com/Rdatatable/data.table/issues/771)

7. `rleid()`, a convenience function for generating a run-length type id column to be used in grouping operations is now implemented. Closes [#686](https://github.com/Rdatatable/data.table/issues/771). Check `?rleid` examples section for usage scenarios.

#### BUG FIXES

1. `if (TRUE) DT[,LHS:=RHS]` no longer prints, [#869](https://github.com/Rdatatable/data.table/issues/869). Tests added. To get this to work we've had to live with one downside: if a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` is typed at the prompt, nothing will be printed. A repeated `DT` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `print(DT)` and `DT[]` at the prompt are guaranteed to print. As before, adding an extra `[]` on the end of `:=` query is a recommended idiom to update and then print; e.g. `> DT[,foo:=3L][]`. Thanks to Jureiss for reporting.
Expand Down
5 changes: 5 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -5716,6 +5716,11 @@ test(1463.24, shift(x,1L, 0L, type="lead"), list(as.character(c(2:5, 0L))))

# add tests for date and factor?

# FR #686
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"),
c(2,2,3,1,2)), value=1:10)
test(1464, rleid(DT, "grp"), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))

##########################


Expand Down
30 changes: 30 additions & 0 deletions man/rleid.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
\name{rleid}
\alias{rleid}
\title{ Generate run-length type group id}
\description{
A convenience function for generating a \emph{run-length} type \emph{id} column to be used in grouping operations. It accepts atomic vectors, lists, data.frames or data.tables as input.
}
\usage{
rleid(x, cols=seq_along(x))
}
\arguments{
\item{x}{ A vector, list, data.frame or data.table. }
\item{cols}{ Only meaningful for lists, data.frames or data.tables. A character vector of column names (or numbers) of x. }
}
\details{
At times aggregation (or grouping) operations need to be performed where consecutive runs of identical values should belong to the same group (See \code{\link[base]{rle}}). The use for such a function has come up repeatedly on StackOverflow, see the \code{See Also} section. This function allows to generate \emph{"run-length"} groups directly.
}
\value{
An integer vector with same length as \code{NROW(x)}.
}
\examples{
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2,2,3,1,2)), value=1:10)
rleid(DT, "grp") # get run-length ids
# get sum of value over run-length groups
DT[, sum(value), by=.(grp, rleid(grp))]

}
\seealso{
\code{\link{data.table}}, \url{http://stackoverflow.com/q/21421047/559784}
}
\keyword{ data }

0 comments on commit b8c1b01

Please sign in to comment.