Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check for index in setkey #3582

Merged
merged 21 commits into from
May 28, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions R/setkey.R
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,26 @@ setkeyv <- function(x, cols, verbose=getOption("datatable.verbose"), physical=TR
if (!typeof(.xi) %chin% c("integer","logical","character","double")) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported as a key column type, currently.")
}
if (!is.character(cols) || length(cols)<1L) stop("Internal error. 'cols' should be character at this point in setkey; please report.") # nocov
if (verbose) {
tt = suppressMessages(system.time(o <- forderv(x, cols, sort=TRUE, retGrp=FALSE))) # system.time does a gc, so we don't want this always on, until refcnt is on by default in R
# suppress needed for tests 644 and 645 in verbose mode
cat("forder took", tt["user.self"]+tt["sys.self"], "sec\n")

# get existing index name if any
found_index <- NULL
Copy link
Member

@mattdowle mattdowle May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer = in data.table please. I don't agree with the common and widely marketed advice to use <-. Single = is the same as C and many other languages. I sometimes hear, e.g. from Python folk, that R code using <- looks "old"/"not modern" and I see their point. I've always used =. I save <- for use when passing function arguments to do an assign and pass in one go; e.g. write(DT, file=f<-tempfile()); ... do something with f ... . I'm often swapping between C and R (which I think more people should do too since C is not as hard as some people want you to believe). When doing this, using = to mean assign and == to mean equals in two languages consistently (R and C) is nice. And by the way, the people who laugh at R for using L postfix to mean integer ... it comes from C (it's the same as C) and it makes a lot of sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for detailed insight. made the change.

if(!is.null(indices(x))){
found_index <- names(attributes(attributes(x)$index))
found_index <- gsub("^__","", found_index)
}

# forder only if index is not present
if(!identical(found_index, cols)){
Copy link
Member

@mattdowle mattdowle May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works if there is one index, as the test tests. But there can be multiple indexes (each index is a set of columns). It needs to find if any of the indexes match cols and then use that one. The test needs expanding for cases of multiple indexes and cols both existing and not existing when there are multiple (so it tests it picks up the correct one).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

if (verbose) {
tt = suppressMessages(system.time(o <- forderv(x, cols, sort=TRUE, retGrp=FALSE))) # system.time does a gc, so we don't want this always on, until refcnt is on by default in R
# suppress needed for tests 644 and 645 in verbose mode
cat("forder took", tt["user.self"]+tt["sys.self"], "sec\n")
} else {
o <- forderv(x, cols, sort=TRUE, retGrp=FALSE)
}
} else {
o <- forderv(x, cols, sort=TRUE, retGrp=FALSE)
cat("using existing index for", found_index, "\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an if (verbose) please otherwise this is always being printed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

o <- attr(attributes(x)$index, which=found_index, exact = TRUE)
}
if (!physical) {
if (is.null(attr(x,"index",exact=TRUE))) setattr(x, "index", integer())
Expand Down
6 changes: 6 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6029,6 +6029,12 @@ thisDT <- copy(DT)[, c("aaa", "b") := 2]
test(1419.58, indices(thisDT), c("a", "ab"))
test(1419.59, allIndicesValid(thisDT), TRUE)

## setkey on same col as index before
DT <- data.table(a = c(1,1,1,1,1,2,2,2,2,2),
aaa = c(1,1,2,2,2,1,1,2,2,2))
setindex(DT, a)
test(1419.60, allIndicesValid(DT), TRUE)
test(1419.61, setkey(DT, a), output="using existing index for a")

# setnames updates secondary key
DT = data.table(a=1:5,b=10:6)
Expand Down