Skip to content

Commit

Permalink
Closes #899. rbindlist combines factor levels properly.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed Oct 19, 2014
1 parent ee9fb8c commit 67ace19
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@

14. `subset` handles extracting duplicate columns in consistency with data.table's rule - if a column name is duplicated, then accessing that column using column number should return that column, whereas accessing by column name (due to ambiguity) will always extract the first column. Closes [#891](https://github.com/Rdatatable/data.table/issues/891). Thanks to @jjzz.

15. `rbindlist` handles combining levels of data.tables with both ordered and unordered factor columns properly. Closes [#899](https://github.com/Rdatatable/data.table/issues/899). Thanks to @ChristK.

#### NOTES

1. Clearer explanation of what `duplicated()` does (borrowed from base). Thanks to @matthieugomez for pointing out. Closes [#872](https://github.com/Rdatatable/data.table/issues/872).
Expand Down
7 changes: 7 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -5408,6 +5408,13 @@ test(1391.2, subset(DT, select=c("V2", "V1")), DT[, c("V2", "V1"), with=FALSE])
DT = data.table(x=sample(c(1:2, NA), 30, TRUE), y=sample(c(1:5, NA, NaN), 30, TRUE))
test(1392, na.omit(DT), DT[!is.na(x) & !is.na(y)])

# Fix for #899. Mix of ordered and normal factors where normal factors in more than 1 data.table has identical levels.
DT1 = data.table(A = factor(INT(7,8,7,8,7)), B = factor(6:10), C = 0)
DT2 = data.table(D = ordered(1:5), A = factor(INT(1:2,1:2,1L)), C = 0)
DT3 = data.table(A = factor(INT(7:8)), C = 0)
ans = data.table(A=factor(INT(7,8,7,8,7,1,2,1,2,1,7,8), levels=c("7", "8", "1", "2")), B=factor(INT(6:10, rep(NA,7))), C=0, D=ordered(INT(rep(NA,5), 1:5, rep(NA,2))))
test(1393, rbindlist(list(DT1, DT2, DT3), fill = TRUE), ans)

##########################


Expand Down
3 changes: 3 additions & 0 deletions src/rbindlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,9 @@ static SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean *
while (h[idx] != NULL) {
pl = h[idx];
if (data.equal(VECTOR_ELT(factorLevels, pl->i), pl->j, elem, j)) {
// Fixes #899. "rest" can have identical levels in
// more than 1 data.table.
if (!(pl->i == i && pl->j == j)) break;
record = TRUE;
do {
// if this element was in an ordered list, it's been recorded already
Expand Down

0 comments on commit 67ace19

Please sign in to comment.