-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly handling the "variable" column after melting unbalanced wide data #2575
Comments
this is the same issue as #4027 and is solved by #4720, not merged into master yet but you can use it via: remotes::install_github(c("Rdatatable/data.table@fix4027", "tdhock/nc@multiple-fill"))
#> Skipping install of 'data.table' from a github remote, the SHA1 (4c5810c2) has not changed since last install.
#> Use `force = TRUE` to force installation
#> Skipping install of 'nc' from a github remote, the SHA1 (11b61f8e) has not changed since last install.
#> Use `force = TRUE` to force installation
library(data.table)
set.seed(2334)
DT <- data.table(
a_alpha = rnorm(3), a_gamma = rnorm(3),
b_beta = rnorm(3), b_gamma = rnorm(3),
id = c(1:3)
)
(DT.tall <- nc::capture_melt_multiple(DT, column="[ab]", "_", var=".*", fill=TRUE))
#> id var a b
#> 1: 1 alpha -0.1183107 NA
#> 2: 2 alpha 1.2370906 NA
#> 3: 3 alpha 0.8088209 NA
#> 4: 1 beta NA 1.682018318
#> 5: 2 beta NA -0.573611132
#> 6: 3 beta NA -0.057320032
#> 7: 1 gamma -0.7656264 -0.706428227
#> 8: 2 gamma -0.5919939 0.001899857
#> 9: 3 gamma 0.5279071 1.063851211
DT.tall[order(id, var)]
#> id var a b
#> 1: 1 alpha -0.1183107 NA
#> 2: 1 beta NA 1.682018318
#> 3: 1 gamma -0.7656264 -0.706428227
#> 4: 2 alpha 1.2370906 NA
#> 5: 2 beta NA -0.573611132
#> 6: 2 gamma -0.5919939 0.001899857
#> 7: 3 alpha 0.8088209 NA
#> 8: 3 beta NA -0.057320032
#> 9: 3 gamma 0.5279071 1.063851211 |
Let's wait for PR to merge, tests added and news item provided. |
hi again if you want to use the new data.table::melt without nc you could do remotes::install_github("Rdatatable/data.table@fix4027")
#> Skipping install of 'data.table' from a github remote, the SHA1 (4c5810c2) has not changed since last install.
#> Use `force = TRUE` to force installation
library(data.table)
set.seed(2334)
DT <- data.table(
a_alpha = rnorm(3), a_gamma = rnorm(3),
b_beta = rnorm(3), b_gamma = rnorm(3),
id = c(1:3)
)
(DT.tall <- melt(DT, measure=list(
a=c("a_alpha", NA, "a_gamma"),
b=c(NA, "b_beta", "b_gamma"))))
#> id variable a b
#> 1: 1 1 -0.1183107 NA
#> 2: 2 1 1.2370906 NA
#> 3: 3 1 0.8088209 NA
#> 4: 1 2 NA 1.682018318
#> 5: 2 2 NA -0.573611132
#> 6: 3 2 NA -0.057320032
#> 7: 1 3 -0.7656264 -0.706428227
#> 8: 2 3 -0.5919939 0.001899857
#> 9: 3 3 0.5279071 1.063851211
DT.tall[, var := c("alpha", "beta", "gamma")[variable] ]
DT.tall[order(id, var), .(id, var, a, b)]
#> id var a b
#> 1: 1 alpha -0.1183107 NA
#> 2: 1 beta NA 1.682018318
#> 3: 1 gamma -0.7656264 -0.706428227
#> 4: 2 alpha 1.2370906 NA
#> 5: 2 beta NA -0.573611132
#> 6: 2 gamma -0.5919939 0.001899857
#> 7: 3 alpha 0.8088209 NA
#> 8: 3 beta NA -0.057320032
#> 9: 3 gamma 0.5279071 1.063851211 |
pure data.table solution using #4731 remotes::install_github("Rdatatable/data.table@melt-custom-variable")
#> Skipping install of 'data.table' from a github remote, the SHA1 (c02fa9e8) has not changed since last install.
#> Use `force = TRUE` to force installation
library(data.table)
set.seed(2334)
DT <- data.table(
a_alpha = rnorm(3), a_gamma = rnorm(3),
b_beta = rnorm(3), b_gamma = rnorm(3),
id = c(1:3)
)
melt(DT, measure.vars=measure(value.name, var))
#> id var a b
#> 1: 1 alpha -0.1183107 NA
#> 2: 2 alpha 1.2370906 NA
#> 3: 3 alpha 0.8088209 NA
#> 4: 1 gamma -0.7656264 -0.706428227
#> 5: 2 gamma -0.5919939 0.001899857
#> 6: 3 gamma 0.5279071 1.063851211
#> 7: 1 beta NA 1.682018318
#> 8: 2 beta NA -0.573611132
#> 9: 3 beta NA -0.057320032 |
Thank you all very much for
The following code produces almost the expected result:
with the weird rows with "code" in the
|
Can confirm. You might need to either reopen this, or open a separate issue. |
I opened issue 4991, as it most likely is an issue of |
In reference to my comments at #2551, using
patterns
withmelt
on unbalanced wide or panel data results in an incorrect molten dataset.Here's a minimal example, where "a" values are missing "beta" times, and "b" values are missing "alpha" times:
melt
just sees that there are two sets of variables:Expected behaviour:
The text was updated successfully, but these errors were encountered: