Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revdep failure from dcast error if fun.aggregate returns length>1 #6032

Closed
tdhock opened this issue Mar 29, 2024 · 10 comments · Fixed by #6329
Closed

revdep failure from dcast error if fun.aggregate returns length>1 #6032

tdhock opened this issue Mar 29, 2024 · 10 comments · Fixed by #6329
Assignees
Labels
revdep Reverse dependencies
Milestone

Comments

@tdhock
Copy link
Member

tdhock commented Mar 29, 2024

after merging #5549 we get the following two new revdep check errors, not sure why, still need to investigate.

* checking examples ... ERROR
Running examples in 'mschart-Ex.R' failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: chart_data_smooth
> ### Title: Smooth series
> ### Aliases: chart_data_smooth
> 
> ### ** Examples
> 
> linec <- ms_linechart(data = iris, x = "Sepal.Length",
+   y = "Sepal.Width", group = "Species")
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.
Execution halted

and


* checking examples ... ERROR
Running examples in 'maditr-Ex.R' failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: to_long
> ### Title: Convert data to long or to wide form
> ### Aliases: to_long to_wide
> 
> ### ** Examples
> 
> data(iris)
> 
> # 'to_long'
> 
> long_iris = iris %>%
+     to_long(keep = Species)
> 
> long_iris
       Species     variable value
        <fctr>       <fctr> <num>
  1:    setosa Sepal.Length   5.1
  2:    setosa Sepal.Length   4.9
  3:    setosa Sepal.Length   4.7
  4:    setosa Sepal.Length   4.6
  5:    setosa Sepal.Length   5.0
 ---                             
596: virginica  Petal.Width   2.3
597: virginica  Petal.Width   1.9
598: virginica  Petal.Width   2.0
599: virginica  Petal.Width   2.3
600: virginica  Petal.Width   1.8
> 
> iris_with_stat = long_iris %>%
+     take(mean = mean(value),
+          sd = sd(value),
+          n = .N*1.0,
+          by = .(Species, variable)
+     ) %>%
+     to_long(columns = c(mean, sd, n), names_in = "stat")
> 
> # 'to_wide' - table with multiple stats
> iris_with_stat %>%
+     to_wide()
Key: <Species, stat>
      Species   stat Sepal.Length Sepal.Width Petal.Length Petal.Width
       <fctr> <fctr>        <num>       <num>        <num>       <num>
1:     setosa   mean    5.0060000   3.4280000    1.4620000   0.2460000
2:     setosa     sd    0.3524897   0.3790644    0.1736640   0.1053856
3:     setosa      n   50.0000000  50.0000000   50.0000000  50.0000000
4: versicolor   mean    5.9360000   2.7700000    4.2600000   1.3260000
5: versicolor     sd    0.5161711   0.3137983    0.4699110   0.1977527
6: versicolor      n   50.0000000  50.0000000   50.0000000  50.0000000
7:  virginica   mean    6.5880000   2.9740000    5.5520000   2.0260000
8:  virginica     sd    0.6358796   0.3224966    0.5518947   0.2746501
9:  virginica      n   50.0000000  50.0000000   50.0000000  50.0000000
> 
> 
> iris_with_stat %>%
+     to_wide(names_in = c(variable, stat))
Key: <Species>
      Species Sepal.Length_mean Sepal.Length_sd Sepal.Length_n Sepal.Width_mean
       <fctr>             <num>           <num>          <num>            <num>
1:     setosa             5.006       0.3524897             50            3.428
2: versicolor             5.936       0.5161711             50            2.770
3:  virginica             6.588       0.6358796             50            2.974
   Sepal.Width_sd Sepal.Width_n Petal.Length_mean Petal.Length_sd
            <num>         <num>             <num>           <num>
1:      0.3790644            50             1.462       0.1736640
2:      0.3137983            50             4.260       0.4699110
3:      0.3224966            50             5.552       0.5518947
   Petal.Length_n Petal.Width_mean Petal.Width_sd Petal.Width_n
            <num>            <num>          <num>         <num>
1:             50            0.246      0.1053856            50
2:             50            1.326      0.1977527            50
3:             50            2.026      0.2746501            50
> 
> iris_with_stat %>%
+     to_wide(names_in = c(variable, Species))
Key: <stat>
     stat Sepal.Length_setosa Sepal.Length_versicolor Sepal.Length_virginica
   <fctr>               <num>                   <num>                  <num>
1:   mean           5.0060000               5.9360000              6.5880000
2:     sd           0.3524897               0.5161711              0.6358796
3:      n          50.0000000              50.0000000             50.0000000
   Sepal.Width_setosa Sepal.Width_versicolor Sepal.Width_virginica
                <num>                  <num>                 <num>
1:          3.4280000              2.7700000             2.9740000
2:          0.3790644              0.3137983             0.3224966
3:         50.0000000             50.0000000            50.0000000
   Petal.Length_setosa Petal.Length_versicolor Petal.Length_virginica
                 <num>                   <num>                  <num>
1:            1.462000                4.260000              5.5520000
2:            0.173664                0.469911              0.5518947
3:           50.000000               50.000000             50.0000000
   Petal.Width_setosa Petal.Width_versicolor Petal.Width_virginica
                <num>                  <num>                 <num>
1:          0.2460000              1.3260000             2.0260000
2:          0.1053856              0.1977527             0.2746501
3:         50.0000000             50.0000000            50.0000000
> 
> # 'to_wide' - aggregation function
> long_iris %>%
+     to_wide(fun = list(Mean = mean, SD = sd, N = length))
Key: <Species>
      Species value_Mean_Sepal.Length value_Mean_Sepal.Width
       <fctr>                   <num>                  <num>
1:     setosa                   5.006                  3.428
2: versicolor                   5.936                  2.770
3:  virginica                   6.588                  2.974
   value_Mean_Petal.Length value_Mean_Petal.Width value_SD_Sepal.Length
                     <num>                  <num>                 <num>
1:                   1.462                  0.246             0.3524897
2:                   4.260                  1.326             0.5161711
3:                   5.552                  2.026             0.6358796
   value_SD_Sepal.Width value_SD_Petal.Length value_SD_Petal.Width
                  <num>                 <num>                <num>
1:            0.3790644             0.1736640            0.1053856
2:            0.3137983             0.4699110            0.1977527
3:            0.3224966             0.5518947            0.2746501
   value_N_Sepal.Length value_N_Sepal.Width value_N_Petal.Length
                  <int>               <int>                <int>
1:                   50                  50                   50
2:                   50                  50                   50
3:                   50                  50                   50
   value_N_Petal.Width
                 <int>
1:                  50
2:                  50
3:                  50
> 
> # multiple variables
> iris %>%
+     to_long(list(Sepal = cols("^Sepal"), Petal = cols("^Petal"))) %>%
+     let(
+         variable = factor(variable, levels = 1:2, labels = c("Length", "Width"))
+     ) %>%
+     to_wide(values_in = c(Sepal, Petal))
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.
Execution halted
@tdhock tdhock self-assigned this Mar 29, 2024
@tdhock tdhock added the revdep Reverse dependencies label Mar 29, 2024
@tdhock
Copy link
Member Author

tdhock commented Mar 29, 2024

in #5549 the condition for this error message was changed from
if (nrow(fill.default) != 1L) to
if (any(lengths(list.of.columns) != 1L))

@tdhock
Copy link
Member Author

tdhock commented Mar 29, 2024

looks like the issue may be with the revdeps. At least in mcchart, fun.aggregate=identity, which is indeed invalid, since identity usually returns vector with length>1, whereas fun.aggregate is supposed to return vector of length=1, as documented in the error message.

Browse[2]> fun.call
list(Sepal.Width = (function (x) 
{
    x
})(Sepal.Width))
Browse[2]> list.of.columns
$Sepal.Width
[1] 3.5 3.5 3.8 3.7 3.3 3.4 3.8 3.8

@tdhock
Copy link
Member Author

tdhock commented Mar 29, 2024

same with maditr, fun.aggregate=identity is being used.

Browse[2]> dat
       Species variable Sepal Petal
        <fctr>   <fctr> <num> <num>
  1:    setosa   Length   5.1   1.4
  2:    setosa   Length   4.9   1.4
  3:    setosa   Length   4.7   1.3
  4:    setosa   Length   4.6   1.5
  5:    setosa   Length   5.0   1.4
 ---                               
296: virginica    Width   3.0   2.3
297: virginica    Width   2.5   1.9
298: virginica    Width   3.0   2.0
299: virginica    Width   3.4   2.3
300: virginica    Width   3.0   1.8
Browse[2]> varnames
[1] "Species"  "variable"
Browse[2]> list.of.columns
$Sepal
 [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7
[20] 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9
[39] 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0

$Petal
 [1] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 1.5 1.6 1.4 1.1 1.2 1.5 1.3 1.4 1.7
[20] 1.5 1.7 1.5 1.0 1.7 1.9 1.6 1.6 1.5 1.4 1.6 1.6 1.5 1.5 1.4 1.5 1.2 1.3 1.4
[39] 1.3 1.5 1.3 1.3 1.3 1.6 1.9 1.4 1.6 1.4 1.5 1.4

Browse[2]> fun.call
list(Sepal = (function (x) 
x)(Sepal), Petal = (function (x) 
x)(Petal))

@tdhock
Copy link
Member Author

tdhock commented Mar 29, 2024

this is documented in ?dcast details, so I will file issues with revdeps.

     The aggregating function should take a vector as input and return
     a single value (or a list of length one) as output. In cases where
     ‘value.var’ is a list, the function should be able to handle a
     list input and provide a single value or list of length one as
     output.

@tdhock
Copy link
Member Author

tdhock commented Jul 16, 2024

We have a new revdep WES, Published: 2024-07-12, I will file an issue.

* checking examples ... ERROR
Running examples in 'WES-Ex.R' failed
The error most likely occurred in:

> base::assign(".ptime", proc.time(), pos = "CheckExEnv")
> ### Name: apply_delta_delta_ct
> ### Title: Apply the delta delta Ct calculation to a data.frame
> ### Aliases: apply_delta_delta_ct
> 
> ### ** Examples
> 
> 
> pae <- apply_amplification_efficiency(template_WES_standard_curve)
> 
> ddct_standard <- apply_delta_delta_ct(df = template_WES_data,
+                                       target_names = c('target_1', 'target_2', 'target_3'),
+                                       reference_names = rep('target_0', 3))
Error in maybe_err(eval(fun.call)) : 
  Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.
Calls: apply_delta_delta_ct ... [.data.table -> maybe_err -> stopf -> raise_condition -> signal
Execution halted

@tdhock tdhock changed the title revdeps maditr, mschart example failures revdep failure from dcast error if fun.aggregate returns length>1 Jul 16, 2024
@MichaelChirico
Copy link
Member

@tdhock given that this broke several downstreams, can we be sure to mention this under BREAKING CHANGES? (or possibly better, adjust our code so that it doesn't break the downstreams, e.g. a branch with if (identical(fun.aggregate, identity)) ... that starts a deprecation cycle)

Here are two more CRAN downstreams that were not caught by revdeps, because downstream usage is not in a tested region of code:

https://github.com/search?q=org%3Acran%20%2Ffun%5B.%5Daggregate%5Cs*%3D%5Cs*identity%2F&type=code

And several more such downstreams on GitHub:

https://github.com/search?q=lang%3AR+%2Ffun%5B.%5Daggregate%5Cs*%3D%5Cs*identity%2F&type=code

@tdhock
Copy link
Member Author

tdhock commented Jul 30, 2024

Thanks for your investigation Michael, that is useful.
There are possibly uses of other functions than identity, which return length>1, and would be additional breakages.
Not sure if this warrants a "deprecation" cycle since it was already documented, and this "breakage" brings the implementation up to date with the docs. But I guess we could.
Which option do you think would be more work, or less work, for us and downstreams?

@MichaelChirico
Copy link
Member

There are possibly uses of other functions than identity, which return length>1, and would be additional breakages.

Definitely possible -- have we seen any though? I would be happy special-casing this fun.aggregate=identity given how common it is empirically:

if (identical(fun.aggregate, identity)) {
  warningf("This is no longer supported, here's how to do what you wanted to... This will become an error on next release")
  return(old_behavior())
}

It looks (?) like the old behavior is equivalent to fun.aggregate = function(x) x[1L], is that right?

DT=data.table(a = rep(1:3, 2), b=rep(1:2, each=3), c=1:6)
DT = rbind(DT, DT)

dcast(DT, a ~ b, fun.aggregate=identity, value.var='c', fill=0)
# Key: <a>
#        a     1     2
#    <int> <int> <int>
# 1:     1     1     4
# 2:     2     2     5
# 3:     3     3     6

so that branch could be fun.aggregate <- function(x) x[1L] instead of return(old_behavior())

@tdhock
Copy link
Member Author

tdhock commented Jul 31, 2024

Hi Michael
I have not seen any problematic functions other than identity.
However I don't think we need to special case that.
In #6329 I propose to change the error to a warning, and that should be enough to alert revdeps, while avoiding any breaking changes.
Also I believe the old/undefined behavior was to use the last value (not first),

> library(data.table)
data.table 1.15.4 using 3 threads (see ?getDTthreads).  Latest news: r-datatable.com
> dcast(data.table(y="a",x=1:2), y ~ ., identity, fill=NA)
Using 'x' as value column. Use 'value.var' to override
Key: <y>
        y     .
   <char> <int>
1:      a     2

@tdhock
Copy link
Member Author

tdhock commented Aug 3, 2024

revdep checker confirms this is no longer an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
revdep Reverse dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants