Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformation depends on other targets #1199

Closed
bart1 opened this issue Mar 2, 2020 · 3 comments
Closed

Transformation depends on other targets #1199

bart1 opened this issue Mar 2, 2020 · 3 comments

Comments

@bart1
Copy link

bart1 commented Mar 2, 2020

Please note this is code derived from a much bigger plan I tried to minimize it as much as possible (that is why it looks a bit clunky). I create the plan as shown below. Everything works fine as long as the target separate is not included.
But as soon as the target separate is included (second transformation below) the transformation fails for the column radar. The problem target gets the same radar attribute (see "the problem" comment). I expected this to be the same as before. I would like to bring these later together hence the replication of names from radarSeasons.

require(drake)
#> Loading required package: drake

radars <- c('aa', 'NL')
seasons <- c('a', 'b')
months <- 1:2
radarSeasons <- expand.grid(radar = radars, season = seasons)
radarSeasons$radarSeason <-
  apply(radarSeasons, 1, paste0, collapse = '_')


plan_ut <- drake_plan(
  transform=F,
  data = target(
    get_data(radar, month),
    transform = cross(radar = !!radars  , month = !!months)
  ),
  toCross = target({
    list(data)
  },
  transform = combine(data,              .by =                      radar)),
  problem =
    target(list(toCross, season),
           transform = cross(toCross, season = !!seasons)),
  separate = target(
    list(radar, season),
    transform = map(.data = !!radarSeasons)
  )
)

plan_ut
#> # A tibble: 4 x 3
#>   target   command                transform                                
#>   <chr>    <expr>                 <expr>                                   
#> 1 data     get_data(radar, month) cross(radar = !!radars, month = !!months)
#> 2 toCross  {     list(data) }     combine(data, .by = radar)               
#> 3 problem  list(toCross, season)  cross(toCross, season = !!seasons)       
#> 4 separate list(radar, season)    map(.data = !!radarSeasons)

plan <- transform_plan(plan_ut[1:3,], trace = T)
plan[s <- grepl("problem" , plan$target), ]$radar
#> [1] "\"aa\"" "\"aa\"" "\"NL\"" "\"NL\""
plan[s, ]$target
#> [1] "problem_a_toCross_aa" "problem_b_toCross_aa" "problem_a_toCross_NL"
#> [4] "problem_b_toCross_NL"
vis_drake_graph(plan, make_imports = F)

plan <- transform_plan(plan_ut, trace = T)
plan[s <- grepl("problem" , plan$target), ]$radar # the problem
#> [1] "\"aa\"" "\"aa\"" "\"aa\"" "\"aa\""
plan[s, ]$target
#> [1] "problem_a_toCross_aa" "problem_b_toCross_aa" "problem_a_toCross_NL"
#> [4] "problem_b_toCross_NL"
vis_drake_graph(plan, make_imports = F)

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tidyselect_1.0.0 drake_7.10.0    
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.3        pillar_1.4.3      compiler_3.6.2    highr_0.8        
#>  [5] tools_3.6.2       digest_0.6.25     jsonlite_1.6.1    lubridate_1.7.4  
#>  [9] evaluate_0.14     tibble_2.1.3      pkgconfig_2.0.3   rlang_0.4.4      
#> [13] igraph_1.2.4.2    cli_2.0.1         filelock_1.0.2    yaml_2.2.1       
#> [17] parallel_3.6.2    xfun_0.12         storr_1.2.1       stringr_1.4.0    
#> [21] knitr_1.28        vctrs_0.2.3       htmlwidgets_1.5.1 webshot_0.5.2    
#> [25] glue_1.3.1        R6_2.4.1          processx_3.4.2    fansi_0.4.1      
#> [29] base64url_1.4     rmarkdown_2.1     txtq_0.2.0        purrr_0.3.3      
#> [33] callr_3.4.2       magrittr_1.5      ps_1.3.2          backports_1.1.5  
#> [37] htmltools_0.4.0   assertthat_0.2.1  utf8_1.1.4        stringi_1.4.6    
#> [41] visNetwork_2.0.9  crayon_1.3.4

Created on 2020-03-02 by the reprex package (v0.3.0)

@bart1 bart1 added the type: bug label Mar 2, 2020
@wlandau wlandau changed the title Transformation depents on other targets Transformation depends on other targets Mar 2, 2020
@wlandau
Copy link
Member

wlandau commented Mar 2, 2020

Static branching has unavoidable limitations, and I am trying to document them in https://books.ropensci.org/drake/static.html#limitations-of-grouping-variables. You are running into the issue because you are using the radar grouping variable for separate, which is not directly downstream of problem (or any of the other grouping variables that uses it).

library(drake)

plan <- drake_plan(
  data = target(
    get_data(radar, month),
    transform = cross(radar = !!radars, month = !!months)
  ),
  to_cross = target(
    list(data),
    transform = combine(data, .by = radar)
  ),
  problem = target(
    list(to_cross, season),
    transform = cross(to_cross, season = !!seasons)
  ),
  separate = target(
    list(radar, season),
    transform = map(radar = !!radar_seasons$radar, season = !!radar_seasons$season)
  ),
  transform = FALSE
)

plot(plan)

Created on 2020-03-02 by the reprex package (v0.3.0)

To fix the issue, define new grouping variables for separate. Below, I use radar2 and season2.

library(drake)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
radars <- c("radar1", "radar2")
seasons <- c("season1", "season2")
months <- c(1, 2)
radar_seasons <- expand.grid(radar2 = radars, season2 = seasons, stringsAsFactors = FALSE)

plan <- drake_plan(
  data = target(
    get_data(radar, month),
    transform = cross(radar = !!radars, month = !!months)
  ),
  to_cross = target(
    list(data),
    transform = combine(data, .by = radar)
  ),
  problem = target(
    list(to_cross, season),
    transform = cross(to_cross, season = !!seasons)
  ),
  separate = target(
    list(radar2, season2),
    transform = map(.data = !!radar_seasons)
  ),
  trace = TRUE
)

select(plan, target, radar, radar2)
#> # A tibble: 14 x 3
#>    target                          radar        radar2      
#>    <chr>                           <chr>        <chr>       
#>  1 data_radar1_1                   "\"radar1\""  <NA>       
#>  2 data_radar2_1                   "\"radar2\""  <NA>       
#>  3 data_radar1_2                   "\"radar1\""  <NA>       
#>  4 data_radar2_2                   "\"radar2\""  <NA>       
#>  5 to_cross_radar1                 "\"radar1\""  <NA>       
#>  6 to_cross_radar2                 "\"radar2\""  <NA>       
#>  7 problem_season1_to_cross_radar1 "\"radar1\""  <NA>       
#>  8 problem_season2_to_cross_radar1 "\"radar1\""  <NA>       
#>  9 problem_season1_to_cross_radar2 "\"radar2\""  <NA>       
#> 10 problem_season2_to_cross_radar2 "\"radar2\""  <NA>       
#> 11 separate_radar1_season1          <NA>        "\"radar1\""
#> 12 separate_radar2_season1          <NA>        "\"radar2\""
#> 13 separate_radar1_season2          <NA>        "\"radar1\""
#> 14 separate_radar2_season2          <NA>        "\"radar2\""

Created on 2020-03-02 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Mar 3, 2020

Reopening. The solution to this is actually pretty simple on reflection. We just need to restrict static transforms so they only use the upstream part of the plan. Then drake_plan() should not be confused as often. Patch forthcoming.

@wlandau
Copy link
Member

wlandau commented Mar 3, 2020

But it is still risky to define the same grouping variables twice. After the patch, you will be able do it in two different disconnected parts of the plan, but not in plans like the one from https://books.ropensci.org/drake/static.html#limitations-of-grouping-variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants