Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed improvement for roll_median #44

Closed
roaldarbol opened this issue Sep 19, 2024 · 7 comments
Closed

Speed improvement for roll_median #44

roaldarbol opened this issue Sep 19, 2024 · 7 comments

Comments

@roaldarbol
Copy link

roaldarbol commented Sep 19, 2024

Hi @jasonjfoster!
Thanks for this amazing package first of all! I'm doing a bunch of benchmarks (I'll update once I've published them as a blog post), and roll is right up there with data.table for rolling mean and sum, and the best for min and max.

So I was surprised to see that it actually is rather slow for the rolling median - slower even than zoo for larger window widths. I've also tested the RollingWindow package, which is the fastest (an order of magnitude faster than roll), and the only one that seems not to increase time with larger window widths, so maybe it's possible to get some inspiration from their implementation to improve the speed? The package has been abandoned seemingly, and can only be installed from Github. I'm aware that it relies on roll_quantile which doesn't have an online implementation, which explains why it's slower, but it still surprised me.

I really like your philosophy of not making rolling functions for a ton of different things, but focusing on ensuring the ones you make are fast, which is why I thought it might be worthwhile exploring a re-implementation of roll_median as it's such a common function (also used in a lot of smoothing, e.g. why it's in robfilter).

Here's the reprex of my test:

library(tibble)
library(dplyr)
library(ggplot2)
library(tinytable)
library(tidyr)
library(microbenchmark)

library(zoo)
library(RollingWindow)
library(RcppRoll)
library(roll)
library(robfilter)

df <- tibble(x = rnorm(1000000))
windows <- c(5, 11, 15, 21, 31, 41, 51, 61, 71, 81, 91, 101, 201, 301)

df_median <- tibble()
for (n in windows){
  n_half <- floor(n/2)
  res <- microbenchmark(
    "RollingWindow::RollingMedian" = RollingWindow::RollingMedian(df$x, n),
    "zoo::rollmedian" = zoo::rollmedian(df$x, k = n, fill = NA),
    "roll::roll_median" = roll::roll_median(df$x, width = n),
    "RcppRoll::roll_median" = RcppRoll::roll_median(df$x, n = n, fill = NA),
    "robfilter::med.filter" = robfilter::med.filter(df$x, n, online = TRUE),
    times = 5) |> 
    mutate(window = n)
  
  df_median <- dplyr::bind_rows(df_median, res)
}

# Plot it
df_median_summarised <- df_median |> 
  group_by(expr, window) |>
  summarise(time = median(time) / 1000000)
#> `summarise()` has grouped output by 'expr'. You can override using the
#> `.groups` argument.
df_median |> 
  mutate(time = time / 1000000) |> 
  ggplot(aes(window, time, colour =  expr)) +
  geom_jitter(alpha = 0.1) +
  geom_line(data = df_median_summarised)

df_median_summarised |> 
  pivot_wider(id_cols = window,
              names_from = expr,
              values_from = time) |> 
  tt()
window RollingWindow::RollingMedian zoo::rollmedian roll::roll_median RcppRoll::roll_median robfilter::med.filter
5 42.61039 371.7149 21.73139 90.1882 1219.056
11 61.25106 350.1357 52.76195 228.6518 1213.023
15 56.37555 382.3029 68.21748 308.9450 1444.169
21 87.25495 435.8322 155.37194 470.8706 1526.992
31 58.60071 369.7318 290.60045 768.0161 1574.961
41 72.95761 350.7423 391.01673 1237.7658 2014.765
51 72.24422 354.0627 509.26535 1567.5037 2518.410
61 83.62925 451.1759 537.93466 1892.1354 2167.003
71 74.74441 403.2428 710.35575 2200.2911 2344.463
81 93.68993 472.4890 997.73558 3088.5824 2430.890
91 65.75451 459.3611 959.18494 3328.7882 2810.957
101 81.16433 411.2182 1124.10512 4231.1901 3041.096
201 78.60651 448.1946 2212.82532 7476.5215 3885.436
301 83.75928 451.0300 3043.77916 12806.6676 5035.164

Created on 2024-09-19 with reprex v2.1.1

@roaldarbol

This comment was marked as outdated.

@jasonjfoster
Copy link
Owner

Thanks for the feedback. I recently implemented efficient algorithms for the roll_median and roll_quantile functions in the development version 1.1.8 on GitHub:

# install.packages("devtools")
devtools::install_github("jasonjfoster/roll")

Also available on r-universe:

install.packages("roll" repos = "https://jasonjfoster.r-universe.dev")

Could you rerun the benchmarks with version 1.1.8?

@roaldarbol
Copy link
Author

roaldarbol commented Sep 19, 2024

Great to hear! I just re-ran the tests, and it's definitely a lot faster for the larger windows (though slower for the smaller windows). It's still about 8 times slower than RollingWindow::RollingMedian unfortunately, and generally a bit slower than zoo::rollmedian.

library(tibble)
library(dplyr)
library(ggplot2)
library(tinytable)
library(tidyr)
library(microbenchmark)

library(zoo)
library(RollingWindow)
library(RcppRoll)
library(roll)
library(robfilter)
packageVersion("roll")
#> [1] '1.1.8'

df <- tibble(x = rnorm(1000000))
windows <- c(5, 11, 15, 21, 31, 41, 51, 61, 71, 81, 91, 101, 201, 301)

df_median <- tibble()
for (n in windows){
  n_half <- floor(n/2)
  res <- microbenchmark(
    "RollingWindow::RollingMedian" = RollingWindow::RollingMedian(df$x, n),
    "zoo::rollmedian" = zoo::rollmedian(df$x, k = n, fill = NA),
    "roll::roll_median" = roll::roll_median(df$x, width = n),
    "RcppRoll::roll_median" = RcppRoll::roll_median(df$x, n = n, fill = NA),
    "robfilter::med.filter" = robfilter::med.filter(df$x, n, online = TRUE),
    times = 5) |> 
    mutate(window = n)
  
  df_median <- dplyr::bind_rows(df_median, res)
}

# Plot it
df_median_summarised <- df_median |> 
  group_by(expr, window) |>
  summarise(time = median(time) / 1000000)
#> `summarise()` has grouped output by 'expr'. You can override using the
#> `.groups` argument.
df_median |> 
  mutate(time = time / 1000000) |> 
  ggplot(aes(window, time, colour =  expr)) +
  geom_jitter(alpha = 0.1) +
  geom_line(data = df_median_summarised)

df_median_summarised |> 
  pivot_wider(id_cols = window,
              names_from = expr,
              values_from = time) |> 
  tt()
window RollingWindow::RollingMedian zoo::rollmedian roll::roll_median RcppRoll::roll_median robfilter::med.filter
5 46.69491 399.5131 299.6766 94.96266 1218.382
11 56.33853 380.3425 352.0274 231.49082 1283.153
15 62.86988 367.6014 375.6445 314.44852 1317.276
21 53.34275 344.6529 405.4176 457.00026 1501.322
31 62.57914 345.8289 453.3822 763.11941 1605.367
41 54.52699 402.9282 468.4996 1050.98745 1715.083
51 74.19872 371.9451 493.9300 1398.44475 1829.274
61 55.09602 383.8024 517.2597 1806.74099 1924.188
71 60.55978 324.0459 521.4181 2150.97881 2107.127
81 62.74747 443.4228 516.2781 2520.77028 2229.549
91 60.94327 389.7431 514.7131 2868.48722 2274.508
101 71.25430 366.4016 522.1390 3259.35205 2456.508
201 70.54210 408.8964 569.4769 7386.22923 3774.311
301 74.42444 431.5904 585.9865 11901.81444 4849.190
sessionInfo()
#> R version 4.4.0 (2024-04-24)
#> Platform: x86_64-apple-darwin20
#> Running under: macOS 15.0
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] robfilter_4.1.5      lattice_0.22-6       MASS_7.3-60.2       
#>  [4] robustbase_0.99-4    roll_1.1.8           RcppRoll_0.3.1      
#>  [7] RollingWindow_0.2    zoo_1.8-12           microbenchmark_1.5.0
#> [10] tidyr_1.3.1          tinytable_0.4.0      ggplot2_3.5.1       
#> [13] dplyr_1.1.4          tibble_3.2.1        
#> 
#> loaded via a namespace (and not attached):
#>  [1] gtable_0.3.5       compiler_4.4.0     tidyselect_1.2.1   reprex_2.1.1      
#>  [5] Rcpp_1.0.13        scales_1.3.0       yaml_2.3.10        fastmap_1.2.0     
#>  [9] R6_2.5.1           labeling_0.4.3     generics_0.1.3     knitr_1.48        
#> [13] munsell_0.5.1      pillar_1.9.0       rlang_1.1.4        utf8_1.2.4        
#> [17] xfun_0.47          fs_1.6.4           RcppParallel_5.1.9 cli_3.6.3         
#> [21] withr_3.0.1        magrittr_2.0.3     digest_0.6.37      grid_4.4.0        
#> [25] rstudioapi_0.16.0  lifecycle_1.0.4    DEoptimR_1.1-3     vctrs_0.6.5       
#> [29] evaluate_0.24.0    glue_1.7.0         farver_2.1.2       fansi_1.0.6       
#> [33] colorspace_2.1-1   rmarkdown_2.28     purrr_1.0.2        tools_4.4.0       
#> [37] pkgconfig_2.0.3    htmltools_0.5.8.1

Created on 2024-09-19 with reprex v2.1.1

@jasonjfoster
Copy link
Owner

Thanks for sharing the results. The RollingWindow::RollingMedian function uses a max-median-min heap structure in C (see https://gist.github.com/ashelly/5665911) and the zoo::rollmedian version (note: default is align = center but should be align = right or use zoo::rollmedianr for comparison purposes) is an interface to stats::runmed that uses efficient median implementations by Turlach and Stuetzle. Both are high-performance algorithms but specialized for median calculations and don’t support weights, missing values, or other quantiles.

In contrast, the roll::roll_quantile function is more flexible and uses two balanced multisets based on the quantile’s probability. It handles weights, missing values, and other quantiles within the window. There is some overhead due to parallel support via RcppParallel too. It’s great to see the recent optimizations have eliminated the scaling issues with window size and the roll package now performs on par with stats in the benchmark; however, after speaking with the data.table team, learned that it’s good practice to create benchmarks that take at least one second to run (see https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-benchmarking.html#avoid-microbenchmark-times-100; note: microbenchmark's time value is measured in nanoseconds so should divide by 1e+9 for seconds). Might be challenging to break the one-second barrier with these highly efficient algorithms!

@roaldarbol
Copy link
Author

roaldarbol commented Sep 25, 2024

Sorry about the delay in getting back to you. 😊
Completely agree that you implementation is much more flexible. And I also agree that the most important thing is fixed, namely the scaling issues, so I'm happy to close the issue; just one more question: You mention the alignment, is it technically possible to implement center and left aligned rolls with your implementation? The reason I'm asking is that a common use of rolling medians is as a smoothing filter in time series. There it's usually expected that the filter is centered to avoid a time shift. For left, it could just be running the right alignment on the reversed vector, and returning the re-reversed vector. For center it could maybe be done simply by shifting the returned vector by width/2?

Regarding benchmarking, I don't quite agree. The benchmarks I ran was on 1.000.000 observations, not a small number, and most users likely use it on fewer, but on multiple groups and data sets. Sure, I could run the benchmark on 1.000.000.000 observations, but I think the smaller number here is more realistic (the page you refer to also notes that "Main focus of benchmark should be on real use case scenarios"). I'd probably argue that both types of benchmarks offer insight of different types. And there's absolutely no need to run a ton of iterations if the difference is clear (the signal-to-noise ratio is good), but it certainly doesn't hurt either.

As an aside, it would be great to have an example in the README explaining the different ways roll handles NAs as it's unclear from the there alone, I personally had to dig through issues and PRs to understand it.

By the way, I hope this doesn't come across wrong - it's only because I find roll to be an amazing project that I'm thrilled to have come across!

@jasonjfoster
Copy link
Owner

Agree on benchmarks and need to consider the practicality of real use cases. This further illustrates the absolute speed improvements of the algorithms rather than just relative gains.

For the align request, please refer to the existing discussion and suggestions in this thread: #38.

Closing this issue now as the benchmarks have demonstrated speed improvements. Feel free to share the blog post whenever it’s ready.

@roaldarbol
Copy link
Author

Have written the first version of the blog post. I'll update it along the way. E.g. I currently don't test scaling as I did above (I'll add that next week), and I'll also try to add a table that covers the features of the different packages. Feel free to comment or open an issue if there's something you feel should be added or taken into consideration that currently isn't. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@jasonjfoster @roaldarbol and others