Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible slowdown vs 3.0.2 #1983

Closed
MichaelChirico opened this issue Jun 28, 2023 · 1 comment
Closed

Investigate possible slowdown vs 3.0.2 #1983

MichaelChirico opened this issue Jun 28, 2023 · 1 comment

Comments

@MichaelChirico
Copy link
Collaborator

See report:

#1476 (comment)

@MichaelChirico MichaelChirico added this to the 3.1.0 milestone Jun 28, 2023
@MichaelChirico
Copy link
Collaborator Author

Some insights:

linter-level comparison, among the linters present on both HEAD (new) and v3.0.2 (old):

                              linter n_lints_old time_elapsed_old n_lints_new time_elapsed_new time_delta time_delta_pct
 1:            T_and_F_symbol_linter           6           26.522           6           26.745      0.223      0.8408114
 2:                     brace_linter           9           36.251           5           30.025     -6.226    -17.1746986
 3:                    commas_linter           0           32.425           0           26.688     -5.737    -17.6931380
 4:                 equals_na_linter           0           31.066           0           25.713     -5.353    -17.2310565
 5: function_left_parentheses_linter           0           27.201           0           28.361      1.160      4.2645491
 6:              infix_spaces_linter           0           28.290           0           28.093     -0.197     -0.6963591
 7:               line_length_linter        4569           34.019        4569           27.579     -6.440    -18.9305976
 8:             object_length_linter         263           55.794          67           52.366     -3.428     -6.1440298
 9:               object_name_linter        4875           48.293        4121           53.131      4.838     10.0180150
10:              object_usage_linter         153           44.369         150           44.709      0.340      0.7663008
11:                paren_body_linter           0           25.214           0           25.582      0.368      1.4595066
12:         pipe_continuation_linter           0           24.386           0           24.889      0.503      2.0626589
13:                 semicolon_linter           0           24.386           0           24.418      0.032      0.1312228
14:                       seq_linter           8           28.301           8           28.168     -0.133     -0.4699481
15:             spaces_inside_linter           0           24.890           0           24.805     -0.085     -0.3415026
16:   spaces_left_parentheses_linter           0           25.719           0           26.413      0.694      2.6983942
17:      trailing_blank_lines_linter           0           26.115           0           26.214      0.099      0.3790925
18:       trailing_whitespace_linter           0           24.472           0           24.653      0.181      0.7396208
19:              vector_logic_linter          33           25.821          33           25.626     -0.195     -0.7551993

Ran this twice, and the first time (not saved here) old was faster than new (vs here new is faster), so I think it's up to timing noise.

I'm happy to chalk up the timing difference to the difference in linters between the two versions:

  • linters run for {mlr} on v3.0.2 but not HEAD: no_tab_linter(), single_quotes_linter()
  • linters run for {mlr} on HEAD but not v3.0.2: quotes_linter(), whitespace_linter(), indentation_linter()

Of course indentation_linter() is the only real difference in these two lists.

As one last insight, I timed lint_package() for all linters, then all linters excluding indentation_linter() on HEAD:

# all linters
   user  system elapsed 
193.823   0.344 194.172
# without indentation_linter
   user  system elapsed 
166.491   0.204 166.688

In conclusion, I feel comfortable about my initial hypothesis, namely that the major timing differences on the two branches can be attributed to changes in the set of linters used, especially indentation_linter().

library(lintr)

both <- sort(c("brace_linter", "commas_linter", "equals_na_linter", "function_left_parentheses_linter", 
"infix_spaces_linter", "line_length_linter", "object_length_linter", 
"object_name_linter", "object_usage_linter", "paren_body_linter", 
"pipe_continuation_linter", "semicolon_linter", "seq_linter", 
"spaces_inside_linter", "spaces_left_parentheses_linter", "T_and_F_symbol_linter", 
"trailing_blank_lines_linter", "trailing_whitespace_linter", 
"vector_logic_linter"))
both_funs <- lapply(both, \(f) eval(call(f), envir = asNamespace("lintr")))
names(both_funs) <- both

info <- vector("list", length(both))
names(info) <- both

for (linter in both) {
  info[[linter]]$timing <- suppressMessages(system.time({
    info[[linter]]$lints <- lint_package(linters = both_funs[[linter]], parse_settings = FALSE)
  }))
}

# linter-level timings
cbind(
  n_lints = sapply(info, \(x) length(x$lints)),
  time_elapsed = sapply(info, `[[`, c("timing", "elapsed"))
)

# copy-paste output as a string 'str', or in current session use
# data.table::as.data.table(cbind(...), keep.rownames="linter")
new_data = str |>
  strspit("\n") |>
  unlist() |>
  strsplit("\\s+") |>
  data.table::transpose() |>
  data.table::setDT() |>
  data.table::setnames(c("linter", "n_lints", "time_elapsed"))

data = merge(old_data, new_data, by = "linter", suffixes = c("_old", "_new"))
data[, time_delta := time_elapsed_new - time_elapsed_old]
data[, time_delta_pct := 100 * time_delta / time_elapsed_old]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant