Avoid //expr XPaths #1358

MichaelChirico · 2022-06-04T17:49:07Z

The following currently linters use //expr:

Original issue raised

As seen in #1353, #1340, #1310, there are subtle performance implications to the way we write our XPaths.

One thing that became clear is that writing //NODE1[NODE2] is slower than //NODE2[parent::NODE1] if NODE2 is far less frequent than NODE1.

And <expr> is by far the most common node; here's a guess at the general frequency by tabulating across r-devel and my local R packages/scripts:

library(data.table)
library(knitr)

get_r_files = function(dir) list.files(dir, pattern = "\\.R$", recursive = TRUE, full.names = TRUE)
get_freq = function(f) {
  pc = tryCatch(parse(f), error = identity, warning = identity)
  if (inherits(pc, "condition")) return(NULL)
  setDT(getParseData(pc))[, .N, by = token]
}

r_files = c(get_r_files("~/github"), get_r_files("~/svn"))
token_freq = rbindlist(lapply(r_files, get_freq))
token_freq = token_freq[, .(N = sum(N)), by = token][order(-N)]
token_freq[, pct := round(100 * N/sum(N), 1)]
knitr::kable(token_freq[pct > 1])

token	N	pct
expr	5713829	36.4
','	1514487	9.6
SYMBOL	1357662	8.6
'('	1119392	7.1
')'	1119392	7.1
NUM_CONST	1086271	6.9
SYMBOL_FUNCTION_CALL	917895	5.8
STR_CONST	375598	2.4
LEFT_ASSIGN	334988	2.1
COMMENT	307772	2.0
EQ_SUB	263926	1.7
SYMBOL_SUB	254903	1.6

i.e., //expr eliminates at most 2/3 of tokens, while other tokens typically eliminate >90% of the tree.

The trade-off here is for readability. XPaths with a lot of parent::/preceding-sibling::/following-sibling:: axes tend to be less readable -- our current XPaths are fairly readable IMO. Moreover, most of our linters are built around expression-level lints, and having a comparatively small tree is the norm in that case -- I guess the overhead of iterating over expressions is usually higher than the savings from fine-tuning XPaths, and that in the presence of cacheing, performance gains will be unnoticeable in all but edge cases.

So we should proceed gently on this issue. Some ideas:

Write some helpers that have high readability but do translation to more performant XPaths "under the hood"
Prioritize fixing it on file-level linters
Wait for Add proper benchmarking functionality to .dev/compare_branches.R #778 to help quantify which linters are performing worse and prioritize those

The text was updated successfully, but these errors were encountered:

AshesITR · 2022-06-19T13:56:01Z

~copied into the main issue body for tracking progress at a glance~

IndrajeetPatil · 2022-10-02T03:48:45Z

Just wanted to also include here a benchmark.

I am assuming the performance benefits scale with the complexity of the code tree.

library(xml2)
library(xmlparsedata)

x <- "switch(stat,
      o = {
        x <- 0.01
      },
      b = {
        x <- 0.05
      },
      # else
      {
        x <- 0.001
      }
    )"

xml <- xml_parse_data(getParseData(parse(text = x)), pretty = TRUE)

xpath1 <- "//expr[FUNCTION and @line1 != @line2 and not(expr[OP-LEFT-BRACE])]"
xpath2 <- "//FUNCTION/parent::expr[@line1 != @line2 and not(expr[OP-LEFT-BRACE])]"

xml <- as_xml_document(xml)

bench::mark(
  "with //expr"    = xml_find_all(xml, xpath1),
  "without //expr" = xml_find_all(xml, xpath2)
)
#> # A tibble: 2 × 6
#>   expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 with //expr      20.3µs   21.1µs    46530.    26.4KB     18.6
#> 2 without //expr   15.2µs   15.9µs    61757.        0B     30.9

^{Created on 2022-10-02 with reprex v2.0.2}

MichaelChirico · 2022-10-02T04:21:36Z

see some of the other cited issues, e.g. #1348. the issue gets way worse on complex R files.

MichaelChirico added the performance label Jun 4, 2022

AshesITR mentioned this issue Jun 8, 2022

Adds expect_that() to undesirable functions linter #1377

Closed

MichaelChirico mentioned this issue Sep 19, 2022

Add sort_linter() #1528

Merged

This was referenced Oct 2, 2022

Move more XPaths away from //expr logic #1590

Merged

Another batch of migrations away from //expr #1594

Merged

More XPaths avoiding //expr #1605

Merged

Complete switch away from //expr XPaths #1606

Merged

MichaelChirico added this to the 3.0.2 milestone Oct 3, 2022

IndrajeetPatil closed this as completed in #1606 Oct 3, 2022

MichaelChirico mentioned this issue Oct 16, 2022

implement indentation_linter #1411

Merged

MichaelChirico modified the milestones: 3.0.3, 3.1.0 Mar 20, 2023

MichaelChirico mentioned this issue Nov 24, 2023

fix detection of non-syntactic names in object_overwrite_linter() #2347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid //expr XPaths #1358

Avoid //expr XPaths #1358

MichaelChirico commented Jun 4, 2022 •

edited by IndrajeetPatil

Loading

AshesITR commented Jun 19, 2022 •

edited by MichaelChirico

Loading

IndrajeetPatil commented Oct 2, 2022

MichaelChirico commented Oct 2, 2022

Avoid //expr XPaths #1358

Avoid //expr XPaths #1358

Comments

MichaelChirico commented Jun 4, 2022 • edited by IndrajeetPatil Loading

AshesITR commented Jun 19, 2022 • edited by MichaelChirico Loading

IndrajeetPatil commented Oct 2, 2022

MichaelChirico commented Oct 2, 2022

MichaelChirico commented Jun 4, 2022 •

edited by IndrajeetPatil

Loading

AshesITR commented Jun 19, 2022 •

edited by MichaelChirico

Loading