-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid //expr XPaths #1358
Labels
Comments
~copied into the main issue body for tracking progress at a glance~ |
Merged
This was referenced Sep 27, 2022
Just wanted to also include here a benchmark. I am assuming the performance benefits scale with the complexity of the code tree. library(xml2)
library(xmlparsedata)
x <- "switch(stat,
o = {
x <- 0.01
},
b = {
x <- 0.05
},
# else
{
x <- 0.001
}
)"
xml <- xml_parse_data(getParseData(parse(text = x)), pretty = TRUE)
xpath1 <- "//expr[FUNCTION and @line1 != @line2 and not(expr[OP-LEFT-BRACE])]"
xpath2 <- "//FUNCTION/parent::expr[@line1 != @line2 and not(expr[OP-LEFT-BRACE])]"
xml <- as_xml_document(xml)
bench::mark(
"with //expr" = xml_find_all(xml, xpath1),
"without //expr" = xml_find_all(xml, xpath2)
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 with //expr 20.3µs 21.1µs 46530. 26.4KB 18.6
#> 2 without //expr 15.2µs 15.9µs 61757. 0B 30.9 Created on 2022-10-02 with reprex v2.0.2 |
see some of the other cited issues, e.g. #1348. the issue gets way worse on complex R files. |
This was referenced Oct 2, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The following currently linters use
//expr
:Original issue raised
As seen in #1353, #1340, #1310, there are subtle performance implications to the way we write our XPaths.
One thing that became clear is that writing
//NODE1[NODE2]
is slower than//NODE2[parent::NODE1]
ifNODE2
is far less frequent thanNODE1
.And
<expr>
is by far the most common node; here's a guess at the general frequency by tabulating across r-devel and my local R packages/scripts:i.e.,
//expr
eliminates at most 2/3 of tokens, while other tokens typically eliminate >90% of the tree.The trade-off here is for readability. XPaths with a lot of
parent::
/preceding-sibling::
/following-sibling::
axes tend to be less readable -- our current XPaths are fairly readable IMO. Moreover, most of our linters are built around expression-level lints, and having a comparatively small tree is the norm in that case -- I guess the overhead of iterating over expressions is usually higher than the savings from fine-tuning XPaths, and that in the presence of cacheing, performance gains will be unnoticeable in all but edge cases.So we should proceed gently on this issue. Some ideas:
The text was updated successfully, but these errors were encountered: