Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New linter: recommend sort(x) over x[order(x)] #1512

Closed
Bisaloo opened this issue Aug 26, 2022 · 6 comments · Fixed by #1528
Closed

New linter: recommend sort(x) over x[order(x)] #1512

Bisaloo opened this issue Aug 26, 2022 · 6 comments · Fixed by #1528
Labels
feature a feature request or enhancement new-linter

Comments

@Bisaloo
Copy link
Contributor

Bisaloo commented Aug 26, 2022

I think it would be nice to have a linter to recommend using sort(x) to sort a vector instead of x[order(x)]:

  • it is more readable
  • it is faster, especially for longer vectors
library(microbenchmark)

x <- sample(LETTERS, 1e3, replace = TRUE)

microbenchmark(
  sort(x),
  x[order(x)],
  check = "identical"
)
#> Unit: microseconds
#>         expr     min       lq     mean   median       uq      max neval
#>      sort(x) 529.036 544.6405 585.4106 559.5975 597.4515  907.008   100
#>  x[order(x)] 811.534 836.0540 899.0072 865.4155 904.8205 1453.345   100

x_long <- sample(LETTERS, 1e5, replace = TRUE)

microbenchmark(
  sort(x_long),
  x_long[order(x_long)],
  check = "identical"
)
#> Unit: milliseconds
#>                   expr       min        lq      mean    median        uq
#>           sort(x_long)  59.78242  61.85322  64.85112  63.69153  66.59794
#>  x_long[order(x_long)] 150.68092 160.44464 168.89652 164.85925 175.08799
#>        max neval
#>   86.98432   100
#>  207.92772   100

Created on 2022-08-26 by the reprex package (v2.0.1.9000)

Possible issues

sort() and order() are both generic and it is theoretically possible that some packages only provide S3 methods for one of them. In this case, they could not be strictly equivalent. However, I would say it's likely a bug in the package.

@IndrajeetPatil IndrajeetPatil added the feature a feature request or enhancement label Aug 26, 2022
@Bisaloo
Copy link
Contributor Author

Bisaloo commented Aug 26, 2022

If you agree this would be a good addition to the package, I could be interested in trying to submit a PR. I can't guarantee I will manage to do so though.

I've read the relevant vignette, which I found useful but I'm still not completely sure how to write the XPath code.

@IndrajeetPatil
Copy link
Collaborator

Please feel free to open a PR and, if you get stuck somewhere, we will be more than happy to help! ☺️

Thanks a lot.

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Aug 30, 2022

I'm getting sidetracked by other projects so in case I forget about this, here are some test cases for safe keeping:

test_that("sort_linter skips allowed usages", {
  expect_lint("order(y)", NULL, sort_linter())

  expect_lint("y[order(x)]", NULL, sort_linter())

  # If another function is intercalated, don't fail
  expect_lint("x[c(order(x))]", NULL, sort_linter())
})


test_that("sort_linter blocks simple disallowed usages", {
  lint_message <- rex::rex("sort(x) is better than x[order(x)].")

  expect_lint("x[order(x)]", lint_message, sort_linter())

  # Works with extra args in order()
  expect_lint("x[order(x, decreasing = TRUE)]", lint_message, sort_linter())

  # ...even in disorder
  expect_lint("x[order(decreasing = TRUE, x)]", lint_message, sort_linter())
})

@IndrajeetPatil
Copy link
Collaborator

One issue I see here is that these two options won't be equivalent in the presence of missing values:

x <- c(1, 4, NA, 2, 8)

sort(x)
#> [1] 1 2 4 8
x[order(x)]
#> [1]  1  2  4  8 NA

Created on 2022-09-16 with reprex v2.0.2

@IndrajeetPatil
Copy link
Collaborator

So I guess our suggestion should be to set na.last = TRUE:

x <- c(1, 4, NA, 2, 8)

sort(x, na.last = TRUE)
#> [1]  1  2  4  8 NA
x[order(x)]
#> [1]  1  2  4  8 NA

Created on 2022-09-16 with reprex v2.0.2

WDYT?

@Bisaloo
Copy link
Contributor Author

Bisaloo commented Sep 16, 2022

I think your proposal makes sense to ensure the behaviour is identical, even though I don't think it makes a lot of sense to order a vector with NAs (which is probably why they are dropped by default).

I'll leave some time for others to weigh in and then update the PR accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement new-linter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants