Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Performance/SelectFirst #63

Closed
ghiculescu opened this issue Jun 10, 2019 · 12 comments · Fixed by #157
Closed

[Proposal] Performance/SelectFirst #63

ghiculescu opened this issue Jun 10, 2019 · 12 comments · Fixed by #157

Comments

@ghiculescu
Copy link
Contributor

We have built this cop internally, for identifying uses of .select {}.first or .select {}.count that could be written in a more performant way. My question is, would rubocop accept a PR to add this to the main gem (if I added tests etc) or is it too niche?

module RuboCop
  module Cop
    module TandaCustomCops
      # Don't call `.find_all { }.first`. Instead just use `.find { }`
      # The bad form iterates over every item in an array and runs the block on each.
      # The good form returns as soon as a match is found.
      class SelectFirst < Cop
        def_node_matcher :find_all_first_candidate?, <<-PATTERN
          (send (block $(send _ ${:find_all :select}) ...) ${:first} $...)
        PATTERN
        def_node_matcher :find_all_count_candidate?, <<-PATTERN
          (send (block $(send _ ${:find_all :select}) ...) ${:count} $...)
        PATTERN

        MSG_FIND = <<~MSG.freeze
          `.select { }.first` can be written as `.find { }`. This is faster because it returns as soon as a match is found.`
        MSG

        MSG_COUNT = <<~MSG.freeze
          `.select { }.count` can be written as `.count { }`. This is marginally faster.`
        MSG

        def on_send(node)
          add_offense(node, message: MSG_FIND) if find_all_first_candidate?(node)
          add_offense(node, message: MSG_COUNT) if find_all_count_candidate?(node)
        end
      end
    end
  end
end
@koic
Copy link
Member

koic commented Jun 28, 2019

Can you show a benchmark compared to alternative methods?

@ghiculescu
Copy link
Contributor Author

ghiculescu commented Jun 28, 2019

Obviously the difference scales with the size of the array.

require 'benchmark/ips'

array = 100_000.times.map { rand > 0.1 ? 0 : 1 } # 10% of the array should be 1, 90% should be 0

Benchmark.ips do |x|
  x.report("slow") { array.select(&:nonzero?).first }
  x.report("fast") { array.find(&:nonzero?) }

  x.compare!
end

=begin
Warming up --------------------------------------
                slow     3.000  i/100ms
                fast    44.899k i/100ms
Calculating -------------------------------------
                slow     35.648  (±11.2%) i/s -    177.000
                fast    579.702k (±14.2%) i/s -      2.829M

Comparison:
                fast:   579701.7 i/s
                slow:       35.6 i/s - 16262.05x slower
=end

But even with a much smaller array it's still significant:

require 'benchmark/ips'

array = 100.times.map { rand > 0.1 ? 0 : 1 } # 10% of the array should be 1, 90% should be 0

Benchmark.ips do |x|
  x.report("slow") { array.select(&:nonzero?).first }
  x.report("fast") { array.find(&:nonzero?) }

  x.compare!
end

=begin
Warming up --------------------------------------
                slow     3.172k i/100ms
                fast    29.762k i/100ms
Calculating -------------------------------------
                slow     33.738k (±13.5%) i/s -    168.116k
                fast    397.477k (±12.9%) i/s -      1.964M

Comparison:
                fast:   397477.5 i/s
                slow:    33737.8 i/s - 11.78x slower
=end

@ghiculescu
Copy link
Contributor Author

This one is less exciting, it's just neater code IMO:

require 'benchmark/ips'

array = 100_000.times.map { rand > 0.1 ? 0 : 1 } # 10% of the array should be 1, 90% should be 0

Benchmark.ips do |x|
  x.report("slow") { array.select(&:nonzero?).count }
  x.report("fast") { array.count(&:nonzero?) }

  x.compare!
end

=begin
Warming up --------------------------------------
                slow     3.000  i/100ms
                fast     3.000  i/100ms
Calculating -------------------------------------
                slow     37.756  (± 2.6%) i/s -    189.000
                fast     35.935  (±11.1%) i/s -    180.000

Comparison:
                slow:       37.8 i/s
                fast:       35.9 i/s - same-ish: difference falls within error
=end

@koic
Copy link
Member

koic commented Jul 2, 2019

Thank you for benchmarking with alternative methods. I agree to add this cop to performance cops.

I will convey some of points considered.

Why did I choose this fix out of the possible options?

I think that coding context will be maintained by replacing find_all { ... }.first with find method and replacing select { ... }.first to detect method respectively. However, it will not be possible to replace it, as it will cause an error in an Active Record model when select { ... }.first method will be auto-corrected detect method. The following are examples.

# Use `select { ... }.first`
Model.select { |model| model.attr == value }.first #=> Returns an AR model object

# Use `find { ... }`
Model.find { |model| model.attr == value } #=> Returns an AR model object

# Use `delect { ... }`
Model.delect { |model| model.attr == value } #=> NoMethodError
NoMethodError: undefined method `delect' for #<Class:0x007f81bd065ae0>
Did you mean?  delete
               select

So I considered that it should be replaced by find method.
It means that performance is prioritized over representation by API name when using this cop. Because this cop is a performance cops.

@koic
Copy link
Member

koic commented Jul 2, 2019

It may be possible to further consider whether it is okay to include select { ... }.first and select { ... }.count in the same cop. Because it is named SelectFirst cop. It looked good to me, but there may be other appropriate cop name.

Cc @rubocop-hq/rubocop-core

@bquorning
Copy link
Contributor

bquorning commented Jul 3, 2019

it will cause an error in an Active Record model

I think that was caused by a typo: delect != detect.

Both find and detect should be valid alternatives.

@koic
Copy link
Member

koic commented Jul 3, 2019

Yeah, detect is correct, not delect. This is my typo.

@ghiculescu
Copy link
Contributor Author

Is there a recommended approach to having rules only apply to specific ruby versions?

@koic
Copy link
Member

koic commented Jul 4, 2019

@rrosenblum
Copy link
Contributor

I think Performance/Detect and Performance/Count cover most if not all of this functionality. As mentioned in #69, I don't think these cops were being run for most people. There are likely some enhancements that can be made to each of the cops.

@fatkodima
Copy link
Contributor

@ghiculescu Have you started work on this?

@ghiculescu
Copy link
Contributor Author

Ahh, no I haven't. @fatkodima if you're interested please go ahead, otherwise I will try and get to it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants