Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerable#sort_by is not always faster than #sort #120

Open
monfresh opened this issue Jan 24, 2017 · 7 comments
Open

Enumerable#sort_by is not always faster than #sort #120

monfresh opened this issue Jan 24, 2017 · 7 comments

Comments

@monfresh
Copy link

Per the Ruby 2.4.0 docs:

The current implementation of sort_by generates an array of tuples containing the original collection element and the mapped value. This makes sort_by fairly expensive when the keysets are simple.

Here's a concrete example showing sort to be 2.70x faster in Ruby 1.9.3, 2.3.3 and 2.4.0:

require 'benchmark/ips'

Benchmark.ips do |x|
  x.time = 5
  x.warmup = 2

  ARRAY = %w{apple pear fig}

  x.report("sort_by") do
    ARRAY.sort_by(&:length)
  end

  x.report("sort") do
    ARRAY.sort { |a, b| a.length <=> b.length}
  end

  x.compare!
end
Warming up --------------------------------------
             sort_by    56.348k i/100ms
                sort   111.946k i/100ms
Calculating -------------------------------------
             sort_by    635.646k (±16.1%) i/s -      3.099M in   5.074036s
                sort      1.713M (±16.3%) i/s -      8.284M in   5.023904s

Comparison:
                sort:  1713232.1 i/s
             sort_by:   635645.9 i/s - 2.70x slower
@mblumtritt
Copy link

mblumtritt commented Jan 24, 2017

In your sample you sort the ARRAY twice - it's already sorted when the second sample runs. The results may be therefore wrong…

@monfresh
Copy link
Author

I don't follow. ARRAY is not mutated. If you call ARRAY.sort_by(&:length), and then call ARRAY again, you get the original ARRAY. You can verify this by adding .freeze to the end of the ARRAY assignment (ARRAY = %w(apple pear fig).freeze)

I get the same results when I change the order of the samples:

Warming up --------------------------------------
                sort   121.953k i/100ms
             sort_by    59.477k i/100ms
Calculating -------------------------------------
                sort      1.898M (± 3.8%) i/s -      9.512M in   5.018772s
             sort_by    704.695k (±11.5%) i/s -      3.450M in   5.002361s

Comparison:
                sort:  1898186.9 i/s
             sort_by:   704694.9 i/s - 2.69x  slower

@monfresh
Copy link
Author

The code example in this repo also does the same thing: it defines the array to be used in the samples as a constant: https://github.com/JuanitoFatas/fast-ruby/blob/master/code/enumerable/sort-vs-sort_by.rb

@avellable
Copy link
Collaborator

avellable commented Jan 25, 2017

@monfresh Thanks for pointing this out. There are two ways to resolve into the current example,

  1. Remove the example
  2. Based on the API documentation of Ruby, we should update the benchmark report to have more details.

@JuanitoFatas Any comments about above?

@mblumtritt
Copy link

@monfresh Sorry - I used in my local sample sort! and sort_by! and did ignore your code details. Sorry …

@abuisman
Copy link

I just ran into a situation where sort_by is not faster, per se, in fact 2.95x slower. But it has something to do with the number of items in the array.

 require 'benchmark/ips'

          records = SomeModel.group(:some_column).sum(:value)

          Benchmark.ips do |x|
            x.report("with #sort") do |n|
              n.times { records.sort { |x, y| y[1] <=> x[1] } }
            end

            x.report("with #sort_by") do |n|
              n.times { records.sort_by(&:last) }
            end

            # Compare the iterations per second of the various reports!
            x.compare!
          end

Yields

Warming up --------------------------------------
          with #sort    95.145k i/100ms
       with #sort_by    41.329k i/100ms
Calculating -------------------------------------
          with #sort      1.350M (±17.9%) i/s -      6.280M in   5.040178s
       with #sort_by    457.236k (±27.8%) i/s -      2.025M in   5.034366s

Comparison:
          with #sort:  1350318.7 i/s
       with #sort_by:   457235.7 i/s - 2.95x  slower

This is with 2 items in the result. When I bump this to 100:

Warming up --------------------------------------
          with #sort     1.029k i/100ms
       with #sort_by     1.365k i/100ms
Calculating -------------------------------------
          with #sort     11.344k (±14.4%) i/s -     55.566k in   5.046672s
       with #sort_by     14.750k (±14.4%) i/s -     72.345k in   5.023649s

Comparison:
       with #sort_by:    14750.3 i/s
          with #sort:    11344.5 i/s - same-ish: difference falls within error

Apparently there is a correlation between the number of items and the performance difference, which makes sense of course. Adding items to the array brings us closer to the 'true' performance difference between sort and sort_by.

So lets bump to 1000:

Calculating -------------------------------------
          with #sort    667.572  (±16.6%) i/s -      3.245k in   5.017742s
       with #sort_by    953.283  (±15.1%) i/s -      4.692k in   5.043402s

Comparison:
       with #sort_by:      953.3 i/s
          with #sort:      667.6 i/s - 1.43x  slower

So at 1000 items sort_by becomes faster.

I don't know how we'd go about this, but adding a note about the expected number of items in the array would make sense.

@luxangel777
Copy link

🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants