Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Array#concat(Indexable) #13280

Merged

Conversation

HertzDevil
Copy link
Contributor

When concatenating an Indexable to an Array, we already know the argument's number of elements in advance and this number won't change (formally this needs the wording from #13061), so we never need more than one reallocation of the array's buffer. Additionally, for containers that store their elements in a contiguous buffer (or two in the case of Deque), we can leverage Intrinsics.memcpy whenever the element types are the same. Benchmarks:

Source
require "benchmark"

class Array(T)
  def concat2(other : Indexable)
    # this PR's implementation
  end
end

{% begin %}
  M = {{ env("M").to_i }}
  N = {{ env("N").to_i }}
{% end %}
puts "M=#{M}, N=#{N}"

x = [] of Int64
y_slice = Slice(Int64).empty
y_static_array = uninitialized Int64[N]
y_deque = Deque(Int64).new

puts "Slice:"
Benchmark.ips do |b|
  b.report("ctrl") do
    x = Array(Int64).new(M, &.to_i64)
    y_slice = Slice(Int64).new(N, &.to_i64)
  end

  b.report("old") do
    x = Array(Int64).new(M, &.to_i64)
    y_slice = Slice(Int64).new(N, &.to_i64)
    x.concat(y_slice)
  end

  b.report("new") do
    x = Array(Int64).new(M, &.to_i64)
    y_slice = Slice(Int64).new(N, &.to_i64)
    x.concat2(y_slice)
  end
end

puts "StaticArray:"
Benchmark.ips do |b|
  b.report("old") do
    x = Array(Int64).new(M, &.to_i64)
    x.concat(y_static_array)
  end

  b.report("new") do
    x = Array(Int64).new(M, &.to_i64)
    x.concat2(y_static_array)
  end
end

puts "Deque:"
Benchmark.ips do |b|
  b.report("ctrl") do
    x = Array(Int64).new(M, &.to_i64)
    y_deque = Deque(Int64).new(N, &.to_i64)
  end

  b.report("old") do
    x = Array(Int64).new(M, &.to_i64)
    y_deque = Deque(Int64).new(N, &.to_i64)
    x.concat(y_deque)
  end

  b.report("new") do
    x = Array(Int64).new(M, &.to_i64)
    y_deque = Deque(Int64).new(N, &.to_i64)
    x.concat2(y_deque)
  end
end
M=1000, N=100
Slice:
ctrl 948.57k (  1.05µs) (± 0.92%)  8.86kB/op        fastest
 old 642.38k (  1.56µs) (± 0.80%)  20.1kB/op   1.48× slower
 new 668.03k (  1.50µs) (± 0.74%)  20.1kB/op   1.42× slower
StaticArray:
old 709.49k (  1.41µs) (± 0.31%)  19.1kB/op   1.05× slower
new 742.15k (  1.35µs) (± 0.78%)  19.1kB/op        fastest
Deque:
ctrl 917.24k (  1.09µs) (± 0.71%)  8.89kB/op        fastest
 old 627.34k (  1.59µs) (± 0.55%)  20.2kB/op   1.46× slower
 new 654.55k (  1.53µs) (± 0.72%)  20.2kB/op   1.40× slower

M=1000, N=1000
Slice:
ctrl 491.24k (  2.04µs) (± 0.93%)  15.7kB/op        fastest
 old 214.47k (  4.66µs) (± 0.64%)  63.6kB/op   2.29× slower
 new 283.87k (  3.52µs) (± 0.90%)  36.7kB/op   1.73× slower
StaticArray:
old 278.31k (  3.59µs) (± 0.68%)  55.8kB/op   1.37× slower
new 382.17k (  2.62µs) (± 1.30%)  28.8kB/op        fastest
Deque:
ctrl 495.76k (  2.02µs) (± 1.05%)  15.7kB/op        fastest
 old 222.66k (  4.49µs) (± 0.88%)  63.6kB/op   2.23× slower
 new 303.31k (  3.30µs) (± 0.90%)  36.7kB/op   1.63× slower

M=100, N=1000
Slice:
ctrl 968.83k (  1.03µs) (± 0.78%)  8.86kB/op        fastest
 old 400.14k (  2.50µs) (± 0.71%)  27.7kB/op   2.42× slower
 new 472.52k (  2.12µs) (± 0.59%)  20.7kB/op   2.05× slower
StaticArray:
old 614.25k (  1.63µs) (± 0.59%)  19.8kB/op   1.27× slower
new 778.45k (  1.28µs) (± 0.43%)  12.9kB/op        fastest
Deque:
ctrl 833.65k (  1.20µs) (± 0.55%)  8.89kB/op        fastest
 old 373.26k (  2.68µs) (± 0.62%)  27.7kB/op   2.23× slower
 new 436.66k (  2.29µs) (± 0.40%)  20.7kB/op   1.91× slower

The above results are collected on top of #13275; without that PR, when N ≪ M, i.e. a small Indexable is being concatenated to a large Array, the Array's buffer size would grow to the next power of 2, which is often less inefficient than the new growth policy. In contrast, the existing #concat(Enumerable) inserts elements one by one, so it already uses the new growth policy.

The old Array#concat(Array) overload is absorbed into the new implementation for Indexable arguments.

@straight-shoota straight-shoota added this to the 1.9.0 milestone Apr 4, 2023
HertzDevil added a commit to HertzDevil/crystal that referenced this pull request Apr 4, 2023
@HertzDevil
Copy link
Contributor Author

HertzDevil commented Apr 4, 2023

There is a compilation error that only affects Crystal 1.0.0? Same piece of code works on Crystal 1.1.1, and I don't recall any changes to the type inference logic in-between. Only reason I could think of is #10582

@straight-shoota straight-shoota merged commit 70bf29e into crystal-lang:master Apr 18, 2023
@HertzDevil HertzDevil deleted the perf/array-concat-indexable branch April 18, 2023 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants