Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

Broadcast performance #32

Open
davidagold opened this issue Jul 7, 2015 · 0 comments
Open

Broadcast performance #32

davidagold opened this issue Jul 7, 2015 · 0 comments

Comments

@davidagold
Copy link
Contributor

I've begun uploading some resources for performance profiling. Here are my results from running profile_broadcast(), profile_ops1() and profile_ops2() from https://github.com/johnmyleswhite/NullableArrays.jl/blob/master/perf/broadcast.jl:

julia> profile_broadcast()
f(x, y) := x * y
Method: broadcast!(f, dest, A1, A2) (no empty entries):
  For Array{Float64}:            19.843 milliseconds (10 allocations: 256 bytes)
  For NullableArray{Float64}:   143.524 milliseconds (10 allocations: 256 bytes)
  For DataArray{Float64}:        53.028 milliseconds (12 allocations: 1221 KB)

Method: broadcast!(f, dest, A1, A2) (~half empty entries):
  For NullableArray{Float64}:   161.431 milliseconds (10 allocations: 256 bytes)
  For DataArray{Float64}:       125.379 milliseconds (12 allocations: 1221 KB)

julia> profile_ops1()
Method: .+ (no empty entries)
  For Array{Float64}:            64.404 milliseconds (33 allocations: 78126 KB, 54.91% gc time)
  For NullableArray{Float64}:   170.374 milliseconds (33 allocations: 153 MB, 0.63% gc time)
  For DataArray{Float64}:        58.402 milliseconds (39 allocations: 80568 KB, 18.74% gc time)
Method: .- (no empty entries)
  For Array{Float64}:            33.202 milliseconds (33 allocations: 78126 KB, 22.81% gc time)
  For NullableArray{Float64}:   163.873 milliseconds (33 allocations: 153 MB, 0.96% gc time)
  For DataArray{Float64}:        57.270 milliseconds (39 allocations: 80568 KB, 17.04% gc time)
Method: .* (no empty entries)
  For Array{Float64}:            33.124 milliseconds (33 allocations: 78126 KB, 22.90% gc time)
  For NullableArray{Float64}:   158.847 milliseconds (33 allocations: 153 MB, 0.63% gc time)
  For DataArray{Float64}:        59.784 milliseconds (39 allocations: 80568 KB, 16.64% gc time)
Method: ./ (no empty entries)
  For Array{Float64}:            64.691 milliseconds (33 allocations: 78126 KB, 11.56% gc time)
  For NullableArray{Float64}:   163.999 milliseconds (33 allocations: 153 MB, 0.66% gc time)
  For DataArray{Float64}:        69.963 milliseconds (39 allocations: 80568 KB, 13.96% gc time)
Method: .% (no empty entries)
  For Array{Float64}:           110.246 milliseconds (33 allocations: 78126 KB, 6.83% gc time)
  For NullableArray{Float64}:   260.494 milliseconds (33 allocations: 153 MB, 0.41% gc time)
  For DataArray{Float64}:       130.343 milliseconds (39 allocations: 80568 KB, 8.13% gc time)
Method: .^ (no empty entries)
  For Array{Float64}:           931.097 milliseconds (33 allocations: 78126 KB, 0.80% gc time)
  For NullableArray{Float64}:     1.168 seconds      (33 allocations: 153 MB, 0.09% gc time)
  For DataArray{Float64}:       926.084 milliseconds (39 allocations: 80568 KB, 1.14% gc time)
Method: .>> (no empty entries)
  For Array{Float64}:            34.215 milliseconds (29 allocations: 78126 KB, 21.93% gc time)
  For NullableArray{Float64}:   193.658 milliseconds (29 allocations: 153 MB, 4.22% gc time)
  For DataArray{Float64}:        70.271 milliseconds (35 allocations: 80567 KB, 15.88% gc time)
Method: .== (no empty entries)
  For Array{Float64}:            28.650 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   235.037 milliseconds (33 allocations: 153 MB, 6.90% gc time)
  For DataArray{Float64}:        38.276 milliseconds (39 allocations: 12208 KB)
Method: .!= (no empty entries)
  For Array{Float64}:            29.436 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   226.303 milliseconds (33 allocations: 153 MB, 7.24% gc time)
  For DataArray{Float64}:        43.322 milliseconds (39 allocations: 12208 KB)
Method: .< (no empty entries)
  For Array{Float64}:            78.932 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   227.670 milliseconds (33 allocations: 153 MB, 5.31% gc time)
  For DataArray{Float64}:        39.111 milliseconds (39 allocations: 12208 KB)
Method: .> (no empty entries)
  For Array{Float64}:            72.882 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   226.093 milliseconds (33 allocations: 153 MB, 5.52% gc time)
  For DataArray{Float64}:        41.665 milliseconds (39 allocations: 12208 KB)
Method: .<= (no empty entries)
  For Array{Float64}:            71.695 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   224.360 milliseconds (33 allocations: 153 MB, 5.45% gc time)
  For DataArray{Float64}:        37.917 milliseconds (39 allocations: 12208 KB)
Method: .>= (no empty entries)
  For Array{Float64}:            76.345 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   233.169 milliseconds (33 allocations: 153 MB, 5.36% gc time)
  For DataArray{Float64}:        38.132 milliseconds (39 allocations: 12208 KB)

julia> profile_ops2()
Method: .+ (~half empty entries)
  For Array{Float64}:            35.238 milliseconds (33 allocations: 78126 KB, 22.17% gc time)
  For NullableArray{Float64}:   210.468 milliseconds (33 allocations: 153 MB, 0.52% gc time)
  For DataArray{Float64}:       153.632 milliseconds (39 allocations: 80568 KB, 6.31% gc time)
Method: .- (~half empty entries)
  For Array{Float64}:            33.107 milliseconds (33 allocations: 78126 KB, 23.22% gc time)
  For NullableArray{Float64}:   204.025 milliseconds (33 allocations: 153 MB, 0.51% gc time)
  For DataArray{Float64}:       146.317 milliseconds (39 allocations: 80568 KB, 6.66% gc time)
Method: .* (~half empty entries)
  For Array{Float64}:            33.065 milliseconds (33 allocations: 78126 KB, 23.50% gc time)
  For NullableArray{Float64}:   204.189 milliseconds (33 allocations: 153 MB, 0.54% gc time)
  For DataArray{Float64}:       144.531 milliseconds (39 allocations: 80568 KB, 6.80% gc time)
Method: ./ (~half empty entries)
  For Array{Float64}:            66.764 milliseconds (33 allocations: 78126 KB, 11.39% gc time)
  For NullableArray{Float64}:   204.838 milliseconds (33 allocations: 153 MB, 0.51% gc time)
  For DataArray{Float64}:       145.403 milliseconds (39 allocations: 80568 KB, 6.71% gc time)
Method: .% (~half empty entries)
  For Array{Float64}:           107.475 milliseconds (33 allocations: 78126 KB, 7.29% gc time)
  For NullableArray{Float64}:   287.880 milliseconds (33 allocations: 153 MB, 0.38% gc time)
  For DataArray{Float64}:       167.173 milliseconds (39 allocations: 80568 KB, 5.92% gc time)
Method: .^ (~half empty entries)
  For Array{Float64}:           885.454 milliseconds (33 allocations: 78126 KB, 0.88% gc time)
  For NullableArray{Float64}:     1.167 seconds      (33 allocations: 153 MB, 0.09% gc time)
  For DataArray{Float64}:       372.115 milliseconds (39 allocations: 80568 KB, 2.61% gc time)
Method: .>> (~half empty entries)
  For Array{Float64}:            38.296 milliseconds (29 allocations: 78126 KB, 28.68% gc time)
  For NullableArray{Float64}:   205.965 milliseconds (29 allocations: 153 MB, 3.84% gc time)
Method: .== (~half empty entries)
  For Array{Float64}:            31.618 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   253.261 milliseconds (33 allocations: 153 MB, 6.84% gc time)
  For DataArray{Float64}:       127.952 milliseconds (39 allocations: 12208 KB)
Method: .!= (~half empty entries)
  For Array{Float64}:            29.622 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   259.534 milliseconds (33 allocations: 153 MB, 6.41% gc time)
  For DataArray{Float64}:       128.988 milliseconds (39 allocations: 12208 KB)
Method: .< (~half empty entries)
  For Array{Float64}:            73.964 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   258.533 milliseconds (33 allocations: 153 MB, 4.87% gc time)
  For DataArray{Float64}:       130.118 milliseconds (39 allocations: 12208 KB)
Method: .> (~half empty entries)
  For Array{Float64}:            75.889 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   243.750 milliseconds (33 allocations: 153 MB, 5.02% gc time)
  For DataArray{Float64}:       129.365 milliseconds (39 allocations: 12208 KB)
Method: .<= (~half empty entries)
  For Array{Float64}:            73.498 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   258.335 milliseconds (33 allocations: 153 MB, 5.02% gc time)
  For DataArray{Float64}:       129.515 milliseconds (39 allocations: 12208 KB)
Method: .>= (~half empty entries)
  For Array{Float64}:            75.257 milliseconds (34 allocations: 1226 KB)
  For NullableArray{Float64}:   245.541 milliseconds (33 allocations: 153 MB, 4.98% gc time)
  For DataArray{Float64}:       127.230 milliseconds (39 allocations: 12208 KB)

Apart from specialized implementations for a handful of element-wise comparison operators (can be found in src/broadcast.jl, there is zero specialized broadcast code for NullableArrays. Speed seems to be generally within 2x - 5x that of the Array implementation, with some notable exceptions tending towards 10x. The memory allocation, though, is substantial, and I'm not sure why -- especially since the mutating method (as in profile_broadcast) appears to be quite memory efficient, if substantially slower. I'll try to track allocations by line and see what I find. But if anybody has thoughts, they would be appreciated. My hope is that we can achieve some solid performance gains without requiring nearly the amount of infrastructure as was necessary for DataArrays.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant