Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted mean with function #886

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions src/weights.jl
Original file line number Diff line number Diff line change
Expand Up @@ -682,6 +682,31 @@ function mean(A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
return mean(A, dims=dims)
end

"""
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mean(f, A::AbstractArray, w::AbstractWeights[, dims::Int])
mean(f, A::AbstractArray, w::AbstractWeights[; dims])

dims shouldn't be required to be an integer.


Compute the weighted mean of array `A`, after transforming it'S
contents with the function `f`, with weight vector `w` (of type
`AbstractWeights`). If `dim` is provided, compute the
weighted mean along dimension `dims`.

# Examples
```julia
n = 20
x = rand(n)
w = rand(n)
mean(√, x, weights(w))
```
"""
mean(f, A::AbstractArray, w::AbstractWeights; dims::Union{Colon,Int}=:) =
_mean(f.(A), w, dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
_mean(f.(A), w, dims)
_mean(f, A; kwargs...)

(See below for more details.)


function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slightly more generic version of the same thing, which is less likely to require maintenance in the future (if we add additional keyword arguments to mean). I suggest using this pattern when you can.

Suggested change
function mean(f, A::AbstractArray, w::UnitWeights; dims::Union{Colon,Int}=:)
function mean(f, A::AbstractArray, w::UnitWeights; kwargs...)

a = (dims === :) ? length(A) : size(A, dims)
a != length(w) && throw(DimensionMismatch("Inconsistent array dimension."))
ParadaCarleton marked this conversation as resolved.
Show resolved Hide resolved
return mean(f.(A), dims=dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid memory allocation by using mean(f, A) instead of mean(f.(A)). Remember that f.(A) creates an extra array, which is slow. Memory access is usually the biggest bottleneck on modern CPUs. mean(f.(A)) is 2 separate operations: The first one creates a new array, f.(A), and the second calculates its mean. mean(f, A) calculates the mean of (f(x) for x in A) directly, as one operation, without creating a new array.

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; dims)

I'd also suggest making it slightly more generic, as

Suggested change
return mean(f.(A), dims=dims)
return mean(f, A; kwargs...)

end

##### Weighted quantile #####

"""
Expand Down
21 changes: 21 additions & 0 deletions test/weights.jl
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,27 @@ end
@test mean(a, f(wt), dims=3) ≈ sum(a.*reshape(wt, 1, 1, length(wt)), dims=3)/sum(wt)
@test_throws ErrorException mean(a, f(wt), dims=4)
end

@test mean(√, [1:3;], f([1.0, 1.0, 0.5])) ≈ 1.3120956
@test mean(√, 1:3, f([1.0, 1.0, 0.5])) ≈ 1.3120956
@test mean(√, [1 + 2im, 4 + 5im], f([1.0, 0.5])) ≈ 1.60824421 + 0.88948688im
ParadaCarleton marked this conversation as resolved.
Show resolved Hide resolved

for wt in ([1.0, 1.0, 1.0], [1.0, 0.2, 0.0], [0.2, 0.0, 1.0])
@test mean(√, a, f(wt), dims=1) ≈ sum(sqrt.(a).*reshape(wt, length(wt), 1, 1), dims=1)/sum(wt)
@test mean(√, a, f(wt), dims=2) ≈ sum(sqrt.(a).*reshape(wt, 1, length(wt), 1), dims=2)/sum(wt)
@test mean(√, a, f(wt), dims=3) ≈ sum(sqrt.(a).*reshape(wt, 1, 1, length(wt)), dims=3)/sum(wt)
@test_throws ErrorException mean(√, a, f(wt), dims=4)
end
itsdebartha marked this conversation as resolved.
Show resolved Hide resolved

b = reshape(1.0:9.0, 3, 3)
w = UnitWeights{Float64}(3)
@test mean(√, b, w; dims=1) ≈ reshape(w, :, 3) * sqrt.(b) / sum(w)
@test mean(√, b, w; dims=2) ≈ sqrt.(b) * w / sum(w)

c = 1.0:9.0
w = UnitWeights{Float64}(9)
@test mean(√, c, w) ≈ sum(sqrt.(c)) / length(c)
@test_throws DimensionMismatch mean(√, c, UnitWeights{Float64}(6))
end

@testset "Quantile fweights" begin
Expand Down