Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wquantile #117

Merged
merged 1 commit into from
Oct 6, 2015
Merged

wquantile #117

merged 1 commit into from
Oct 6, 2015

Conversation

matthieugomez
Copy link
Contributor

This function implements a weighted quantile. It is a generalization of quantile type 7 as implemented in Julia base (by that I mean results are the same with w = 1).

The function follows the notation in the R help file for quantile, which hopefully will make it easy to add more quantile types. Only non negative weights are allowed.

I've added some tests and documentation.

This is my first pull request so I may be doing some things incorrectly. In particular, I'm not sure about the order of arguments (weight vs quantiles).

@johnmyleswhite
Copy link
Member

Just to clarify: you didn't read any R source code when implementing this?

@@ -158,6 +158,10 @@ Quantile and Friends
w = rand(n)
xk = median(x, weights(w))

.. function:: wquantile(x, w, q)

Compute the weighted quantile of ``x``, using weights given by a weight vector ``w`` (of type ``WeightVec``). Weights must not be negative. The weight and data vectors must have the same length `n`. Quantiles as defined as follow: assume for simplicity that weights sum to `n`. Denote :math:`x_h` the smallest element of ``x`` such that the cumulated sum up of weights up to `x_h` is strictly superior to :math:`lv * q + m`. Denote :math:`x_l` the element just before `x_h`. The sample quantile is defined as `(1-gamma) x_l + gamma x_h`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The weight and data vectors" -> "weights"
"as follow" -> "as follows"
"for simplicity": looks like you can always assume this, right? Better remove these words.
"cumulated sum up of weights up to" -> "cumulative sum of weights up to"
"gamma" -> "\gamma"
Also, looks like several expressions lack :math:.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and you say "quantile" (singular) while a vector of probabilities is supported. It would make sense to use the plural, and to say a word about the probabilities, following the vocabulary used for ?quantile.

@nalimilan
Copy link
Member

Thanks! I've only had a look at most obvious things.

Indeed, @johnmyleswhite is right that you must not have taken inspiration from the R code to write this, or it can't be licensed under MIT. It's a bit hard to use their notation without reading their code, though you can always make an implementation from scratch just based on the behaviour described in R docs.

@matthieugomez
Copy link
Contributor Author

Thanks for your comments. I changed my code accordingly. I don't understand what coverage/coveralls is so I'm not sure why it fails.

No I did not look at the R code (which does not compute weighted quantile btw).

A question about your workflow: when I modify a local copy of a package like StatsBase, how can I install this local version as a package? Also, is there a way to load every function in this local version, including non exported function?

@nalimilan
Copy link
Member

Coveralls is used to measure the portion of the code which is covered by tests. Apparently there are some issues with it at the moment.

When you modify a package in ~/.julia/Package, it's automatically used. You need to checkout the original git branch to use the official version again. I'm not sure how to load all functions from a package; maybe simply by calling include on the toplevel file of the package.

@@ -158,6 +158,9 @@ Quantile and Friends
w = rand(n)
xk = median(x, weights(w))

.. function:: wquantile(x, p, w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q is p in the code now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather the contrary.

@StefanKarpinski
Copy link
Contributor

@matthieugomez, can you answer the question about whether you read the R source when writing this code? If you did, then I'm afraid that iterating over this patch is a waste of time since the project can't accept it in any case.

@johnmyleswhite
Copy link
Member

No I did not look at the R code (which does not compute weighted quantile btw).

He did answer it, @StefanKarpinski



###### Weighted quantile #####
# Variables m, g, low, high, correspond respectively to m, gamma, x[j], x[j+1] in the R help file for quantile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you can use γ to be closer to the math in Julia.

l = 1
wthreshold = cumulative_weight - wt[vpermute[i-1]] - index[k]
# loop over posterior elements with same value
while (i + l <= lv && v[vpermute[i+l]] == high[k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard Julia style is to put no parentheses after while.

@nalimilan
Copy link
Member

The tests fail since you rebased, looks like you forgot to export the function.

@matthieugomez
Copy link
Contributor Author

I removed wquantile (instead one should use quantile(v, weight(w))). I also defined quantile for AbstractVector instead of just RealVector but I'm not sure this is correct.

I also used the weighted quantile function to improve the function describe, adding a weighted and a detailed option.

@nalimilan
Copy link
Member

I'm all for using quantile, but as long as wmedian is exported, it makes sense to export wquantile. Else people might think it's not supported. Regarding the type, I think it's better to keep RealVector, which is used everywhere at the moment. If this needs to be changed, better do that separately for all functions in a different PR.



function describe{T<:Real}(a::AbstractArray{T}; detail = false)
detail? show(detailedsummarystats(a)) : show(summarystats(a))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space before ? (same below).

@matthieugomez
Copy link
Contributor Author

Thanks for the feedback again. What's the best way to use the code for the method of one type to define the method for another type ? Define Abstract type? Write a convert function from weighted to unweighted summary ? I don't see how this would make the code simpler.

On Jun 12, 2015, at 6:09 PM, Milan Bouchet-Valat [email protected] wrote:

I'm all for using quantile, but as long as wmedian is exported, it makes sense to export wquantile. Else people might think it's not supported. Regarding the type, I think it's better to keep RealVector, which is used everywhere at the moment. If this needs to be changed, better do that separately for all functions in a different PR.


Reply to this email directly or view it on GitHub.

@nalimilan
Copy link
Member

I think you can define an abstract type, say AbstractSummaryStats, from which both types inherit. Then the unweighted version will apply to AbstractSummaryStats, so that you can call it without any conversion from the weighted version.

@nalimilan
Copy link
Member

Even simpler: you could merge all (non-)detailed and (un-)weighted printing methods into one method parameterized on T <: AbstractSummaryStats, and just do if T == WeightedSummaryStats before the line printing the weights and if T == DetailedSummaryStats before the lines printing detailed stats.

@matthieugomez
Copy link
Contributor Author

I've simplified the wquantile code and added tests for the describe method.
What should I do to have this merged?

@andreasnoack
Copy link
Member

You'll have to rebase. @nalimilan You've done most of the review, is this ready for merging?

@aviks
Copy link
Contributor

aviks commented Sep 16, 2015

Bump?

@matthieugomez
Copy link
Contributor Author

It's been a while. Can this be merged? I've suppressed the types to display weighted summary stats and I'll write it as a different pull request.

@johnmyleswhite
Copy link
Member

This looks good to merge. What's going on with Travis?

@andreasnoack
Copy link
Member

There have been some certificate problems on the nightlies, but they should vbe fixed now. I've restarted the build and it seems to work.

johnmyleswhite added a commit that referenced this pull request Oct 6, 2015
@johnmyleswhite johnmyleswhite merged commit 7397c85 into JuliaStats:master Oct 6, 2015
@johnmyleswhite
Copy link
Member

Thanks for working on this, @matthieugomez.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants