-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the definition of wmedian
to wquantile(., 0.5)
#436
Conversation
Codecov Report
@@ Coverage Diff @@
## master #436 +/- ##
==========================================
- Coverage 85.09% 84.97% -0.12%
==========================================
Files 18 18
Lines 1992 1963 -29
==========================================
- Hits 1695 1668 -27
+ Misses 297 295 -2
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #436 +/- ##
==========================================
- Coverage 85.06% 84.95% -0.12%
==========================================
Files 21 21
Lines 2116 2080 -36
==========================================
- Hits 1800 1767 -33
+ Misses 316 313 -3
Continue to review full report at Codecov.
|
put median after quantile
We should revive this PR. Sorry for the delay. For future reference, can you explain why |
median currently is:
It corresponds to a nice extension of the median definition from unweighted to weighted vectors. The issue is this generalization does not help when generalizing other quantiles, so I think is better to use a generalization that is defined for any quantile. The other issue with the current implementation is that, with frequency weights, it does not give the same thing as the unweighted median of a repeated variable. Both issues are solved by this PR but they are conceptually different. |
@deprecate wquantile(v::RealVector, w::RealVector, p::RealVector) quantile(v, weights(w), p) | ||
@deprecate wquantile(v::RealVector, w::RealVector, p::Number) quantile(v, weights(w), [p])[1] | ||
@deprecate wmedian(v::RealVector, w::AbstractWeights{<:Real}) median(v, w) | ||
@deprecate wmedian(v::RealVector, w::RealVector) median(v, weights(w)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me realize that we probably shouldn't accept Weights
objects in quantile
and median
since their meaning is ambiguous: they could be frequency weights, or another type of weights. So here we'd better recommend using pweights
or fweights
(whichever is closer to the current behavior of wmedian
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it ok to assume that default weights are probability weights, as it is now? Honestly, most of the time these differences don't matter, and I don't want users starting to feel overwhelmed by the exact weight type they should use.
In Stata:
Each command has its own idea of the "natural" kind of weight. The command will tell you what kind of weight it is assuming and perform the request as if you specified that kind of weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, but clearly not in this function! :-)
Contrary to Stata, our commands don't "tell" the user what "idea of the 'natural' kind of weight" they have. So it would really not help users to make silent assumptions behind their back. Anyway if people have weights they would better declare them using the right type as early as possible so that they get the right answer automatically.
It's too bad that contrary to Stata our definition of quantiles isn't independent from the type of weights, but at this point we'll just have to bite the bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like that too. I don't think it's too late.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like commands that talk to the user. Either a command succeeds and it should just do what is requested, or it doesn't and it should indicate a way to make it succeed. Printing warnings during normal operation is just annoying, we'd better explain how to do things properly. Adding f
, p
or a
in front of weights
isn't that costly, and anyway if users doesn't know what letter to use they won't understand the warning (and likely do something incorrect).
Also I don't think is common for Julia functions to print messages like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the second point ;)
Incorporated your suggestion. Sorry I answered to your remarks as new comments — you can see them in "Files Changed". |
@deprecate wquantile(v::RealVector, w::RealVector, p::RealVector) quantile(v, weights(w), p) | ||
@deprecate wquantile(v::RealVector, w::RealVector, p::Number) quantile(v, weights(w), [p])[1] | ||
@deprecate wmedian(v::RealVector, w::AbstractWeights{<:Real}) median(v, w) | ||
@deprecate wmedian(v::RealVector, w::RealVector) median(v, weights(w)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the time, but clearly not in this function! :-)
Contrary to Stata, our commands don't "tell" the user what "idea of the 'natural' kind of weight" they have. So it would really not help users to make silent assumptions behind their back. Anyway if people have weights they would better declare them using the right type as early as possible so that they get the right answer automatically.
It's too bad that contrary to Stata our definition of quantiles isn't independent from the type of weights, but at this point we'll just have to bite the bullet.
Honestly the difference does not matter for 99.99% of vectors. For me it'd be a bit like having to specify the type of quantile (1, 2... 7) etc every time I use the |
That's not really comparable IMHO, as these quantile types all aim to estimate the same quantity (and they converge with large sample sizes). OTC if you have frequency weights and you pass them without noticing they are treated as analytical weights, the interpretation is clearly incorrect. What's the point in spending so much work in finding the right definitions if in the end we don't think it matters? Also, people could be confused if they pass
Adding an error can wait for another PR, but the deprecation should suggest using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks! |
and returns an error for non integer FrequencyWeights.
See #435
I wish there was a NEWS section to document the change, because
wmedian
now gives a different result.