-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted data matrix #12
Comments
I suppose this is possible. It would probably require a fair bit of rewriting, and thinking of how to keep the interface clean. |
It might be a nice feature request. In the meantime I got something working by hand using the output of |
I'm also interested in training using weighted datapoints. Could you share your code for that? Thanks. |
I would think we have to add weight support for the stats() functions in stats.jl. I haven't looked at the math yeat, but I suspect it will probably boil down to a boadcasting multiply of We could add a parameter |
Yes. For my application, I'm using importance sampling, so each data point has an associated weight. Adding a weights parameter seems like the natural way to do it to me. If you want to group the data and weights, then I think it would be better to use a structure of arrays, rather than an array of structures. For some applications, there could be multiple sets of weights for one set of data. E.g., different weights for different choices of priors or different temperatures when using tempering/annealing. I'd propose that those applications are probably best handeled by multiple function calls. But one would want to be able to swap out the weights can be done efficiently, I don't see a problem. Using a structure of arrays also makes it easier and more efficient to combine different pacakges/libraries. |
In the meantime, if you're interested, you can find my hand-rolled version with weights in |
@rgiordan, Thanks. My application was different enough that I ended up writing my own em! replacement that allows for weighted data (and also training a mixture of t-distributions, rather than Gaussians.) I've just written what I need for my project (e.g., full covar matrices, data in memory). If anyone's interested, those additions are at https://github.com/eford/GaussianMixtures.jl in src/eford_extensions.jl). |
I think it would be useful for GaussianMixtures.jl to expose a |
what do you mean by expose a I've gone though your diff's---it seems quite a rewrite of code. Was there not a way to include weighting in existing code? |
There might be a way to incorporate weighting. I tried that at first. But after struggling trying to understand how your code was working, I decided it would be easier to rewrite the training function. Feel free to add functionality in a more general way |
Ah---that sounds like the code isn't very transparant, which is not great. I suppose it could do with some cleanup and rewrites here and there. Anyway---when I add weighting to the original code, we now have independent code that can be used for verification. |
Is there any support in GaussianMixtures for weighted rows in a data matrix? For example, if I have a dataset with many repeated observations, can I pass in a matrix of distinct points and a vector of multiplicities?
The text was updated successfully, but these errors were encountered: