-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add essential packages for statistics #4
Conversation
ManifoldLearning.jl should be forked into JuliaStats, master tagged, and added here IMO. |
This makes the package useful again.
@nalimilan Great that you are pushing this. It will make things much more user friendly. One of the open questions is how documention should be handled. Maybe examples of anaylses that uses functionality across several packages would be useful, leaving the actual API documention to the individual packages.
I'm not sure if it is an obvious candidate for inclusion here. I think the idea here is to cover the standard stuff that you'd see in stats courses. |
Are manifold-based methods and TSne not in standard stats courses by now? I wouldn't be able to find a stats-based computational bio course without them. |
Yes, that's a difficult question. Maybe the ideal would be to have a tutorial exposing the most common features of each domain, and redirecting to packages for more details. But that's a lot of work. So maybe we can just start with links to the package's manuals on each line? Regarding ManifoldLearning.jl, I have no idea what it is so I can't really say. One good criterion would be whether other statistical environment provide it by default. |
computational bio is not statistics. At least not the flavors of it that I've seen. |
Alright, I'll leave it alone. The best solution down the line is probably to add that stuff to MultivariateStats.jl which has the other half of the commonly used dimensional reduction methods. The others that come to mind for me are LOESS.jl and Bootstrap.jl. At least to me, anything further is probably "specialized" and those are sitting right on the cutoff line.
There's tons of flavors to the point where computational/systems biology needs a word in front of it to really be descriptive. |
Good point, I've added Bootstrap and Loess. I missed the latter because it's not listed on the website, we should update it (and remove unmaintained packages). Also, shouldn't Loess be renamed to LOESS? |
Should we include RDatasets? I know people who use that for their demos. |
There is CovarianceMateices.jl. I am working on making it generic, but as it is is a nice complement to GLM.jl (in certain fields m, these variances are the standard ones). |
Want Jackknife? |
Great list. What about MixedModels? Would be nice to have that really integrated into the ecosystem here. In ecology at least nobody seems to do a GLM without random effects these days. |
|
Jackknife doesn't export anything, so you have to call them as |
Btw we may want to do some serious cleanup and dedicated maintenance if we're going to fully endorse all of these packages. While I think most are fine, I don't know that anybody really tends to MultivariateStats these days. |
Such an important package though. |
Yeah, it's chicken and egg. I think you put it in so that way it has to be maintained. FWIW it's already widely used and right now it works. Maybe it just hasn't been touched because it's working just fine. But yes,
It has a lot of stuff in there, but at least PCA is pretty standard in most toolkits. |
A few comments:
|
I'm happy to reconcile the exported |
Want MultivariateTests? It'd just have to be registered first. |
Yeah, but why not add these tests to HypothesisTests instead? |
Yeah, I suppose they would work just fine there, good point. They were originally separate because it started as a project for my master's program. 😛 |
I think the list should be shorter rather than longer. IMO, only packages that proved themselves useful/popular with end-users should be in this list. Otherwise, this list may give the impression that a lot of things are "done" in Julia, which is not true and which potentially stiffens innovation. |
What packages above does not fullfill these criteria in your opinion? |
I do not really know a lot of these packages. But it just seems safer to me to start with a small list of packages, and then expand it, rather than removing existing functionalities. |
So basically you object about Bootstrap, KernelDensity, Loess, Jackknife and CovarianceMatrices? Care to develop why? |
Are KDE.jl and LOESS.jl reasonably complete to be worth including? |
I think that MultipleTesting.jl should be included in the list of essential packages as well! Both the Benjamini-Hochberg procedure (and the related Storey procedure) are ubiquitous in high-throughput studies. R provides some of that functionality through the @nalimilan For what it is worth, if by KDE.jl you mean KernelDensity.jl then whenever I needed it, it has been useful and it seems to have the (basic) required functionality (and I think a nonparametric density estimator falls within the "essential" category). |
Hi, I'm just curious where I can learn about the plans for this package? |
I'm not aware of any plans besides this PR. I think we should make a decision and merge it. |
Let's merge what's here now. We can adjust later if needed. I'll do it tomorrow if nobody objects. |
If anybody thinks a package should be added or removed from the list, please file a new issue. |
This makes the package useful again.