-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculating RDA on large datasets #527
Comments
I think memory may be a bigger issue than speed: time has no limit, memory has. There is no special handling of large data sets in Another issue is that there are no safeguards for simple stats for 1e7 observations. Things like sum, mean, variance can become unreliable with such a huge number of observations. I don't know, because the code was never developed or tested for such cases. It may be OK, or it may not be OK. |
@TonyKess : I had a look at your profile. If RDA can help in getting halibut in fishmongers, I hope you can make RDA work. Halibut is my favourite! |
Thanks for this advice! We are checking out BLAS now, and looking into building some checks on internal stats for when we are using really large datasets. We've used RDA successfully on Halibut, but have some other tasty species to use it on now too! |
Hello,
We're using the RDA function to carry out genome scans for signals of adaptation similar to this paper - we are beginning to run into speed problems with very large datasets (e.g. 1e+7 x 1000 matrices). Are there any solutions for speeding up computation of the RDA for very large datasets? We have looked into parallelizing across subsets of the data, but I was curious if there were other methods available. Any advice appreciated!
The text was updated successfully, but these errors were encountered: