You are free to use these R functions and R scripts. This project is licensed under the MIT license - see the License file for details.
Please acknowledge the original referenced papers for their methodology as indicated within the specific R functions or R scripts under @references
. Moreover, if you use this code in your research or find it helpful, please consider acknowledging this repository (https://github.com/bonifazi/TuttiFrutti) by citing it as:
TuttiFrutti: A Collection of Miscellaneous R Functions and R Scripts. Bonifazi R. 2023. GitHub Repository. DOI:10.5281/zenodo.11185709. Available online: https://github.com/bonifazi/TuttiFrutti
@misc{TuttiFrutti,
title={TuttiFrutti: A Collection of Miscellaneous R Functions and R Scripts},
author={Renzo Bonifazi},
year={2023},
howpublished={\url{https://github.com/bonifazi/TuttiFrutti}},
doi={https://doi.org/10.5281/zenodo.11185710}
}
Your suggestions on improving these functions and scripts are very welcome. If you have any suggestions, questions, or need support, feel free to contact [email protected], or open an issue here on GitHub.
The R functions are all documented with the R docstring package. Read the documentation to know more about how to use them, and what options are available. Where possible, I provide some examples to show their functionalities.
To view the functions' documentation, first load the docstring
package in R, then view the documentation of the function by running docstring(fun = <function_name>)
in the console.
One quick way is to source
R functions directly from GitHub. For this option, go to the function link, click on Raw
, copy the URL, and paste it into source
directly in R. For instance:
source("https://raw.githubusercontent.com/bonifazi/TuttiFrutti/main/compute_LR_stats.R")
Another way is to just copy-paste the GitHub code in an .R
file saved in your machine and source
it locally.
- Plot convergence of MiXBLUP.
MiXBLUP
has a gnuplot code that automatically generates a convergence graph. When gnuplot is not available on your system, you can generate the same graph using this function. The output is aggplot2
R object. - Convert a MiX99 parameter file into (co)variance matrices.
MiX99
parameter file for (co)variance components has a lower triangular 'long' format as "effect_number,i,j,covar_value
". This function converts it into full symmetric (co)variance matrices. - Rebase (G)EBVs. A function to rebase vector(s) of EBVs given a list of animals to be considered as the base population.
- Compute LR method statistics. A function that, taking EBVs from a partial and a whole evaluation, computes LR method statistics following Legarra and Reverter, 2018, GSE, 50:53. Several options are available. The main ones are that: 1) the statistics can be computed with and without providing a 'focal group' of individuals, i.e., a validation group; 2) Plotting can be used to investigate better if the validation group is homogenous. 3) Bootstrapping with replacement can be used to compute standard errors of the estimated statistics.
- Compute general validation statistics. Function to compute general validation statistics from NextGP software (to be documented).
- Functions for the integration of international (G)EBVs into national evaluations. R functions for integrating (genomic) estimated breeding values ((G)EBVs) into national evaluations following Bonifazi et al., 2023, GSE. See the README for more information.
- ... [new functions will be added here]
These Rscripts are to be run in a command-line style, e.g. Rscript --vanilla script.R --option1 data --option2 output.csv
. They have a help option which you can call using: Rscript --vanilla script.R -h
or Rscript --vanilla script.R --help
.
- Analyse Plink ROH. A command-line Rscript to analyse genomic inbreeding from Runs of Homozygosity (ROH) obtained from Plink using the
detectRUNS
R package. This script produces .csv and .pdf files on ROH inbreeding. To run it:
Rscript --vanilla Analyse_Plink_ROH.R --plink_files plinkcleaned --plink_roh ROH.hom --group geno_BRD --pedigree ped.ped --output results_dir 2>&1 | tee logfile.log
To see a description of each argument use:Rscript Analyse_Plink_ROH.R --help.
Note that thedetectRuns
package groups the results based on the number of groups in the first column of the ROH files, which I guess can be used if you want to define sub-populations. For now, the Rscript internally overrides the ‘group’ column of the--plink_roh
file with that given in the--group
label. - Extract EBV and REL from asreml .sln file. A command-line Rscript to extract EBV and REL from asreml .sln file. This script produces a .csv file with ID, EBV, REL, and (user-provided) VAR(A) for each trait in the asreml solution file (.sln). To run it:
Rscript --vanilla ExtractAsremlSolutions.R --file my_path/asreml.sln --effect_name effect5 --trait_names "Trait1, Trait2, Trait3" --varA "varA_trait1, varA_trait2, varA_trait3" --output myoutput.csv
To see a description of each argument use:Rscript ExtractAsremlSolutions.R --help.
Note: inbreeding is currently ignored in the REL calculation. - Plot postgibbsf90 convergence. Rscript to plot postgibbsf90 covariances. The script produces a .pdf file. See
--help
for usage and read the details section of the Rscript for more information on input and settings. To run it:
Rscript --vanilla plot_postgibbsf90.R --file my_path/postgibbs_samples --output postgibbs_plots.pdf
or with more control on output, e.g., adding trait names and making two main groups:
Rscript --vanilla plot_postgibbsf90.R -f my_path/postgibbs_samples -o postgibbs_plots.pdf -g 2 -t "BRD1_AWW, BRD2_AWW, BRD1_CE, BRD2_CE"
- ... [new Rscripts will be added here]
TuttiFrutti is used for products that mix different colours and flavours such as candies. My Swedish colleague often told me 'TuttiFrutti!' to show off his Italian skills. In Italian, 'Tutti Frutti' translates to 'all fruits', which is similar to what this repository is intended to be: a mix of different R things (and maybe more later on). DYK: TuttiFrutti comes with a soundtrack 😄