-
Notifications
You must be signed in to change notification settings - Fork 5
/
README.Rmd
175 lines (139 loc) · 6.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
library(knitr)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
knit_print.data.frame = function(x, ...) {
res = paste(c("", "", knitr::kable(x, output = FALSE)), collapse = "\n")
asis_output(res)
}
# register the method
registerS3method("knit_print", "data.frame", knit_print.data.frame)
library("FUNGuildR")
```
# FUNGuildR
<!-- badges: start -->
[![R-CMD-check](https://github.com/brendanf/FUNGuildR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/brendanf/FUNGuildR/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/brendanf/FUNGuildR/branch/master/graph/badge.svg)](https://codecov.io/gh/brendanf/FUNGuildR?branch=master)
<!-- badges: end -->
`FUNGuildR` is a tool for assigning trait information based on matching to a
taxonomic classification, using the
[FUNGuild database](http://www.funguild.org). In normal use, the database is
queried for each use, because FUNGuild are continually updated as new
information is submitted. However, `FUNGuildR` also includes functions to
download the database and store it as an R object to speed up repeated queries,
make queries reproducible over time, and allow local queries without internet
access.
## Installation
For the moment, `FUNGuildR` can only be installed from GitHub.
To install, be sure you have installed the [devtools](https://CRAN.R-project.org/package=devtools) package,
and then type:
``` r
devtools::install_github("brendanf/FUNGuildR")
```
## Guild assignment
The main function is `funguild_assign()`.
It takes as its only required argument a `data.frame`, which should contain
a `character` column named "Taxonomy".
It returns a version of the same `data.frame` (as a `tibble`) with additional
columns:
**taxon**: The name of the matched taxon.
**guid**: [Globally Unique Identifier](http://guid.one/guid) for the database
record.
**mbNumber**: The [MycoBank](https://www.mycobank.org) number of the taxon.
**taxonomicLevel**: A numeric representation of the taxonomic rank; higher
numbers are lower ranks.
**trophicMode**: A very general overview of how the organism gets its nutrition;
one or more of *Saprotroph*, *Pathotroph*, and *Symbiotroph*.
**guild**: One or more narrower categories for how the organism gets its
nutrition.
**confidenceRanking**: The confidence level of the guild assignment; *Possible*,
*Probable*, or *Highly Probable*.
**growthForm**: The general growth morphology of the organism (or its fruiting
body). Multiple values may be given.
**trait**: Additional traits about the organism, such as wood decay type or
toxicity.
**note**: Additional notes about the entry.
**citationSource**: Citation(s) for the information about the taxon.
That's it!
Here's how it works on a sample database (scroll to see additional columns):
```{r}
sample_fungi
sample_guilds <- funguild_assign(sample_fungi)
sample_guilds
```
For more information about the meaning of the new columns, see the [FUNGuild manual](https://github.com/UMNFuN/FUNGuild/blob/master/FUNGuild_Manual.pdf).
### Input data format
Each value in the *Taxonomy* column of the input `data.frame` should consist of
a comma-, colon-, underscore-, or semicolon-delimited list of taxa which the
organism on that row belongs to.
You can see examples in the `sample_fungi` data presented above.
Taxonomy strings which include taxonomic rank indicators in the styles used by
Sintax ("`k:`", "`p:`"...) or Unite ("`k__`, "`p__`", ...) are also accepted.
Such taxonomic classifications are frequently arranged from the most inclusive
taxon (e.g., Kingdom) to the least inclusive taxon (e.g., Species), but this
is not actually required for FUNGuild.
Not all taxonomic ranks are required; for each row, the algorithm returns
results only for the least inclusive taxon which is present in the database.
### Database caching
By default, `funguild_assign()` downloads the
FUNGuild database each time they are invoked.
In many analysis workflows, where guilds need to be assigned only once, this is
not a problem; because the databases are continuously updated, it is good to
use the most current version.
However, if you are going to call the functions many times, or if you plan to
assign guilds in a situation where you have no internet access, you can cache
the database(s) locally using the functions `get_funguild_db()`.
This returns the database as a `tibble`, which can be passed
as a second argument to `funguild_assign()`.
```r
fung <- get_funguild_db()
# This isn't necessary for a single query, but it works.
fung_guilds <- funguild_assign(sample_fungi, db = fung)
# It might be more useful in this situation
data_guilds <- lapply(many_datasets, funguild_assign, db = fung)
# Or you can save it for later offline use
saveRDS(fung, "funguild.rds")
#And then load it again
fung <- loadRDS("funguild.rds")
```
This strategy can also be used for reproduceable research, to store the same
version of the database which was used in the original analysis.
## Database queries
From the current development version, `FUNGuildR` also allows queries to the
FUNGuild web API. The fields *taxon*, *guid*, *mbNumber*, *trophicMode*,
*guild*, *growthForm*, and *trait* are searchable. For instance, to find all
fungi annotated as causing brown rot (a kind of wood decay):
```{r}
brownrotters <- funguild_query("brown rot", "trait")
nrow(brownrotters)
```
Here are the first few:
```{r}
head(brownrotters)
```
The characters "`%`" and "`*`" can be used as wildcards. For instance, we can
search for all fungi where the wood decay type is listed:
```{r}
allrotters <- funguild_query("* rot", "trait")
nrow(allrotters)
unique(allrotters$trait)
```
Queries can also be run against a locally cached database.
```r
fungi <- get_funguild_database()
allrotters <- funguild_query("* rot", "trait", db = fungi)
```
## NEMAGuild
As of April 2021, the NEMAGuild database is temporarily offline. However,
`FUNGuildR` has functions `nemaguild_assign()` and `get_nemaguild_db()` to
access it in exactly the same ways as the FUNGuild database. These functions
will not work for the time being (unless you already have a cached local copy of
NEMAGuild!) but they should work again when NEMAGuild is back online.