-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
286 lines (207 loc) · 8.52 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# extractox <img src='man/figures/extractox.png' align='right' height="139px" alt='logo' style="float:right; height:139px;">
<!-- badges: start -->
[![R-CMD-check](https://github.com/c1au6i0/extractox/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/c1au6i0/extractox/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/extractox)](https://CRAN.R-project.org/package=extractox)
<!-- [![DEV version](https://img.shields.io/badge/devel%20version-1.0.0.9000-blue.svg)](https://github.com/c1au6i0/extractox) -->
<!-- badges: end -->
`extractox` is a comprehensive R package designed to simplify querying various
chemical, toxicological, and biological databases:
* the Integrated Chemical Environment (ICE) of the National Toxicology Program (NTP).
* the Integrated Risk Information System (IRIS) of the Environmental Protection
Agency (EPA).
* the Provisional Peer-Reviewed Toxicity Values (PPRTVs) of EPA.
* the Computational Toxicology Chemicals Dashboard Resource Hub (CompTox) of EPA,
* the monographs of International Agency for Research on Cancer of the World
Health Organization (WHO).
* PubChem of the National Center for Biotechnology Information/National Institutes
Of Health (NCBI, NIH).
* the Comparative Toxicogenomics Database (CTP) of the MDI Biological Laboratory
and NC State University.
The package facilitates interaction with APIs, providing curated and user-friendly
outputs.
To communicate with Pubchem, `extractox` relies on the package`webchem`.
## Installation
Install the package `extractox` from CRAN.
```r
# From CRAN
install.packages("extractox")
```
### Development Version
To get a bug fix or to use a feature from the development version, you can
install the development version of `extractox` from GitHub.
```r
# Install pak if not already installed
if (!requireNamespace("pak", quietly = TRUE)) {
install.packages("pak")
}
# Install the package from GitHub
pak::pkg_install("c1au6i0/extractox")
```
## Features
### NTP's ICE
The ICE database provides access to a variety of data related to chemical toxicity,
exposure, and risk assessment. It includes data from high-throughput screening
assays, in vivo studies, and computational models.
`extr_ice` provides access to NTP’s ICE database for toxicological and
exposure-related data.
```{r}
library(extractox)
# assays is null so all assays are retrieved
ice_data <- extr_ice(casrn = c("50-00-0"), assays = NULL, verbose = FALSE)
names(ice_data)
```
There are more than **2000 possible assays** in ICE. The `extr_ice_assay_names()`
function allows to search for assay names that match a pattern you're interested in.
Please note that searches are case sensitive and accept regexp.
```{r}
extr_ice_assay_names("Rat Acute", verbose = FALSE) # keep empty to retrieve all
```
### EPA's IRIS
The IRIS database contains information on the health effects of exposure to various
substances found in the environment. It provides qualitative and quantitative health
risk information.
`extr_iris` provides access to EPA’s IRIS database and accepts queries CASRN or
IUPAC names of chemicals.
```{r}
iris_info <- extr_iris(c("glyphosate", "50-00-0"), verbose = FALSE)
names(iris_info)
```
### EPA's PPRTVs
The `extr_pprtv` function allows you to extract data for specified identifiers
(CASRN or chemical names) from the EPA's PPRTVs database. This function retrieves
the file containing all the chemicals and processes it, but if has an argument
(`force`) to allow users to use a cached file or force a fresh download.
```{r}
# Example usage to extract data for a specific CASRN
dat_pprtv_1 <- extr_pprtv(ids = "107-02-8", search_type = "casrn", verbose = FALSE)
# Example usage to extract data for a chemical name
dat_pprtv_2 <- extr_pprtv(ids = "Acrolein", search_type = "name", verbose = FALSE)
```
### EPA's CompTox
The CompTox Chemistry Dashboard provides access to data on chemical structures,
properties, and associated bioactivity data. It integratesdata from various sources
to support chemical safety assessments.
`extr_comptox` extracts data from CompTox using either CASRN or IUPAC names of
chemicals and returns a **list of dataframes**.
```{r}
info_comptox <- extr_comptox(ids = c("Aspirin", "50-00-0"), verbose = FALSE)
names(info_comptox)
```
### WHO's IARC
The IARC Monographs database contains evaluations of the carcinogenic risks of various
substances to humans. It provides detailed information about Monographs, including
publication volumes and years, evaluation years, and additional relevant details.
The function `extr_monograph` provides access to the WHO IARC Monographs database
and accepts queries using CASRN or the names of chemicals.
```{r}
dat <- extr_monograph(
search_type = "casrn",
ids = c("105-74-8", "120-58-1"),
verbose = FALSE
)
str(dat)
# Example usage for name search
dat2 <- extr_monograph(
search_type = "name",
ids = c("Aloe", "Schistosoma", "Styrene")
)
```
### NIH's PubChem
PubChem provides information on chemical structures, identifiers, chemical and
physical properties, biological activities, safety and toxicity information,
patents, literature citations, and more.
A series of functions that rely on the `webchem` package are used to extract
chemical information, Globally Harmonized System (`GHS`) classification data, or
flavor classification (`FEMA`) from PubChem.
The function `extr_chem_info` retrieves chemical information of IUPAC-named
chemicals. A warning is displayed if the chemical is not found.
```{r}
chem_info <- extr_chem_info(
iupac_names = c("Formaldehyde", "Aflatoxin B1"),
verbose = FALSE
)
names(chem_info)
```
Two functions are used to extract specific sections of PubChem chemical information using CASRN:
- `extr_pubchem_ghs` extracts GHS codes.
- `extr_pubchem_fema` extracts FEMA data.
```{r}
ghs_info <- extr_pubchem_ghs(casrn = c("50-00-0", "64-17-5"), verbose = FALSE)
fema_info <- extr_pubchem_fema(casrn = c("50-00-0", "123-68-2"), verbose = FALSE)
```
### Tox Info
The function `extr_tox` is a wrapper used to call all the above-mentioned
functions and retrieve a list of dataframes.
```{r}
info_tox <- extr_tox(casrn = c("Aspirin", "50-00-0"), verbose = FALSE)
```
### MDI's CTD
The CTP provides information about the interactions between chemicals, genes,
and diseases. It helps in understanding the effects of environmental exposures
on human health.
A series of functions interact with the CTP database.
`extr_ctd` extracts information related to chemical-gene or pathway associations.
```{r}
input_terms <- c("50-00-0", "64-17-5", "methanal", "ethanol")
ctd_association <- extr_ctd(
input_terms = input_terms,
category = "chem",
report_type = "genes_curated",
input_term_search_type = "directAssociations",
action_types = "ANY",
ontology = c("go_bp", "go_cc"),
verbose = FALSE
)
names(ctd_association)
# Get expresssion data
ctd_expression <- extr_ctd(
input_terms = input_terms,
report_type = "cgixns",
category = "chem",
action_types = "expression",
verbose = FALSE
)
names(ctd_expression)
```
Tetramers are computationally generated information units that interrelate four
data types from the CTP: a chemical, gene product, phenotype, and disease. They
help in understanding the complex relationships between these entities and their
combined impact on human health.
`extr_tetramer` extracts info related to tetramers from CTD.
```{r echo=FALSE}
Sys.sleep(3)
```
```{r}
tetramer_data <- extr_tetramer(
chem = c("50-00-0", "ethanol"),
disease = "",
gene = "",
go = "",
input_term_search_type = "directAssociations",
qt_match_type = "equals",
verbose = FALSE
)
names(tetramer_data)
```
## Important Note for Linux Users
Please note that functions that pull data from EPA servers may encounter issues
on some Linux systems. This is because these servers do not accept secure legacy
renegotiation. On Linux system, those functions depend on `curl` and `OpenSSL`,
which have known problems with unsafe legacy renegotiation in newer versions.
One workaround is to downgrade to `curl v7.78.0` and `OpenSSL v1.1.1`. However,
please be aware that using these older versions might introduce potential security
vulnerabilities.
Refer to [this gist](https://gist.github.com/c1au6i0/5cc2d87966340a31032ffebf1cfb657c)
for instructions on how to downgrade `curl` and `OpenSSL` on Ubuntu.