Skip to content

Commit

Permalink
docs: minor fixes to Readme + revedep [skipci]
Browse files Browse the repository at this point in the history
  • Loading branch information
c1au6i0 committed Jan 2, 2025
1 parent d64ebec commit 1cb057e
Show file tree
Hide file tree
Showing 12 changed files with 286 additions and 238 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@
^inst/img$
^CRAN-SUBMISSION$
justfile$
^revdep$
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: extractox
Title: Extract Tox Info from Various Databases
Version: 0.1.9003
Version: 1.0.0
Authors@R: c(
person("Claudio", "Zanettini", , "[email protected]", role = c("aut", "cre", "cph"),
comment = c(ORCID = "0000-0001-5043-8033")),
Expand Down
2 changes: 1 addition & 1 deletion R/extr_monograph.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#' internal data in the package.
#'
#' @param search_type A character string specifying the type of search to
#' perform. Valid options are "cas_rn" (CAS Registry Number) and "name"
#' perform. Valid options are "casrn" (CAS Registry Number) and "name"
#' . (name of the chemical). If `search_type` is "casrn", the function filters
#' . by the CAS Registry Number.
#' If `search_type` is "name", the function performs a partial match search
Expand Down
154 changes: 107 additions & 47 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,26 @@ knitr::opts_chunk$set(
<!-- badges: start -->
[![R-CMD-check](https://github.com/c1au6i0/extractox/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/c1au6i0/extractox/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/extractox)](https://CRAN.R-project.org/package=extractox)
[![DEV version](https://img.shields.io/badge/devel%20version-0.1.9003-blue.svg)](https://github.com/c1au6i0/extractox)
<!-- [![DEV version](https://img.shields.io/badge/devel%20version-0.1.9003-blue.svg)](https://github.com/c1au6i0/extractox) -->
<!-- badges: end -->

`extractox` is a comprehensive R package designed to simplify querying various chemical, toxicological, and biological databases such as the Comparative Toxicogenomics Database (CTP) of the MDI Biological Laboratory and NC State University, the Integrated Chemical Environment (ICE) of the National Toxicology Program (NTP), the Integrated Risk Information System (IRIS) of the Environmental Protection Agency (EPA), the CompTox Chemicals Dashboard Resource Hub (CompTox) of EPA, and PubChem of the National Center for Biotechnology Information/National Institures Of Health (NCBI, NIH). The package facilitates interaction with APIs, providing curated and user-friendly outputs.

`extractox` is a comprehensive R package designed to simplify querying various
chemical, toxicological, and biological databases:

* the Integrated Chemical Environment (ICE) of the National Toxicology Program (NTP).
* the Integrated Risk Information System (IRIS) of the Environmental Protection
Agency (EPA).
* the Provisional Peer-Reviewed Toxicity Values (PPRTVs) of EPA.
* the Computational Toxicology Chemicals Dashboard Resource Hub (CompTox) of EPA,
* the monographs of International Agency for Research on Cancer of the World
Health Organization (WHO).
* PubChem of the National Center for Biotechnology Information/National Institutes
Of Health (NCBI, NIH).
* the Comparative Toxicogenomics Database (CTP) of the MDI Biological Laboratory
and NC State University.

The package facilitates interaction with APIs, providing curated and user-friendly
outputs.
To communicate with Pubchem, `extractox` relies on the package`webchem`.

## Installation
Expand All @@ -35,7 +50,8 @@ install.packages("extractox")
```
### Development Version

To get a bug fix or to use a feature from the development version, you can install the development version of `extractox` from GitHub.
To get a bug fix or to use a feature from the development version, you can
install the development version of `extractox` from GitHub.

```r
# Install pak if not already installed
Expand All @@ -49,115 +65,146 @@ pak::pkg_install("c1au6i0/extractox")

## Features

### ICE Database
### NTP's ICE

The ICE database, managed by the EPA, provides access to a variety of data related to chemical toxicity, exposure, and risk assessment. It includes data from high-throughput screening assays, in vivo studies, and computational models.
The ICE database provides access to a variety of data related to chemical toxicity,
exposure, and risk assessment. It includes data from high-throughput screening
assays, in vivo studies, and computational models.

`extr_ice` provides access to NTP’s ICE database for toxicological and exposure-related data.
`extr_ice` provides access to NTP’s ICE database for toxicological and
exposure-related data.


```{r}
library(extractox)
ice_data <- extr_ice(casrn = c("50-00-0"), assays = NULL) # assays is null so all assays are retrieved
# assays is null so all assays are retrieved
ice_data <- extr_ice(casrn = c("50-00-0"), assays = NULL, verbose = FALSE)
names(ice_data)
```

There are more than **2000 possible assays** in ICE. The `extr_ice_assay_names()` function allows to search for assay names that match a pattern you're interested in. Please note that searches are case sensitive and accept regexp.
There are more than **2000 possible assays** in ICE. The `extr_ice_assay_names()`
function allows to search for assay names that match a pattern you're interested in.
Please note that searches are case sensitive and accept regexp.

```{r}
extr_ice_assay_names("Rat Acute") # keep empty to retrieve all
extr_ice_assay_names("Rat Acute", verbose = FALSE) # keep empty to retrieve all
```


### EPA IRIS
### EPA's IRIS

The IRIS database is managed by the EPA and contains information on the health effects of exposure to various substances found in the environment. It provides qualitative and quantitative health risk information.
The IRIS database contains information on the health effects of exposure to various
substances found in the environment. It provides qualitative and quantitative health
risk information.

`extr_iris` provides access to EPA’s IRIS database and accepts queries CASRN or IUPAC names of chemicals.
`extr_iris` provides access to EPA’s IRIS database and accepts queries CASRN or
IUPAC names of chemicals.

```{r}
iris_info <- extr_iris(c("glyphosate", "50-00-0"))
iris_info <- extr_iris(c("glyphosate", "50-00-0"), verbose = FALSE)
names(iris_info)
```


### EPA PPRTVs
### EPA's PPRTVs

The `extr_pprtv` function allows you to extract data for specified identifiers (CASRN or chemical names) from the EPA's Provisional Peer-Reviewed Toxicity Values (PPRTVs) database. This function retrieves and processes data, with options to use cached files or force a fresh download if necessary.
The `extr_pprtv` function allows you to extract data for specified identifiers
(CASRN or chemical names) from the EPA's PPRTVs database. This function retrieves
the file containing all the chemicals and processes it, but if has an argument
(`force`) to allow users to use a cached file or force a fresh download.

```{r}
# Example usage to extract data for a specific CASRN
extr_pprtv(ids = "107-02-8", search_type = "casrn", verbose = TRUE)
dat_pprtv_1 <- extr_pprtv(ids = "107-02-8", search_type = "casrn", verbose = FALSE)
# Example usage to extract data for a chemical name
extr_pprtv(ids = "Acrolein", search_type = "name", verbose = TRUE, force = FALSE)
dat_pprtv_2 <- extr_pprtv(ids = "Acrolein", search_type = "name", verbose = FALSE)
```


### CompTox
### EPA's CompTox

The CompTox Chemistry Dashboard, managed by the EPA, provides access to data on chemical structures, properties, and associated bioactivity data. It integrates data from various sources to support chemical safety assessments.
The CompTox Chemistry Dashboard provides access to data on chemical structures,
properties, and associated bioactivity data. It integratesdata from various sources
to support chemical safety assessments.

`extr_comptox` extracts data from the CompTox Chemistry Dashboard using both CASRN and IUPAC names of chemicals and returns a list of **dataframes**.
`extr_comptox` extracts data from CompTox using either CASRN or IUPAC names of
chemicals and returns a **list of dataframes**.

```{r}
info_comptox <- extr_comptox(ids = c("Aspirin", "50-00-0"))
info_comptox <- extr_comptox(ids = c("Aspirin", "50-00-0"), verbose = FALSE)
names(info_comptox)
```


### WHO IARC
### WHO's IARC

The IARC Monographs database is managed by the World Health Organization (WHO) International Agency for Research on Cancer (IARC) and contains evaluations of the carcinogenic risks of various substances to humans. It provides detailed information about Monographs, including publication volumes and years, evaluation years, and additional relevant details.
The IARC Monographs database contains evaluations of the carcinogenic risks of various
substances to humans. It provides detailed information about Monographs, including
publication volumes and years, evaluation years, and additional relevant details.

The functione `extr_monograph` provides access to the WHO IARC Monographs database and accepts queries using CASRN or the names of chemicals.
The function `extr_monograph` provides access to the WHO IARC Monographs database
and accepts queries using CASRN or the names of chemicals.

```{r}
dat <- extr_monograph(search_type = "casrn", ids = c("105-74-8", "120-58-1"))
dat <- extr_monograph(search_type = "casrn",
ids = c("105-74-8", "120-58-1"),
verbose = FALSE)
str(dat)
# Example usage for name search
dat2 <- extr_monograph(search_type = "name", ids = c("Aloe", "Schistosoma", "Styrene"))
dat2 <- extr_monograph(search_type = "name",
ids = c("Aloe", "Schistosoma", "Styrene")
)
```


### PubChem Database
### NIH's PubChem

PubChem is an open chemistry database at the NIH. It provides information on chemical structures, identifiers, chemical and physical properties, biological activities, safety and toxicity information, patents, literature citations, and more.
PubChem provides information on chemical structures, identifiers, chemical and
physical properties, biological activities, safety and toxicity information,
patents, literature citations, and more.

A series of functions that rely on the `webchem` package are used to extract chemical information, Globally Harmonized System (`GHS`) classification data, or flavor classification from PubChem.
A series of functions that rely on the `webchem` package are used to extract
chemical information, Globally Harmonized System (`GHS`) classification data, or
flavor classification (`FEMA`) from PubChem.

The function `extr_chem_info` retrieves chemical information of IUPAC-named chemicals. A warning is displayed if the chemical is not found.
The function `extr_chem_info` retrieves chemical information of IUPAC-named
chemicals. A warning is displayed if the chemical is not found.

```{r}
chem_info <- extr_chem_info(iupac_names = c("Formaldehyde", "Aflatoxin B1"))
chem_info <- extr_chem_info(iupac_names = c("Formaldehyde", "Aflatoxin B1"),
verbose = FALSE)
names(chem_info)
```


Two functions are used to extract specific sections of PubChem chemical information using CASRN:

- `extr_pubchem_ghs` extracts Globally Harmonized System (GHS) codes.
- `extr_pubchem_fema` extracts flavor profile data.
- `extr_pubchem_ghs` extracts GHS codes.
- `extr_pubchem_fema` extracts FEMA data.


```{r}
ghs_info <- extr_pubchem_ghs(casrn = c("50-00-0", "64-17-5"))
fema_info <- extr_pubchem_fema(casrn = c("50-00-0", "123-68-2"))
ghs_info <- extr_pubchem_ghs(casrn = c("50-00-0", "64-17-5"), verbose = FALSE)
fema_info <- extr_pubchem_fema(casrn = c("50-00-0", "123-68-2"), verbose = FALSE)
```


### Tox Info

The function `extr_tox` is a wrapper used to call all the above-mentioned functions and retrieve a list of dataframes.
The function `extr_tox` is a wrapper used to call all the above-mentioned
functions and retrieve a list of dataframes.

```{r}
info_tox <- extr_tox(casrn = c("Aspirin", "50-00-0"))
info_tox <- extr_tox(casrn = c("Aspirin", "50-00-0"), verbose = FALSE)
```

### CTD Database
### MDI's CTD

The CTP provides information about the interactions between chemicals, genes, and diseases. It helps in understanding the effects of environmental exposures on human health.
The CTP provides information about the interactions between chemicals, genes,
and diseases. It helps in understanding the effects of environmental exposures
on human health.

A series of functions interact with the CTP database.

Expand All @@ -171,7 +218,8 @@ ctd_association <- extr_ctd(
report_type = "genes_curated",
input_term_search_type = "directAssociations",
action_types = "ANY",
ontology = c("go_bp", "go_cc")
ontology = c("go_bp", "go_cc"),
verbose = FALSE
)
Expand All @@ -180,16 +228,20 @@ ctd_expression <- extr_ctd(
input_terms = input_terms,
report_type = "cgixns",
category = "chem",
action_types = "expression"
action_types = "expression",
verbose = FALSE
)
names(ctd_expression)
```


Tetramers are computationally generated information units that interrelate four data types from the CTP: a chemical, gene product, phenotype, and disease. They help in understanding the complex relationships between these entities and their combined impact on human health.
Tetramers are computationally generated information units that interrelate four
data types from the CTP: a chemical, gene product, phenotype, and disease. They
help in understanding the complex relationships between these entities and their
combined impact on human health.

`extr_tetramer` extract info related to tetramers from CTD.
`extr_tetramer` extracts info related to tetramers from CTD.

```{r echo=FALSE}
Sys.sleep(3)
Expand All @@ -202,7 +254,8 @@ tetramer_data <- extr_tetramer(
gene = "",
go = "",
input_term_search_type = "directAssociations",
qt_match_type = "equals"
qt_match_type = "equals",
verbose = FALSE
)
names(tetramer_data)
Expand All @@ -211,8 +264,15 @@ names(tetramer_data)

## Important Note for Linux Users

Please note that functions that pull data from EPA servers may encounter issues on some Linux systems. This is because these servers do not accept secure legacy renegotiation. On Linux system, those functions depend on `curl` and `OpenSSL`, which have known problems with unsafe legacy renegotiation in newer versions. One workaround is to downgrade to `curl v7.78.0` and `OpenSSL v1.1.1`. However, please be aware that using these older versions might introduce potential security vulnerabilities.
Refer to [this gist](https://gist.github.com/c1au6i0/5cc2d87966340a31032ffebf1cfb657c) for instructions on how to downgrade `curl` and `OpenSSL` on Ubuntu.
Please note that functions that pull data from EPA servers may encounter issues
on some Linux systems. This is because these servers do not accept secure legacy
renegotiation. On Linux system, those functions depend on `curl` and `OpenSSL`,
which have known problems with unsafe legacy renegotiation in newer versions.
One workaround is to downgrade to `curl v7.78.0` and `OpenSSL v1.1.1`. However,
please be aware that using these older versions might introduce potential security
vulnerabilities.
Refer to [this gist](https://gist.github.com/c1au6i0/5cc2d87966340a31032ffebf1cfb657c)
for instructions on how to downgrade `curl` and `OpenSSL` on Ubuntu.



Expand Down
Loading

0 comments on commit 1cb057e

Please sign in to comment.