-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from Boehringer-Ingelheim/rc/2.1.0
Rc/2.1.0 to main
- Loading branch information
Showing
12 changed files
with
314 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,25 @@ | ||
Package: dv.loader | ||
Type: Package | ||
Title: Data loading module | ||
Version: 2.0.0 | ||
Version: 2.1.0 | ||
Authors@R: c( | ||
person( "Boehringer-Ingelheim Pharma GmbH & Co.KG", role = c("cph", "fnd")), | ||
person( given = "Ming", family = "Yang", role = c("aut", "cre"), email = "[email protected]"), | ||
person( given = "Steven", family = "Brooks", role = "aut", email = "[email protected]"), | ||
person( given = "Sorin", family = "Voicu", role = "aut", email = "[email protected]") | ||
) | ||
Description: This is a module for loading .RDS / .sas7bdat data files from a network file storage environment. It also allows loading data locally. | ||
Description: A package for loading multiple data files, returning a list of data frames with associated metadata, designed to integrate with the modular DaVinci framework. | ||
License: Apache License (>= 2) | ||
Encoding: UTF-8 | ||
LazyData: true | ||
Depends: R (>= 3.5.0) | ||
Imports: haven | ||
Imports: | ||
haven, | ||
checkmate | ||
Suggests: | ||
testthat, | ||
knitr, | ||
rmarkdown | ||
RoxygenNote: 7.3.0 | ||
VignetteBuilder: knitr | ||
Config/testthat/edition: 3 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ | |
export(get_cre_path) | ||
export(get_nfs_path) | ||
export(load_data) | ||
export(load_files) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,70 @@ | ||
# Data Loading | ||
|
||
The {dv.loader} package provides a simple interface for loading data from a network file storage folder or | ||
locally. It is designed to be used with `.RDS` and `.sas7bdat` file formats. | ||
The package provides a simple function, `load_data()`, which loads R and SAS data files into memory. | ||
Loading data from SQL databases is not yet supported. The function returns a list named by the file names passed, | ||
and containing data frames, along with metadata for that table. By default, the function will look for files in a | ||
sub-directory `sub_dir` of the base path defined by a environment variable "RXD_DATA". You can check if the base path | ||
is set by running `Sys.getenv("RXD_DATA")`. A single file or multiple files can be loaded at once. | ||
To make the loading process faster for large datasets, it is suggested that '.sas7bdat' files are converted to | ||
'.RDS' files. The function will prefer '.RDS' files over '.sas7bdat' files by default. | ||
The `dv.loader` package provides two functions for loading `.rds` and `.sas7bdat` files into R. | ||
|
||
- `load_data()`: loads data files from a specified subdirectory of the base path defined by the environment variable "RXD_DATA". This function is useful when working with data files stored in a centralized location. | ||
- `load_files()`: accepts explicit file paths to load data files from any location on your system. You can optionally provide custom names for the data frames in the returned list. | ||
|
||
## Installation | ||
|
||
The `dv.loader` package is available on GitHub. To install it, you can use the following commands: | ||
|
||
```r | ||
if (!require("remotes")) install.packages("remotes") | ||
remotes::install_github("Boehringer-Ingelheim/dv.loader") | ||
``` | ||
|
||
## Basic usage | ||
After installation, you can load the package using: | ||
|
||
```r | ||
# getting data from a network file storage folder | ||
dv.loader::load_data(sub_dir = "subdir1/subdir2", file_names = c("adsl", "adae")) | ||
library(dv.loader) | ||
``` | ||
|
||
## Basic Usage | ||
|
||
### Using `load_data()` | ||
|
||
The `load_data()` function loads data from the specified subdirectory relative to `RXD_DATA`. For the `file_names` argument, you can optionally specify the file extensions in the names. If not provided, the function will attempt to search for `.rds` and `.sas7bdat` files in the subdirectory and decide which one to load based on the `prefer_sas` argument when both file types are present. By default, `prefer_sas` is `FALSE`, meaning `.rds` files are preferred due to their smaller file size and faster loading time. | ||
|
||
```r | ||
# getting data locally (e.g., if you have file `./data/adsl.RDS`) | ||
dv.loader::load_data(sub_dir = "data", file_names = c("adsl"), use_wd = TRUE) | ||
# Set the RXD_DATA environment variable | ||
Sys.setenv(RXD_DATA = "path/to/data/folder") | ||
|
||
# Load data from path/to/data/folder/subdir1 | ||
load_data( | ||
sub_dir = "subdir1", | ||
file_names = c("file1", "file2"), | ||
prefer_sas = TRUE | ||
) | ||
|
||
# Load data from path/to/data/folder/subdir1/subdir2 | ||
load_data( | ||
sub_dir = "subdir1/subdir2", | ||
file_names = c("file1.rds", "file2.sas7bdat"), | ||
) | ||
``` | ||
|
||
### Using `load_files()` | ||
|
||
The `load_files()` function requires you to provide explicit file paths including the file extensions for the data files you want to load. You can optionally provide custom names for the data frames in the returned list. | ||
|
||
|
||
```r | ||
# Load data files with default names | ||
load_files( | ||
file_paths = c( | ||
"path/to/file1.rds", | ||
"path/to/file2.sas7bdat" | ||
) | ||
) | ||
|
||
# Load data files with custom names | ||
load_files( | ||
file_paths = c( | ||
"file1 (rds)" = "path/to/file1.rds", | ||
"file2 (sas)" = "path/to/file2.sas7bdat" | ||
) | ||
) | ||
``` | ||
|
||
For more details, please refer to the package vignettes and function documentation. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
test_that("load_files() correctly loads both RDS and SAS files", { | ||
rds_file <- "inst/extdata/dummyads1.RDS" | ||
sas_file <- "inst/extdata/dummyads2.sas7bdat" | ||
|
||
data_list <- load_files(file_paths = c(rds_file, sas_file)) | ||
|
||
# Check that default names are correctly assigned based on filenames | ||
expect_equal(names(data_list), c("dummyads1", "dummyads2")) | ||
|
||
# Verify RDS file contents match direct reading | ||
expect_equal(data_list[["dummyads1"]], readRDS(rds_file), ignore_attr = "meta") | ||
|
||
# Verify SAS file contents match direct reading | ||
expect_equal(data_list[["dummyads2"]], haven::read_sas(sas_file), ignore_attr = "meta") | ||
|
||
# Create expected metadata for comparison | ||
rds_metadata <- cbind( | ||
file.info(rds_file, extra_cols = FALSE), | ||
path = rds_file, | ||
file_name = basename(rds_file) | ||
) | ||
sas_metadata <- cbind( | ||
file.info(sas_file, extra_cols = FALSE), | ||
path = sas_file, | ||
file_name = basename(sas_file) | ||
) | ||
row.names(rds_metadata) <- NULL | ||
row.names(sas_metadata) <- NULL | ||
|
||
# Verify metadata is correctly attached to loaded data | ||
expect_equal(attr(data_list[["dummyads1"]], "meta"), rds_metadata) | ||
expect_equal(attr(data_list[["dummyads2"]], "meta"), sas_metadata) | ||
}) | ||
|
||
test_that("load_files() works with different file extensions", { | ||
# GitHub Actions (Assertion on 'file_paths' failed: File does not exist) | ||
expect_error( | ||
load_files(file_paths = c( | ||
"inst/extdata/dummyads1.rds", # extension: RDS | ||
"inst/extdata/dummyads2.SAS7BDAT" # extension: sas7bdat | ||
)) | ||
) | ||
}) | ||
|
||
test_that("load_files() properly validates file extensions", { | ||
expect_error( | ||
load_files(file_paths = c( | ||
"inst/extdata/bad_file_type.myrds", | ||
"inst/extdata/bad_file_type.txt" | ||
)) | ||
) | ||
}) | ||
|
||
test_that("load_files() can return both default and custom names for loaded data", { | ||
# Check that duplicate names are caught and error is thrown | ||
expect_error( | ||
load_files(file_paths = c( | ||
"inst/extdata/just_rds/dummyads1.RDS", | ||
"inst/extdata/just_sas/dummyads1.sas7bdat" | ||
)), | ||
"Duplicate entries detected \\(dummyads1\\). Please review `file_paths` argument." | ||
) | ||
|
||
# Loading files with default names | ||
data_list1 <- load_files( | ||
file_paths = c( | ||
"inst/extdata/just_rds/dummyads1.RDS", | ||
"inst/extdata/just_sas/dummyads2.sas7bdat" | ||
) | ||
) | ||
expect_equal(names(data_list1), c("dummyads1", "dummyads2")) | ||
|
||
# Loading files with custom names | ||
data_list2 <- load_files( | ||
file_paths = c( | ||
"rds_dummyads1" = "inst/extdata/just_rds/dummyads1.RDS", | ||
"sas_dummyads2" = "inst/extdata/just_sas/dummyads2.sas7bdat" | ||
) | ||
) | ||
expect_equal(names(data_list2), c("rds_dummyads1", "sas_dummyads2")) | ||
|
||
# Loading files with mixed naming (custom and default) | ||
data_list3 <- load_files( | ||
file_paths = c( | ||
"rds_dummyads1" = "inst/extdata/just_rds/dummyads1.RDS", | ||
"inst/extdata/dummyads2.sas7bdat" | ||
) | ||
) | ||
expect_equal(names(data_list3), c("rds_dummyads1", "dummyads2")) | ||
}) |
Oops, something went wrong.