Skip to content

Commit

Permalink
Merge branch 'main' into rewrite-reshape
Browse files Browse the repository at this point in the history
  • Loading branch information
etiennebacher authored Oct 10, 2022
2 parents 60b9797 + 8698506 commit bf87604
Show file tree
Hide file tree
Showing 28 changed files with 206 additions and 123 deletions.
25 changes: 0 additions & 25 deletions .github/workflows/draft-pdf.yaml

This file was deleted.

5 changes: 4 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,17 @@ Authors@R: c(
person("Etienne", "Bacher", , "[email protected]", role = "aut",
comment = c(ORCID = "0000-0002-9271-5075")),
person("RΓ©mi", "ThΓ©riault", , "[email protected]", role = "ctb",
comment = c(ORCID = "0000-0003-4315-6788", Twitter = "@rempsyc"))
comment = c(ORCID = "0000-0003-4315-6788", Twitter = "@rempsyc")),
person("Thomas J.", "Faulkenberry", , "[email protected]", role = "rev"),
person("Robert", "Garrett", , "[email protected]", role = "rev")
)
Maintainer: Indrajeet Patil <[email protected]>
Description: A lightweight package to assist in key steps involved in any data
analysis workflow: (1) wrangling the raw data to get it in the needed form,
(2) applying preprocessing steps and statistical transformations, and
(3) compute statistical summaries of data properties and distributions.
It is also the data wrangling backend for packages in 'easystats' ecosystem.
References: Patil et al. (2022) <doi:10.21105/joss.04684>.
License: GPL (>= 3)
URL: https://easystats.github.io/datawizard/
BugReports: https://github.com/easystats/datawizard/issues
Expand Down
21 changes: 16 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
# datawizard (development version)
=======
# datawizard 0.6.2.1

MAJOR CHANGES

* There is new a publication about the `{datawizard}` package:
Patil et al. (2022) <doi:10.21105/joss.04684>.

* `data_to_long()` and `data_to_wide()` have had significant performance improvements,
sometimes as high as a ten-fold speedup.

MINOR CHANGES

* When column names are misspelled, most functions now suggest which
existing columns possibly could be meant.

# datawizard 0.6.2

BREAKING CHANGES
Expand All @@ -17,14 +26,16 @@ BREAKING CHANGES
`remove_empty_rows()` remove observations that completely have missing or
empty character values.

CHANGES

* `data_arrange()` now works with data frames that were grouped using
`data_group()` (#274).
MINOR CHANGES

* `data_read()` gains a `convert_factors` argument, to turn off automatic
conversion from numeric variables into factors.

BUG FIXES

* `data_arrange()` now works with data frames that were grouped using
`data_group()` (#274).

# datawizard 0.6.1

* Updates tests for upcoming changes in the `{tidyselect}` package (#267).
Expand Down
2 changes: 1 addition & 1 deletion R/data_arrange.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ data_arrange.default <- function(data, select = NULL, safe = TRUE) {
data <- .coerce_to_dataframe(data)

# find which vars should be decreasing
desc <- select[grepl("^-", select)]
desc <- select[startsWith(select, "-")]
desc <- gsub("^-", "", desc)
select <- gsub("^-", "", select)

Expand Down
20 changes: 16 additions & 4 deletions R/data_relocate.R
Original file line number Diff line number Diff line change
Expand Up @@ -93,18 +93,30 @@ data_relocate <- function(data,
data_cols <- names(data)
position <- which(data_cols %in% cols)

# remember original values, for more informative messages
original_before <- before
original_after <- after

# Find new positions
if (!is.null(before)) {
before <- before[before %in% data_cols][1] # Take first that exists (if vector is supplied)
if (length(before) != 1) {
stop("The column passed to `before` wasn't found. Possibly mispelled.", call. = FALSE)
if (length(before) != 1 || is.na(before)) {
# guess the misspelled column
insight::format_error(
"The column passed to `before` wasn't found.",
.misspelled_string(data_cols, original_before[1], default_message = "Possibly misspelled?")
)
}
where <- min(match(before, data_cols))
position <- c(setdiff(position, where), where)
} else if (!is.null(after)) {
after <- after[after %in% data_cols][1] # Take first that exists (if vector is supplied)
if (length(after) != 1) {
stop("The column passed to `after` wasn't found. Possibly mispelled.", call. = FALSE)
if (length(after) != 1 || is.na(after)) {
# guess the misspelled column
insight::format_error(
"The column passed to `after` wasn't found.",
.misspelled_string(data_cols, original_after[1], default_message = "Possibly misspelled?")
)
}
where <- max(match(after, data_cols))
position <- c(where, setdiff(position, where))
Expand Down
1 change: 1 addition & 0 deletions R/datawizard-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#' - compute statistical summaries of data properties and distributions.
#'
#' It is also the data wrangling backend for packages in 'easystats' ecosystem.
#' References: Patil et al. (2022) <doi:10.21105/joss.04684>.
#'
#' @docType package
#' @aliases datawizard datawizard-package
Expand Down
16 changes: 12 additions & 4 deletions R/select_helpers.R
Original file line number Diff line number Diff line change
Expand Up @@ -260,10 +260,18 @@
from <- which(cn == from_to[1])
to <- which(cn == from_to[2])
if (!length(from)) {
stop("Could not find variable '", from_to[1], "' in data.", call. = FALSE)
# guess the misspelled column
insight::format_error(
paste0("Could not find variable \"", from_to[1], "\" in data."),
.misspelled_string(cn, from_to[1], default_message = "Possibly misspelled?")
)
}
if (!length(to)) {
stop("Could not find variable '", from_to[2], "' in data.", call. = FALSE)
# guess the misspelled column
insight::format_error(
paste0("Could not find variable \"", from_to[2], "\" in data."),
.misspelled_string(cn, from_to[2], default_message = "Possibly misspelled?")
)
}
if (negate) {
pattern <- columns[setdiff(seq_len(ncol(data)), from:to)]
Expand Down Expand Up @@ -298,7 +306,6 @@
exclude <- .check_pattern_and_exclude(exclude, data, ignore_case, verbose)
pattern <- setdiff(pattern, exclude)
}

pattern
}

Expand Down Expand Up @@ -337,7 +344,8 @@
if (!all(pattern %in% columns)) {
if (isTRUE(verbose)) {
insight::format_warning(
paste0("Following variable(s) were not found: ", paste0(setdiff(pattern, columns), collapse = ", "))
paste0("Following variable(s) were not found: ", paste0(setdiff(pattern, columns), collapse = ", ")),
.misspelled_string(columns, setdiff(pattern, columns), default_message = "Possibly misspelled?")
)
}
pattern <- intersect(pattern, columns)
Expand Down
62 changes: 62 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,65 @@
.has_numeric_rownames <- function(data) {
identical(attributes(data)$row.names, seq_len(nrow(data)))
}


#' Fuzzy grep, matches pattern that are close, but not identical
#' Example:
#' colnames(iris)
#' p <- sprintf("(%s){~%i}", "Spela", 2)
#' grep(pattern = p, x = colnames(iris), ignore.case = FALSE)
#' @keywords internal
#' @noRd

.fuzzy_grep <- function(x, pattern, precision = NULL) {
if (is.null(precision)) {
precision <- round(nchar(pattern) / 3)
}
if (precision > nchar(pattern)) {
return(NULL)
}
p <- sprintf("(%s){~%i}", pattern, precision)
grep(pattern = p, x = x, ignore.case = FALSE)
}


#' create a message string to tell user about matches that could possibly
#' be the string they were looking for
#'
#' @keywords internal
#' @noRd

.misspelled_string <- function(source, searchterm, default_message = NULL) {
if (is.null(searchterm) || length(searchterm) < 1) {
return(default_message)
}
# used for many matches
more_found <- ""
# init default
msg <- ""
# guess the misspelled string
possible_strings <- unlist(lapply(searchterm, function(s) {
source[.fuzzy_grep(source, s)]
}))
if (length(possible_strings)) {
msg <- "Did you mean "
if (length(possible_strings) > 1) {
# make sure we don't print dozens of alternatives for larger data frames
if (length(possible_strings) > 5) {
more_found <- sprintf(
" We even found %i more possible matches, not shown here.",
length(possible_strings) - 5
)
possible_strings <- possible_strings[1:5]
}
msg <- paste0(msg, "one of ", text_concatenate(possible_strings, enclose = "\"", last = " or "))
} else {
msg <- paste0(msg, "\"", possible_strings, "\"")
}
msg <- paste0(msg, "?", more_found)
} else {
msg <- default_message
}
# no double white space
insight::trim_ws(msg)
}
24 changes: 12 additions & 12 deletions R/utils_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,17 +45,17 @@ rownames_as_column <- function(x, var = "rowname") {
#' @rdname rownames
#' @export
column_as_rownames <- function(x, var = "rowname") {
if (!is.character(var) & !is.numeric(var)) {
stop("Argument 'var' must be of type character or numeric.")
if (!is.character(var) && !is.numeric(var)) {
insight::format_error("Argument `var` must be of type character or numeric.")
}
if (is.character(var)) {
if (!var %in% names(x)) {
stop(paste0('Variable "', var, '" is not in the data frame.'))
insight::format_error(paste0("Variable \"", var, "\" is not in the data frame."))
}
}
if (is.numeric(var)) {
if (var > ncol(x) | var <= 0) {
stop("Column ", var, " does not exist. There are ", ncol(x), " columns in the data frame.")
if (var > ncol(x) || var <= 0) {
insight::format_error("Column ", var, " does not exist. There are ", ncol(x), " columns in the data frame.")
}
}
rownames(x) <- x[[var]]
Expand Down Expand Up @@ -102,10 +102,10 @@ column_as_rownames <- function(x, var = "rowname") {
#'
row_to_colnames <- function(x, row = 1, na_prefix = "x", verbose = TRUE) {
if (!is.numeric(row)) {
insight::format_error("Argument 'row' must be of type numeric.")
insight::format_error("Argument `row` must be of type numeric.")
}
if (length(row) != 1) {
insight::format_error("Argument 'row' must be of length 1.")
insight::format_error("Argument `row` must be of length 1.")
}
if (nrow(x) < row) {
insight::format_error(
Expand All @@ -129,8 +129,8 @@ row_to_colnames <- function(x, row = 1, na_prefix = "x", verbose = TRUE) {
insight::format_warning(
paste0(
"Some values of row ", row,
" were NAs. The corresponding column names are prefixed with '",
na_prefix, "'."
" were NAs. The corresponding column names are prefixed with `",
na_prefix, "`."
)
)
}
Expand All @@ -146,12 +146,12 @@ row_to_colnames <- function(x, row = 1, na_prefix = "x", verbose = TRUE) {
#' @export
colnames_to_row <- function(x, prefix = "x") {
if (length(prefix) != 1) {
insight::format_error("Argument 'prefix' must be of length 1.")
insight::format_error("Argument `prefix` must be of length 1.")
}
if (!is.character(prefix)) {
insight::format_error("Argument 'prefix' must be of type character.")
insight::format_error("Argument `prefix` must be of type character.")
}
x2 <- rbind(colnames(x), x)
colnames(x2) <- paste0(prefix, 1:ncol(x2))
colnames(x2) <- paste0(prefix, seq_len(ncol(x2)))
x2
}
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ set.seed(333)
library(datawizard)
```

[![publication](https://img.shields.io/badge/Cite-Unpublished-yellow)](https://github.com/easystats/datawizard/blob/master/inst/CITATION)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.04684/status.svg)](https://doi.org/10.21105/joss.04684)
[![downloads](http://cranlogs.r-pkg.org/badges/datawizard)](https://cran.r-project.org/package=datawizard)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/datawizard)](https://cranlogs.r-pkg.org/) [![status](https://tinyverse.netlify.com/badge/datawizard)](https://CRAN.R-project.org/package=datawizard) [![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)

Expand Down
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# `datawizard`: Easy Data Wrangling and Statistical Transformations <img src='man/figures/logo.png' align="right" height="139" />

[![publication](https://img.shields.io/badge/Cite-Unpublished-yellow)](https://github.com/easystats/datawizard/blob/master/inst/CITATION)
[![DOI](https://joss.theoj.org/papers/10.21105/joss.04684/status.svg)](https://doi.org/10.21105/joss.04684)
[![downloads](http://cranlogs.r-pkg.org/badges/datawizard)](https://cran.r-project.org/package=datawizard)
[![total](https://cranlogs.r-pkg.org/badges/grand-total/datawizard)](https://cranlogs.r-pkg.org/)
[![status](https://tinyverse.netlify.com/badge/datawizard)](https://CRAN.R-project.org/package=datawizard)
Expand Down Expand Up @@ -58,22 +58,23 @@ To cite the package, run the following command:
``` r
citation("datawizard")

To cite datawizard in publications use:
To cite package 'datawizard' in publications use:

Patil, Makowski, Ben-Shachar, Wiernik, Bacher, & LΓΌdecke (2022).
datawizard: An R Package for Easy Data Preparation and Statistical
Transformations. CRAN. Available from
https://easystats.github.io/datawizard/
Patil et al., (2022). datawizard: An R Package for Easy Data
Preparation and Statistical Transformations. Journal of Open Source
Software, 7(78), 4684, https://doi.org/10.21105/joss.04684

A BibTeX entry for LaTeX users is

@Article{,
title = {datawizard: An R Package for Easy Data Preparation and Statistical Transformations},
title = {{datawizard}: An {R} Package for Easy Data Preparation and Statistical Transformations},
author = {Indrajeet Patil and Dominique Makowski and Mattan S. Ben-Shachar and Brenton M. Wiernik and Etienne Bacher and Daniel LΓΌdecke},
journal = {CRAN},
journal = {Journal of Open Source Software},
year = {2022},
note = {R package},
url = {https://easystats.github.io/datawizard/},
volume = {7},
number = {78},
pages = {4684},
doi = {10.21105/joss.04684},
}
```

Expand Down
20 changes: 8 additions & 12 deletions inst/CITATION
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
bibentry(
bibtype="Article",
title="datawizard: An R Package for Easy Data Preparation and Statistical Transformations",
title="{datawizard}: An {R} Package for Easy Data Preparation and Statistical Transformations",
author=c(person("Indrajeet", "Patil"), person("Dominique", "Makowski"), person("Mattan S.", "Ben-Shachar"), person("Brenton M.", "Wiernik"), person("Etienne", "Bacher"), person("Daniel", "LΓΌdecke")),
journal="CRAN",
year="2022",
note="R package",
url="https://easystats.github.io/datawizard/",

textVersion =
paste("Patil, Makowski, Ben-Shachar, Wiernik, Bacher, & LΓΌdecke (2022). datawizard: An R Package for Easy Data Preparation and Statistical Transformations. CRAN.",
"Available from https://easystats.github.io/datawizard/"
),
mheader = "To cite datawizard in publications use:"
journal="Journal of Open Source Software",
year = 2022,
volume = 7,
number = 78,
pages = 4684,
doi = "10.21105/joss.04684",
textVersion = "Patil et al., (2022). datawizard: An R Package for Easy Data Preparation and Statistical Transformations. Journal of Open Source Software, 7(78), 4684, https://doi.org/10.21105/joss.04684"
)

3 changes: 3 additions & 0 deletions man/datawizard-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Binary file added paper/Patil_et_al_2022_JOSS.pdf
Binary file not shown.
Loading

0 comments on commit bf87604

Please sign in to comment.