Skip to content

Commit

Permalink
GeoJSON fixes and extended tests
Browse files Browse the repository at this point in the history
  • Loading branch information
coolbutuseless committed Sep 4, 2023
1 parent 735ffdd commit 13e0bcf
Show file tree
Hide file tree
Showing 17 changed files with 676 additions and 220 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: yyjsonr
Type: Package
Title: Fast JSON, GeoJSON and NDJSON Parsing and Serialisation
Version: 0.1.5
Version: 0.1.6
Authors@R: c(
person("Mike", "FC", role = c("aut", "cre"), email = "[email protected]"),
person("Yao", "Yuan", role = "cph", email = "[email protected]",
Expand All @@ -17,6 +17,7 @@ Suggests:
bit64,
knitr,
rmarkdown,
jsonlite,
testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@

# yyjsonr 0.1.6 2023-09-04

* FEATURE: Added `promote_num_to_string` in `from_opts()` to enable
forced promotion of numerics to string
* BUGFIX: fixes for handling of geometry collection when reading and writing.
* TESTING: More tests included for output to geojson
* TESTING: Refactored testing of 'sf' objects

# yyjsonr 0.1.5 2023-08-31

Expand Down
20 changes: 14 additions & 6 deletions R/R-yyjson.R
Original file line number Diff line number Diff line change
Expand Up @@ -186,18 +186,26 @@ write_flag <- list(
#' @param num_specials Should jsong strings 'NA'/'Inf'/'NaN' in a numeric context
#' be converted to the \code{'special'} R numeric values
#' \code{NA, Inf, NaN}, or left as a \code{'string'}. Default: 'special'
#' @param promote_num_to_string Should numeric values be promoted to strings
#' when they occur within an array with other string values? Default: FALSE
#' means to keep numerics as numeric value and promote the \emph{container} to
#' be a \code{list} rather than an atomic vector when types are mixed. If \code{TRUE}
#' then array of mixed string/numeric types will be promoted to all
#' string values and returned as an atonic character vector. Set this to \code{TRUE}
#' if you want to emulate the behaviour of \code{jsonlite::fromJSON()}
#'
#' @seealso [read_flag()]
#' @return Named list of options
#' @export
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
from_opts <- function(
int64 = c('string', 'bit64'),
missing_list_elem = c('null', 'na'),
vectors_to_df = TRUE,
str_specials = c('string', 'special'),
num_specials = c('special', 'string'),
yyjson_read_flag = 0L
int64 = c('string', 'bit64'),
missing_list_elem = c('null', 'na'),
vectors_to_df = TRUE,
str_specials = c('string', 'special'),
num_specials = c('special', 'string'),
promote_num_to_string = FALSE,
yyjson_read_flag = 0L
) {

structure(
Expand Down
40 changes: 19 additions & 21 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ if (FALSE) {
[![R-CMD-check](https://github.com/coolbutuseless/yyjsonr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/coolbutuseless/yyjsonr/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

`{yyjsonr}` is a fast JSON parser/serializer, which converts R data to/from JSON and NDJSON.
`{yyjsonr}` is a fast JSON parser/serializer, which converts R data to/from JSON, GeoJSON and NDJSON.

In most cases it is around 2x to 10x faster than `{jsonlite}` at both reading and writing JSON.

Expand Down Expand Up @@ -137,22 +137,6 @@ No 'digits' argument
Numeric conversion is handled within the `yyjson` C library and is not
configuraable.

Numeric types retained in presence of other strings
-----------------------------------------------------------------------------

`{yyjsonr}` does not promote numeric values in arrays to strings if the array
contains a string. Instead the R container is promoted to a `list()` in order
to retain original types.

Note: this could be controlled by a flag if desired. Open an issue and let
me know what you need!

```{r}
json <- '[1,2,3,"apple"]'
jsonlite::fromJSON(json)
yyjsonr::from_json_str(json)
```


3-d arrays are parsed as multiple 2-d matrices and combined
-----------------------------------------------------------------------------
Expand All @@ -165,17 +149,31 @@ consistent within each package, but not cross-compatible between them i.e.
you cannot serialize an array in `{yyjsonr}` and re-create it exactly
using `{jsonlite}`.

The matrix handling in `{yyjsonr}` is compatible with the expectationf os GeoJSON
coordinate handling.

```{r}
# A simple 3D array
mat <- array(1:12, dim = c(2,3,2))
mat
```


str <- jsonlite::toJSON(mat)
str
```{r}
# jsonlite's serialization of matrices is internally consistent and re-parses
# to the initial matrix.
str <- jsonlite::toJSON(mat, pretty = TRUE)
cat(str)
jsonlite::fromJSON(str)
```


str <- yyjsonr::to_json_str(mat)
str
```{r}
# yyjsonr's serialization of matrices is internally consistent and re-parses
# to the initial matrix.
# But note that it is *different* to what jsonlite does.
str <- yyjsonr::to_json_str(mat, pretty = TRUE)
cat(str)
yyjsonr::from_json_str(str)
```

Expand Down
91 changes: 56 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ yyjsonr
<!-- badges: end -->

`{yyjsonr}` is a fast JSON parser/serializer, which converts R data
to/from JSON and NDJSON.
to/from JSON, GeoJSON and NDJSON.

In most cases it is around 2x to 10x faster than `{jsonlite}` at both
reading and writing JSON.
Expand Down Expand Up @@ -140,33 +140,6 @@ from_json_str(str)
Numeric conversion is handled within the `yyjson` C library and is not
configuraable.

## Numeric types retained in presence of other strings

`{yyjsonr}` does not promote numeric values in arrays to strings if the
array contains a string. Instead the R container is promoted to a
`list()` in order to retain original types.

Note: this could be controlled by a flag if desired. Open an issue and
let me know what you need!

``` r
json <- '[1,2,3,"apple"]'
jsonlite::fromJSON(json)
#> [1] "1" "2" "3" "apple"
yyjsonr::from_json_str(json)
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
#>
#> [[4]]
#> [1] "apple"
```

## 3-d arrays are parsed as multiple 2-d matrices and combined

In `{yyjsonr}` the order in which elements in an array are serialized to
Expand All @@ -178,7 +151,11 @@ consistent within each package, but not cross-compatible between them
i.e. you cannot serialize an array in `{yyjsonr}` and re-create it
exactly using `{jsonlite}`.

The matrix handling in `{yyjsonr}` is compatible with the expectationf
os GeoJSON coordinate handling.

``` r
# A simple 3D array
mat <- array(1:12, dim = c(2,3,2))
mat
#> , , 1
Expand All @@ -192,10 +169,25 @@ mat
#> [,1] [,2] [,3]
#> [1,] 7 9 11
#> [2,] 8 10 12
```

str <- jsonlite::toJSON(mat)
str
#> [[[1,7],[3,9],[5,11]],[[2,8],[4,10],[6,12]]]
``` r
# jsonlite's serialization of matrices is internally consistent and re-parses
# to the initial matrix.
str <- jsonlite::toJSON(mat, pretty = TRUE)
cat(str)
#> [
#> [
#> [1, 7],
#> [3, 9],
#> [5, 11]
#> ],
#> [
#> [2, 8],
#> [4, 10],
#> [6, 12]
#> ]
#> ]
jsonlite::fromJSON(str)
#> , , 1
#>
Expand All @@ -208,11 +200,40 @@ jsonlite::fromJSON(str)
#> [,1] [,2] [,3]
#> [1,] 7 9 11
#> [2,] 8 10 12
```


str <- yyjsonr::to_json_str(mat)
str
#> [1] "[[[1,3,5],[2,4,6]],[[7,9,11],[8,10,12]]]"
``` r
# yyjsonr's serialization of matrices is internally consistent and re-parses
# to the initial matrix.
# But note that it is *different* to what jsonlite does.
str <- yyjsonr::to_json_str(mat, pretty = TRUE)
cat(str)
#> [
#> [
#> [
#> 1,
#> 3,
#> 5
#> ],
#> [
#> 2,
#> 4,
#> 6
#> ]
#> ],
#> [
#> [
#> 7,
#> 9,
#> 11
#> ],
#> [
#> 8,
#> 10,
#> 12
#> ]
#> ]
#> ]
yyjsonr::from_json_str(str)
#> , , 1
#>
Expand Down
8 changes: 4 additions & 4 deletions man/benchmark/benchmarks.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ res09 <- bench::mark(
```

```{r echo=FALSE}
res09$benchmark <- 'Rcppsimdjson'
res09$benchmark <- 'Rcppsimdjson benchmark'
knitr::kable(res09[, 1:5])
plot(res09) + theme_bw() + theme(legend.position = 'none')
```
Expand All @@ -328,7 +328,7 @@ res10 <- bench::mark(
```

```{r echo=FALSE}
res10$benchmark <- 'jsonify (1)'
res10$benchmark <- 'jsonify benchmark (1)'
knitr::kable(res10[,1:5])
plot(res10) + theme_bw() + theme(legend.position = 'none')
```
Expand Down Expand Up @@ -360,7 +360,7 @@ res11 <- bench::mark(
```

```{r echo=FALSE}
res11$benchmark <- 'jsonify (2)'
res11$benchmark <- 'jsonify benchmark (2)'
knitr::kable(res11[,1:5])
plot(res11) + theme_bw() + theme(legend.position = 'none')
```
Expand All @@ -383,7 +383,7 @@ res12 <- bench::mark(
```

```{r echo=FALSE}
res12$benchmark <- 'jsonify (3)'
res12$benchmark <- 'jsonify benchmark (3)'
knitr::kable(res12[,1:5])
plot(res12) + theme_bw() + theme(legend.position = 'none')
```
Expand Down
Binary file modified man/figures/benchmark-summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions man/from_opts.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 12 additions & 7 deletions src/R-yyjson-parse.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,13 @@
parse_options create_parse_options(SEXP parse_opts_) {

parse_options opt = {
.int64 = INT64_AS_STR,
.missing_list_elem = MISSING_AS_NULL,
.vectors_to_df = true,
.str_specials = STR_SPECIALS_AS_STRING,
.num_specials = NUM_SPECIALS_AS_SPECIAL,
.yyjson_read_flag = 0
.int64 = INT64_AS_STR,
.missing_list_elem = MISSING_AS_NULL,
.vectors_to_df = true,
.str_specials = STR_SPECIALS_AS_STRING,
.num_specials = NUM_SPECIALS_AS_SPECIAL,
.promote_num_to_string = false,
.yyjson_read_flag = 0
};

if (isNull(parse_opts_) || length(parse_opts_) == 0) {
Expand Down Expand Up @@ -63,6 +64,8 @@ parse_options create_parse_options(SEXP parse_opts_) {
} else if (strcmp(opt_name, "num_specials") == 0) {
const char *val = CHAR(STRING_ELT(val_, 0));
opt.num_specials = strcmp(val, "string") == 0 ? NUM_SPECIALS_AS_STRING : NUM_SPECIALS_AS_SPECIAL;
} else if (strcmp(opt_name, "promote_num_to_string") == 0) {
opt.promote_num_to_string = asLogical(val_);
} else {
warning("Unknown option ignored: '%s'\n", opt_name);
}
Expand Down Expand Up @@ -407,7 +410,9 @@ unsigned int get_best_sexp_to_represent_type_bitset(unsigned int type_bitset, pa
// String
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
if ((type_bitset & VAL_STR) | (type_bitset & VAL_STR_INT)) {
if (type_bitset & (VAL_NONE | VAL_RAW | VAL_BOOL | VAL_INT | VAL_REAL | VAL_ARR | VAL_OBJ | VAL_INT64)) {
if ( opt->promote_num_to_string && (type_bitset & (VAL_REAL | VAL_INT | VAL_BOOL)) && !(type_bitset & (VAL_NONE | VAL_RAW | VAL_ARR | VAL_OBJ))) {
return STRSXP;
} else if (type_bitset & (VAL_NONE | VAL_RAW | VAL_BOOL | VAL_INT | VAL_REAL | VAL_ARR | VAL_OBJ | VAL_INT64)) {
return VECSXP;
} else {
return STRSXP;
Expand Down
1 change: 1 addition & 0 deletions src/R-yyjson-parse.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ typedef struct {
bool vectors_to_df;
unsigned int str_specials;
unsigned int num_specials;
bool promote_num_to_string;
unsigned int yyjson_read_flag;
} parse_options;

Expand Down
Loading

0 comments on commit 13e0bcf

Please sign in to comment.