Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: prettify the default output #85

Closed
wants to merge 11 commits into from
Closed

RFC: prettify the default output #85

wants to merge 11 commits into from

Conversation

yihui
Copy link

@yihui yihui commented Mar 25, 2015

As mentioned in #84, neither the pretty nor unpretty output looks good to me. I feel it is a much better job to be done on the R side instead of handing it over to yajl, since yajl does not know the data structure information in R.

What I did was pretty straightforward:

  1. added an indent argument to asJSON(), and indent = indent + 2 very time asJSON() is called on arrays, lists, and data frames;
  2. added an inner argument to collapse(), so we know when to add line breaks (only add line breaks in the outer loops of collapsing);

Since my C-fu is weak, I'm using the R versions of collapse() and collapse_object(). If you prefer doing collapsing in C, please feel free to do so. I imagine it is not going to be terribly hard. (Or is paste() really that slow?)

Some examples:

library(jsonlite)
toJSON = function(...) jsonlite::toJSON(..., pretty = TRUE)

x = matrix(1:12, 4)
toJSON(x)
[
  [1, 5, 9],
  [2, 6, 10],
  [3, 7, 11],
  [4, 8, 12]
] 
toJSON(x, matrix = 'columnmajor')
[
  [1, 2, 3, 4],
  [5, 6, 7, 8],
  [9, 10, 11, 12]
] 
x = array(1:24, c(2, 3, 4))
toJSON(x)
[
  [
    [1, 7, 13, 19],
    [3, 9, 15, 21],
    [5, 11, 17, 23]
  ],
  [
    [2, 8, 14, 20],
    [4, 10, 16, 22],
    [6, 12, 18, 24]
  ]
] 
x = list(x = NULL, y = list(), z = list(1, 3:6), a = list(list(), NULL))
toJSON(x)
{
  "x": {},
  "y": [],
  "z": [
    [1],
    [3, 4, 5, 6]
  ],
  "a": [
    [],
    {}
  ]
} 
toJSON(x, null = 'null')
{
  "x": null,
  "y": [],
  "z": [
    [1],
    [3, 4, 5, 6]
  ],
  "a": [
    [],
    null
  ]
} 
x = head(iris)
toJSON(x)
[
  {
    "Sepal.Length": 5.1,
    "Sepal.Width": 3.5,
    "Petal.Length": 1.4,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 4.9,
    "Sepal.Width": 3,
    "Petal.Length": 1.4,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 4.7,
    "Sepal.Width": 3.2,
    "Petal.Length": 1.3,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 4.6,
    "Sepal.Width": 3.1,
    "Petal.Length": 1.5,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 5,
    "Sepal.Width": 3.6,
    "Petal.Length": 1.4,
    "Petal.Width": 0.2,
    "Species": "setosa"
  },
  {
    "Sepal.Length": 5.4,
    "Sepal.Width": 3.9,
    "Petal.Length": 1.7,
    "Petal.Width": 0.4,
    "Species": "setosa"
  }
] 
toJSON(x, dataframe = 'columns')
{
  "Sepal.Length": [5.1, 4.9, 4.7, 4.6, 5, 5.4],
  "Sepal.Width": [3.5, 3, 3.2, 3.1, 3.6, 3.9],
  "Petal.Length": [1.4, 1.4, 1.3, 1.5, 1.4, 1.7],
  "Petal.Width": [0.2, 0.2, 0.2, 0.2, 0.2, 0.4],
  "Species": ["setosa", "setosa", "setosa", "setosa", "setosa", "setosa"]
} 
toJSON(x, dataframe = 'values')
[
  [5.1, 3.5, 1.4, 0.2, "setosa"],
  [4.9, 3, 1.4, 0.2, "setosa"],
  [4.7, 3.2, 1.3, 0.2, "setosa"],
  [4.6, 3.1, 1.5, 0.2, "setosa"],
  [5, 3.6, 1.4, 0.2, "setosa"],
  [5.4, 3.9, 1.7, 0.4, "setosa"]
] 
toJSON(unname(x), dataframe = 'columns')
[
  [5.1, 4.9, 4.7, 4.6, 5, 5.4],
  [3.5, 3, 3.2, 3.1, 3.6, 3.9],
  [1.4, 1.4, 1.3, 1.5, 1.4, 1.7],
  [0.2, 0.2, 0.2, 0.2, 0.2, 0.4],
  ["setosa", "setosa", "setosa", "setosa", "setosa", "setosa"]
] 
obj = list(
  x = matrix(1:12, 4),
  y = 5,
  z = list(
    a = 1:3, b = 'foo',
    c = list(
      u = 'bar',
      ii = letters[2:8]
    )
  ),
  b = head(iris,3)
)
toJSON(obj, dataframe = 'columns')
{
  "x": [
    [1, 5, 9],
    [2, 6, 10],
    [3, 7, 11],
    [4, 8, 12]
  ],
  "y": [5],
  "z": {
    "a": [1, 2, 3],
    "b": ["foo"],
    "c": {
      "u": ["bar"],
      "ii": ["b", "c", "d", "e", "f", "g", "h"]
    }
  },
  "b": {
    "Sepal.Length": [5.1, 4.9, 4.7],
    "Sepal.Width": [3.5, 3, 3.2],
    "Petal.Length": [1.4, 1.4, 1.3],
    "Petal.Width": [0.2, 0.2, 0.2],
    "Species": ["setosa", "setosa", "setosa"]
  }
} 
toJSON(obj, dataframe = 'rows')
{
  "x": [
    [1, 5, 9],
    [2, 6, 10],
    [3, 7, 11],
    [4, 8, 12]
  ],
  "y": [5],
  "z": {
    "a": [1, 2, 3],
    "b": ["foo"],
    "c": {
      "u": ["bar"],
      "ii": ["b", "c", "d", "e", "f", "g", "h"]
    }
  },
  "b": [
    {
      "Sepal.Length": 5.1,
      "Sepal.Width": 3.5,
      "Petal.Length": 1.4,
      "Petal.Width": 0.2,
      "Species": "setosa"
    },
    {
      "Sepal.Length": 4.9,
      "Sepal.Width": 3,
      "Petal.Length": 1.4,
      "Petal.Width": 0.2,
      "Species": "setosa"
    },
    {
      "Sepal.Length": 4.7,
      "Sepal.Width": 3.2,
      "Petal.Length": 1.3,
      "Petal.Width": 0.2,
      "Species": "setosa"
    }
  ]
} 
toJSON(obj, dataframe = 'values')
{
  "x": [
    [1, 5, 9],
    [2, 6, 10],
    [3, 7, 11],
    [4, 8, 12]
  ],
  "y": [5],
  "z": {
    "a": [1, 2, 3],
    "b": ["foo"],
    "c": {
      "u": ["bar"],
      "ii": ["b", "c", "d", "e", "f", "g", "h"]
    }
  },
  "b": [
    [5.1, 3.5, 1.4, 0.2, "setosa"],
    [4.9, 3, 1.4, 0.2, "setosa"],
    [4.7, 3.2, 1.3, 0.2, "setosa"]
  ]
} 
obj$b = unname(obj$b)
toJSON(obj, dataframe = 'columns')
{
  "x": [
    [1, 5, 9],
    [2, 6, 10],
    [3, 7, 11],
    [4, 8, 12]
  ],
  "y": [5],
  "z": {
    "a": [1, 2, 3],
    "b": ["foo"],
    "c": {
      "u": ["bar"],
      "ii": ["b", "c", "d", "e", "f", "g", "h"]
    }
  },
  "b": [
    [5.1, 4.9, 4.7],
    [3.5, 3, 3.2],
    [1.4, 1.4, 1.3],
    [0.2, 0.2, 0.2],
    ["setosa", "setosa", "setosa"]
  ]
} 
toJSON(obj, dataframe = 'values')
{
  "x": [
    [1, 5, 9],
    [2, 6, 10],
    [3, 7, 11],
    [4, 8, 12]
  ],
  "y": [5],
  "z": {
    "a": [1, 2, 3],
    "b": ["foo"],
    "c": {
      "u": ["bar"],
      "ii": ["b", "c", "d", "e", "f", "g", "h"]
    }
  },
  "b": [
    [5.1, 3.5, 1.4, 0.2, "setosa"],
    [4.9, 3, 1.4, 0.2, "setosa"],
    [4.7, 3.2, 1.3, 0.2, "setosa"]
  ]
} 
mydata = data.frame(row.names=1:2)
mydata$d = list(
  data.frame(a1=1:2, a2=3:4, a3=5:6, a4=7:8),
  data.frame(a1=11:12, a2=13:14, a3=15:16, a4=17:18)
)
mydata$m = list(
  matrix(1:6, nrow=2, ncol=3),
  matrix(6:1, nrow=2, ncol=3)
)
toJSON(mydata)
[
  {
    "d": [
      {
        "a1": 1,
        "a2": 3,
        "a3": 5,
        "a4": 7
      },
      {
        "a1": 2,
        "a2": 4,
        "a3": 6,
        "a4": 8
      }
    ],
    "m": [
      [1, 3, 5],
      [2, 4, 6]
    ]
  },
  {
    "d": [
      {
        "a1": 11,
        "a2": 13,
        "a3": 15,
        "a4": 17
      },
      {
        "a1": 12,
        "a2": 14,
        "a3": 16,
        "a4": 18
      }
    ],
    "m": [
      [6, 4, 2],
      [5, 3, 1]
    ]
  }
] 
toJSON(mydata, dataframe = 'columns')
{
  "d": [
    {
      "a1": [1, 2],
      "a2": [3, 4],
      "a3": [5, 6],
      "a4": [7, 8]
    },
    {
      "a1": [11, 12],
      "a2": [13, 14],
      "a3": [15, 16],
      "a4": [17, 18]
    }
  ],
  "m": [
    [
      [1, 3, 5],
      [2, 4, 6]
    ],
    [
      [6, 4, 2],
      [5, 3, 1]
    ]
  ]
} 
toJSON(mydata, dataframe = 'values')
[
  [[
      [1, 3, 5, 7],
      [2, 4, 6, 8]
    ], [
      [1, 3, 5],
      [2, 4, 6]
    ]],
  [[
      [11, 13, 15, 17],
      [12, 14, 16, 18]
    ], [
      [6, 4, 2],
      [5, 3, 1]
    ]]
] 

@yihui
Copy link
Author

yihui commented Mar 26, 2015

I just updated the PR based on our discussion:

  • The old default behavior is preserved, i.e. pretty = FALSE
  • pretty = TRUE now means prettification is done in R instead of yajl
  • pretty = a_number also means yajl

Now two questions for you:

  • Do you want pretty = TRUE to be the default? I don't mind it particularly since we will almost surely use a wrapper function in shiny/htmlwidgets and we can change the default there, but I guess it might make more sense to have beautiful output by default;
  • Could you rewrite the collapse() and collapse_object() functions in C?

Thanks!

@jeroen
Copy link
Owner

jeroen commented Mar 31, 2015

I've copied the branch to my repo. I intend to work on this once my schedule frees up a bit. It's not high priority but nice to have.

To answer your question: lets keep the default as it currently is (no prettification). There are many people using jsonlite in production and I don't want to them to experience sudden performance regressions because things get prettified by default after updating.

I'm not sure your colleagues will agree with "we will almost surely use a wrapper function in shiny/htmlwidgets and we can change the default there". I think in the context of shiny performance is more important than prettyness, but that's up to you guys.

@yihui
Copy link
Author

yihui commented Mar 31, 2015

Prettiness does not make much sense in the context of shiny, but it does makes some sense in htmlwidgets, e.g. I often read the git diff's of the HTML pages generated from htmlwidgets to make sure what exactly I changed.

I understand this may not be of high priority. Please take your time. Thanks!

@yihui
Copy link
Author

yihui commented Apr 7, 2015

@jeroenooms I have completed this PR by rewriting the two R functions in C. I have tested them as thoroughly as I could. You can also see that the PR passes R CMD check on Travis, even when I use pretty = TRUE.

Please let me know what else you want. Thanks!

@jeroen
Copy link
Owner

jeroen commented Apr 7, 2015

Thanks! Can you run some performance benchmarks of your branch vs the current version? Doesn't have to overly fine grained, something like this:

library(microbenchmark)
library(jsonlite)
data(flights, package="nycflights13")
data(diamonds, package="ggplot2")

microbenchmark (times = 10,
  toJSON(diamonds, dataframe = "rows"),
  toJSON(diamonds, dataframe = "columns"),
  toJSON(diamonds, dataframe = "values"),

  toJSON(diamonds, dataframe = "rows", pretty = TRUE),
  toJSON(diamonds, dataframe = "columns", pretty = TRUE),
  toJSON(diamonds, dataframe = "values", pretty = TRUE),

  toJSON(flights, dataframe = "rows"),
  toJSON(flights, dataframe = "columns"),
  toJSON(flights, dataframe = "values"),

  toJSON(flights, dataframe = "rows", pretty = TRUE),
  toJSON(flights, dataframe = "columns", pretty = TRUE),
  toJSON(flights, dataframe = "values", pretty = TRUE)
)

@jeroen
Copy link
Owner

jeroen commented Apr 8, 2015

Running some quick benchmarks myself, it looks like your version is faster for pretty = TRUE, but about 25% slower for pretty = FALSE. It would be nice if we can keep the case of pretty = FALSE on par with the current version. Perhaps we will have to include separate C functions C_collapse_object and C_collapse_object_indented.

@yihui
Copy link
Author

yihui commented Apr 8, 2015

Okay, let me try.

@yihui
Copy link
Author

yihui commented Apr 8, 2015

It does not seem to help much to have separate C functions C_collapse_object and C_collapse_object_indent:

Current CRAN version:

Unit: milliseconds
                                                   expr        min         lq       mean     median         uq       max neval      cld
                   toJSON(diamonds, dataframe = "rows")  278.46645  280.60470  331.86477  295.28061  314.65905  637.6211    10 a c     
                toJSON(diamonds, dataframe = "columns")   62.80473   65.02979   78.78064   69.15289   73.08861  171.0639    10 a       
                 toJSON(diamonds, dataframe = "values")  227.23621  232.44414  236.83144  234.06169  237.83712  251.8907    10 ab      
    toJSON(diamonds, dataframe = "rows", pretty = TRUE)  452.72568  464.78837  495.44293  471.65152  522.23919  590.5022    10  bc     
 toJSON(diamonds, dataframe = "columns", pretty = TRUE)  138.18809  139.91890  157.26942  141.49645  161.14075  263.2036    10 a       
  toJSON(diamonds, dataframe = "values", pretty = TRUE)  315.98820  320.73633  366.02112  330.00942  435.24539  488.2774    10 a c     
                    toJSON(flights, dataframe = "rows") 2930.42581 3030.71757 3244.00721 3093.95234 3171.67655 4784.8548    10      f  
                 toJSON(flights, dataframe = "columns")  535.37370  541.35886  576.37545  550.75027  596.66761  682.2998    10   c     
                  toJSON(flights, dataframe = "values") 2297.41202 2574.43638 2654.56349 2632.62421 2712.53543 3080.3174    10     e   
     toJSON(flights, dataframe = "rows", pretty = TRUE) 4653.56588 4818.15327 4997.19558 5023.15436 5148.21457 5368.4529    10        h
  toJSON(flights, dataframe = "columns", pretty = TRUE) 1265.18546 1302.23472 1384.72703 1320.78683 1491.76271 1584.7721    10    d    
   toJSON(flights, dataframe = "values", pretty = TRUE) 3284.03811 3378.05288 3556.89061 3571.07331 3689.05229 3876.3819    10       g 

Current PR:

Unit: milliseconds
                                                   expr        min         lq       mean     median         uq       max neval    cld
                   toJSON(diamonds, dataframe = "rows")  295.69642  335.47318  401.18241  374.66715  486.00833  552.1015    10  bc   
                toJSON(diamonds, dataframe = "columns")   62.54986   65.66363   79.03077   66.47463   73.77897  175.7375    10 a     
                 toJSON(diamonds, dataframe = "values")  264.22732  271.62487  342.28364  311.08393  409.40114  473.5059    10 ab    
    toJSON(diamonds, dataframe = "rows", pretty = TRUE)  340.16996  369.37701  424.88922  433.77568  466.27462  507.4459    10  bc   
 toJSON(diamonds, dataframe = "columns", pretty = TRUE)   66.72841   67.59578   84.97684   68.38117   73.20481  225.1784    10 a     
  toJSON(diamonds, dataframe = "values", pretty = TRUE)  289.03919  291.71330  337.78984  306.10486  348.83737  482.3374    10 ab    
                    toJSON(flights, dataframe = "rows") 3146.60493 3313.32309 3516.45900 3533.35127 3763.71434 3888.5012    10     e 
                 toJSON(flights, dataframe = "columns")  559.45579  564.37603  642.30084  597.35481  705.59086  821.5219    10   c   
                  toJSON(flights, dataframe = "values") 2576.41974 2824.82537 2909.33041 2946.49904 3051.98864 3193.4622    10    d  
     toJSON(flights, dataframe = "rows", pretty = TRUE) 3398.31982 3637.51814 3865.05843 3732.23219 3861.03443 4999.8880    10      f
  toJSON(flights, dataframe = "columns", pretty = TRUE)  565.20879  576.10901  653.22169  636.04862  704.49588  782.4491    10   c   
   toJSON(flights, dataframe = "values", pretty = TRUE) 2427.43721 2707.72331 2904.64660 2810.40577 3199.98716 3466.0647    10    d  

@yihui
Copy link
Author

yihui commented Apr 8, 2015

I just optimized it a bit, and the speed is closer. Not sure if you are satisfied.

@jeroen
Copy link
Owner

jeroen commented Apr 8, 2015

I don't understand where the performance overhead is coming from for the case of pretty = FALSE, if all you do is pass around an additional indent = NA parameter.

@yihui
Copy link
Author

yihui commented Apr 8, 2015

I don't understand it either. There isn't much difference in the performance for dataframe='column', though, which is what we use in htmlwidgets/shiny. The other thing to mention is the difference is pretty much negligible on smaller data frames, or data frames with a smaller number of columns, or matrices.

So what do you expect me to do next? :)

SEXP out = PROTECT(allocVector(STRSXP, 1));
SET_STRING_ELT(out, 0, mkCharCE(olds, CE_UTF8));
UNPROTECT(1);
free(olds);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need to free sp and sp2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot free them since they were not malloc'd. It will abort R if I do free them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really confused by c_spaces - you malloc() but you don't free()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are probably right.

@jeroen
Copy link
Owner

jeroen commented Apr 9, 2015

Merged a rewritten version in 6f3275f. Thanks for this contribution @yihui!

@jeroen jeroen closed this Apr 9, 2015
@yihui
Copy link
Author

yihui commented Apr 9, 2015

Perfect. Thanks Jeroen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants