Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A better str for simple objects #2

Open
hadley opened this issue Mar 10, 2016 · 11 comments
Open

A better str for simple objects #2

hadley opened this issue Mar 10, 2016 · 11 comments
Labels
feature a feature request or enhancement

Comments

@hadley
Copy link
Member

hadley commented Mar 10, 2016

Starting to noodle on this idea because I think it will be useful for the data structures chapter in R4DS

# Atomic vectors ----------------------------------------------------------

# Compactly displays type and length
somethingstr(1:100)
#> int[100]
somethingstr(letters)
#> chr[26]

# Also displays attributes. 
# Class gets special handling
somethingstr(factor(letters))
#> int[26] <factor>
#> @ levels: chr[26]

# (I imagine this being used once you've taught basic data structures
# and purrr, so it's useful to see the details, instead the helpful
# lies that str() tells you.)

somethingstr(Sys.time())
#> dbl[1] <POSIXct, POSIXt>
#> @ tzone: chr[1]  

# Lists -------------------------------------------------------------------

# Shows hierarchy
x <- list(
  list(
    1, 
    2
  ),
  list(
    3,
    4
  )
)
somethingstr(x)
#> list[4]
#> - 1: list[2]
#>    - 1: dbl[1]
#>    - 2: dbl[1]
#> - 2: list[2]
#>    - 1: dbl[1]
#>    - 2: dbl[1]

# Very long lists are truncated
x <- replicate(100, list(runif(5)))
somethingstr(x)

#> list[100]
#> - 1: dbl[5]
#> - 2: dbl[5]
#> - 3: dbl[5]
#> - 4: dbl[5]
#> - 5: dbl[5]
#> ...

# So are very deep lists
x <- list()
for (i in 1:100) x$x <- list(x)
somethingstr(x)

#> list[1]
#> 1. list[1]
#>    1. list[1]
#>       1. list[1]
#>          1. ...

# And length and depth interplay in some complicated way. Maybe the way
# to think about it is that you want to (say) print at most 100 lines.  
# How should you allocate those lines to best display the structure of
# the object? I don't think simple cut-offs for length vs. depth will
# work in general. 

# Think about something() on a data frame containing models etc.
# Maybe can assume unnamed lists are generally homogeneous?

# Names get special treatment
somethingstr(mtcars)
#> list[11] <data.frame>
#> $ mpg : dbl[32]
#> $ cyl : dbl[32]
#> $ disp: dbl[32]
#> $ hp  : dbl[32]
#> $ drat: dbl[32]
#> $ wt  : dbl[32]
#> $ qsec: dbl[32]
#> $ vs  : dbl[32]
#> $ am  : dbl[32]
#> $ gear: dbl[32]
#> $ carb: dbl[32]
#> @ row.names: chr[32]

# Very long names get truncted
x <- list(this_is_a_very_very_very_very_long_name = 1:10)
somethingstr(x)
#> list[1]
#> $ this_is_a_very_...: int[10]


# Environments ----------------------------------------------------------------

# Need someway to control recursion into environments. Probably don't
# want it on by default because there are too many objects that have 
# (possibly big) environments attached (e.g. formulas)
somethingstr(globalenv())
#> env[2] [R_GlobalEnv]

somethingstr(globalenv(), show_env = 0L)
#> env[2] [R_GlobalEnv]
#> $ df: list[1] <data.frame>
#>       $x: int[100]
#>       @row.names: int[1]
#> $ i:  int[10]
#> @parent.env: env[10] [tools:rstudio]

# show_env = 0L would also show the contents of parent.env.

# Functions ---------------------------------------------------------------

somethingstr(function(x = 1:10, y = x) {})
#> func[2] 
#>   $x: `1:10
#>   $y: `x
#> * env: env[4] [R_GlobalEnv]

cc @jennybc, @lionel-

@lionel-
Copy link
Member

lionel- commented Mar 10, 2016

Possible short name: info()

Also displays attributes.
Class gets special handling

Really nice!

So are very deep lists [truncated]

It would be cool to have a more flexible control of the levels to display than str()'s max.level argument. For instance specify the depth from the bottom of the list, or with a range:

info(deep, 3)        # 3 first levels
info(deep, -3)       # 3 last levels
info(deep, c(5, 10)) # levels in the range [5, 10]

And length and depth interplay in some complicated way.

Right, it really depends what the user is looking for. Maybe reduce the amount of information displayed as we go down the hierarchy? Then we can use the level argument to get more complete info about deeper levels.

#> list[1]
#> 1. list[1]
#>    1. list[1]
#>       1. list[1]
#>          1. list ...(5-96)  # Some kind of hint about remaining depth?

Need someway to control recursion into environments.

Maybe treat first level objects specially. So info(env) gives the full info while info(list(env)) doesn't. If the user specifically wants to investigate the environment, she probably wants to have the details. Linked to the idea of reducing the amount of info as we go down the hierarchy.

@lionel-
Copy link
Member

lionel- commented Mar 10, 2016

Ideas for displaying functions:

  • Display environment if it has a name (e.g. a namespace or the globalenv)
  • Display only parameter names, not default arguments.
  • Use 4 dots to truncate long parameter lists
l <- list(data, write.table, list)

str(l)
#> List of 3
#>  $ :function (..., list = character(), package = NULL, lib.loc = NULL, verbose = getOption("verbose"), 
#>     envir = .GlobalEnv)
#>  $ :function (x, file = "", append = FALSE, quote = TRUE, sep = " ", eol = "\n", 
#>     na = "NA", dec = ".", row.names = TRUE, col.names = TRUE, qmethod = c("escape", 
#>         "double"), fileEncoding = "")
#>  $ :function (...)
#>   ..- attr(*, "class")= chr [1:2] "namespace" "roclet"

info(l)
#> list[3]
#> - 1: function[utils]
#>      @args: ..., list, package, lib.loc, ....
#> - 2: function[utils]
#>      @args: file, append, quote, sep, eol, ....
#> - 3: function[base] <namespace, roclet>
#>      @args: ...

@jennybc
Copy link
Member

jennybc commented Mar 11, 2016

Long live somethingstr()!

And length and depth interplay in some complicated way. Maybe the way
to think about it is that you want to (say) print at most 100 lines.
How should you allocate those lines to best display the structure of
the object? I don't think simple cut-offs for length vs. depth will
work in general.

This rings very true. max.level and especially list.len feel like very blunt instruments. I always thought I wanted list.len to be vectorized, but maybe you're right that that wouldn't really solve the problem.

If you knew that the list had repeated structure, then you want to see detail on one element, presumably the first, and then just a note that there are 99 more things of a similar nature. But that ties back to the separate problem of detecting repeated structure. Maybe it would be good to record when you know a list has repeated structure by its very construction (i.e. df %>% group_by() %>% nest() ?%>% mutate()? or lst %>% map()). You get those for free! These sorts of lists remind me of short tandem repeats in a genome.

@hadley
Copy link
Member Author

hadley commented Mar 11, 2016

@jennybc I think that circles back to the purrr issue. I think we need a "homogenous" list class that just asserts that all the elements of the list are of the same type.

@lionel- what do you imagine looking at the last 3 levels in a list would look like?

For a bit more context, I'm imaging that in the near future (< 12 months) RStudio will gain an interactive widget that lets you drill down iteratively into a deeply nested list. So this function doesn't need to solve every deeply nested navigation problem, it just needs to give a decent textual output.

@lionel-
Copy link
Member

lionel- commented Mar 11, 2016

what do you imagine looking at the last 3 levels in a list would look like?

The function would figure out the depth of each branch, and only show the branches deep enough:

l <-
  list(
    list(
      list(
        list(
          4
        ),
        list(
          list(
            5
          )
        )
      )
    ),
    list(
      2
    )
  )

info(l, -3)
#> ---levels 3 to 5---
#> list[2]
#> - 1: list[1]
#>    - 1: dbl[1]
#> - 2: list[1]
#>    - 1: list[1]
#>       - 1: dbl[1]

@hadley
Copy link
Member Author

hadley commented Mar 21, 2016

Challenging list from @jennybc at https://gist.github.com/jennybc/12d75a88edf37cc996eb

@jennybc
Copy link
Member

jennybc commented Mar 28, 2016

Another good example list: foo from foo <- test_dir("tests/testthat/"). It's (arguably?) a homogenous list with a couple levels of nesting and simply playing with str(..., max.level = ?) doesn't produce great results.

@hadley
Copy link
Member Author

hadley commented Apr 3, 2018

Two recent ideas:

  • We could this function rts(), short for restrained tree structure (and also str() in reverse)

  • For unnnamed lists we could display the first element recursively, the 2nd-5th elements in summary form, and then display and ... x more for the rest. I think this might be a reasonable heuristic for long lists.

@hadley
Copy link
Member Author

hadley commented Apr 24, 2018

Also need to think this through post-BigQuery insights - is a list-col an array or a record or a repeated record? I think we could have a method to impute the "type" of a list column, and then display arrays, records, and repeated records in different ways.

@hadley
Copy link
Member Author

hadley commented May 30, 2018

This is particularly important for list-columns since there's no good way to see them currently.

@hadley hadley added the feature a feature request or enhancement label Dec 20, 2018
@wch
Copy link
Member

wch commented Mar 4, 2021

Here's a screenshot of the printing of a nested list structure, from something I'm working on. Some of the ideas may be useful here:

image

Some notes about it:

  • Named lists are indicated with {}, and unnamed lists are indicated with []
  • The tree diagram makes it easier to see what's connected to what.
  • For both named and unnamed lists, it shows the names/indexes of children. This makes it easy to traverse nested objects to get to the object that you want. For example, it's easy to tell what x[[2]]$c[[3]] refers to.
  • For lists that have a small number of atomic children, it prints it all on one line.
  • Atomic types are printed in a different color.
  • The S3 class of the objects is in the braces/brackets (like {Block}). This is useful for my use case, but I'm not sure this makes a lot of sense in general.
  • One thing that may be confusing is that the named entries t and c sometimes appear on the same line (when they are both atomic), but if c is another list, it is displayed on a new branch going down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants