Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree function to print nested structure list-like objects #56

Merged
merged 65 commits into from
May 17, 2021

Conversation

nstrayer
Copy link
Collaborator

@nstrayer nstrayer commented Mar 30, 2021

This PR adds the function tree() (and its helper function tree_label().

The purpose of tree is to replace str() for complex nested structures. The original inspiration was HTML tag lists with children and various other valuable information hanging off of them.

Examples

Printing simple lists

library(tree)

plain_list <- list(
  list(id = "a",
       val = 2),
  list(id = "b",
       val = 1,
       children = list(
         list(id = "b1",
              val = 2.5),
         list(id = "b2",
              val = 8,
              children = list(
                list(id = "b21",
                     val = 4)
              )))))
tree(plain_list)
#> 
#> [list]
#> ├─1:{list}
#> │ ├─id:"a"
#> │ └─val:2
#> └─2:{list}
#>   ├─id:"b"
#>   ├─val:1
#>   └─children:[list]
#>     ├─1:{list}
#>     │ ├─id:"b1"
#>     │ └─val:2.5
#>     └─2:{list}
#>       ├─id:"b2"
#>       ├─val:8
#>       └─children:[list]
#>         └─1:{list}
#>           ├─id:"b21"
#>           └─val:4

Output depth can be limited to avoid information overload

tree(plain_list, max_depth = 2)
#> 
#> [list]
#> ├─1:{list}
#> │ ├─id:"a"
#> │ └─val:2
#> └─2:{list}
#>   ├─id:"b"
#>   ├─val:1
#>   └─children:[list]
#>     ├─1:{list}
#>     └─2:{list}

You can further simplify output by avoiding printing indices for id-less
elements

tree(plain_list, index_arraylike = FALSE)
#> 
#> [list]
#> ├─{list}
#> │ ├─id:"a"
#> │ └─val:2
#> └─{list}
#>   ├─id:"b"
#>   ├─val:1
#>   └─children:[list]
#>     ├─{list}
#>     │ ├─id:"b1"
#>     │ └─val:2.5
#>     └─{list}
#>       ├─id:"b2"
#>       ├─val:8
#>       └─children:[list]
#>         └─{list}
#>           ├─id:"b21"
#>           └─val:4

Prints atomic vectors inline (up to 10 elements)

list_w_vectors <- list(
  name = "vectored list",
  num_vec = 1:10,
  char_vec = letters
)
tree(list_w_vectors)
#> 
#> {list}
#> ├─name:"vectored list"
#> ├─num_vec:1,2,3,4,5,6,7,8,9,10
#> └─char_vec:"a","b","c","d","e","f","g","h","i","j",...(n = 26)

Attempts to work with any list-like object.

For instance: html tag structures

tree(shiny::sliderInput("test", "Input Label", 0, 1, 0.5))
#> 
#> {shiny.tag}
#> ├─name:"div"
#> ├─attribs:{list}
#> │ └─class:"form-group shiny-input-container"
#> └─children:[list]
#>   ├─1:{shiny.tag}
#>   │ ├─name:"label"
#>   │ ├─attribs:{list}
#>   │ │ ├─class:"control-label"
#>   │ │ ├─id:"test-label"
#>   │ │ └─for:"test"
#>   │ └─children:[list]
#>   │   └─1:"Input Label"
#>   └─2:{shiny.tag}
#>     ├─name:"input"
#>     ├─attribs:{list}
#>     │ ├─class:"js-range-slider"
#>     │ ├─id:"test"
#>     │ ├─data-skin:"shiny"
#>     │ ├─data-min:"0"
#>     │ ├─data-max:"1"
#>     │ ├─data-from:"0.5"
#>     │ ├─data-step:"0.01"
#>     │ ├─data-grid:"true"
#>     │ ├─data-grid-num:10
#>     │ ├─data-grid-snap:"false"
#>     │ ├─data-prettify-separator:","
#>     │ ├─data-prettify-enabled:"true"
#>     │ ├─data-keyboard:"true"
#>     │ └─data-data-type:"number"
#>     └─children:[list]

Attributes can be shown (but arent by default)

list_w_attrs <- structure(
  list(
    structure(
      list(id = "a",
           val = 2),
      level = 2,
      name = "first child"
    ),
    structure(
      list(id = "b",
           val = 1,
           children = list(
             list(id = "b1",
                  val = 2.5),
             list(id = "b2",
                  val = 8,
                  children = list(
                    list(id = "b21",
                         val = 4)
                  )))),
      level = 2,
      name = "second child",
      class = "custom-class"
    )),
  level = "1",
  name = "root"
)
tree(list_w_attrs, show_attributes = TRUE)
#> 
#> [list]
#> ├─1:{list}
#> │ ├─id:"a"
#> │ ├─val:2
#> │ ├┄<attr>names:"id","val"
#> │ ├┄<attr>level:2
#> │ └┄<attr>name:"first child"
#> ├─2:{custom-class}
#> ┊ ├─id:"b"
#> ┊ ├─val:1
#> ┊ ├─children:[list]
#> ┊ ┊ ├─1:{list}
#> ┊ ┊ │ ├─id:"b1"
#> ┊ ┊ │ ├─val:2.5
#> ┊ ┊ │ └┄<attr>names:"id","val"
#> ┊ ┊ └─2:{list}
#> ┊ ┊   ├─id:"b2"
#> ┊ ┊   ├─val:8
#> ┊ ┊   ├─children:[list]
#> ┊ ┊   ┊ └─1:{list}
#> ┊ ┊   ┊   ├─id:"b21"
#> ┊ ┊   ┊   ├─val:4
#> ┊ ┊   ┊   └┄<attr>names:"id","val"
#> ┊ ┊   └┄<attr>names:"id","val","children"
#> ┊ ├┄<attr>names:"id","val","children"
#> ┊ ├┄<attr>level:2
#> ┊ ├┄<attr>name:"second child"
#> ┊ └┄<attr>class:"custom-class"
#> ├┄<attr>level:"1"
#> └┄<attr>name:"root"

You can customize the tree appearance by swapping out unicode characters

Find characters at
unicode-table.com.

tree(plain_list,
     char_vertical = "\u2551",
     char_horizontal = "\u2550",
     char_branch = "\u2560",
     char_final_branch = "\u255A")
#> 
#> [list]
#> ╠═1:{list}
#> ║ ╠═id:"a"
#> ║ ╚═val:2
#> ╚═2:{list}
#>   ╠═id:"b"
#>   ╠═val:1
#>   ╚═children:[list]
#>     ╠═1:{list}
#>     ║ ╠═id:"b1"
#>     ║ ╚═val:2.5
#>     ╚═2:{list}
#>       ╠═id:"b2"
#>       ╠═val:8
#>       ╚═children:[list]
#>         ╚═1:{list}
#>           ╠═id:"b21"
#>           ╚═val:4

You can also pass custom style functions for class and value printing
that transform text. E.g. using crayon to style.

library(crayon)
tree(plain_list, 
     class_printer = white$bgBlue,
     val_printer = underline$red)

Screenshot of output is consoles with styling support.

Screenshot of output in consoles with styling support.

Limitations/Drawbacks

Built entirely outside of lobstr at first:

  • Since I assumed no dependencies there very well may be some rlang niceties that could make the code much more maintainable.
  • No shared code with the already existing shared tree printing code that is used in functions like ast()

Testing

  • Most tests while building function were snapshot tests which are not compatible with the current testhat version for lobstr.
  • Many of the non-snapshot tests rely on the lack of some results which is a bit confusing to reason about and make sure that the tests themselves don't break. Not sure how to fix this.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comment; will be easier to respond with more details once you've changed the tests.

R/tree.R Outdated Show resolved Hide resolved
tests/testthat/test-tree.R Outdated Show resolved Hide resolved
# )
# })

test_that("Max depth can be enforced", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pretty much all of these should be converted to snapshot tests, with the object included in the snapshot code. That'll make it much easier for me to review the precise display decisions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! For ones where I reuse the same object a few times for testing different options I just made the snapshot an expression that declared the object as a variable and passed it twice.
E.g.

  expect_snapshot({
    deep_list <- list(
      list(
        id = "b",
        val = 1,
        children = list(
          list(id = "b1",val = 2.5),
          list(
            id = "b2",
            val = 8,
            children = list(
              list(id = "b21", val = 4)
            )
          )
        )
      ),
      list(id = "a", val = 2)
    )
    tree(deep_list, max_depth = 1)
    tree(deep_list, max_depth = 2)
    tree(deep_list, max_depth = 3)

    tree(deep_list, max_length = 0)
    tree(deep_list, max_length = 2)
    tree(deep_list, max_depth = 1, max_length = 4)
  })

Works fine but didn't know if that was frowned upon.

R/tree.R Outdated Show resolved Hide resolved
R/tree.R Show resolved Hide resolved
R/tree.R Outdated Show resolved Hide resolved
R/tree.R Outdated Show resolved Hide resolved
R/tree.R Outdated Show resolved Hide resolved
R/tree.R Outdated
# Environments also tend to have the same trouble as functions. For instance
# the srcobject attached to a function's attributes is an environment but
# doesn't report as one to s3.
tree_label.environment(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But tree_label.environment just calls format.default? I think it would be better to at least optionally recurse into environments.

Copy link
Collaborator Author

@nstrayer nstrayer Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I had put it off because it felt hard due to not knowing when to stop recusing. I think I have a good heuristic for stopping now:

...
if (is_environment(x)) {
  # If we're looking at an environment, we add its parent as child
  parent <- env_parent(x)

  already_seen_parent <- any(as.logical(lapply(counter_env$envs_seen, identical, parent)))
  # Stop recursion into environments when we get to either the calling
  # environment, the global environment, or an empty environment
  if (
    !(
      already_seen_parent |
      identical(parent, rlang::caller_env()) |
      identical(parent, rlang::empty_env()) |
      identical(parent, rlang::global_env())
    )
  ) {
    counter_env$envs_seen <- c(counter_env$envs_seen, parent)
    children <- c(children, parent = env_parent(x))
  }
}
...

E.g.:

library(rlang)
ea <- env(d = 4, e = 5)
lobstr::tree(env(ea, a = 1, b = 2, c = 3))
#> <environment: 0x7f91ba1a4ab8>
#> ├─a: 1
#> ├─b: 2
#> ├─c: 3
#> └─parent: <environment: 0x7f91aa4d42e0>
#>   ├─d: 4
#>   └─e: 5

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to show the parent -- for example, if you show the tree for shiny:::.globals (which is an environment), it'll show a bunch of environments that aren't relevant. I also think that if you do show parents, stopping at the global env or caller env can be misleading to users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add a bit more: I think it's very uncommon when using environments to care what's in the parent env. That only really is relevant in closures. In the vast majority of cases, when an environment is used as a data structure, the parent env is not relevant, since $ and [[ indexing won't look in the parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think you need to recurse into the parent either.

Once you ignore the parent, you're less likely to encounter other special environments, but I'd suggest following the same heuristics as obj_size(): https://github.com/r-lib/lobstr/blob/master/src/size.cpp#L157-L159

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursing into parent was removed in ce4128a, also matches the forbidden environments from obj_size().

#'
#' @export
tree_label <- function(x, val_printer, class_printer, remove_newlines){
UseMethod("tree_label")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, it's not clear to me that you actually want to use S3 dispatch at all here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for using S3 dispatch was to enable external developers to hook up custom printing functions for their classes. For instance maybe Shiny.tag wants to print as <h1/> etc..

Without that motivation, it does feel like creating unnecessary namespace exports.

It also feels like if the goal is truly to most accurately represent the structure then we wouldn't want custom printing for classes. However, if we wanted to have the function serve as a base for different tree-printing applications then maybe we should either keep the S3 dispatch or come up with another customization interface.

tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
{list}
├─name:"vectored list"
├─num_vec:1,2,3,4,5,6,7,8,9,10
└─char_vec:"a","b","c","d","e","f","g","h","i","j",...(n = 26)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this was char_vec[26]? i.e. always put the length on the left hand side.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if each individual string is very long?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up changing the vector printing to wrap in square brackets and then (if truncated) put the length at the end. Putting the length at the start felt like it interfered with ease-of-reading and meant that we needed to put it for every vector, which if they are short feels excessive.

<list>
├─name: "vectored list"
├─num_vec: (n:10) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
└─char_vec: (n:26) "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...

vs

<list>
├─name: "vectored list"
├─num_vec: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
└─char_vec: ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...] n:26

I'm not deadset in this viewpoint though.

Copy link
Member

@hadley hadley Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using something like pillar::obj_sum():

├─name<chr [1]>: "vectored list"
├─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
└─char_vec<chr [26]>: "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...

I'd prefer not to use [] just because that'd be introducing a new convention which we haven't used elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I definitely like that better.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with one exception. I feel like adding the same <type [length]> prefix for all the scalar atomics is a bit busier than necessary so I default to printing those as before; with an option to enable more verbose behavior.

Default keeps scalar atomics nice and clean (type's are still inferable with printing behavior)

lobstr::tree(
  list(name = "vectored list", num_vec = 1:10)
)
#> <list>
#> ├─name: "vectored list"
#> └─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

But the more verbose (and accurate to underyling structure) printing is available with an argument

lobstr::tree(
  list(name = "vectored list", num_vec = 1:10), 
  hide_scalar_types = FALSE
)
#> <list>
#> ├─name<chr [1]>: "vectored list"
#> └─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Do you think this is a fair compromise between terseness and conveying info about the element?

tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
tests/testthat/_snaps/tree.md Show resolved Hide resolved
tests/testthat/_snaps/tree.md Outdated Show resolved Hide resolved
{list}
├─normal string:"first element"
├─really long string:"abcdefghijklmnopqrstuvwxyzabcdef..."
└─vec of long strings:"a long
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this show \n insteadd?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using a unicode symbol to show where newlines existed instead of just wiping them out.

    {list}
     ├─normal string: "first element"
     ├─really long string: "abcdefghijklmnopqrstuvwxyzabcdef..."
     └─vec of long strings: ["a long↵and m...", "a fine length", "another long..."]

I figured it's less horizontal space taken up and a bit easier to parse on a glance. That being said I can see why we may want to simply use \n instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, interesting. Again, I don't have any concerns except that we haven't used it before elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is another example where you want to balance the length on individual elements with the number of elements displayed)

@hadley
Copy link
Member

hadley commented Apr 6, 2021

Fixes #2. Should we call it rst()?

@wch
Copy link
Member

wch commented Apr 6, 2021

Does rst stand for anything or is it just an anagram of str?

R/tree.R Outdated
Comment on lines 141 to 147
already_seen <- rlang::is_environment(x) &&
any(vapply(counter_env$envs_seen, identical, x, FUN.VALUE = NA))

if (!already_seen && rlang::is_environment(x)) {
# If this environment is new, add it to the seen
counter_env$envs_seen[[length(counter_env$envs_seen) + 1]] <- x
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to have:

  already_seen <- FALSE
  if (is_environment(x)) {
	already_seen <- any(vapply(counter_env$envs_seen, identical, x, FUN.VALUE = NA))
    if (!already_seen) {
      counter_env$envs_seen[[length(counter_env$envs_seen) + 1]] <- x
    }
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for clarity FUN.VALUE = logical(1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've recently been using NA/NA_character_/NA_integer_ etc., because the literal value has less overhead than a function call. The drawback is that for logicals, it's NA instead of NA_logical_, so it's not as obvious what the type is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadley
Copy link
Member

hadley commented Apr 13, 2021

@nstrayer if you fix the build failures, I think this is in a good place to merge, and we can continue iterating in future PRs.

…ist-like objects passing through as.list unscathed
R/tree.R Outdated
if (rlang::is_environment(x)) {
# Environments are funky as they dont have names before conversion to list
# but do after, so let them handle their conversion
return (as.list(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need to be as.list.environment(), because if the environment has a class (like with R6 objects), as.list() will not dispatch to as.list.environment().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to consider whether or not to set all.names = TRUE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had no idea the default was FALSE for that. Updated to TRUE. Since the purpose of tree is to show the true underlying structure of objects it seems only natural we would expose dot prefixed children.

R/tree.R Outdated
@@ -268,6 +268,22 @@ is_printable_env <- function(x) {
)
}

tree_as_list <- function(x){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of objects can this take as input? I'm asking because the removal of all attributes (other than names) may have unexpected results. For example:

x <- matrix(1:6, 2, 3)
attributes(x)
#> $dim
#> [1] 2 3

x <- data.frame(x=1:3, y=4:6)
attributes(x)
#> $names
#> [1] "x" "y"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3

Copy link
Member

@hadley hadley May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripping attributes seems like the right approach to me. The tree is fundamentally about the underlying data not the lies that S3 might be telling you about it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used to make sure we can properly iterate over the children. We already pull off attributes before doing this if they are requested with show_attributes = TRUE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about what kind of objects this function can take as input? That'll help future readers to understand what should be happening in this function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I realized that I only called this function once. (I had wrapped attributes(x) in it, but since attributes() always returns a list I didn't need to there.) So I moved the function inline and streamlined it a bit.

...
    # Coerce current object to a plain list. This is necessary as some s3
    # classes override `[[` and return funky stuff like themselves (see s3 class
    # "package_version")
    children <- if (is_printable_env(x)) {
      # Environments are funky as they don't have names before conversion to list
      # but do after, so let them handle their conversion.
      # We use all.names = TRUE in an effort to fully explain the object
      as.list.environment(x, all.names = TRUE)
    } else {
      # By wiping all attributes except for the names we force the object to be
      # a plain list. This is inspired by the (now depreciated) rlang::as_list().
      attributes(x) <- list(names = names(x))
      as.list(x)
    }
...

@nstrayer
Copy link
Collaborator Author

All checks are passing -- finally! So baring name changes this should be good to go.

@nstrayer nstrayer merged commit 91ff8b5 into r-lib:master May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants