-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tree
function to print nested structure list-like objects
#56
Conversation
…a warning not an error
…eaned up to avoid breaking tree vertical continuity
…ls arent awkwardly on the same line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comment; will be easier to respond with more details once you've changed the tests.
tests/testthat/test-tree.R
Outdated
# ) | ||
# }) | ||
|
||
test_that("Max depth can be enforced", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pretty much all of these should be converted to snapshot tests, with the object included in the snapshot code. That'll make it much easier for me to review the precise display decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! For ones where I reuse the same object a few times for testing different options I just made the snapshot an expression that declared the object as a variable and passed it twice.
E.g.
expect_snapshot({
deep_list <- list(
list(
id = "b",
val = 1,
children = list(
list(id = "b1",val = 2.5),
list(
id = "b2",
val = 8,
children = list(
list(id = "b21", val = 4)
)
)
)
),
list(id = "a", val = 2)
)
tree(deep_list, max_depth = 1)
tree(deep_list, max_depth = 2)
tree(deep_list, max_depth = 3)
tree(deep_list, max_length = 0)
tree(deep_list, max_length = 2)
tree(deep_list, max_depth = 1, max_length = 4)
})
Works fine but didn't know if that was frowned upon.
R/tree.R
Outdated
# Environments also tend to have the same trouble as functions. For instance | ||
# the srcobject attached to a function's attributes is an environment but | ||
# doesn't report as one to s3. | ||
tree_label.environment(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But tree_label.environment
just calls format.default
? I think it would be better to at least optionally recurse into environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I had put it off because it felt hard due to not knowing when to stop recusing. I think I have a good heuristic for stopping now:
...
if (is_environment(x)) {
# If we're looking at an environment, we add its parent as child
parent <- env_parent(x)
already_seen_parent <- any(as.logical(lapply(counter_env$envs_seen, identical, parent)))
# Stop recursion into environments when we get to either the calling
# environment, the global environment, or an empty environment
if (
!(
already_seen_parent |
identical(parent, rlang::caller_env()) |
identical(parent, rlang::empty_env()) |
identical(parent, rlang::global_env())
)
) {
counter_env$envs_seen <- c(counter_env$envs_seen, parent)
children <- c(children, parent = env_parent(x))
}
}
...
E.g.:
library(rlang)
ea <- env(d = 4, e = 5)
lobstr::tree(env(ea, a = 1, b = 2, c = 3))
#> <environment: 0x7f91ba1a4ab8>
#> ├─a: 1
#> ├─b: 2
#> ├─c: 3
#> └─parent: <environment: 0x7f91aa4d42e0>
#> ├─d: 4
#> └─e: 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to show the parent -- for example, if you show the tree for shiny:::.globals
(which is an environment), it'll show a bunch of environments that aren't relevant. I also think that if you do show parents, stopping at the global env or caller env can be misleading to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add a bit more: I think it's very uncommon when using environments to care what's in the parent env. That only really is relevant in closures. In the vast majority of cases, when an environment is used as a data structure, the parent env is not relevant, since $
and [[
indexing won't look in the parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think you need to recurse into the parent either.
Once you ignore the parent, you're less likely to encounter other special environments, but I'd suggest following the same heuristics as obj_size()
: https://github.com/r-lib/lobstr/blob/master/src/size.cpp#L157-L159
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursing into parent was removed in ce4128a, also matches the forbidden environments from obj_size()
.
#' | ||
#' @export | ||
tree_label <- function(x, val_printer, class_printer, remove_newlines){ | ||
UseMethod("tree_label") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, it's not clear to me that you actually want to use S3 dispatch at all here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation for using S3 dispatch was to enable external developers to hook up custom printing functions for their classes. For instance maybe Shiny.tag
wants to print as <h1/>
etc..
Without that motivation, it does feel like creating unnecessary namespace exports.
It also feels like if the goal is truly to most accurately represent the structure then we wouldn't want custom printing for classes. However, if we wanted to have the function serve as a base for different tree-printing applications then maybe we should either keep the S3 dispatch or come up with another customization interface.
tests/testthat/_snaps/tree.md
Outdated
{list} | ||
├─name:"vectored list" | ||
├─num_vec:1,2,3,4,5,6,7,8,9,10 | ||
└─char_vec:"a","b","c","d","e","f","g","h","i","j",...(n = 26) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this was char_vec[26]
? i.e. always put the length on the left hand side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if each individual string is very long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up changing the vector printing to wrap in square brackets and then (if truncated) put the length at the end. Putting the length at the start felt like it interfered with ease-of-reading and meant that we needed to put it for every vector, which if they are short feels excessive.
<list>
├─name: "vectored list"
├─num_vec: (n:10) 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
└─char_vec: (n:26) "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...
vs
<list>
├─name: "vectored list"
├─num_vec: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
└─char_vec: ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...] n:26
I'm not deadset in this viewpoint though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using something like pillar::obj_sum()
:
├─name<chr [1]>: "vectored list"
├─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
└─char_vec<chr [26]>: "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", ...
I'd prefer not to use []
just because that'd be introducing a new convention which we haven't used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I definitely like that better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with one exception. I feel like adding the same <type [length]>
prefix for all the scalar atomics is a bit busier than necessary so I default to printing those as before; with an option to enable more verbose behavior.
Default keeps scalar atomics nice and clean (type's are still inferable with printing behavior)
lobstr::tree(
list(name = "vectored list", num_vec = 1:10)
)
#> <list>
#> ├─name: "vectored list"
#> └─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
But the more verbose (and accurate to underyling structure) printing is available with an argument
lobstr::tree(
list(name = "vectored list", num_vec = 1:10),
hide_scalar_types = FALSE
)
#> <list>
#> ├─name<chr [1]>: "vectored list"
#> └─num_vec<int [10]>: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Do you think this is a fair compromise between terseness and conveying info about the element?
tests/testthat/_snaps/tree.md
Outdated
{list} | ||
├─normal string:"first element" | ||
├─really long string:"abcdefghijklmnopqrstuvwxyzabcdef..." | ||
└─vec of long strings:"a long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this show \n
insteadd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using a unicode symbol to show where newlines existed instead of just wiping them out.
{list}
├─normal string: "first element"
├─really long string: "abcdefghijklmnopqrstuvwxyzabcdef..."
└─vec of long strings: ["a long↵and m...", "a fine length", "another long..."]
I figured it's less horizontal space taken up and a bit easier to parse on a glance. That being said I can see why we may want to simply use \n
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, interesting. Again, I don't have any concerns except that we haven't used it before elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is another example where you want to balance the length on individual elements with the number of elements displayed)
Fixes #2. Should we call it |
Does |
…element not just those in array-like structures
… els> at the end for truncated vectors
…t to snapshot test
…an ellipsis to show more exists
R/tree.R
Outdated
already_seen <- rlang::is_environment(x) && | ||
any(vapply(counter_env$envs_seen, identical, x, FUN.VALUE = NA)) | ||
|
||
if (!already_seen && rlang::is_environment(x)) { | ||
# If this environment is new, add it to the seen | ||
counter_env$envs_seen[[length(counter_env$envs_seen) + 1]] <- x | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to have:
already_seen <- FALSE
if (is_environment(x)) {
already_seen <- any(vapply(counter_env$envs_seen, identical, x, FUN.VALUE = NA))
if (!already_seen) {
counter_env$envs_seen[[length(counter_env$envs_seen) + 1]] <- x
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for clarity FUN.VALUE = logical(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've recently been using NA
/NA_character_
/NA_integer_
etc., because the literal value has less overhead than a function call. The drawback is that for logicals, it's NA
instead of NA_logical_
, so it's not as obvious what the type is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…on into themselves
@nstrayer if you fix the build failures, I think this is in a good place to merge, and we can continue iterating in future PRs. |
…ist-like objects passing through as.list unscathed
R/tree.R
Outdated
if (rlang::is_environment(x)) { | ||
# Environments are funky as they dont have names before conversion to list | ||
# but do after, so let them handle their conversion | ||
return (as.list(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might need to be as.list.environment()
, because if the environment has a class (like with R6 objects), as.list()
will not dispatch to as.list.environment()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to consider whether or not to set all.names = TRUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no idea the default was FALSE
for that. Updated to TRUE
. Since the purpose of tree
is to show the true underlying structure of objects it seems only natural we would expose dot prefixed children.
R/tree.R
Outdated
@@ -268,6 +268,22 @@ is_printable_env <- function(x) { | |||
) | |||
} | |||
|
|||
tree_as_list <- function(x){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of objects can this take as input? I'm asking because the removal of all attributes (other than names
) may have unexpected results. For example:
x <- matrix(1:6, 2, 3)
attributes(x)
#> $dim
#> [1] 2 3
x <- data.frame(x=1:3, y=4:6)
attributes(x)
#> $names
#> [1] "x" "y"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stripping attributes seems like the right approach to me. The tree is fundamentally about the underlying data not the lies that S3 might be telling you about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used to make sure we can properly iterate over the children. We already pull off attributes before doing this if they are requested with show_attributes = TRUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment about what kind of objects this function can take as input? That'll help future readers to understand what should be happening in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I realized that I only called this function once. (I had wrapped attributes(x)
in it, but since attributes()
always returns a list I didn't need to there.) So I moved the function inline and streamlined it a bit.
...
# Coerce current object to a plain list. This is necessary as some s3
# classes override `[[` and return funky stuff like themselves (see s3 class
# "package_version")
children <- if (is_printable_env(x)) {
# Environments are funky as they don't have names before conversion to list
# but do after, so let them handle their conversion.
# We use all.names = TRUE in an effort to fully explain the object
as.list.environment(x, all.names = TRUE)
} else {
# By wiping all attributes except for the names we force the object to be
# a plain list. This is inspired by the (now depreciated) rlang::as_list().
attributes(x) <- list(names = names(x))
as.list(x)
}
...
… more usefull comments
… would not show proper branch structure
Different versions of R have different arguments for strArgs so just define a function so we know the args wont change
…acter support breaking snapshots
All checks are passing -- finally! So baring name changes this should be good to go. |
This PR adds the function
tree()
(and its helper functiontree_label()
.The purpose of
tree
is to replacestr()
for complex nested structures. The original inspiration was HTML tag lists with children and various other valuable information hanging off of them.Examples
Printing simple lists
Output depth can be limited to avoid information overload
You can further simplify output by avoiding printing indices for id-less
elements
Prints atomic vectors inline (up to 10 elements)
Attempts to work with any list-like object.
For instance: html tag structures
Attributes can be shown (but arent by default)
You can customize the tree appearance by swapping out unicode characters
Find characters at
unicode-table.com.
You can also pass custom style functions for class and value printing
that transform text. E.g. using
crayon
to style.Screenshot of output in consoles with styling support.
Limitations/Drawbacks
Built entirely outside of
lobstr
at first:rlang
niceties that could make the code much more maintainable.ast()
Testing
testhat
version forlobstr
.