Skip computing size of child objects #631

dfalbel · 2024-11-15T15:40:44Z

This speeds up expanding objects with a lot of self-references for which it's very slow to compute the size of.
Relates to posit-dev/positron#4636
Built on top of #629

lionel-

Was the performance problem because we were repeatedly computing the size for all nested objects?

I guess ideally we'd start with the leaves and compute sizes of intermediary nodes from the size of their components, this way the work we're doing is proportional to the size of the tree?

The apporach taken here seems like a reasonable workaround.

lionel- · 2024-11-19T13:14:04Z

crates/ark/src/variables/variable.rs

            },
        }
    }

    /**
     * Create a new Variable from an R object
     */
-    fn from(access_key: String, display_name: String, x: SEXP) -> Self {
+    fn from(access_key: String, display_name: String, x: SEXP, compute_size: bool) -> Self {


From an API standpoint I think a builder-style approach would be cleaner. For instance adding a compute_size() method that you'd call after new() or from(). By default size would be 0.

Especially since computing the size is the less common case.

Makes sense! I'll make this change

Actually, the problem is that PositronVariable doesn't hold any reference to the RObject, so we can't compute_size() later. We can do something like

var.size = object.size()

at the call site, what do you think of this approach?

yep sounds good

dfalbel · 2024-11-19T17:47:47Z

I didn't do a detailed benchmark but, when we expand the children, we return a list of PositronVariables, one for each child node. In the case of the big model in posit-dev/positron#4636 many of the child nodes seem to include references to the data.frame like eg, the formula, and call objects. This causes the each child node to trigger a large computation for the size.

Since we can't really tell which child object ultimatelly owns the actual data.frame, we would also probably sum(size(child) for child in children) much larger then the object size.

Eg if we try to size each element of a list built like:

x <- list(
  x1 = matrix(1, ncol = 10000, nrow = 10000)
)
for (l in letters) {
  x[[l]] <- x$x1
}
x

Anyway, skipping computing the size of child objects make expanding the model run ins ~300ms while if we compute the size, the total time is at around ~1300ms

dfalbel requested a review from lionel- November 15, 2024 17:16

lionel- approved these changes Nov 19, 2024

View reviewed changes

Base automatically changed from bugfix/honor-max-display-entries to main November 21, 2024 13:49

dfalbel force-pushed the children-size branch from 15b4c4f to a064f87 Compare November 21, 2024 13:52

Compute size at the call site.

4122b8f

dfalbel force-pushed the children-size branch from a064f87 to 4122b8f Compare November 21, 2024 14:24

dfalbel requested a review from lionel- November 21, 2024 15:26

lionel- approved these changes Nov 22, 2024

View reviewed changes

dfalbel merged commit e4630a5 into main Nov 22, 2024
6 checks passed

dfalbel deleted the children-size branch November 22, 2024 12:03

github-actions bot locked and limited conversation to collaborators Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip computing size of child objects #631

Skip computing size of child objects #631

dfalbel commented Nov 15, 2024

lionel- left a comment

lionel- Nov 19, 2024

dfalbel Nov 19, 2024

dfalbel Nov 19, 2024

lionel- Nov 20, 2024

dfalbel commented Nov 19, 2024 •

edited

Loading

Skip computing size of child objects #631

Skip computing size of child objects #631

Conversation

dfalbel commented Nov 15, 2024

lionel- left a comment

Choose a reason for hiding this comment

lionel- Nov 19, 2024

Choose a reason for hiding this comment

dfalbel Nov 19, 2024

Choose a reason for hiding this comment

dfalbel Nov 19, 2024

Choose a reason for hiding this comment

lionel- Nov 20, 2024

Choose a reason for hiding this comment

dfalbel commented Nov 19, 2024 • edited Loading

dfalbel commented Nov 19, 2024 •

edited

Loading