Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip computing size of child objects #631

Merged
merged 1 commit into from
Nov 22, 2024
Merged

Skip computing size of child objects #631

merged 1 commit into from
Nov 22, 2024

Conversation

dfalbel
Copy link
Contributor

@dfalbel dfalbel commented Nov 15, 2024

This speeds up expanding objects with a lot of self-references for which it's very slow to compute the size of.
Relates to posit-dev/positron#4636
Built on top of #629

@dfalbel dfalbel requested a review from lionel- November 15, 2024 17:16
Copy link
Contributor

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the performance problem because we were repeatedly computing the size for all nested objects?

I guess ideally we'd start with the leaves and compute sizes of intermediary nodes from the size of their components, this way the work we're doing is proportional to the size of the tree?

The apporach taken here seems like a reasonable workaround.

},
}
}

/**
* Create a new Variable from an R object
*/
fn from(access_key: String, display_name: String, x: SEXP) -> Self {
fn from(access_key: String, display_name: String, x: SEXP, compute_size: bool) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an API standpoint I think a builder-style approach would be cleaner. For instance adding a compute_size() method that you'd call after new() or from(). By default size would be 0.

Especially since computing the size is the less common case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I'll make this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the problem is that PositronVariable doesn't hold any reference to the RObject, so we can't compute_size() later. We can do something like

var.size = object.size()

at the call site, what do you think of this approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep sounds good

@dfalbel
Copy link
Contributor Author

dfalbel commented Nov 19, 2024

I didn't do a detailed benchmark but, when we expand the children, we return a list of PositronVariables, one for each child node. In the case of the big model in posit-dev/positron#4636 many of the child nodes seem to include references to the data.frame like eg, the formula, and call objects. This causes the each child node to trigger a large computation for the size.

Since we can't really tell which child object ultimatelly owns the actual data.frame, we would also probably sum(size(child) for child in children) much larger then the object size.

Eg if we try to size each element of a list built like:

x <- list(
  x1 = matrix(1, ncol = 10000, nrow = 10000)
)
for (l in letters) {
  x[[l]] <- x$x1
}
x

Anyway, skipping computing the size of child objects make expanding the model run ins ~300ms while if we compute the size, the total time is at around ~1300ms

Base automatically changed from bugfix/honor-max-display-entries to main November 21, 2024 13:49
@dfalbel dfalbel merged commit e4630a5 into main Nov 22, 2024
6 checks passed
@dfalbel dfalbel deleted the children-size branch November 22, 2024 12:03
@github-actions github-actions bot locked and limited conversation to collaborators Nov 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants