Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when rendering/expanding large R object in Variables tab #3628

Closed
jmcphers opened this issue Jun 22, 2024 · 15 comments
Closed

Error when rendering/expanding large R object in Variables tab #3628

jmcphers opened this issue Jun 22, 2024 · 15 comments
Assignees
Labels
area: variables Issues related to Variables category. bug Something isn't working lang: r support

Comments

@jmcphers
Copy link
Collaborator

Discussed in https://github.com/posit-dev/positron-beta/discussions/220

Originally posted by mikemahoney218 June 20, 2024
Hi all,

When clicking to expand a large data frame in the Variables tab, I get the following error:
image

What happens next seems to vary; I've sometimes had this freeze the console so all input is ignored, and sometimes I've had the arrow in the Variables pane then turn down, even though it hasn't been expanded:
image

If the console freezes, I need to restart Positron entirely, not just the R session, to get it back.

@jmcphers jmcphers transferred this issue from another repository Jun 22, 2024
@jmcphers
Copy link
Collaborator Author

I can reproduce this by creating an enormous dataset -- try repeatedly doubling the rows and columns of the nycflights13::flights dataset until it is about the same size as Mike's dataset. It won't expand in the Variables pane any longer.

> foo <- nycflights13::flights
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- cbind(foo, foo)
image

@jmcphers jmcphers changed the title Error when expanding large data frame in Variables tab Error when expanding large R data frame in Variables tab Jun 22, 2024
@juliasilge juliasilge added the area: variables Issues related to Variables category. label Jun 22, 2024
@petetronic petetronic added this to the Release Candidate milestone Jun 24, 2024
@jmcphers
Copy link
Collaborator Author

jmcphers commented Jul 3, 2024

Reported again in #3833.

@jmcphers
Copy link
Collaborator Author

Reported on Mastodon too: https://mastodon.online/@[email protected]/112966300026506969

@juliasilge
Copy link
Contributor

Reported again in #5270 with this example:

# Need to do this first
# devtools::install_github('satijalab/seurat-data')
# SeuratData::InstallData("pbmc3k")
library(Seurat)
library(SeuratData)

pbmc3k.final <- LoadData("pbmc3k", type = "pbmc3k.final")

print("hi")

Do str(pbmc3k.final) to see this enormous, complicated object that hangs the Variables pane.

@juliasilge juliasilge changed the title Error when expanding large R data frame in Variables tab Error when rendering/expanding large R object in Variables tab Nov 7, 2024
@juliasilge juliasilge added the bug Something isn't working label Nov 7, 2024
@const-ae
Copy link

const-ae commented Nov 11, 2024

Hi, I also just wanted to add that I ran into the same issue and suggest a maybe even easier way to reproduce the problem without downloading some extra files:

system.time({
  mat <- matrix(0, nrow = 1000, ncol = 1e5)
  sce <- SingleCellExperiment::SingleCellExperiment(list(mat = mat))
})
print("hi")

The creation of the SingleCellExperiment object takes 80ms on my computer, the console then freezes for 30 seconds before printing "hi".


And also just to add, when I run the same code in a jupyter console with the ark kernel, there is no additional delay

@jmcphers
Copy link
Collaborator Author

Reported again in #5333.

@testlabauto
Copy link
Contributor

Still able to repro with Jonathan's example:

> foo <- nycflights13::flights
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- rbind(foo, foo)
> foo <- cbind(foo, foo)

with:
Positron Version: 2024.12.0 (Universal) build 66

@dfalbel
Copy link
Contributor

dfalbel commented Nov 25, 2024

The existing problem with flights is that we are very slowly formatting the POSIXct columns.
We can see that removing the POSIXct columns fixes the problem:

f <- foo[, !sapply(foo, \(x) inherits(x, "POSIXct"))]

Probably because of https://github.com/posit-dev/ark/blob/f3fffe3d41bf0b33a7e34097c05f3b685b6bfaea/crates/harp/src/vector/formatted_vector.rs#L122-L129

We'll need a better way to iterate over an object that doesn't require formatting the whole column.

@ZhimingYe

This comment has been minimized.

@dfalbel
Copy link
Contributor

dfalbel commented Nov 26, 2024

There are pros and cons of using str for displaying the info. The main feature we want to support and using str makes it harder is adding options to expand nested items. For instance, in RStudio you either expand the whole list or not, while in Positron you can selectively expand nested elements too.

Another problem with str is that there's no real guarantee that it won't do a lot of work too. Since it's an s3 generic, it could be that a class didn't nicely implement it and it takes a lot of time to execute. Dependning on the implementation it may also materialize ALTREP objects, etc too.

@ZhimingYe
Copy link

@dfalbel Thank you for the clarification! I understand and respect your decision. I truly appreciate all the effort and dedication you’ve put into ARK.

That said, some of RStudio’s approaches might be worth considering, such as skipping particularly large objects, setting rendering timeouts (like the in valueFromStr setted withTimeLimit , and interrupting operations when the timeout is exceeded. While these methods are relatively straightforward and not very refined, they greatly enhance the user experience.

rstudio/rstudio#47

https://github.com/rstudio/rstudio/blob/main/src/cpp/session/modules/SessionEnvironment.R

.rs.addFunction("valueFromStr", function(val)
{
  .rs.withTimeLimit(1, fail = "<truncated>", {
    capture.output(try(str(val), silent = TRUE))
  })
  
})

...

if (is.symbol(obj))
{
  val <- as.character(obj)
}
else if (is.language(obj))
{
  val <- .rs.describeCall(obj)
}
else if (!hasNullPtr)
{
  # for large objects (> half MB), don't try to get the value, just show
  # the size. Some functions (e.g. str()) can cause the object to be
  # copied, which is slow for large objects.
  if (size > 524288)
  {
    len_desc <- if (len > 1)
    {
      paste(len, " elements, ", sep = "")
    }
    else
    {
      ""
    }
    
    # data frames are likely to be large, but a summary is still helpful
    if (is.data.frame(obj))
    {
      val <- "NO_VALUE"
      desc <- .rs.valueDescription(obj)
    }
    else
    {
      fmt <- "Large %s (%s %s)"
      val <- sprintf(fmt, class, len_desc, format(size, units = "auto", standard = "SI"))
    }
    contents_deferred <- TRUE
  }
  else
  {
    val <- .rs.valueAsString(obj)
    desc <- .rs.valueDescription(obj)
    
    # expandable object--supply contents
    if (is.list(obj) ||  is.data.frame(obj) || isS4(obj) ||
        inherits(obj, c("data.table", "ore.frame", "cast_df", "xts", "DataFrame")))
    {
      if (computeSize)
      {
        # normal object
        contents <- .rs.valueContents(obj)
      }
      else
      {
        # don't prefetch content for altreps
        val <- "NO_VALUE"
        contents_deferred <- TRUE
      }
    }
  }
}

In most cases, users only need a rough preview (especially the data hierarchy). For more detailed exploration, they are more likely to subset the object to inspect specific items or use tools like Data Explorer, rather than being limited to the variable pane. :)

@dfalbel
Copy link
Contributor

dfalbel commented Nov 27, 2024

Thanks @ZhimingYe
We definitely plan to implement similar mechanisms in Positron. See e.g. #1419 (comment). We still don't have the entire infrastructure for that though.

In the meantime, the fixes we have pushed will probably solve most of the problems, as a lot of the performance cost was caused by bugs in the formatting routine, triggering unnecessarily large computations.

dfalbel added a commit that referenced this issue Dec 18, 2024
For:

[PR 655](posit-dev/ark#655): Limits the number
of bins for histograms to avoid crashes, addressing [issue
5744](#5744).
[PR 654](posit-dev/ark#654): Adds more logging
for Help proxy errors, addressing [issue
3543](#3543).
[PR 646](posit-dev/ark#646): Refactors
FormattedVector for faster formatting of S3 objects, improving
performance when expanding large data.frames, related to
[comment](#3628 (comment)).
@testlabauto
Copy link
Contributor

Verified Fixed

Positron Version(s) : 2025.01.0-124
OS Version          : OSX

Test scenario(s)

Verified with Jonathans repro steps. Looks great.

Link(s) to TestRail test cases run or created:

@pieterjanvc
Copy link

Hello,

Just letting you know that I started getting this error after installing the latest version on my system.

Error refreshing variables: RPC timed out after 5 seconds: {"jsonrpc":"2.0","method":"list"}

Error (-32603)
Positron Version: 2025.01.0 (system setup) build 152
Code - OSS Version: 1.95.0
Commit: 66aa3fb7f98eb8d19155cb7220856154f6ede8b3
Date: 2025-01-06T02:53:20.465Z
Electron: 32.2.1
Chromium: 128.0.6613.186
Node.js: 20.18.0
V8: 12.8.374.38-electron.0
OS: Windows_NT x64 10.0.26100

I did not see this error before with the previous version

@juliasilge
Copy link
Contributor

This new error you are seeing @pieterjanvc is #5813 and the fix will be included in the next release. Thanks for reporting!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: variables Issues related to Variables category. bug Something isn't working lang: r support
Projects
None yet
Development

No branches or pull requests

8 participants