-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better scrolling for large tabular/text output #2049
Comments
I can't reproduce. Please provide a link to an example notebook. |
@gnestor did you try my example code? maybe you just have a faster computer than me? |
Yes, I did. Initially, it was throttled by the default data rate limit (which will ship with 5.0): After overriding it, I was able to fetch the data and it was slow to respond to user events initially (because it's a lot of data and a lot of DOM nodes), but then I was able to change focus with no issue (I tried the There isn't much we can do to resolve the notebook being slow when a ton of data is rendered, except for implementing some form of infinite scrolling that can paginate through data and ideally fetch new data from the kernel vs. keeping it in memory in the browser. |
@gnestor yes this makes sense, although it might be a lot of work I think this is definitely a feature we should have, I've renamed the issue accordingly. |
@Carreau @takluyver Has there been any discussion around infinite scrolling for output areas? I'm not sure how much (if any) performance improvements we would get without kernel interaction (e.g. fetching new pages of data from the kernel). @themrmax If kernel interaction is required, then this feature would not be implemented in notebook but rather ipywidgets or another "infinite scrolling output area" extension that can interact with ipython and ideally other kernels. This could be a good experiment for a kernel-connected mimerender extension. |
Yes, and more generally about the notebook itself. Another question would be how to "recycle" things that are high on the page. In the end it would be easier if model was on server side and do lazy loading. But that's far away. Beyond infinite scrolling, we can just collapse something to be and ask a "would you like to display everything". |
@gnestor i really like the idea of the jupyterlab extension, i think this is key to achieve feature parity with Rstudio/Spyder etc. @Carreau do you think it would make sense to use something like https://github.com/NeXTs/Clusterize.js ? it looks like they are using the fixed hight div's trick you suggest, and maybe it would be a fairly lightweight integration? EDIT: i'm going to try and update |
We are having a related discussion in JupyterLab: jupyterlab/jupyterlab#1587 |
@blink1073 Thanks! @themrmax I started implementing react-virtualized in jupyterlab_table! It will be good to compare the performance of both 👍 |
@themrmax Here is an initial implementation using react-virtualized's table: https://github.com/gnestor/jupyterlab_table/tree/react-virtualized I also have an example using it's grid which allows for horizontal scrolling but actually uses 2 different grids (one for the header) and attempts to keep their x scroll positions in sync (not very well). They're both pretty performant but could be optimized: |
Nice! |
Excellent! |
@gnestor very nice! i've gotten basically the same thing working for containerized (i haven't done the headers, but i think i would need to do a similar trick as you) https://github.com/themrmax/jupyterlab_table/tree/clusterize a big problem i can see with both of our solutions is that when data gets large (i.e. over a few thousand rows), it takes a very long time to load the component. as far as i can tell it's not a problem with the frameworks, the demo on https://clusterize.js.org/ instantly loads 500K rows if they're generated by javascript. is it a limitation of us loading the data into the browser as a single JSON payload, and is there a way we could work around this? EDIT: Just noticed the |
@themrmax Yes. There is observable latency for 10,000+ cells for me. There is no latency when rendering as a pandas HTML table and this is partly because the pandas HTML is rendered kernel-side. A few thoughts:
{
"resources": [],
"metadata": {
"startIndex": 0,
"stopIndex": 1000,
"loadMoreRows": "df[startIndex:stopIndex]"
}
} Assuming that the extension can communicate with the kernel (which I know is possible but don't know how to implement), the extension could parse this
|
I'd certainly like to see it. I'm hopeful that we can make some simplified mechanics with the |
Sometimes I would like to view a large dataset inside the output of a cell. If the cell contains a lot of data, it's slow to change focus to/from the cell. For example if I run a cell with the following, once it's completed it takes around 2 seconds to focus in or out of the cell (tested on Chrome and Firefox on a Macbook Air). This is especially frustrating when I'm trying to quickly navigate around the notebook with "J" and "K".
A related issue, if I try and display an equivalently large HTML table (i.e. the display output from a pandas dataframe), the whole browser window seizes up. Maybe this is a limitation of using HTML to display tables, however I think we need to have a way to browse/view large tabular datasets to avoid the need for an external depenency on something like Excel.
EDIT Just realized if I toggle the output of the cell, there is no problem navigating across the cell, so maybe it's not such a big problem.
The text was updated successfully, but these errors were encountered: