Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatree: Dynamically populate the HTML repr #9350

Open
flamingbear opened this issue Aug 13, 2024 · 2 comments
Open

Datatree: Dynamically populate the HTML repr #9350

flamingbear opened this issue Aug 13, 2024 · 2 comments
Labels
topic-DataTree Related to the implementation of a DataTree class

Comments

@flamingbear
Copy link
Member

flamingbear commented Aug 13, 2024

What is your issue?

Originally posted by @TomNicholas in xarray-contrib/datatree#206

@andersy005, @jbusecke and I noticed that for big trees (hundreds or thousands of nodes) the HTML repr can become very slow to render, potentially locking up your jupyter notebook.

We think that's because the HTML representing the whole tree is pre-rendered in one go, and hidden by defaulting sections to be closed. If your tree contains thousands of nodes that's a lot of HTML to render.

@andersy005 suggested that perhaps the HTML repr should contain some kind of callback, so that the code to render new nodes is only opened

I don't know if that's possible at all, or whether it would work for reprs rendered in non-interactive environments (such as in xarray's static docs pages).

@flamingbear flamingbear added the topic-DataTree Related to the implementation of a DataTree class label Aug 13, 2024
@Illviljan
Copy link
Contributor

Both html repr and the normal repr are struggling with large datatrees. The normal repr should probably be truncated in similar fashion as the dataset repr:

"display_max_rows": 12,

@Illviljan
Copy link
Contributor

Illviljan commented Sep 17, 2024

With this example the html repr takes 3 minutes compared to the 840ms from the normal repr:

import numpy as np
import xarray as xr
from xarray.core.datatree import DataTree


def create_datatree(number_of_files, number_of_groups, number_of_variables):
    datasets = {}
    for f in range(number_of_files):
        for g in range(number_of_groups):
            # Create random data:
            time = np.linspace(0, 50 + f, 100 + g)
            y = f * time + g

            # Create dataset:
            ds = xr.Dataset(
                data_vars={
                    f"temperature_{g}{i}": ("time", y)
                    for i in range(number_of_variables // number_of_groups)
                },
                coords={"time": ("time", time)},
            ).chunk()

            # Prepare for Datatree:
            name = f"file_{f}/group_{g}"
            datasets[name] = ds
    dt = DataTree.from_dict(datasets)

    return dt


number_of_files = 25
number_of_groups = 20
number_of_variables = 2000

dt = create_datatree(number_of_files, number_of_groups, number_of_variables)

# %timeit dt._repr_html_()
# 3min 15s ± 4.37 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

# %timeit dt.__repr__()
# 840 ms ± 29.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

2 participants