-
-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize handling of wide datasets and Pandas indexes #1350
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1350 +/- ##
==========================================
+ Coverage 87.39% 88.87% +1.47%
==========================================
Files 50 51 +1
Lines 7490 7479 -11
==========================================
+ Hits 6546 6647 +101
+ Misses 944 832 -112 ☔ View full report in Codecov by Sentry. |
Relevant to many things, but especially #1160 |
issue noted in meeting: "When printing the ndoverlay, the vdim is not a value label.. it's the label of the last element in the ndoverlay" |
With holoviz/holoviews#6262 this should now behave well. |
I confirm this is fixed, i.e. printing import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
df = pd.DataFrame(np.random.random((5, 4)), columns=list('ABCD'))
value_label = 'value'
group_label = 'Variable'
charts = []
for column in df.columns:
kdims = ['index']
vdims = hv.Dimension(column, label=value_label)
chart = hv.Curve(df, kdims, vdims)
charts.append((column, chart))
out = hv.NdOverlay(charts, group_label, sort=False)
out now displays:
|
The test suite ran fine (https://github.com/holoviz/hvplot/actions/runs/9448341262) with the latest dev version of HoloViews that includes holoviz/holoviews#6262. |
hvplot/converter.py
Outdated
chart = element(data, kdims, vdims).redim(**{c: self.value_label}) | ||
ydim = hv.Dimension(c, label=self.value_label) | ||
kdims, vdims = self._get_dimensions([x], [ydim]) | ||
chart = element(data, kdims, vdims) | ||
charts.append((c, chart.relabel(**self._relabel).redim(**self._redim))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe let's also skip relabel
and redim
if self._relabel
and self._redim
are empty respectively.
ready to merge? |
OP updated with a description of the changes made #1350 (comment). This is ready for review. |
Great work, all looked good to me and thanks for the new tests! Let's test this as much as possible, would love to see a new alpha release for that purpose. |
By avoiding calling
redim
on each element of anNdOverlay
which meant that every element had its own copy of the dataEdit: This PR bundles a few optimizations:
.redim
is no longer called the elements of anNdOverlay
so they can share the same data. This is demonstrated with the example below, thedata
wrapped by each element of theNdOverlay
is now the same object.326ee64: HoloViews added in Implement support for retaining Pandas index holoviews#6061 support for retaining
Pandas
indexes, which means that hvPlot no longer needs to callreset_index
in a few cases, avoiding creating internal copies of the data. This change was made in 326ee64, by skipping somereset_index
calls when the type of the dataset is a Pandas DataFrame (so far the Pandas Interface in HoloViews is the only one to have this support, so not Dask, not Geopandas, etc. which still require thereset_index
calls). The preceding commits add various tests to make sure all thereset_index
calls are at least called once in the unit tests. This commit adbddf7 updates the instantiation of aDataset
object with an explicit list of kdims instead letting HoloViews infer them, as HoloViews when given a Pandas DataFrame doesn't include its indexes in the list of automatically inferred kdims.abb485a: Only redim and/or relabel a HoloViews element when needed, as suggested in Optimize handling of wide datasets and Pandas indexes #1350 (comment) since apparently these operations return a clone even when not given any argument.
I hope these optimizations will help/fix this issue #501.
edit: closes #1292