Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Tree from dask-awkward if not in dask #164

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

martindurant
Copy link
Collaborator

@martindurant martindurant commented Jan 27, 2025

Following the disappearance of DataframeTreeReduction in upstream dask, it was copied to dask-awkward, making it a required dependency in the future.

cc @lgray

@martindurant
Copy link
Collaborator Author

Appears to fail on some dataframe-specific functions:

FAILED tests/test_boost.py::test_histogramdd_series - NotImplementedError: The legacy implementation is no longer supported
FAILED tests/test_boost.py::test_histogramdd_arrays_and_series - NotImplementedError: The legacy implementation is no longer supported
FAILED tests/test_boost.py::test_histogramdd_dataframe - NotImplementedError: The legacy implementation is no longer supported
FAILED tests/test_core.py::test_df_input[True] - NotImplementedError: The legacy implementation is no longer supported
FAILED tests/test_core.py::test_df_input[None] - NotImplementedError: The legacy implementation is no longer supported

I didn't even realise we supported this. I suppose we drop support for now, as with dak.to_dataframe?

@martindurant
Copy link
Collaborator Author

martindurant commented Jan 27, 2025

I haven't yet figured out why dask thinks we are trying to use pre-expr, there must be a config set somewhere.

Locally, I am still seeing some failures, but the following without dask is mystifying me:

>>> x = np.random.standard_normal(size=(3_000,))
>>> h2 = bh.Histogram(bh.axis.Regular(10, -3, 3))
>>> h2.fill(x)
>>> h2.to_numpy(dd=True, flow=True)
(array([  5.,  17.,  79., 246., 456., 711., 666., 487., 241.,  70.,  20.,
          2.]),
 [array([1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308])])

why are the edges messed up?? The values look right. Is there some copy thing with numpy 2 I should care about?

^ this only happens for flow=True

@pfackeldey
Copy link

pfackeldey commented Jan 29, 2025

I haven't yet figured out why dask thinks we are trying to use pre-expr, there must be a config set somewhere.

Locally, I am still seeing some failures, but the following without dask is mystifying me:

>>> x = np.random.standard_normal(size=(3_000,))
>>> h2 = bh.Histogram(bh.axis.Regular(10, -3, 3))
>>> h2.fill(x)
>>> h2.to_numpy(dd=True, flow=True)
(array([  5.,  17.,  79., 246., 456., 711., 666., 487., 241.,  70.,  20.,
          2.]),
 [array([1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308, 1.79769313e+308, 1.79769313e+308, 1.79769313e+308,
         1.79769313e+308])])

why are the edges messed up?? The values look right. Is there some copy thing with numpy 2 I should care about?

^ this only happens for flow=True

I can not reproduce this boost-histogram (v1.5.0) issue with either numpy v2.0.0 or numpy v2.2.2 (latest). Maybe @henryiii has seen this behavior in the past?

@martindurant
Copy link
Collaborator Author

Updating boost-histogram to v1.5.0 fixed this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants