Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] in polars-gpu, group_by(maintain_order=True) is not ordered #16893

Closed
KazukiOnodera opened this issue Sep 24, 2024 · 0 comments · Fixed by #16907
Closed

[BUG] in polars-gpu, group_by(maintain_order=True) is not ordered #16893

KazukiOnodera opened this issue Sep 24, 2024 · 0 comments · Fixed by #16907
Assignees
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars

Comments

@KazukiOnodera
Copy link

Steps/Code to reproduce bug

import polars as pl
import numpy as np

df = pl.DataFrame(
    {
        "random": np.random.rand(30_000),
        "groups": np.random.randint(100, size=30_000),
    }
)
df = df.lazy()
 
df.group_by("groups", maintain_order=True).agg(pl.col("random").sum()).collect(engine="gpu")

df.group_by("groups", maintain_order=True).agg(pl.col("random").sum()).collect()

Image

Expected behavior
The result should be same as cpu.

Environment details

  • cudf-cu12-24.8.3
  • cudf-polars-cu12-24.8.3
  • cupy-cuda12x-13.3.0
  • polars-1.8.1
  • rmm-cu12-24.8.2
@KazukiOnodera KazukiOnodera added the bug Something isn't working label Sep 24, 2024
@wence- wence- added the cudf.polars Issues specific to cudf.polars label Sep 24, 2024
wence- added a commit to wence-/cudf that referenced this issue Sep 25, 2024
When we are requested to maintain order in groupby aggregations we
must post-process the result by computing a permutation between the
wanted order (of the input keys) and the order returned by the groupby
aggregation. To do this, we can perform a join between the two unique
key tables. Previously, we assumed that the gather map returned in
this join for the left (wanted order) table was the identity. However,
this is not guaranteed, in addition to computing the match between the
wanted key order and the key order we have, we must also apply the
permutation between the left gather map order and the identity.

- Closes rapidsai#16893
@GPUtester GPUtester moved this from Todo to In Progress in cuDF Python Sep 25, 2024
@wence- wence- self-assigned this Sep 26, 2024
wence- added a commit to wence-/cudf that referenced this issue Sep 27, 2024
When we are requested to maintain order in groupby aggregations we
must post-process the result by computing a permutation between the
wanted order (of the input keys) and the order returned by the groupby
aggregation. To do this, we can perform a join between the two unique
key tables. Previously, we assumed that the gather map returned in
this join for the left (wanted order) table was the identity. However,
this is not guaranteed, in addition to computing the match between the
wanted key order and the key order we have, we must also apply the
permutation between the left gather map order and the identity.

- Closes rapidsai#16893
@rapids-bot rapids-bot bot closed this as completed in 2b6408b Sep 30, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF Python Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf.polars Issues specific to cudf.polars
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants