Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX]: MGPropertyGraph renumber_vertices_by_type fails when type is unspecified ('') #3058

Closed
2 tasks done
Tracked by #12 ...
alexbarghi-nv opened this issue Dec 7, 2022 · 0 comments · Fixed by rapidsai/cudf#12988 or #3352
Closed
2 tasks done
Tracked by #12 ...
Assignees
Labels
bug Something isn't working
Milestone

Comments

@alexbarghi-nv
Copy link
Member

Version

23.02

Which installation method(s) does this occur on?

Docker, Conda, Pip, Source

Describe the bug.

Called add_vertex_data without specifying type. Called renumber_vertices_by_type. An IndexError was thrown.

IndexError: string index out of range

Minimum reproducible example

pG = MGPropertyGraph()
    pG.add_edge_data(
        dask_cudf.from_cudf(
            cudf.DataFrame(
                {
                    "src": cupy.array([0, 0, 1, 2, 2, 3], dtype="int32"),
                    "dst": cupy.array([1, 2, 4, 3, 4, 1], dtype="int32"),
                }
            ),
            npartitions=2,
        ),
        vertex_col_names=["src", "dst"],
    )

    pG.add_vertex_data(
        dask_cudf.from_cudf(
            cudf.DataFrame(
                {
                    "prop1": [100, 200, 300, 400, 500],
                    "prop2": [5, 4, 3, 2, 1],
                    "id": cupy.array([0, 1, 2, 3, 4], dtype="int32"),
                }
            ),
            npartitions=2,
        ),
        vertex_col_name="id"
    )

    pG.renumber_vertices_by_type()


### Relevant log output

```shell
/opt/conda/envs/rapids/lib/python3.9/site-packages/cugraph/dask/structure/mg_property_graph.py:1313: in renumber_vertices_by_type
    df = df.reset_index().sort_values(by=TCN)
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/core.py:225: in sort_values
    df = sorting.sort_values(
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/sorting.py:272: in sort_values
    divisions = quantile_divisions(df, by, npartitions)
/opt/conda/envs/rapids/lib/python3.9/contextlib.py:79: in inner
    return func(*args, **kwds)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

df = <dask_cudf.DataFrame | 22 tasks | 2 npartitions>, by = ['_TYPE_'], npartitions = 2

    @_dask_cudf_nvtx_annotate
    def quantile_divisions(df, by, npartitions):
        qn = np.linspace(0.0, 1.0, npartitions + 1).tolist()
        divisions = _approximate_quantile(df[by], qn).compute()
        columns = divisions.columns
    
        # TODO: Make sure divisions are correct for all dtypes..
        if (
            len(columns) == 1
            and df[columns[0]].dtype != "object"
            and not is_categorical_dtype(df[columns[0]].dtype)
        ):
            dtype = df[columns[0]].dtype
            divisions = divisions[columns[0]].astype("int64")
            divisions.iloc[-1] += 1
            divisions = sorted(
                divisions.drop_duplicates().astype(dtype).to_arrow().tolist(),
                key=lambda x: (x is None, x),
            )
        else:
            for col in columns:
                dtype = df[col].dtype
                if dtype != "object":
                    divisions[col] = divisions[col].astype("int64")
                    divisions[col].iloc[-1] += 1
                    divisions[col] = divisions[col].astype(dtype)
                else:
                    divisions[col].iloc[-1] = chr(
>                       ord(divisions[col].iloc[-1][0]) + 1
                    )
E                   IndexError: string index out of range

/opt/conda/envs/rapids/lib/python3.9/site-packages/dask_cudf/sorting.py:222: IndexError


### Environment details

```shell
Standard environment

Other/Misc.

n/a

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@alexbarghi-nv alexbarghi-nv added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 7, 2022
@alexbarghi-nv alexbarghi-nv added Fix and removed bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 7, 2022
@alexbarghi-nv alexbarghi-nv added this to the 23.02 milestone Dec 7, 2022
@rlratzel rlratzel removed their assignment Jan 5, 2023
@BradReesWork BradReesWork modified the milestones: 23.02, 23.04 Jan 23, 2023
@kingmesal kingmesal added bug Something isn't working and removed Fix labels Feb 9, 2023
eriknw added a commit to eriknw/cugraph that referenced this issue Mar 22, 2023
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Mar 22, 2023
See test for simple MRE.

This fixes rapidsai/cugraph#3058

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #12988
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants