Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: vertex pair not properly shuffled #3001

Closed
2 tasks done
jnke2016 opened this issue Nov 30, 2022 · 0 comments · Fixed by #3002
Closed
2 tasks done

[BUG]: vertex pair not properly shuffled #3001

jnke2016 opened this issue Nov 30, 2022 · 0 comments · Fixed by #3002
Labels
bug Something isn't working
Milestone

Comments

@jnke2016
Copy link
Contributor

jnke2016 commented Nov 30, 2022

Version

22.12

Which installation method(s) does this occur on?

Docker, Conda, Pip, Source

Describe the bug.

The vertex pairs are not properly shuffled to the appropriate GPUs leading to undefined behaviors such as illegal memory accesses. This causes all the MG similarity algorithms (jaccard, sorensen, overlap) to fail at 8+ GPUs

Minimum reproducible example

Run the script below or any MG similarity algo test with 8+ GPUs

    setup_objs = setup()
    client = setup_objs[0]
    num_workers = len(client.scheduler_info()['workers'])

    df = karate.get_edgelist()
    
    ddf = dask_cudf.from_cudf(df, npartitions=num_workers)

    # Create MG Graph
    dg = cugraph.Graph(directed=False)
    dg.from_dask_cudf_edgelist(
        ddf, source='src', destination='dst',
        legacy_renum_only=True)

    # Get vertex_pair by computing the two_hop_neighbors
    vertex_pair = dg.get_two_hop_neighbors()
    vertex_pair = vertex_pair.compute.head()

    # Call jaccard
    df = dcg.jaccard(dg, vertex_pair)    

    teardown(*setup_objs)

Relevant log output

Exception: "RuntimeError('non-success value returned from cugraph_jaccard_coefficients: CUGRAPH_UNKNOWN_ERROR std::bad_alloc: out_of_memory: CUDA error at: /gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/include/rmm/mr/device/cuda_memory_resource.hpp')"

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@jnke2016 jnke2016 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Nov 30, 2022
@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Nov 30, 2022
@BradReesWork BradReesWork added this to the 22.12 milestone Nov 30, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 30, 2022
An illegal memory access occurs when running the MG similarity algos at certain scale. This is caused by vertex pairs not being shuffled appropriately.
This PR:
1. Shuffle the vertex pairs based on the edge partitioning
2. Update the the vertex pairs column names which are not necessarily edgelists
3. Update the docstrings, tests and notebooks accordingly

closes #3001

Authors:
  - Joseph Nke (https://github.com/jnke2016)
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Chuck Hastings (https://github.com/ChuckHastings)

URL: #3002
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants