Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]: Topic: SG Egograph (Batched vs. Single Vertex) #4191

Open
3 of 4 tasks
nv-rliu opened this issue Feb 23, 2024 · 1 comment
Open
3 of 4 tasks

[FEA]: Topic: SG Egograph (Batched vs. Single Vertex) #4191

nv-rliu opened this issue Feb 23, 2024 · 1 comment
Assignees
Labels
feature request New feature or request
Milestone

Comments

@nv-rliu
Copy link
Contributor

nv-rliu commented Feb 23, 2024

Is this a new feature, an improvement, or a change to existing functionality?

Change

How would you describe the priority of this feature request

High

Please provide a clear description of problem this feature solves

Upon closer examination of Egonet.py, it has been discovered that there are a few inconsistencies with the behavior of ego_graph and batched_ego_graph based on what the documentation says.

According to the docs, the ego_graph function is supposed to compute the induced sub-graph of neighbors centered at a single node, n, and return the sub-graph. The batched_ego_graph function (in congruence with its design in the legacy API) is supposed to accept a List of n values, or seeds, and compute each induced sub-graph in "batch" mode (parallel).

However, currently both Python functions just call the same PLC algorithm, which raises two concerns:

  1. The PLC function doesn't actually run ego_graph in "batch" mode like it's designed to
  2. The single ego_graph function can be called with multiple n values and returns a graph that doesn't specify the cut-offs (offsets).

Describe your ideal solution

The proposed solution for the Python API is to combined both functions into a single ego_graph function, which will recognize when it is given a single n or multiple seeds to compute.

Down the line, another parameter can be added which allows for proper "batch" mode, similar to how cugraph handles batched betweenness centrality.

def ego_graph(
    G,
    n,
    radius=1,
    center=True,
    undirected=None,
    distance=None,
    batch_mode=False
):

New Parameters
    ----------
    n : integer or List, cudf.Series, cudf.DataFrame
    < update docstring to specify that it can compute a single n or multiple seeds >

    batch_mode : Boolean
    Can be set to True or False

Returns
    ----------
A Graph or Edge Lists with Offsets

Tasks:

  • current batched_ego_graphs algorithm can be given a DeprecationWarning.
  • Add true batch-mode support to ego_graph

Describe any alternatives you have considered

The two functions could also remain the same, but then ego_graph would need to not allow multiple seeds in order to be differentiated from batched_ego_graph, aka, multiple ego_graph calls.

Additional context

This was discovered while looking into this bug on the MG implementation of ego_graph

Further context may be included in the discussion down below.

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
@nv-rliu nv-rliu added feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 23, 2024
@nv-rliu nv-rliu added this to the 24.04 milestone Feb 23, 2024
@BradReesWork BradReesWork removed the ? - Needs Triage Need team to review and classify label Feb 26, 2024
rapids-bot bot pushed a commit that referenced this issue Mar 7, 2024
This PR addresses #4191 

Since the plan is to add "batched" support to the regular `ego_graph` method, a `DeprecationWarning` has been added to `batched_ego_graphs`. 

--
Minor change: use proper indexing when accessing a pd.Series value.

Authors:
  - Ralph Liu (https://github.com/nv-rliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: #4209
@ChuckHastings
Copy link
Collaborator

ChuckHastings commented May 22, 2024

I'm not sure this analysis is correct - or I'm misinterpreting what your asserting.

Looking here:

A tuple of device arrays containing the sources, destinations, edge_weights

It appears to me that the PLC function ego_graph operates on multiple seeds and does return the offsets array that allows the decomposition of the resulting graphs into separate graphs.

I do see that the python code is ignoring the offsets value in the ego_graph call and using it in the deprecated batched_ego_graph call. I see several TODO items in the current python code that are describing some of the issues you see here.

The consequence of ignoring the offsets value in ego_graph is that if you pass multiple seeds to ego_graph, the result will be a single graph that is the composite of all of the resulting graphs, which is incorrect behavior.

So I believe this is entirely a python issue to resolve. Although I'll investigate further if we can identify a PLC call that is generating an answer other than what is expected/required to handle this correctly.

@BradReesWork BradReesWork modified the milestones: 24.04, 24.12 Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants