Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update egonet implementation #2874

Merged
merged 76 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
725a8a6
move egonet back from legacy directory, the original implementation i…
ChuckHastings Oct 28, 2022
7c7bad3
some simple edits
ChuckHastings Oct 28, 2022
155fd24
Finished coding C API for egonet
ChuckHastings Oct 31, 2022
adada88
rename variables appropriately
jnke2016 Nov 1, 2022
7194bdc
define the 'vertex_pairs' and 'induced_subgraph' methods by leveragin…
jnke2016 Nov 1, 2022
e3059b3
define 'extract_ego' function in the PLC API by leveraging the CAPI
jnke2016 Nov 1, 2022
8647fd1
add PLC implementation of 'egonet'
jnke2016 Nov 1, 2022
11ab13c
update docstrings
jnke2016 Nov 2, 2022
5de024b
merge latest cugraph changes
jnke2016 Nov 12, 2022
2fdf310
undo changes to community_algorithms header file
jnke2016 Nov 12, 2022
61cdfcf
fix merge conflict
jnke2016 Nov 12, 2022
e7cdfa7
remove legacy egonet
jnke2016 Nov 14, 2022
770a461
temporarily skip batch_ego_graphs tests to debug
jnke2016 Nov 14, 2022
5b8e438
update python implementation of egonet
jnke2016 Nov 14, 2022
3413a7b
add helper type and function
jnke2016 Nov 14, 2022
95b7da5
temporarily cast the vertex dtype
jnke2016 Nov 14, 2022
b74179f
temporarily cast the vertex dtype
jnke2016 Nov 14, 2022
64d70e5
leverage helper functions to debug
jnke2016 Nov 14, 2022
181be1f
add temporary helper functions to debug
jnke2016 Nov 14, 2022
d137116
fix style
jnke2016 Nov 14, 2022
181ba6d
add mg egonet
jnke2016 Nov 14, 2022
31e8d4b
return the offsets array even if unused
jnke2016 Nov 14, 2022
5fb853d
add MG implementation of egonet
jnke2016 Nov 14, 2022
2dab871
fix style
jnke2016 Nov 14, 2022
f70ca2c
undo change to copyright
jnke2016 Nov 14, 2022
6178cd6
update docstrings
jnke2016 Nov 14, 2022
dd15a81
undo changes to tests
jnke2016 Nov 14, 2022
9cee9d3
update docstrings
jnke2016 Nov 14, 2022
b5111d6
remove unused datasets object
jnke2016 Nov 14, 2022
3d5dc37
undo changes to nx_factory
jnke2016 Nov 14, 2022
420777f
fix type return for weight
jnke2016 Nov 14, 2022
13eaf42
remove unnecessary checks
jnke2016 Nov 14, 2022
54f3499
match the seeds to the vertex type
jnke2016 Nov 14, 2022
89ca664
enable tests that were previously skipped
jnke2016 Nov 14, 2022
6a311e2
add dtype for offsets array
jnke2016 Nov 14, 2022
c69db7c
remove debug prints, remove temporary function calls
jnke2016 Nov 14, 2022
e6c1a3f
remove temporary debug functions, undo changes to these files
jnke2016 Nov 14, 2022
166acd1
remove debug print, match seeds to vertex dtype
jnke2016 Nov 14, 2022
b6d31f4
import missing function
jnke2016 Nov 14, 2022
8e054db
remove redundant deallocation of pointer
jnke2016 Nov 14, 2022
d97130e
map appropriate python type to offsets ctype
jnke2016 Nov 14, 2022
68bb735
fix typo
jnke2016 Nov 14, 2022
6af2299
update docstrings example
jnke2016 Nov 14, 2022
6f97292
fix style
jnke2016 Nov 14, 2022
cbaa850
fix merge conflict
jnke2016 Nov 14, 2022
5f40b3b
update docstrings, support directed graphs
jnke2016 Nov 14, 2022
7175e48
update docstrings, remove unused input parameters, support directed g…
jnke2016 Nov 14, 2022
779bf4f
fix typo
jnke2016 Nov 14, 2022
09efec3
debugging MG egonet issues
ChuckHastings Nov 15, 2022
8d45eab
Merge remote-tracking branch 'upstream/debug_egonet' into branch-22.1…
jnke2016 Nov 15, 2022
ab01f33
add an mg python implementation of egonet leveraging PLC
jnke2016 Nov 16, 2022
f921185
add python tests for mg egonet
jnke2016 Nov 16, 2022
b659eb5
add helper function, update docstrings
jnke2016 Nov 16, 2022
19c4a70
update comments
jnke2016 Nov 16, 2022
4622b4b
fix style
jnke2016 Nov 16, 2022
293b7c7
fix typo
jnke2016 Nov 16, 2022
84772b2
fix typo
jnke2016 Nov 16, 2022
456a0c3
Update build.sh
galipremsagar Nov 16, 2022
bb3542e
Update build.sh
galipremsagar Nov 16, 2022
66eb4ef
update env variable
jnke2016 Nov 16, 2022
66e19d1
undo changes
jnke2016 Nov 16, 2022
379c3bc
Merge remote-tracking branch 'upstream/patch-2' into branch-22.12_fea…
jnke2016 Nov 16, 2022
8b40446
fix destination dtype
jnke2016 Nov 16, 2022
05cca2a
update docstrings, return appropriate error message
jnke2016 Nov 16, 2022
4a3fb60
skip MG egonet tests on single GPU
jnke2016 Nov 16, 2022
c738d25
add fixme
jnke2016 Nov 16, 2022
1fdadfa
fix style
jnke2016 Nov 16, 2022
ec063a9
Merge remote-tracking branch 'upstream/branch-22.12' into branch-22.1…
jnke2016 Nov 16, 2022
556dbb7
simplify type check
jnke2016 Nov 17, 2022
9984802
simplify type check
jnke2016 Nov 17, 2022
167c8ed
fix typo
jnke2016 Nov 17, 2022
4cc3ff9
refactor consolidation function
jnke2016 Nov 17, 2022
9f8d284
reset index
jnke2016 Nov 17, 2022
eed37f2
remove unsued import and variable
jnke2016 Nov 17, 2022
2f6a5bf
undo changes to docstrings and type check
jnke2016 Nov 17, 2022
95fbd46
fix style
jnke2016 Nov 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cpp/src/c_api/extract_ego.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ struct extract_ego_functor : public cugraph::c_api::abstract_functor {

result_ = new cugraph::c_api::cugraph_induced_subgraph_result_t{
new cugraph::c_api::cugraph_type_erased_device_array_t(src, graph_->vertex_type_),
new cugraph::c_api::cugraph_type_erased_device_array_t(dst, graph_->edge_type_),
new cugraph::c_api::cugraph_type_erased_device_array_t(dst, graph_->vertex_type_),
wgt ? new cugraph::c_api::cugraph_type_erased_device_array_t(*wgt, graph_->weight_type_)
: NULL,
new cugraph::c_api::cugraph_type_erased_device_array_t(edge_offsets,
Expand Down
32 changes: 18 additions & 14 deletions python/cugraph/cugraph/community/egonet.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ def ego_graph(G, n, radius=1, center=True, undirected=None, distance=None):
information. Edge weights, if present, should be single or double
precision floating point values.

n : integer or cudf.DataFrame
n : integer or list, cudf.Series, cudf.DataFrame
A single node as integer or a cudf.DataFrame if nodes are
represented with multiple columns. If a cudf.DataFrame is provided,
only the first row is taken as the node input.
Expand All @@ -73,10 +73,10 @@ def ego_graph(G, n, radius=1, center=True, undirected=None, distance=None):
Defaults to True. False is not supported

undirected: bool, optional
This parameter is here for NetworkX compatibility and ignored
This parameter is here for NetworkX compatibility and is ignored

distance: key, optional (default=None)
This parameter is here for NetworkX compatibility and ignored
This parameter is here for NetworkX compatibility and is ignored

Returns
-------
Expand All @@ -93,7 +93,7 @@ def ego_graph(G, n, radius=1, center=True, undirected=None, distance=None):
"""
(G, input_type) = ensure_cugraph_obj(G, nx_weight_attr="weight")

result_graph = type(G)()
result_graph = type(G)(directed=G.is_directed())

if undirected is not None:
warning_msg = (
Expand All @@ -102,17 +102,21 @@ def ego_graph(G, n, radius=1, center=True, undirected=None, distance=None):
)
warnings.warn(warning_msg, PendingDeprecationWarning)

if n is not None:
if isinstance(n, int):
n = [n]
if isinstance(n, list):
n = cudf.Series(n)

if isinstance(n, int):
n = [n]
if isinstance(n, list):
n = cudf.Series(n)
if isinstance(n, cudf.Series):
if G.renumbered is True:
if isinstance(n, cudf.DataFrame):
n = G.lookup_internal_vertex_id(n, n.columns)
else:
n = G.lookup_internal_vertex_id(n)
n = G.lookup_internal_vertex_id(n)
elif isinstance(n, cudf.DataFrame):
if G.renumbered is True:
n = G.lookup_internal_vertex_id(n, n.columns)
else:
raise TypeError(
f"'n' must be either an integer or a list or a cudf.Series"
f" or a cudf.DataFrame, got: {type(n)}"
)

# Match the seed to the vertex dtype
n_type = G.edgelist.edgelist_df["src"].dtype
Expand Down
51 changes: 33 additions & 18 deletions python/cugraph/cugraph/dask/community/egonet.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,17 @@ def _call_ego_graph(


def consolidate_results(ddf, offsets, num_seeds):
rlratzel marked this conversation as resolved.
Show resolved Hide resolved
"""
Each rank returns its ego_graph dataframe with its corresponding
offsets array. This is ideal if the user operates on distributed memory
but when attempting to bring the result into a single machine,
the ego_graph dataframes generated from each seed cannot be extracted
using the offsets array. This function consolidate the final result by
performing segmented copies.

Returns: consolidated ego_graph dataframe and offsets array
"""

df = cudf.DataFrame()
offset_array = [0]
for s in range(num_seeds):
Expand Down Expand Up @@ -94,7 +105,7 @@ def ego_graph(input_graph, n, radius=1, center=True):
information. Edge weights, if present, should be single or double
precision floating point values.

n : int, list or cudf.Series, cudf.DataFrame
n : int, list or cudf Series or Dataframe, dask_cudf Series or DataFrame
A node or a list or cudf.Series of nodes or a cudf.DataFrame if nodes
are represented with multiple columns. If a cudf.DataFrame is provided,
only the first row is taken as the node input.
Expand All @@ -119,25 +130,29 @@ def ego_graph(input_graph, n, radius=1, center=True):
# Initialize dask client
client = input_graph._client

if n is not None:
if isinstance(n, int):
n = [n]
if isinstance(n, list):
n = cudf.Series(n)
if not isinstance(n, cudf.Series):
if isinstance(n, int):
n = [n]
if isinstance(n, list):
n = cudf.Series(n)
elif not isinstance(n, (cudf.Series, dask_cudf.Series)):
if not isinstance(n, (cudf.DataFrame, dask_cudf.DataFrame)):
rlratzel marked this conversation as resolved.
Show resolved Hide resolved
raise TypeError(
f"'n' must be either a list or a cudf.Series," f"got: {n.dtype}"
f"'n' must be either an integer or a list or a "
f"cudf or dask_cudf Series or DataFrame, got: {type(n)}"
)
num_seeds = len(n)
# n uses "external" vertex IDs, but since the graph has been
# renumbered, the node ID must also be renumbered.
if input_graph.renumbered:
n = input_graph.lookup_internal_vertex_id(n).compute()
n_type = input_graph.edgelist.edgelist_df.dtypes[0]
else:
n_type = input_graph.input_df.dtypes[0]

n = dask_cudf.from_cudf(n, npartitions=min(input_graph._npartitions, len(n)))

num_seeds = len(n)
# n uses "external" vertex IDs, but since the graph has been
# renumbered, the node ID must also be renumbered.
if input_graph.renumbered:
n = input_graph.lookup_internal_vertex_id(n)
n_type = input_graph.edgelist.edgelist_df.dtypes[0]
else:
n_type = input_graph.input_df.dtypes[0]

if isinstance(n, (cudf.Series, cudf.DataFrame)):
n = dask_cudf.from_cudf(n, npartitions=min(input_graph._npartitions, len(n)))

n = n.astype(n_type)

n = get_distributed_data(n)
Expand Down
5 changes: 2 additions & 3 deletions python/cugraph/cugraph/tests/mg/test_mg_egonet.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import pytest
import cugraph
import dask_cudf
from cugraph.dask.common.mg_utils import is_single_gpu

# from cugraph.dask.common.mg_utils import is_single_gpu
from cugraph.testing import utils
Expand Down Expand Up @@ -113,9 +114,7 @@ def input_expected_output(input_combo):
# =============================================================================


# @pytest.mark.skipif(
# is_single_gpu(), reason="skipping MG testing on Single GPU system"
# )
@pytest.mark.skipif(is_single_gpu(), reason="skipping MG testing on Single GPU system")
def test_dask_ego_graphs(dask_client, benchmark, input_expected_output):

dg = input_expected_output["MGGraph"]
Expand Down
6 changes: 2 additions & 4 deletions python/pylibcugraph/pylibcugraph/egonet.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ def ego_graph(ResourceHandle resource_handle,
radius: size_t
The number of hops to go out from each source vertex


do_expensive_check : bool_t
If True, performs more extensive tests on the inputs to ensure
validitity, at the expense of increased run time.
Expand Down Expand Up @@ -153,15 +152,14 @@ def ego_graph(ResourceHandle resource_handle,
cdef cugraph_type_erased_device_array_view_t* subgraph_offsets_ptr = \
cugraph_induced_subgraph_get_subgraph_offsets(result_ptr)

# FIXME: Get ownership of the result data instead of performing a copy
# for perfomance improvement
cupy_sources = copy_to_cupy_array(
jnke2016 marked this conversation as resolved.
Show resolved Hide resolved
c_resource_handle_ptr, sources_ptr)

cupy_destinations = copy_to_cupy_array(
c_resource_handle_ptr, destinations_ptr)

cupy_edge_weights = copy_to_cupy_array(
c_resource_handle_ptr, edge_weights_ptr)

cupy_subgraph_offsets = copy_to_cupy_array(
c_resource_handle_ptr, subgraph_offsets_ptr)

Expand Down