Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PyG Loaders by properly supporting multi_get_tensor #2860

Merged
merged 95 commits into from
Nov 17, 2022
Merged
Show file tree
Hide file tree
Changes from 78 commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
2a6c9cf
PropertyGraph set index to vertex and edge ids
eriknw Aug 9, 2022
4c93f77
Update graph_store
eriknw Aug 10, 2022
2631715
flake8
eriknw Aug 10, 2022
ff0f80c
Merge branch 'branch-22.10' into pg_set_index
eriknw Sep 7, 2022
99c2e0e
Set index to vertex or edge IDs in PG for MG
eriknw Sep 14, 2022
1a1e039
Merge branch 'pg_set_index' of https://github.com/eriknw/cugraph into…
alexbarghi-nv Sep 21, 2022
9bbf048
fixes
alexbarghi-nv Sep 21, 2022
4496cba
merge
alexbarghi-nv Oct 12, 2022
ccae80b
Fix concat with different index dtypes in SG PropertyGraph
eriknw Oct 13, 2022
824d083
initial
alexbarghi-nv Oct 17, 2022
5aaa90d
initial work on remote wrappers, very rough
alexbarghi-nv Oct 18, 2022
52fe830
merge resolution
alexbarghi-nv Oct 18, 2022
3221911
additional functionality, v/e counts
alexbarghi-nv Oct 18, 2022
f097043
copyright update
alexbarghi-nv Oct 18, 2022
7d33ed6
additional functions
alexbarghi-nv Oct 18, 2022
d14ae24
quick fix
alexbarghi-nv Oct 18, 2022
10bf725
Merge branch 'sgpg_fix_concat' of https://github.com/eriknw/cugraph i…
alexbarghi-nv Oct 18, 2022
1887ce7
add definition for remote graph, tests for pg
alexbarghi-nv Oct 18, 2022
f598dbe
remove dispatch (will be added in other pr)
alexbarghi-nv Oct 18, 2022
6b34f5b
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Oct 18, 2022
8495b70
revert inadvertently changed file
alexbarghi-nv Oct 18, 2022
7ffd777
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Oct 19, 2022
5089def
initial changes
alexbarghi-nv Oct 20, 2022
c157076
update version
alexbarghi-nv Oct 20, 2022
bc400ca
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Oct 20, 2022
ab3a28e
pull in dispatch from other branch
alexbarghi-nv Oct 20, 2022
438bfff
dispatch
alexbarghi-nv Oct 21, 2022
50fd0df
fix get_vertices(), add tests
alexbarghi-nv Oct 21, 2022
7fe9b0f
tests, fixes
alexbarghi-nv Oct 21, 2022
0a88a52
fix typo
alexbarghi-nv Oct 21, 2022
ae87b94
major changes to update output array/dataframe/tensor handling, unit/…
alexbarghi-nv Oct 25, 2022
ec44561
Merge branch 'cgs-remote-wrappers' of https://github.com/alexbarghi-n…
alexbarghi-nv Oct 25, 2022
c7d7112
fix merge conflict
alexbarghi-nv Oct 25, 2022
5703d41
fix version
alexbarghi-nv Oct 25, 2022
c8379ad
infer default backend
alexbarghi-nv Oct 25, 2022
3aed33d
fix default backend for remote pg
alexbarghi-nv Oct 25, 2022
ce12b47
reverse this commit
alexbarghi-nv Oct 25, 2022
092db5e
Revert "reverse this commit"
alexbarghi-nv Oct 25, 2022
e1a3c1f
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Oct 25, 2022
a66e437
remove useless code from pg, remove print statement
alexbarghi-nv Oct 26, 2022
a18b336
move backend call to methods, add graph() factory, update tests
alexbarghi-nv Oct 26, 2022
865ca44
fix version
alexbarghi-nv Oct 26, 2022
0975038
fix get vertex/edge data with types in cgs handler, minor raii fix, u…
alexbarghi-nv Oct 26, 2022
8f28820
fix version
alexbarghi-nv Oct 26, 2022
c8289f6
update branch
alexbarghi-nv Oct 26, 2022
03a1cf2
minor fix
alexbarghi-nv Oct 26, 2022
9b2ff76
add loader fix initial code
alexbarghi-nv Oct 28, 2022
e007684
fixes
alexbarghi-nv Nov 1, 2022
e1d4b84
Resolve merge conflict
alexbarghi-nv Nov 1, 2022
bf3df8a
cleanup, fixes for renumbering
alexbarghi-nv Nov 1, 2022
7931832
support for pg api
alexbarghi-nv Nov 1, 2022
668f95f
sampling, algo calls, implicit sg, fixes for multigraph
alexbarghi-nv Nov 2, 2022
3fc56be
fix version
alexbarghi-nv Nov 2, 2022
4955f90
remove print statements
alexbarghi-nv Nov 2, 2022
5b2a917
Merge branch 'cgs-remote-sample' into loader_fix
alexbarghi-nv Nov 2, 2022
90f700f
resolve merge conflict
alexbarghi-nv Nov 7, 2022
c96be0a
fix version
alexbarghi-nv Nov 8, 2022
8dc069e
fix version
alexbarghi-nv Nov 9, 2022
64b7d82
rename columns
alexbarghi-nv Nov 9, 2022
be41c53
switch to import_optional
alexbarghi-nv Nov 9, 2022
8462a24
minor cleanup
alexbarghi-nv Nov 9, 2022
53020e2
prevent copy in numpy to numpy conversion
alexbarghi-nv Nov 9, 2022
ddfb89d
is_mg -> is_multi_gpu
alexbarghi-nv Nov 9, 2022
2548efd
point to new issue
alexbarghi-nv Nov 9, 2022
09a5be4
point to new issue
alexbarghi-nv Nov 9, 2022
260ab2e
Merge branch 'cgs-remote-sample' into loader_fix
alexbarghi-nv Nov 9, 2022
24f9c85
add fillna to property graph
alexbarghi-nv Nov 9, 2022
9716f3f
fix notebook
alexbarghi-nv Nov 9, 2022
261eb29
remove include code
alexbarghi-nv Nov 9, 2022
c6d49ba
update version
alexbarghi-nv Nov 9, 2022
091c958
test, doc updates
alexbarghi-nv Nov 9, 2022
37fc74d
add different check for sg/mg
alexbarghi-nv Nov 9, 2022
29d51f0
update is_mg calls
alexbarghi-nv Nov 9, 2022
3490c26
fix version
alexbarghi-nv Nov 9, 2022
8f8470e
Merge branch 'branch-22.12' into loader_fix
alexbarghi-nv Nov 9, 2022
43b2c5e
resolve conflict
alexbarghi-nv Nov 10, 2022
2fd134a
split fillna, remove 'inplace'
alexbarghi-nv Nov 14, 2022
e5c54d7
remove unwanted files
alexbarghi-nv Nov 14, 2022
d726c70
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Nov 15, 2022
ac300c5
restore updated comments
alexbarghi-nv Nov 15, 2022
780fe9a
remove unwanted files
alexbarghi-nv Nov 15, 2022
bd3ee5a
formatting cleanup
alexbarghi-nv Nov 15, 2022
733b557
update the pyg extension tests
alexbarghi-nv Nov 15, 2022
e35fb25
test fix
alexbarghi-nv Nov 15, 2022
f0aebcc
update pyg tests
alexbarghi-nv Nov 15, 2022
fe59a9b
update notebook to use new fillna
alexbarghi-nv Nov 15, 2022
6f0f585
remove unwanted file
alexbarghi-nv Nov 15, 2022
0a2bf88
Merge branch 'branch-22.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Nov 15, 2022
1e5c015
clarify in docstring that general Series is accepted
alexbarghi-nv Nov 16, 2022
cdd3d47
clarify in docstring general Series accepted
alexbarghi-nv Nov 16, 2022
c576d7a
clean up formatting in test_property_graph
alexbarghi-nv Nov 16, 2022
d34a1c8
clean up formatting in test_mg_property_graph
alexbarghi-nv Nov 16, 2022
cdf836e
formatting fix for test_property_graph
alexbarghi-nv Nov 16, 2022
2d8cc1b
reformat
alexbarghi-nv Nov 16, 2022
2e06c04
fix column name issue
alexbarghi-nv Nov 17, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 27 additions & 9 deletions notebooks/gnn/pyg_hetero_mag.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"import rmm\n",
"\n",
"rmm.reinitialize(pool_allocator=True,initial_pool_size=5e+9, maximum_pool_size=20e+9)"
Expand Down Expand Up @@ -86,7 +85,7 @@
" last_offset += num_nodes\n",
" \n",
" blank_df = cudf.DataFrame({'id':range(vertex_offsets[node_type], vertex_offsets[node_type] + num_nodes)})\n",
" blank_df.id = blank_df.id.astype('int32')\n",
" blank_df.id = blank_df.id.astype('int64')\n",
" if isinstance(pG, MGPropertyGraph):\n",
" blank_df = dask_cudf.from_cudf(blank_df, npartitions=2)\n",
" pG.add_vertex_data(blank_df, vertex_col_name='id', type_name=node_type)\n",
Expand All @@ -113,11 +112,14 @@
" feature_df = cudf.DataFrame(node_features)\n",
" feature_df.columns = [str(c) for c in range(feature_df.shape[1])]\n",
" feature_df['id'] = range(vertex_offset, vertex_offset + node_features.shape[0])\n",
" feature_df.id = feature_df.id.astype('int32')\n",
" feature_df.id = feature_df.id.astype('int64')\n",
" if isinstance(pG, MGPropertyGraph):\n",
" feature_df = dask_cudf.from_cudf(feature_df, npartitions=2)\n",
"\n",
" pG.add_vertex_data(feature_df, vertex_col_name='id', type_name=node_type)"
" pG.add_vertex_data(feature_df, vertex_col_name='id', type_name=node_type)\n",
"\n",
"# Fill in an empty value for vertices without properties.\n",
"pG.fillna(0.0)"
alexbarghi-nv marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand All @@ -141,8 +143,8 @@
" eidx = [n + vertex_offset_src for n in eidx[0]], [n + vertex_offset_dst for n in eidx[1]]\n",
"\n",
" edge_df = cudf.DataFrame({'src':eidx[0], 'dst':eidx[1]})\n",
" edge_df.src = edge_df.src.astype('int32')\n",
" edge_df.dst = edge_df.dst.astype('int32')\n",
" edge_df.src = edge_df.src.astype('int64')\n",
" edge_df.dst = edge_df.dst.astype('int64')\n",
" edge_df['type'] = edge_type\n",
" if isinstance(pG, MGPropertyGraph):\n",
" edge_df = dask_cudf.from_cudf(edge_df, npartitions=2)\n",
Expand All @@ -167,7 +169,7 @@
"source": [
"y_df = cudf.DataFrame(data[1]['paper'], columns=['y'])\n",
"y_df['id'] = range(vertex_offsets['paper'], vertex_offsets['paper'] + len(y_df))\n",
"y_df.id = y_df.id.astype('int32')\n",
"y_df.id = y_df.id.astype('int64')\n",
"if isinstance(pG, MGPropertyGraph):\n",
" y_df = dask_cudf.from_cudf(y_df, npartitions=2)\n",
"\n",
Expand Down Expand Up @@ -219,15 +221,15 @@
" shuffle=True,\n",
" batch_size=50,\n",
" node_sampler=sampler,\n",
" input_nodes='author'\n",
" input_nodes=('author', graph_store.get_vertex_index('author'))\n",
")\n",
"\n",
"test_loader = NodeLoader(\n",
" data=(feature_store, graph_store),\n",
" shuffle=True,\n",
" batch_size=50,\n",
" node_sampler=sampler,\n",
" input_nodes='author'\n",
" input_nodes=('author', graph_store.get_vertex_index('author'))\n",
")\n"
]
},
Expand All @@ -238,6 +240,15 @@
"### Create the Network"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%timeit next(iter(loader))"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand Down Expand Up @@ -357,6 +368,13 @@
" train_acc = test()\n",
" print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_acc:.4f}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,11 +104,11 @@ def uniform_neighbor_sample(
start_list = [start_list]

if isinstance(start_list, list):
start_list = cudf.Series(start_list, dtype="int32")
# FIXME: ensure other sequence types (eg. cudf Series) can be handled.
if start_list.dtype != "int32":
raise ValueError(
f"'start_list' must have int32 values, " f"got: {start_list.dtype}"
start_list = cudf.Series(
start_list,
dtype=input_graph.edgelist.edgelist_df[
input_graph.renumber_map.renumbered_src_col_name
].dtype,
)

# fanout_vals must be a host array!
Expand Down
38 changes: 38 additions & 0 deletions python/cugraph/cugraph/dask/structure/mg_property_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,37 @@ def get_edge_data(self, edge_ids=None, types=None, columns=None):

return None

def fillna_vertices(self, val=0):
"""
Fills empty vertex property values with the given value, zero by default.
Fills in-place.

Parameters
----------
val : object, cudf.Series, or dict
The object that will replace "na". Default = 0. If a dict or
Series is passed, the index or keys are the columns to fill
and the values are the fill value for the corresponding column.
"""
self.__vertex_prop_dataframe = self.__vertex_prop_dataframe.fillna(
val
).persist()

def fillna_edges(self, val=0):
"""
Fills empty edge property values with the given value, zero by default.
Fills in-place.

Parameters
----------
val : object, cudf.Series, or dict
The object that will replace "na". Default = 0. If a dict or
Series is passed, the index or keys are the columns to fill
and the values are the fill value for the corresponding column.
"""

self.__edge_prop_dataframe = self.__edge_prop_dataframe.fillna(val).persist()

def select_vertices(self, expr, from_previous_selection=None):
raise NotImplementedError

Expand Down Expand Up @@ -1167,6 +1198,13 @@ def renumber_edges_by_type(self):
rv["stop"] -= 1 # Make inclusive
return rv[["start", "stop"]]

def is_multi_gpu(self):
"""
Return True if this is a multi-gpu graph. Always returns True for
MGPropertyGraph.
"""
return True

@classmethod
def is_multigraph(cls, df):
"""
Expand Down
Loading