-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Subgraph extraction python bindings #246
Changes from 6 commits
ec780fb
ebb956a
19cd365
7364d1f
1758c49
6412436
8aa8e92
a8a4f64
624e5c8
19b593d
9c89346
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -144,15 +144,15 @@ gdf_error gdf_createGraph_nvgraph(nvgraphHandle_t nvg_handle, | |
0, | ||
settype, | ||
(float * ) gdf_G->transposedAdjList->edge_data->data)) | ||
break; | ||
break; | ||
case GDF_FLOAT64: | ||
settype = CUDA_R_64F; | ||
NVG_TRY(nvgraphAttachEdgeData(nvg_handle, | ||
*nvgraph_G, | ||
0, | ||
settype, | ||
(double * ) gdf_G->transposedAdjList->edge_data->data)) | ||
break; | ||
break; | ||
default: | ||
return GDF_UNSUPPORTED_DTYPE; | ||
} | ||
|
@@ -183,15 +183,15 @@ gdf_error gdf_createGraph_nvgraph(nvgraphHandle_t nvg_handle, | |
0, | ||
settype, | ||
(float * ) gdf_G->adjList->edge_data->data)) | ||
break; | ||
break; | ||
case GDF_FLOAT64: | ||
settype = CUDA_R_64F; | ||
NVG_TRY(nvgraphAttachEdgeData(nvg_handle, | ||
*nvgraph_G, | ||
0, | ||
settype, | ||
(double * ) gdf_G->adjList->edge_data->data)) | ||
break; | ||
break; | ||
default: | ||
return GDF_UNSUPPORTED_DTYPE; | ||
} | ||
|
@@ -515,6 +515,69 @@ gdf_error gdf_AnalyzeClustering_ratio_cut_nvgraph(gdf_graph* gdf_G, | |
return GDF_SUCCESS; | ||
} | ||
|
||
|
||
gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G, | ||
gdf_graph* result, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indentation is weird, here. The input parameter list should be updated to match the suggested changes in the header file. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Converted tabs to spaces in the file. Should address the weird indentation. |
||
gdf_column* vertices) { | ||
GDF_REQUIRE(gdf_G != nullptr, GDF_INVALID_API_CALL); | ||
GDF_REQUIRE((gdf_G->adjList != nullptr) || (gdf_G->edgeList != nullptr), GDF_INVALID_API_CALL); | ||
GDF_REQUIRE(vertices != nullptr, GDF_INVALID_API_CALL); | ||
GDF_REQUIRE(vertices->data != nullptr, GDF_INVALID_API_CALL); | ||
GDF_REQUIRE(!vertices->valid, GDF_VALIDITY_UNSUPPORTED); | ||
|
||
// Initialize Nvgraph and wrap the graph | ||
nvgraphHandle_t nvg_handle = nullptr; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and is there really a reason we are mixing nv_, nvg_, and nvgraph_ in variable names (e.g. nv_result, nvg_handle, and nvgraph_G)? I am guessing all three are for nvgraph data type variables. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed everything to use nvg_ |
||
nvgraphGraphDescr_t nvgraph_G = nullptr; | ||
NVG_TRY(nvgraphCreate(&nvg_handle)); | ||
GDF_TRY(gdf_createGraph_nvgraph(nvg_handle, gdf_G, &nvgraph_G, false)); | ||
|
||
// Create an Nvgraph graph descriptor for the result and initialize | ||
nvgraphGraphDescr_t nv_result = nullptr; | ||
NVG_TRY(nvgraphCreateGraphDescr(nvg_handle, &nv_result)); | ||
|
||
// Call Nvgraph function to get subgraph (into nv_result descriptor) | ||
NVG_TRY(nvgraphExtractSubgraphByVertex( nvg_handle, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to complain about this trivial thing... but in function calls, cugraph code base (mostly) does not place space after opening parenthesis. So better be "(nvg_handle" instead of "( nvg_handle" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed. |
||
nvgraph_G, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Weird indentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. weird indentation |
||
nv_result, | ||
(int*)vertices->data, | ||
vertices->size)); | ||
|
||
// Get the vertices and edges of the created subgraph to allocate memory: | ||
nvgraphCSRTopology32I_st topo; | ||
topo.source_offsets = nullptr; | ||
topo.destination_indices = nullptr; | ||
nvgraphTopologyType_t TT = NVGRAPH_CSR_32; | ||
NVG_TRY(nvgraphGetGraphStructure(nvg_handle, nv_result, (void*)&topo, &TT)); | ||
if (TT != NVGRAPH_CSR_32) | ||
return GDF_C_ERROR; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And this is not a request to make change, but it seems like you're using tab for indentation. cugraph mostly uses 2 space or 4 space for indentation in C++. We really need a rule for this, but right at this moment, I am not sure I should ask you to use either 2 space or 4 space... so you may just keep using tab or switch to 2 space or 4 space :-( |
||
int num_verts = topo.nvertices; | ||
int num_edges = topo.nedges; | ||
result->adjList = new gdf_adj_list; | ||
result->adjList->offsets = new gdf_column; | ||
result->adjList->indices = new gdf_column; | ||
result->adjList->ownership = 0; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If ownership is set to 0, who will free the memory allocated in line 560 and 561? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Output of the analytic is a graph. This space is owned by the graph, the user will need to delete the graph when they are done with it. |
||
int *offsets, *indices; | ||
CUDA_TRY(cudaMallocManaged((void**)&offsets, sizeof(int) * (num_verts + 1))); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not directly invoke cudaMalloc/cudaMallocManaged. We are allocating memory through RMM, so we need to use RMM_ALLOC instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to use RMM |
||
CUDA_TRY(cudaMallocManaged((void**)&indices, sizeof(int) * num_edges)); | ||
gdf_column_view(result->adjList->offsets, | ||
offsets, | ||
nullptr, | ||
num_verts + 1, | ||
GDF_INT32); | ||
gdf_column_view(result->adjList->indices, | ||
indices, | ||
nullptr, | ||
num_edges, | ||
GDF_INT32); | ||
|
||
// Call nvgraphGetGraphStructure again to copy out the data | ||
topo.source_offsets = (int*)result->adjList->offsets->data; | ||
topo.destination_indices = (int*)result->adjList->indices->data; | ||
NVG_TRY(nvgraphGetGraphStructure(nvg_handle, nv_result, (void*)&topo, &TT)); | ||
|
||
return GDF_SUCCESS; | ||
} | ||
|
||
gdf_error gdf_triangle_count_nvgraph(gdf_graph* G, uint64_t* result) { | ||
GDF_REQUIRE(G != nullptr, GDF_INVALID_API_CALL); | ||
GDF_REQUIRE((G->adjList != nullptr) || (G->edgeList != nullptr), GDF_INVALID_API_CALL); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Copyright (c) 2019, NVIDIA CORPORATION. | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from c_nvgraph cimport * | ||
from c_graph cimport * | ||
from libcpp cimport bool | ||
from libc.stdint cimport uintptr_t | ||
from libc.stdlib cimport calloc, malloc, free | ||
from libc.float cimport FLT_MAX_EXP | ||
import cudf | ||
from librmm_cffi import librmm as rmm | ||
import numpy as np | ||
|
||
cpdef subgraph(G, vertices): | ||
""" | ||
Compute a clustering/partitioning of the given graph using the spectral balanced | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. needs to be updated to reflect subgraph extraction |
||
cut method. | ||
|
||
Parameters | ||
---------- | ||
G : cuGraph.Graph | ||
cuGraph graph descriptor | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it work both on directed and undirected graphs? We may re-use nvgraph doc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it works for both directed and undirected graphs. It doesn't actually traverse the edges, so direction is immaterial. |
||
vertices : cudf.Series | ||
Specifies the vertices of the induced subgraph | ||
|
||
Returns | ||
------- | ||
Sg : cuGraph.Graph | ||
A graph object containing the subgraph induced by the given vertex set. | ||
|
||
Example: | ||
-------- | ||
>>> M = ReadMtxFile(graph_file) | ||
>>> sources = cudf.Series(M.row) | ||
>>> destinations = cudf.Series(M.col) | ||
>>> G = cuGraph.Graph() | ||
>>> G.add_edge_list(sources,destinations,None) | ||
>>> verts = numpy.zeros(3, dtype=np.int32) | ||
>>> verts[0] = 0 | ||
>>> verts[1] = 1 | ||
>>> verts[2] = 2 | ||
>>> sverts = cudf.Series(verts) | ||
>>> Sg = cuGraph.subgraph(G, sverts) | ||
""" | ||
|
||
cdef uintptr_t graph = G.graph_ptr | ||
cdef gdf_graph * g = < gdf_graph *> graph | ||
|
||
resultGraph = Graph() | ||
cdef uintptr_t rGraph = resultGraph.graph_ptr | ||
cdef gdf_graph* rg = <gdf_graph*>rGraph | ||
|
||
cdef gdf_column vert_col = get_gdf_column_view(vertices) | ||
|
||
err = gdf_extract_subgraph_vertex_nvgraph(g, rg, &vert_col) | ||
cudf.bindings.cudf_cpp.check_gdf_error(err) | ||
|
||
return resultGraph |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Copyright (c) 2019, NVIDIA CORPORATION. | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import cugraph | ||
import cudf | ||
import pytest | ||
import numpy as np | ||
import networkx as nx | ||
from scipy.io import mmread | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless you have a strong reason to import in this order, we may better follow import orders in other files. Import python standard modules first, then thirdparty modules, then rapids modules. We may alphabetically sort within a group. And if NetworkX folks hasn't fixed this, importing networkx generates deprecation warnings and to avoid this, we need to do # Temporarily suppress warnings till networkX fixes deprecation warnings
# (Using or importing the ABCs from 'collections' instead of from
# 'collections.abc' is deprecated, and in 3.8 it will stop working) for
# python 3.7. Also, this import networkx needs to be relocated in the
# third-party group once this gets fixed.
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=DeprecationWarning)
import networkx as nx See other test files (e.g. test_sssp.py) for an example. |
||
def compareEdges(cg, nxg, verts): | ||
src, dest = cg.view_edge_list() | ||
if (len(src) != nxg.size()): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not
? Similarly in line 27 & 28,
Looks clearer to me. |
||
assert False | ||
for i in range(len(src)): | ||
if not nxg.has_edge(verts[src[i]], verts[dest[i]]): | ||
assert False | ||
return True | ||
|
||
def ReadMtxFile(mmFile): | ||
print('Reading '+ str(mmFile) + '...') | ||
return mmread(mmFile).asfptype() | ||
|
||
def cugraph_call(M, verts): | ||
G = cugraph.Graph() | ||
rows = cudf.Series(M.row) | ||
cols = cudf.Series(M.col) | ||
G.add_edge_list(rows, cols, None) | ||
cu_verts = cudf.Series(verts) | ||
Sg = cugraph.subgraph(G, cu_verts) | ||
return Sg | ||
|
||
def nx_call(M, verts): | ||
G = nx.DiGraph(M) | ||
Sg = nx.subgraph(G, verts) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See the following for PEP8 naming rules. https://www.python.org/dev/peps/pep-0008/#function-and-variable-names Sg and cu_Sg violates "Function names should be lowercase, with words separated by underscores as necessary to improve readability. Variable names follow the same convention as function names." unless "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility." and I don't think mixed case is already the prevailing style in cugraph. |
||
return Sg | ||
|
||
datasets = ['/datasets/networks/karate.mtx', | ||
'/datasets/networks/dolphins.mtx', | ||
'/datasets/networks/netscience.mtx'] | ||
|
||
@pytest.mark.parametrize('graph_file', datasets) | ||
def test_subgraph_extraction(graph_file): | ||
M = ReadMtxFile(graph_file) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ReadMtxFile also violates the PEP8 naming rule. |
||
verts = np.zeros(3, dtype=np.int32) | ||
verts[0] = 0 | ||
verts[1] = 1 | ||
verts[2] = 17 | ||
cu_Sg = cugraph_call(M, verts) | ||
nx_Sg = nx_call(M, verts) | ||
assert compareEdges(cu_Sg, nx_Sg, verts) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. compareEdges also violate the PEP8 naming rule. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed these naming issues |
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I long as I understand based on comments,
gdf_graph* gdf_g and gdf_column* vertices are input parameters and gdf_graph* result is an output parameter. A function input parameter list having (input, output, input) is unorthodox. It seems like other functions in this file places input parameters before output parameters in the function input parameter lists, we should follow that convention. Also, if this function is not modifying gdf_G and vertices, we should enforce "const correctness".
So the input parameter should be
(const gdf_graph* gdf_G, const gdf_column* vertices, gdf_graph* result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't make this const gdf_graph* gdf_G. The graph can be changed if there is no CSR representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing vertices to const makes sense. However, that would require changing the implementation in nvgraph. It seems to me like this is more than we want to address now. In fact, we probably should talk about how much we want to change nvgraph other than bug fixes.