Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Subgraph extraction python bindings #246

Merged
merged 11 commits into from
May 3, 2019
10 changes: 10 additions & 0 deletions cpp/include/nvgraph_gdf.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,16 @@ gdf_error gdf_AnalyzeClustering_ratio_cut_nvgraph(gdf_graph* gdf_G,
gdf_column* clustering,
float* score);

/**
* Wrapper function for Nvgraph extract subgraph by vertices
* @param gdf_G Pointer to GDF graph object, this is the input graph
* @param result Pointer to GDF graph object, this is the output must be a valid pointer
* @param vertices Pointer to GDF column object which contains the list of vertices to extract
* @return Error code
*/
gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as I long as I understand based on comments,

gdf_graph* gdf_g and gdf_column* vertices are input parameters and gdf_graph* result is an output parameter. A function input parameter list having (input, output, input) is unorthodox. It seems like other functions in this file places input parameters before output parameters in the function input parameter lists, we should follow that convention. Also, if this function is not modifying gdf_G and vertices, we should enforce "const correctness".

So the input parameter should be
(const gdf_graph* gdf_G, const gdf_column* vertices, gdf_graph* result).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't make this const gdf_graph* gdf_G. The graph can be changed if there is no CSR representation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing vertices to const makes sense. However, that would require changing the implementation in nvgraph. It seems to me like this is more than we want to address now. In fact, we probably should talk about how much we want to change nvgraph other than bug fixes.

gdf_graph* result,
gdf_column* vertices);
/**
* Wrapper function for Nvgraph triangle counting
* @param G Pointer to GDF graph object
Expand Down
71 changes: 67 additions & 4 deletions cpp/src/nvgraph_gdf.cu
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,15 @@ gdf_error gdf_createGraph_nvgraph(nvgraphHandle_t nvg_handle,
0,
settype,
(float * ) gdf_G->transposedAdjList->edge_data->data))
break;
break;
case GDF_FLOAT64:
settype = CUDA_R_64F;
NVG_TRY(nvgraphAttachEdgeData(nvg_handle,
*nvgraph_G,
0,
settype,
(double * ) gdf_G->transposedAdjList->edge_data->data))
break;
break;
default:
return GDF_UNSUPPORTED_DTYPE;
}
Expand Down Expand Up @@ -183,15 +183,15 @@ gdf_error gdf_createGraph_nvgraph(nvgraphHandle_t nvg_handle,
0,
settype,
(float * ) gdf_G->adjList->edge_data->data))
break;
break;
case GDF_FLOAT64:
settype = CUDA_R_64F;
NVG_TRY(nvgraphAttachEdgeData(nvg_handle,
*nvgraph_G,
0,
settype,
(double * ) gdf_G->adjList->edge_data->data))
break;
break;
default:
return GDF_UNSUPPORTED_DTYPE;
}
Expand Down Expand Up @@ -515,6 +515,69 @@ gdf_error gdf_AnalyzeClustering_ratio_cut_nvgraph(gdf_graph* gdf_G,
return GDF_SUCCESS;
}


gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G,
gdf_graph* result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is weird, here. The input parameter list should be updated to match the suggested changes in the header file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted tabs to spaces in the file. Should address the weird indentation.

gdf_column* vertices) {
GDF_REQUIRE(gdf_G != nullptr, GDF_INVALID_API_CALL);
GDF_REQUIRE((gdf_G->adjList != nullptr) || (gdf_G->edgeList != nullptr), GDF_INVALID_API_CALL);
GDF_REQUIRE(vertices != nullptr, GDF_INVALID_API_CALL);
GDF_REQUIRE(vertices->data != nullptr, GDF_INVALID_API_CALL);
GDF_REQUIRE(!vertices->valid, GDF_VALIDITY_UNSUPPORTED);

// Initialize Nvgraph and wrap the graph
nvgraphHandle_t nvg_handle = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and is there really a reason we are mixing nv_, nvg_, and nvgraph_ in variable names (e.g. nv_result, nvg_handle, and nvgraph_G)? I am guessing all three are for nvgraph data type variables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed everything to use nvg_

nvgraphGraphDescr_t nvgraph_G = nullptr;
NVG_TRY(nvgraphCreate(&nvg_handle));
GDF_TRY(gdf_createGraph_nvgraph(nvg_handle, gdf_G, &nvgraph_G, false));

// Create an Nvgraph graph descriptor for the result and initialize
nvgraphGraphDescr_t nv_result = nullptr;
NVG_TRY(nvgraphCreateGraphDescr(nvg_handle, &nv_result));

// Call Nvgraph function to get subgraph (into nv_result descriptor)
NVG_TRY(nvgraphExtractSubgraphByVertex( nvg_handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to complain about this trivial thing... but in function calls, cugraph code base (mostly) does not place space after opening parenthesis. So better be "(nvg_handle" instead of "( nvg_handle"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

nvgraph_G,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird indentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird indentation

nv_result,
(int*)vertices->data,
vertices->size));

// Get the vertices and edges of the created subgraph to allocate memory:
nvgraphCSRTopology32I_st topo;
topo.source_offsets = nullptr;
topo.destination_indices = nullptr;
nvgraphTopologyType_t TT = NVGRAPH_CSR_32;
NVG_TRY(nvgraphGetGraphStructure(nvg_handle, nv_result, (void*)&topo, &TT));
if (TT != NVGRAPH_CSR_32)
return GDF_C_ERROR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is not a request to make change, but it seems like you're using tab for indentation. cugraph mostly uses 2 space or 4 space for indentation in C++. We really need a rule for this, but right at this moment, I am not sure I should ask you to use either 2 space or 4 space... so you may just keep using tab or switch to 2 space or 4 space :-(

int num_verts = topo.nvertices;
int num_edges = topo.nedges;
result->adjList = new gdf_adj_list;
result->adjList->offsets = new gdf_column;
result->adjList->indices = new gdf_column;
result->adjList->ownership = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ownership is set to 0, who will free the memory allocated in line 560 and 561?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output of the analytic is a graph. This space is owned by the graph, the user will need to delete the graph when they are done with it.

int *offsets, *indices;
CUDA_TRY(cudaMallocManaged((void**)&offsets, sizeof(int) * (num_verts + 1)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not directly invoke cudaMalloc/cudaMallocManaged. We are allocating memory through RMM, so we need to use RMM_ALLOC instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use RMM

CUDA_TRY(cudaMallocManaged((void**)&indices, sizeof(int) * num_edges));
gdf_column_view(result->adjList->offsets,
offsets,
nullptr,
num_verts + 1,
GDF_INT32);
gdf_column_view(result->adjList->indices,
indices,
nullptr,
num_edges,
GDF_INT32);

// Call nvgraphGetGraphStructure again to copy out the data
topo.source_offsets = (int*)result->adjList->offsets->data;
topo.destination_indices = (int*)result->adjList->indices->data;
NVG_TRY(nvgraphGetGraphStructure(nvg_handle, nv_result, (void*)&topo, &TT));

return GDF_SUCCESS;
}

gdf_error gdf_triangle_count_nvgraph(gdf_graph* G, uint64_t* result) {
GDF_REQUIRE(G != nullptr, GDF_INVALID_API_CALL);
GDF_REQUIRE((G->adjList != nullptr) || (G->edgeList != nullptr), GDF_INVALID_API_CALL);
Expand Down
3 changes: 2 additions & 1 deletion python/cugraph/cugraph.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ include "grmat/grmat_wrapper.pyx"
include "louvain/louvain_wrapper.pyx"
include "bfs/bfs_wrapper.pyx"
include "spectral_clustering/spectral_clustering.pyx"
include "triangle_count/triangle_count_wrapper.pyx"
include "subgraph_extraction/subgraph_extraction.pyx"
include "triangle_count/triangle_count_wrapper.pyx"
8 changes: 6 additions & 2 deletions python/cugraph/nvgraph/c_nvgraph.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ cdef extern from "nvgraph_gdf.h":
cdef gdf_error gdf_AnalyzeClustering_ratio_cut_nvgraph(gdf_graph* gdf_G,
const int n_clusters,
gdf_column* clustering,
float* score)
float* score)

cdef gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G,
gdf_graph* result,
gdf_column* vertices)

cdef gdf_error gdf_triangle_count_nvgraph(gdf_graph* G, uint64_t* result)
cdef gdf_error gdf_triangle_count_nvgraph(gdf_graph* G, uint64_t* result)
68 changes: 68 additions & 0 deletions python/cugraph/subgraph_extraction/subgraph_extraction.pyx
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Copyright (c) 2019, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from c_nvgraph cimport *
from c_graph cimport *
from libcpp cimport bool
from libc.stdint cimport uintptr_t
from libc.stdlib cimport calloc, malloc, free
from libc.float cimport FLT_MAX_EXP
import cudf
from librmm_cffi import librmm as rmm
import numpy as np

cpdef subgraph(G, vertices):
"""
Compute a clustering/partitioning of the given graph using the spectral balanced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be updated to reflect subgraph extraction

cut method.

Parameters
----------
G : cuGraph.Graph
cuGraph graph descriptor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it work both on directed and undirected graphs? We may re-use nvgraph doc
https://docs.nvidia.com/cuda/nvgraph/index.html#function-nvgraphextractsubgraphbyvertex

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works for both directed and undirected graphs. It doesn't actually traverse the edges, so direction is immaterial.

vertices : cudf.Series
Specifies the vertices of the induced subgraph

Returns
-------
Sg : cuGraph.Graph
A graph object containing the subgraph induced by the given vertex set.

Example:
--------
>>> M = ReadMtxFile(graph_file)
>>> sources = cudf.Series(M.row)
>>> destinations = cudf.Series(M.col)
>>> G = cuGraph.Graph()
>>> G.add_edge_list(sources,destinations,None)
>>> verts = numpy.zeros(3, dtype=np.int32)
>>> verts[0] = 0
>>> verts[1] = 1
>>> verts[2] = 2
>>> sverts = cudf.Series(verts)
>>> Sg = cuGraph.subgraph(G, sverts)
"""

cdef uintptr_t graph = G.graph_ptr
cdef gdf_graph * g = < gdf_graph *> graph

resultGraph = Graph()
cdef uintptr_t rGraph = resultGraph.graph_ptr
cdef gdf_graph* rg = <gdf_graph*>rGraph

cdef gdf_column vert_col = get_gdf_column_view(vertices)

err = gdf_extract_subgraph_vertex_nvgraph(g, rg, &vert_col)
cudf.bindings.cudf_cpp.check_gdf_error(err)

return resultGraph
65 changes: 65 additions & 0 deletions python/cugraph/subgraph_extraction/test_subgraph_extraction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Copyright (c) 2019, NVIDIA CORPORATION.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import cugraph
import cudf
import pytest
import numpy as np
import networkx as nx
from scipy.io import mmread

Copy link
Contributor

@seunghwak seunghwak May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you have a strong reason to import in this order, we may better follow import orders in other files.

Import python standard modules first, then thirdparty modules, then rapids modules.

We may alphabetically sort within a group. And if NetworkX folks hasn't fixed this, importing networkx generates deprecation warnings and to avoid this, we need to do

# Temporarily suppress warnings till networkX fixes deprecation warnings
# (Using or importing the ABCs from 'collections' instead of from
# 'collections.abc' is deprecated, and in 3.8 it will stop working) for
# python 3.7.  Also, this import networkx needs to be relocated in the
# third-party group once this gets fixed.
import warnings
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=DeprecationWarning)
    import networkx as nx

See other test files (e.g. test_sssp.py) for an example.

def compareEdges(cg, nxg, verts):
src, dest = cg.view_edge_list()
if (len(src) != nxg.size()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not

assert len(src) == nxg.size()

?

Similarly in line 27 & 28,

assert nxg.has_edge(verts[src[i]], verts[dest[i]]) == True

Looks clearer to me.

assert False
for i in range(len(src)):
if not nxg.has_edge(verts[src[i]], verts[dest[i]]):
assert False
return True

def ReadMtxFile(mmFile):
print('Reading '+ str(mmFile) + '...')
return mmread(mmFile).asfptype()

def cugraph_call(M, verts):
G = cugraph.Graph()
rows = cudf.Series(M.row)
cols = cudf.Series(M.col)
G.add_edge_list(rows, cols, None)
cu_verts = cudf.Series(verts)
Sg = cugraph.subgraph(G, cu_verts)
return Sg

def nx_call(M, verts):
G = nx.DiGraph(M)
Sg = nx.subgraph(G, verts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the following for PEP8 naming rules.

https://www.python.org/dev/peps/pep-0008/#function-and-variable-names

Sg and cu_Sg violates

"Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Variable names follow the same convention as function names."

unless "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility." and I don't think mixed case is already the prevailing style in cugraph.

return Sg

datasets = ['/datasets/networks/karate.mtx',
'/datasets/networks/dolphins.mtx',
'/datasets/networks/netscience.mtx']

@pytest.mark.parametrize('graph_file', datasets)
def test_subgraph_extraction(graph_file):
M = ReadMtxFile(graph_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadMtxFile also violates the PEP8 naming rule.

verts = np.zeros(3, dtype=np.int32)
verts[0] = 0
verts[1] = 1
verts[2] = 17
cu_Sg = cugraph_call(M, verts)
nx_Sg = nx_call(M, verts)
assert compareEdges(cu_Sg, nx_Sg, verts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compareEdges also violate the PEP8 naming rule.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed these naming issues