-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Subgraph extraction python bindings #246
Conversation
…actSubgraphByVertex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build and Python style check should pass
Changelog should be updated
Looks good if you can make make the checks pass. |
rerun tests |
rerun tests |
|
||
cpdef subgraph(G, vertices): | ||
""" | ||
Compute a clustering/partitioning of the given graph using the spectral balanced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be updated to reflect subgraph extraction
Parameters | ||
---------- | ||
G : cuGraph.Graph | ||
cuGraph graph descriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work both on directed and undirected graphs? We may re-use nvgraph doc
https://docs.nvidia.com/cuda/nvgraph/index.html#function-nvgraphextractsubgraphbyvertex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it works for both directed and undirected graphs. It doesn't actually traverse the edges, so direction is immaterial.
* @param vertices Pointer to GDF column object which contains the list of vertices to extract | ||
* @return Error code | ||
*/ | ||
gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I long as I understand based on comments,
gdf_graph* gdf_g and gdf_column* vertices are input parameters and gdf_graph* result is an output parameter. A function input parameter list having (input, output, input) is unorthodox. It seems like other functions in this file places input parameters before output parameters in the function input parameter lists, we should follow that convention. Also, if this function is not modifying gdf_G and vertices, we should enforce "const correctness".
So the input parameter should be
(const gdf_graph* gdf_G, const gdf_column* vertices, gdf_graph* result).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't make this const gdf_graph* gdf_G. The graph can be changed if there is no CSR representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing vertices to const makes sense. However, that would require changing the implementation in nvgraph. It seems to me like this is more than we want to address now. In fact, we probably should talk about how much we want to change nvgraph other than bug fixes.
cpp/src/nvgraph_gdf.cu
Outdated
@@ -515,6 +515,69 @@ gdf_error gdf_AnalyzeClustering_ratio_cut_nvgraph(gdf_graph* gdf_G, | |||
return GDF_SUCCESS; | |||
} | |||
|
|||
|
|||
gdf_error gdf_extract_subgraph_vertex_nvgraph(gdf_graph* gdf_G, | |||
gdf_graph* result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is weird, here. The input parameter list should be updated to match the suggested changes in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converted tabs to spaces in the file. Should address the weird indentation.
cpp/src/nvgraph_gdf.cu
Outdated
|
||
// Call Nvgraph function to get subgraph (into nv_result descriptor) | ||
NVG_TRY(nvgraphExtractSubgraphByVertex( nvg_handle, | ||
nvgraph_G, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird indentation.
cpp/src/nvgraph_gdf.cu
Outdated
|
||
// Call Nvgraph function to get subgraph (into nv_result descriptor) | ||
NVG_TRY(nvgraphExtractSubgraphByVertex( nvg_handle, | ||
nvgraph_G, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird indentation
cpp/src/nvgraph_gdf.cu
Outdated
nvgraphTopologyType_t TT = NVGRAPH_CSR_32; | ||
NVG_TRY(nvgraphGetGraphStructure(nvg_handle, nv_result, (void*)&topo, &TT)); | ||
if (TT != NVGRAPH_CSR_32) | ||
return GDF_C_ERROR; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is not a request to make change, but it seems like you're using tab for indentation. cugraph mostly uses 2 space or 4 space for indentation in C++. We really need a rule for this, but right at this moment, I am not sure I should ask you to use either 2 space or 4 space... so you may just keep using tab or switch to 2 space or 4 space :-(
cpp/src/nvgraph_gdf.cu
Outdated
result->adjList->indices = new gdf_column; | ||
result->adjList->ownership = 0; | ||
int *offsets, *indices; | ||
CUDA_TRY(cudaMallocManaged((void**)&offsets, sizeof(int) * (num_verts + 1))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not directly invoke cudaMalloc/cudaMallocManaged. We are allocating memory through RMM, so we need to use RMM_ALLOC instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to use RMM
cpp/src/nvgraph_gdf.cu
Outdated
NVG_TRY(nvgraphCreateGraphDescr(nvg_handle, &nv_result)); | ||
|
||
// Call Nvgraph function to get subgraph (into nv_result descriptor) | ||
NVG_TRY(nvgraphExtractSubgraphByVertex( nvg_handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to complain about this trivial thing... but in function calls, cugraph code base (mostly) does not place space after opening parenthesis. So better be "(nvg_handle" instead of "( nvg_handle"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
cpp/src/nvgraph_gdf.cu
Outdated
GDF_REQUIRE(!vertices->valid, GDF_VALIDITY_UNSUPPORTED); | ||
|
||
// Initialize Nvgraph and wrap the graph | ||
nvgraphHandle_t nvg_handle = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and is there really a reason we are mixing nv_, nvg_, and nvgraph_ in variable names (e.g. nv_result, nvg_handle, and nvgraph_G)? I am guessing all three are for nvgraph data type variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed everything to use nvg_
cpp/src/nvgraph_gdf.cu
Outdated
result->adjList = new gdf_adj_list; | ||
result->adjList->offsets = new gdf_column; | ||
result->adjList->indices = new gdf_column; | ||
result->adjList->ownership = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ownership is set to 0, who will free the memory allocated in line 560 and 561?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output of the analytic is a graph. This space is owned by the graph, the user will need to delete the graph when they are done with it.
import numpy as np | ||
import networkx as nx | ||
from scipy.io import mmread | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you have a strong reason to import in this order, we may better follow import orders in other files.
Import python standard modules first, then thirdparty modules, then rapids modules.
We may alphabetically sort within a group. And if NetworkX folks hasn't fixed this, importing networkx generates deprecation warnings and to avoid this, we need to do
# Temporarily suppress warnings till networkX fixes deprecation warnings
# (Using or importing the ABCs from 'collections' instead of from
# 'collections.abc' is deprecated, and in 3.8 it will stop working) for
# python 3.7. Also, this import networkx needs to be relocated in the
# third-party group once this gets fixed.
import warnings
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=DeprecationWarning)
import networkx as nx
See other test files (e.g. test_sssp.py) for an example.
|
||
def compareEdges(cg, nxg, verts): | ||
src, dest = cg.view_edge_list() | ||
if (len(src) != nxg.size()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
assert len(src) == nxg.size()
?
Similarly in line 27 & 28,
assert nxg.has_edge(verts[src[i]], verts[dest[i]]) == True
Looks clearer to me.
|
||
def nx_call(M, verts): | ||
G = nx.DiGraph(M) | ||
Sg = nx.subgraph(G, verts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the following for PEP8 naming rules.
https://www.python.org/dev/peps/pep-0008/#function-and-variable-names
Sg and cu_Sg violates
"Function names should be lowercase, with words separated by underscores as necessary to improve readability.
Variable names follow the same convention as function names."
unless "mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility." and I don't think mixed case is already the prevailing style in cugraph.
|
||
@pytest.mark.parametrize('graph_file', datasets) | ||
def test_subgraph_extraction(graph_file): | ||
M = ReadMtxFile(graph_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadMtxFile also violates the PEP8 naming rule.
verts[2] = 17 | ||
cu_Sg = cugraph_call(M, verts) | ||
nx_Sg = nx_call(M, verts) | ||
assert compareEdges(cu_Sg, nx_Sg, verts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compareEdges also violate the PEP8 naming rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed these naming issues
After the merge of rapidsai#232, a few different tests failed in rapidsai/cuml#3891, given the timing I think it'd be best to target 232 (again) to 21.08 after triaging the issues. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) - Brad Rees (https://github.com/BradReesWork) URL: rapidsai/raft#246
Adding in the python bindings for subgraph extraction after the fix in nvgraph was put in. Closes #60