-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Bug ext create column #232
[REVIEW] Bug ext create column #232
Conversation
…ck allocation, these objects are only used to pass the data encapsulated in Python cudf Series objects to C++ functions expecting (pointers to) C++ gdf_column objects and sizeof(gdf_column) is not large enough to blow stack, no need to involve heap allocation overhead and risk memory leak (if forget to free).
…column_view (stack-allocation for gdf_column)
…using ReadMtxFile, fixed this.
…(no spacing between < and >), cudf is inconsistent in placing a space between > and the name of the variable to be casted, so left this part as is.
…, cudf's gdf_column_view does not properly initialize col_name to nullptr, freeing col_name can result in freeing unallocated memory, this problem should be cleaned up once cudf finishes redesigning cudf::column.
…nter will break strict-aliasing rules [-Wstrict-aliasing])
…- 1 with num_vertices(), this is a better abstraction and less vulnerable to low level changes in class Graph
…so, view_transposed_adj_list can better mirror view_adj_list (except for replacing adjList with transposedAdjList)
else: | ||
value = create_column(self.adj_list_value_col) | ||
c_value_col = get_gdf_column_view(self.adj_list_value_col) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scoping issue here? You are creating an object on the stack inside of a code block and assigning a pointer to it which is used outside of the code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
283 cdef gdf_column c_value_col
c_value_col is defined in line 238, and it's not used after
291 err = gdf_adj_list_view(<gdf_graph*> graph,
292 &c_offset_col,
293 &c_index_col,
294 c_value_col_ptr)
Please note that the actual column array data are stored in self.adj_list_value_col (a python object), and c_value_col is a temporary C++ wrapper used to pass the column data in python object to a C++ function expecting C++ data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that c_value_col
is still in the scope when used in gdf_adj_list_view
.
However, there is also a small consistency issue here as c_offset_col
and c_index_col
are accessed from the stack objects while a pointer to the stack object is created for the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inconsistency is because value is optional (it can be NULL). I can either use value_col_ptr or replicate the gdf_edge_list_view call for the two different cases (if value_col is None and else). I picked to use value_col_ptr.
err = gdf_add_adj_list(g) | ||
cudf.bindings.cudf_cpp.check_gdf_error(err) | ||
|
||
col_size_off = g.adjList.offsets.size | ||
col_size_ind = g.adjList.indices.size | ||
offset_col_size = self.num_vertices() + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to just take the size of the offsets column directly rather than calling self.num_vertices() it's less complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not agree with this. We'd better hide C++ data structure (gdf_graph) internals from the functions using gdf_graph. If we use num_verticies(), it does not matter whether we change gdf_graph internals or not as long as we have num_verticies(). If some change is necessary, we just need to update num_verticies(). We are about to update graph data structures to match NetworkX, and in this case, we need to search the entire code base and replace every g.adjList.offsets.size -1 (and g.transposedAdjList.offset.size if g.adjList is not available and g.transposedAdjList is available) to match the new data structure. This will become more and more problematic as we add more and more graph functions; there basically are more places we need to track.
See Hiding data (and code) in https://herbsutter.com/2013/08/12/gotw-94-solution-aaa-style-almost-always-auto/
Hiding data structure internals is a key principle in object oriented programming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And cudf folks are working on replacing gdf_column with cudf::column to provide better abstraction (rapidsai/cudf#1443).
We need to do something similar for gdf_graph; and I think eventually most (if not all) gdf_graph member variables should become private and should not be directly accessed outside gdf_graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still disagree. While you are right that it is good to hide the internals of the graph object, the issue is that we aren't looking for the number of vertices in the graph, but for the size of the offsets array. Using self.num_vertices() + 1
to get that is still relying on information on the graph object's internals, just doing it in a more roundabout way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your reply!!!
See "The array IA is of length m + 1." from https://en.wikipedia.org/wiki/Sparse_matrix (here, m is the number of vertices).
self.num_vertices() + 1 (or m + 1 if I use notation from the above link) is from the general knowledge about the CSR format, not from the graph structure internals. Hope this helps to address your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't think this should block the PR, I'm mostly concerned about consistency here. It is now using a mix of direct structure access and accessors. If self.num_vertices()+1
is used to set the size of the offset array, it is best to use a self.num_edges()
for indices arrays too or keep the previous, consistent, solution imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have the same concern. The thing is that we do not have the num_edges() function, and we are planning to do some major restructuring to match the NetworkX API (for the 0.8 release I guess), so I was a bit hesitant to implement the num_edges() function that will be replaced in one month. I decided to defer this work for the 0.8 release but if this inconsistency really bothers (it somewhat bothers me but I can endure if this will last for only short time), I can do the work for this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think I'd better do this now.
I will rename num_vertices() to number_of_nodes() and num_edges() to number_of_edges() to follow NetworkX (https://networkx.github.io/documentation/networkx-1.10/reference/generated/networkx.DiGraph.number_of_nodes.html and https://networkx.github.io/documentation/stable/reference/classes/generated/networkx.Graph.number_of_edges.html).
This will not be too much work and not gonna be entirely replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also live with that as long as these changes are tracked in 0.8 release ;)
err = gdf_add_transposed_adj_list(g) | ||
cudf.bindings.cudf_cpp.check_gdf_error(err) | ||
|
||
off_size = g.transposedAdjList.offsets.size | ||
ind_size = g.transposedAdjList.indices.size | ||
offset_col_size = self.num_vertices() + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, getting the size of the column directly is simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree for the same reason.
python/cugraph/graph/c_graph.pyx
Outdated
off_size = g.transposedAdjList.offsets.size | ||
ind_size = g.transposedAdjList.indices.size | ||
offset_col_size = self.num_vertices() + 1 | ||
inex_col_size = g.transposedAdjList.indices.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may have meant 'index_col_size' here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!!! I made a fix.
cdef gdf_column c_weight_col | ||
cdef gdf_column c_first_col | ||
cdef gdf_column c_second_col | ||
cdef gdf_column c_indices_col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be c_index_col
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding this!!! and I fixed this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
This PR addresses issues mentioned in rapidsai/raft#221 -- Adds grid stride based fusedL2NN kernel, this gives approx 1.85x speed up over previous version of this kernel. -- Adds support in pairwise dist base class to work for any input size by adding support for grid stride based work distribution. Authors: - Mahesh Doijade (https://github.com/mdoijade) Approvers: - Thejaswi. N. S (https://github.com/teju85) - Divye Gala (https://github.com/divyegala) - Alex Fender (https://github.com/afender) URL: rapidsai/raft#232
After the merge of rapidsai#232, a few different tests failed in rapidsai/cuml#3891, given the timing I think it'd be best to target 232 (again) to 21.08 after triaging the issues. Authors: - Dante Gama Dessavre (https://github.com/dantegd) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Divye Gala (https://github.com/divyegala) - Brad Rees (https://github.com/BradReesWork) URL: rapidsai/raft#246
Closes #186, Closes #189.