Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Do not store redundant columns in _edge_prop_dataframe and _vertex_prop_dataframe #2400

Closed
VibhuJawa opened this issue Jul 11, 2022 · 0 comments · Fixed by #2449
Closed
Assignees
Labels
improvement Improvement / enhancement to an existing function
Milestone

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Jul 11, 2022

Describe the solution you'd like and any additional context

When we add data to Property graph via add_vertex_data and add_edge_data we store the vertex_columns along with the internal _SRC_ and _DST_ . We should drop those to save on memory.

from cugraph.experimental import PropertyGraph
import numpy as np
import cudf

pg = PropertyGraph()
df = cudf.DataFrame({'input_src':np.ones(100_000, dtype=np.int32)*2, 'input_dst':np.arange(100_000), 'edge_w':np.zeros(100_000)})
pg.add_edge_data(df,vertex_col_names=['input_src','input_dst'])
pg._edge_prop_dataframe

pg._edge_prop_dataframe.memory_usage()
_SRC_ | _DST_ | _EDGE_ID_ | _TYPE_ | input_src | input_dst | edge_w
-- | -- | -- | -- | -- | -- | --
2 | 0 | 0 | <NA> | 2 | 0 | 0.0
2 | 1 | 1 | <NA> | 2 | 1 | 0.0
2 | 2 | 2 | <NA> | 2 | 2 | 0.0
2 | 3 | 3 | <NA> | 2 | 3 | 0.0
2 | 4 | 4 | <NA> | 2 | 4 | 0.0
... | ... | ... | ... | ... | ... | ...
2 | 99995 | 99995 | <NA> | 2 | 99995 | 0.0
2 | 99996 | 99996 | <NA> | 2 | 99996 | 0.0
2 | 99997 | 99997 | <NA> | 2 | 99997 | 0.0
2 | 99998 | 99998 | <NA> | 2 | 99998 | 0.0
2 | 99999 | 99999 | <NA> | 2 | 99999 | 0.0

In this example the input_src and input_dst is redundant and we can save 30% of memory by dropping them.

@VibhuJawa VibhuJawa added improvement Improvement / enhancement to an existing function python labels Jul 11, 2022
@rlratzel rlratzel added this to the 22.08 milestone Jul 20, 2022
eriknw added a commit to eriknw/cugraph that referenced this issue Jul 27, 2022
Closes rapidsai#2400

I still need to update MG tests.

I'll also remove the in-code assertions, since they won't always be True,
because a column name could have previously been used as a property.
Nevertheless, seeing these assertions pass should give us warm-fuzzies :)
rapids-bot bot pushed a commit that referenced this issue Aug 2, 2022
The main purpose of this is to reduce memory usage.

Closes #2400

I still need to update MG tests.

I'll also remove the in-code assertions, since they won't always be True, because a column name could have previously been used as a property. Nevertheless, seeing these assertions pass should give us warm-fuzzies :)

Authors:
  - Erik Welch (https://github.com/eriknw)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Brad Rees (https://github.com/BradReesWork)

URL: #2449
@rapids-bot rapids-bot bot closed this as completed in #2449 Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants