Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling Performance Testing #3584

Merged
merged 490 commits into from
Jan 12, 2024
Merged
Changes from 32 commits
Commits
Show all changes
490 commits
Select commit Hold shift + click to select a range
c09bb25
bug fix
seunghwak Aug 3, 2023
4edb9ae
Merge branch 'branch-23.08' of github.com:rapidsai/cugraph into bug_mfg
seunghwak Aug 3, 2023
57fb8e5
Merge branch 'bug_mfg' of https://github.com/seunghwak/cugraph into p…
alexbarghi-nv Aug 3, 2023
3b95106
add latest updates
alexbarghi-nv Aug 3, 2023
3269a4f
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Aug 3, 2023
3e009cd
bug fix (when edge list is empty)
seunghwak Aug 3, 2023
622a17a
Merge branch 'branch-23.08' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Aug 3, 2023
e4d7796
add latest updates
alexbarghi-nv Aug 9, 2023
a226a4e
revert cpp changes
alexbarghi-nv Aug 9, 2023
5d3843f
revert plc changes
alexbarghi-nv Aug 9, 2023
36464a9
revert notebook changes
alexbarghi-nv Aug 9, 2023
c5a81c2
Revert logging change
alexbarghi-nv Aug 9, 2023
95a72ab
correction for dataset name
alexbarghi-nv Aug 9, 2023
aebe742
fix for empty batch issue
alexbarghi-nv Aug 14, 2023
449984d
do merge
alexbarghi-nv Aug 14, 2023
bdaa22f
bring in changes
alexbarghi-nv Aug 15, 2023
223dee3
remove redundant filter function
alexbarghi-nv Aug 15, 2023
0c904ae
construct cugraph graph in CSC format
alexbarghi-nv Aug 16, 2023
399976d
fixes for csc, update tests
alexbarghi-nv Aug 16, 2023
6b1169e
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 16, 2023
3c9afc9
style fix, add comment explaining function
alexbarghi-nv Aug 17, 2023
88831a8
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 17, 2023
246ac33
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 17, 2023
2fe3fe0
improve docstring
alexbarghi-nv Aug 17, 2023
f89a3fb
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 17, 2023
072b1ff
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 21, 2023
85a9c88
cleanup ahead of conversion to mg
alexbarghi-nv Aug 21, 2023
53b334b
mg work
alexbarghi-nv Aug 21, 2023
f0e9f1f
move sampling relatd functions in graph_functions.hpp to sampling_fun…
seunghwak Aug 22, 2023
3b1fd23
draft sampling post processing function APIs
seunghwak Aug 22, 2023
5e99823
mg
alexbarghi-nv Aug 23, 2023
7e4d041
resolve merge conflict
alexbarghi-nv Aug 23, 2023
d62f4f0
update to fix hop numbering issue
alexbarghi-nv Aug 24, 2023
67f4d7b
API updates
seunghwak Aug 24, 2023
5a8194e
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 24, 2023
19f66d0
Persist on host memory
alexbarghi-nv Aug 25, 2023
8f521d2
API updates
seunghwak Aug 25, 2023
da3da9b
deprecate the existing renumber_sampeld_edgelist function
seunghwak Aug 25, 2023
0b87ee1
combine renumber & compression/sorting functions
seunghwak Aug 25, 2023
9b5950b
minor documentation updates
seunghwak Aug 25, 2023
5fbb177
mionr documentation updates
seunghwak Aug 25, 2023
b9611ab
deprecate the existing sampling output renumber function
seunghwak Aug 27, 2023
d1c1440
improvements
alexbarghi-nv Aug 29, 2023
846d3fd
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 29, 2023
e52c614
split homogeneous/heterogeneous for better performance
alexbarghi-nv Aug 29, 2023
2e5479d
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 29, 2023
6463445
add e2e test, fix a lot of bugs found by test
alexbarghi-nv Aug 29, 2023
c291110
style fix
alexbarghi-nv Aug 30, 2023
e9d1fcc
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 30, 2023
8f95c79
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 30, 2023
29aa194
correct docstrings
alexbarghi-nv Aug 30, 2023
99b6f48
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 30, 2023
ebf0d9c
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Aug 30, 2023
26d48dd
rename sampling convert function
alexbarghi-nv Aug 30, 2023
0069d9d
Merge branch 'cugraph-pyg-loader-improvements' of https://github.com/…
alexbarghi-nv Aug 30, 2023
34d6bdc
update loader with new name
alexbarghi-nv Aug 30, 2023
baa8ea8
add comments to renumbering, clarify deprecation, add warning
alexbarghi-nv Aug 30, 2023
c3ee02b
initial implementation of sampling post processing
seunghwak Aug 31, 2023
04c9105
cuda::std::atomic=>cuda::atomic
seunghwak Aug 31, 2023
bdc840c
update API documentation
seunghwak Aug 31, 2023
8c304b3
add additional input testing
seunghwak Aug 31, 2023
b16a071
replace testing for sampling output post processing
seunghwak Aug 31, 2023
09a38d7
cosmetic updates
seunghwak Aug 31, 2023
82ad8e4
bug fixes
seunghwak Aug 31, 2023
e9b39e4
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 1, 2023
d99b512
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 1, 2023
c15d580
the c api
alexbarghi-nv Sep 1, 2023
2ac8b86
work
alexbarghi-nv Sep 1, 2023
9135629
fix compile errors
alexbarghi-nv Sep 1, 2023
dfd1cb7
reformat
alexbarghi-nv Sep 1, 2023
6dfd4fe
rename test file from .cu to .cpp
seunghwak Sep 5, 2023
f600520
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 6, 2023
7d5821f
bug fixes
seunghwak Sep 6, 2023
58189ed
add fill wrapper
seunghwak Sep 6, 2023
39db98a
undo adding fill wrapper
seunghwak Sep 6, 2023
98c8e0a
sampling test from .cpp to .cu
seunghwak Sep 6, 2023
687d191
latest perf testing
alexbarghi-nv Sep 7, 2023
c151f95
fix a typo
seunghwak Sep 7, 2023
fc5a4f0
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 7, 2023
a7d1804
merge
alexbarghi-nv Sep 7, 2023
3cda233
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 7, 2023
0a18cde
do merge
alexbarghi-nv Sep 7, 2023
094aaf9
do not return valid nzd vertices if doubly_compress is false
seunghwak Sep 7, 2023
cf57a6d
bug fix
seunghwak Sep 8, 2023
2b48b7e
test code
seunghwak Sep 8, 2023
79acc8e
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 8, 2023
11009c6
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 8, 2023
0481bfb
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 8, 2023
2af9333
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 8, 2023
23cd2c2
bug fix
seunghwak Sep 8, 2023
6eaf67e
update documentation
seunghwak Sep 8, 2023
4dc0a92
fix c api issues
alexbarghi-nv Sep 11, 2023
2947b33
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 11, 2023
0a2b2b7
C API fixes, Python/PLC API work
alexbarghi-nv Sep 11, 2023
db35940
adjust hop offsets when there is a jump in major vertex IDs between hops
seunghwak Sep 11, 2023
b8b72be
add sort only function
seunghwak Sep 12, 2023
38dd11e
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into fea_mfg
seunghwak Sep 12, 2023
2a799a6
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 12, 2023
c86ceac
various improvements
alexbarghi-nv Sep 12, 2023
37a37bf
Merge branch 'fea_mfg' of https://github.com/seunghwak/cugraph into c…
alexbarghi-nv Sep 12, 2023
002fe93
fix merge conflict
alexbarghi-nv Sep 19, 2023
5051dfc
fix bad merge
alexbarghi-nv Sep 19, 2023
6cdf92b
asdf
alexbarghi-nv Sep 19, 2023
6682cb4
clarifying comments
alexbarghi-nv Sep 19, 2023
0d12a28
t
alexbarghi-nv Sep 19, 2023
f5733f2
latest code
alexbarghi-nv Sep 19, 2023
52e2f57
bug fix
seunghwak Sep 19, 2023
befeb25
Merge branch 'branch-23.10' of github.com:rapidsai/cugraph into bug_o…
seunghwak Sep 19, 2023
8781612
additional bug fix
seunghwak Sep 19, 2023
f92b5f5
add additional checking to detect the previously neglected bugs
seunghwak Sep 19, 2023
2bd93d9
Merge branch 'bug_offsets' of https://github.com/seunghwak/cugraph in…
alexbarghi-nv Sep 19, 2023
3195298
wrap up sg API
alexbarghi-nv Sep 20, 2023
74195cb
test fix, cleanup
alexbarghi-nv Sep 20, 2023
374b103
refactor code into new shared utility
alexbarghi-nv Sep 20, 2023
bd625e3
get mg api working
alexbarghi-nv Sep 20, 2023
b2a4ed1
add offset mg test
alexbarghi-nv Sep 20, 2023
9fb7438
fix renumber map issue in C++
alexbarghi-nv Sep 20, 2023
c770a17
verify new compression formats for sg
alexbarghi-nv Sep 20, 2023
b569563
complete csr/csc tests for both sg/mg
alexbarghi-nv Sep 20, 2023
ab2a185
get the bulk sampler working again
alexbarghi-nv Sep 20, 2023
89a1b33
remove unwanted file
alexbarghi-nv Sep 20, 2023
a9d46ef
fix wrong dataframe issue
alexbarghi-nv Sep 21, 2023
17e9013
update sg bulk sampler tests
alexbarghi-nv Sep 21, 2023
c5543b2
fix mg bulk sampler tests
alexbarghi-nv Sep 21, 2023
6581f47
Merge branch 'branch-23.10' into cugraph-pyg-loader-improvements
alexbarghi-nv Sep 21, 2023
16e83bc
write draft of csr bulk sampler
alexbarghi-nv Sep 21, 2023
1e7098d
overhaul the writer methods
alexbarghi-nv Sep 22, 2023
ae94c35
remove unused method
alexbarghi-nv Sep 22, 2023
7beba4b
style
alexbarghi-nv Sep 22, 2023
16ed5ef
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 22, 2023
79e3cef
remove notebook
alexbarghi-nv Sep 22, 2023
fd5cceb
add clarifying comment to c++
alexbarghi-nv Sep 22, 2023
a47691d
add future warnings
alexbarghi-nv Sep 22, 2023
195d063
cleanup
alexbarghi-nv Sep 22, 2023
0af1750
remove print statements
alexbarghi-nv Sep 22, 2023
d65632c
fix c api bug
alexbarghi-nv Sep 22, 2023
247d8d2
revert dataloader change
alexbarghi-nv Sep 22, 2023
72bebc2
fix empty df bug
alexbarghi-nv Sep 22, 2023
4d51751
style
alexbarghi-nv Sep 22, 2023
9dfa3fa
io
alexbarghi-nv Sep 22, 2023
10c8c1f
fix test failures, remove c++ compression enum
alexbarghi-nv Sep 23, 2023
08cf3e1
remove removed api from mg tests
alexbarghi-nv Sep 23, 2023
897e6d6
change to future warning
alexbarghi-nv Sep 23, 2023
bb5e621
resolve checking issues
alexbarghi-nv Sep 23, 2023
d20e593
Merge branch 'cugraph-pyg-loader-improvements' into cugraph-pyg-mfg
alexbarghi-nv Sep 23, 2023
eb3aadc
fix wrong index + off by 1 error, add check in test
alexbarghi-nv Sep 25, 2023
a124964
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 25, 2023
6990c23
add annotations
alexbarghi-nv Sep 25, 2023
920bed7
docstring correction
alexbarghi-nv Sep 25, 2023
f8df56f
remove empty batch check
alexbarghi-nv Sep 25, 2023
ef2ec5b
fix capi sg test
alexbarghi-nv Sep 25, 2023
8e22ab9
disable broken tests, they are too expensive to fix and redundant
alexbarghi-nv Sep 25, 2023
13bdd43
Merge branch 'cugraph-sample-convert' of https://github.com/alexbargh…
alexbarghi-nv Sep 25, 2023
c48a14b
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 25, 2023
cf612c7
update c code
alexbarghi-nv Sep 25, 2023
09a3bd8
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 26, 2023
140b6e4
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 27, 2023
e4544b6
Merge branch 'branch-23.10' into cugraph-sample-convert
alexbarghi-nv Sep 27, 2023
0ee3798
Resolve merge conflict
alexbarghi-nv Sep 27, 2023
6212869
fix bad merge
alexbarghi-nv Sep 27, 2023
0f1a144
initial rewrite
alexbarghi-nv Sep 27, 2023
b369e97
fixes, more testing
alexbarghi-nv Sep 27, 2023
13be49c
fix issue with num nodes and edges
alexbarghi-nv Sep 27, 2023
185143c
e2e smoke test
alexbarghi-nv Sep 28, 2023
99efb9c
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 28, 2023
bc1f30b
Merge branch 'cugraph-sample-convert' into perf-testing-v2
alexbarghi-nv Sep 28, 2023
9ea6c6b
Merge branch 'branch-23.10' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Sep 28, 2023
a127643
Merge branch 'cugraph-pyg-mfg' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Sep 28, 2023
262d1da
fix test column name issues
alexbarghi-nv Sep 29, 2023
7a05c10
Merge branch 'branch-23.10' into cugraph-pyg-mfg
alexbarghi-nv Sep 29, 2023
c440f64
resolve merge conflicts
alexbarghi-nv Sep 29, 2023
d0d0cb2
copyright
alexbarghi-nv Sep 29, 2023
b4e6d06
testing
alexbarghi-nv Sep 29, 2023
20f138c
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Sep 29, 2023
7e770ad
debugging
alexbarghi-nv Sep 29, 2023
4ac962d
perf testing
alexbarghi-nv Oct 2, 2023
55b4e84
regex
alexbarghi-nv Nov 15, 2023
0fd367a
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Nov 15, 2023
894831e
update to latest
alexbarghi-nv Nov 15, 2023
3cad3f2
fixes
alexbarghi-nv Nov 15, 2023
912d6ca
node loader
alexbarghi-nv Nov 29, 2023
ea60f94
Merge branch 'branch-23.12' of https://github.com/rapidsai/cugraph in…
alexbarghi-nv Nov 29, 2023
9972619
finish patch
alexbarghi-nv Nov 29, 2023
1c401d1
merge latest
alexbarghi-nv Dec 1, 2023
02c7210
bulk sampling
alexbarghi-nv Dec 1, 2023
b67d5ed
perf testing
alexbarghi-nv Dec 5, 2023
da389e0
minor fixes
alexbarghi-nv Dec 6, 2023
e29b4e8
get the native workflow working
alexbarghi-nv Dec 6, 2023
d358257
wrap up first version of cugraph trainer
alexbarghi-nv Dec 7, 2023
e08c46c
remove stats file
alexbarghi-nv Dec 7, 2023
a9fc5af
Fixes
alexbarghi-nv Dec 8, 2023
49094db
x
alexbarghi-nv Dec 12, 2023
b8e2354
output multiple epochs, train/test/val
alexbarghi-nv Dec 12, 2023
0fd156b
remove unwanted file
alexbarghi-nv Dec 12, 2023
663febe
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Dec 12, 2023
2a3ee5a
revert file
alexbarghi-nv Dec 12, 2023
b424e7c
remove unwanted file
alexbarghi-nv Dec 12, 2023
b727fcb
remove cmake files
alexbarghi-nv Dec 12, 2023
d37f0d7
train/test
alexbarghi-nv Dec 12, 2023
d0ca16b
reformat
alexbarghi-nv Dec 12, 2023
06dc14d
add scripts
alexbarghi-nv Dec 13, 2023
a5f1b67
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Dec 13, 2023
ad83725
reorganize, add scripts
alexbarghi-nv Dec 13, 2023
e3d28a6
init
alexbarghi-nv Dec 13, 2023
d15a4d4
update
alexbarghi-nv Dec 14, 2023
70a509a
Merge branch 'pyg-nightly-input-nodes-fix' of https://github.com/alex…
alexbarghi-nv Dec 14, 2023
ecc2db1
cugraph
alexbarghi-nv Dec 26, 2023
726c81d
loader debug
alexbarghi-nv Dec 26, 2023
c095769
fix small bugs in cugraph-pyg
alexbarghi-nv Dec 26, 2023
4be1875
c
alexbarghi-nv Dec 26, 2023
59f030d
fix fanout issues
alexbarghi-nv Dec 26, 2023
4bc7f90
remove experimental warnings
alexbarghi-nv Dec 27, 2023
a58d358
remove test files
alexbarghi-nv Dec 27, 2023
318212d
data preprocessing
alexbarghi-nv Dec 27, 2023
68ca511
commit
alexbarghi-nv Dec 27, 2023
dbbd791
Merge branch 'dlfw-patch-24.01' of https://github.com/alexbarghi-nv/c…
alexbarghi-nv Dec 27, 2023
d47c3ba
comment
alexbarghi-nv Dec 27, 2023
367c79c
fixing issues impacting accuracy
alexbarghi-nv Dec 29, 2023
ac1cfbd
add readme
alexbarghi-nv Dec 29, 2023
cc2635b
refactor
alexbarghi-nv Dec 29, 2023
f1ce3e1
Fix mixed experimental import
alexbarghi-nv Dec 29, 2023
e38fe66
update readme
alexbarghi-nv Dec 29, 2023
f3f68bd
update readme
alexbarghi-nv Dec 29, 2023
d2734c4
fix environment variables
alexbarghi-nv Dec 29, 2023
7222cba
remove unwanted file
alexbarghi-nv Dec 29, 2023
c2e8520
minor change to avoid timeout
alexbarghi-nv Dec 29, 2023
a4dad32
remove stats file
alexbarghi-nv Jan 3, 2024
2109bfb
Merge branch 'perf-testing-v2' of https://github.com/alexbarghi-nv/cu…
alexbarghi-nv Jan 3, 2024
6358f9b
switch versions of simple distributed graph for 24.02
alexbarghi-nv Jan 3, 2024
3898cb2
remove test python file
alexbarghi-nv Jan 3, 2024
3f266f5
remove mg utils dir
alexbarghi-nv Jan 3, 2024
864e55e
wait for workers
alexbarghi-nv Jan 3, 2024
67d6aa0
reformat
alexbarghi-nv Jan 3, 2024
78fc260
add copyrights
alexbarghi-nv Jan 3, 2024
d81a9a8
fix wrong file
alexbarghi-nv Jan 3, 2024
16f225a
remove stats file
alexbarghi-nv Jan 3, 2024
259ec47
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 5, 2024
18571fe
fix copyright
alexbarghi-nv Jan 5, 2024
40502de
split off feature transfer time
alexbarghi-nv Jan 5, 2024
ea46748
style
alexbarghi-nv Jan 5, 2024
61f30a2
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 5, 2024
89ac530
fixes to scripts
alexbarghi-nv Jan 8, 2024
77b0788
compatibility issues
alexbarghi-nv Jan 8, 2024
4e2a706
reset file
alexbarghi-nv Jan 8, 2024
18e43de
c
alexbarghi-nv Jan 8, 2024
c4c45db
copyright
alexbarghi-nv Jan 8, 2024
8ea5c92
whitespace
alexbarghi-nv Jan 8, 2024
441810c
set nthreads to 8
alexbarghi-nv Jan 9, 2024
c053ed0
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 9, 2024
3039843
Merge branch 'branch-24.02' into perf-testing-v2
alexbarghi-nv Jan 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
218 changes: 158 additions & 60 deletions python/cugraph-pyg/cugraph_pyg/data/cugraph_store.py
Original file line number Diff line number Diff line change
@@ -25,6 +25,7 @@
import pandas
import cudf
import cugraph
import warnings

from cugraph.utilities.utils import import_optional, MissingModule

@@ -211,7 +212,9 @@ def __init__(
F: cugraph.gnn.FeatureStore,
G: Union[Dict[str, Tuple[TensorType]], Dict[str, int]],
num_nodes_dict: Dict[str, int],
*,
multi_gpu: bool = False,
order: str = "CSC",
):
"""
Constructs a new CuGraphStore from the provided
@@ -256,11 +259,20 @@ def __init__(
multi_gpu: bool (Optional, default = False)
Whether the store should be backed by a multi-GPU graph.
Requires dask to have been set up.
order: str (Optional ["CSR", "CSC"], default = CSC)
The order to use for sampling. Should nearly always be CSC
unless there is a specific expectation of "reverse" sampling.
It is also not uncommon to use CSR order for correctness
testing, which some cuGraph-PyG tests do.
"""

if None in G:
raise ValueError("Unspecified edge types not allowed in PyG")

if order != "CSR" and order != "CSC":
raise ValueError("invalid valid for order")

self.__vertex_dtype = torch.int64

self._tensor_attr_cls = CuGraphTensorAttr
@@ -289,6 +301,7 @@ def __init__(
self.__features = F
self.__graph = None
self.__is_graph_owner = False
self.__order = order

if construct_graph:
if multi_gpu:
@@ -297,7 +310,9 @@ def __init__(
)

if self.__graph is None:
self.__graph = self.__construct_graph(G, multi_gpu=multi_gpu)
self.__graph = self.__construct_graph(
G, multi_gpu=multi_gpu, order=order
)
self.__is_graph_owner = True

self.__subgraphs = {}
@@ -347,6 +362,7 @@ def __construct_graph(
self,
edge_info: Dict[Tuple[str, str, str], List[TensorType]],
multi_gpu: bool = False,
order: str = "CSC",
) -> cugraph.MultiGraph:
"""
This function takes edge information and uses it to construct
@@ -363,6 +379,14 @@ def __construct_graph(
multi_gpu: bool (Optional, default=False)
Whether to construct a single-GPU or multi-GPU cugraph Graph.
Defaults to a single-GPU graph.
order: str (CSC or CSR)
Essentially whether to reverse edges so that the cuGraph
sampling algorithm operates on the CSC matrix instead of
the CSR matrix. Should nearly always be CSC unless there
is a specific expectation of reverse sampling, or correctness
testing is being performed.
Returns
-------
A newly-constructed directed cugraph.MultiGraph object.
@@ -371,6 +395,9 @@ def __construct_graph(
# Ensure the original dict is not modified.
edge_info_cg = {}

if order != "CSR" and order != "CSC":
raise ValueError("Order must be either CSC (default) or CSR!")

# Iterate over the keys in sorted order so that the created
# numerical types correspond to the lexicographic order
# of the keys, which is critical to converting the numeric
@@ -430,20 +457,43 @@ def __construct_graph(

df = pandas.DataFrame(
{
"src": pandas.Series(na_src),
"dst": pandas.Series(na_dst),
"src": pandas.Series(na_dst)
if order == "CSC"
else pandas.Series(na_src),
"dst": pandas.Series(na_src)
if order == "CSC"
else pandas.Series(na_dst),
"etp": pandas.Series(na_etp),
}
)
vertex_dtype = df.src.dtype

if multi_gpu:
nworkers = len(distributed.get_client().scheduler_info()["workers"])
df = dd.from_pandas(df, npartitions=nworkers).persist()
df = df.map_partitions(cudf.DataFrame.from_pandas)
else:
df = cudf.from_pandas(df)
df = dd.from_pandas(df, npartitions=nworkers if len(df) > 32 else 1)

# Ensure the dataframe is constructed on each partition
# instead of adding additional synchronization head from potential
# host to device copies.
def get_empty_df():
return cudf.DataFrame(
{
"src": cudf.Series([], dtype=vertex_dtype),
"dst": cudf.Series([], dtype=vertex_dtype),
"etp": cudf.Series([], dtype="int32"),
}
)

df = df.reset_index(drop=True)
# Have to check for empty partitions and handle them appropriately
df = df.persist()
df = df.map_partitions(
lambda f: cudf.DataFrame.from_pandas(f)
if len(f) > 0
else get_empty_df(),
meta=get_empty_df(),
).reset_index(drop=True)
else:
df = cudf.from_pandas(df).reset_index(drop=True)

graph = cugraph.MultiGraph(directed=True)
if multi_gpu:
@@ -468,6 +518,10 @@ def __construct_graph(
def _edge_types_to_attrs(self) -> dict:
return dict(self.__edge_types_to_attrs)

@property
def order(self) -> str:
return self.__order

@property
def node_types(self) -> List[NodeType]:
return list(self.__vertex_type_offsets["type"])
@@ -557,6 +611,7 @@ def _get_edge_index(self, attr: CuGraphEdgeAttr) -> Tuple[TensorType, TensorType
raise ValueError("Graph is not in memory, cannot access edge index!")

if attr.layout != EdgeLayout.COO:
# TODO support returning CSR/CSC (Issue #3802)
raise TypeError("Only COO direct access is supported!")

# Currently, graph creation enforces that input vertex ids are always of
@@ -566,12 +621,14 @@ def _get_edge_index(self, attr: CuGraphEdgeAttr) -> Tuple[TensorType, TensorType
# This may change in the future if/when renumbering or the graph
# creation process is refactored.
# See Issue #3201 for more details.
# Also note src/dst are flipped so that cuGraph sampling is done in
# CSC format rather than CSR format.
if self._is_delayed:
src_col_name = self.__graph.renumber_map.renumbered_src_col_name
dst_col_name = self.__graph.renumber_map.renumbered_dst_col_name
dst_col_name = self.__graph.renumber_map.renumbered_src_col_name
src_col_name = self.__graph.renumber_map.renumbered_dst_col_name
else:
src_col_name = self.__graph.srcCol
dst_col_name = self.__graph.dstCol
dst_col_name = self.__graph.srcCol
src_col_name = self.__graph.dstCol

# If there is only one edge type (homogeneous graph) then
# bypass the edge filters for a significant speed improvement.
@@ -785,29 +842,73 @@ def _get_renumbered_edge_groups_from_sample(
"""
row_dict = {}
col_dict = {}
if len(self.__edge_types_to_attrs) == 1:
# If there is only 1 edge type (includes heterogeneous graphs)
if len(self.edge_types) == 1:
t_pyg_type = list(self.__edge_types_to_attrs.values())[0].edge_type
src_type, _, dst_type = t_pyg_type

dst_id_table = noi_index[dst_type]
dst_id_map = (
cudf.Series(cupy.asarray(dst_id_table), name="dst")
.reset_index()
.rename(columns={"index": "new_id"})
.set_index("dst")
)
dst = dst_id_map["new_id"].loc[sampling_results.destinations]
col_dict[t_pyg_type] = torch.as_tensor(dst.values, device="cuda")

src_id_table = noi_index[src_type]
src_id_map = (
cudf.Series(cupy.asarray(src_id_table), name="src")
.reset_index()
.rename(columns={"index": "new_id"})
.set_index("src")
)
src = src_id_map["new_id"].loc[sampling_results.sources]
row_dict[t_pyg_type] = torch.as_tensor(src.values, device="cuda")
# If there is only 1 node type (homogeneous)
# This should only occur if the cuGraph loader was
# not used. This logic is deprecated.
if len(self.node_types) == 1:
warnings.warn(
"Renumbering after sampling for homogeneous graphs is deprecated.",
FutureWarning,
)

# Create a dataframe mapping old ids to new ids.
vtype = src_type
id_table = noi_index[vtype]
id_map = cudf.Series(
cupy.arange(id_table.shape[0], dtype="int32"),
name="new_id",
index=cupy.asarray(id_table),
).sort_index()

# Renumber the sources using binary search
# Step 1: get the index of the new id
ix_r = torch.searchsorted(
torch.as_tensor(id_map.index.values, device="cuda"),
torch.as_tensor(sampling_results.sources.values, device="cuda"),
)
# Step 2: Go from id indices to actual ids
row_dict[t_pyg_type] = torch.as_tensor(id_map.values, device="cuda")[
ix_r
]

# Renumber the destinations using binary search
# Step 1: get the index of the new id
ix_c = torch.searchsorted(
torch.as_tensor(id_map.index.values, device="cuda"),
torch.as_tensor(
sampling_results.destinations.values, device="cuda"
),
)
# Step 2: Go from id indices to actual ids
col_dict[t_pyg_type] = torch.as_tensor(id_map.values, device="cuda")[
ix_c
]
else:
# Handle the heterogeneous case where there is only 1 edge type
dst_id_table = noi_index[dst_type]
dst_id_map = cudf.DataFrame(
{
"dst": cupy.asarray(dst_id_table),
"new_id": cupy.arange(dst_id_table.shape[0]),
}
).set_index("dst")
dst = dst_id_map["new_id"].loc[sampling_results.destinations]
col_dict[t_pyg_type] = torch.as_tensor(dst.values, device="cuda")

src_id_table = noi_index[src_type]
src_id_map = cudf.DataFrame(
{
"src": cupy.asarray(src_id_table),
"new_id": cupy.arange(src_id_table.shape[0]),
}
).set_index("src")
src = src_id_map["new_id"].loc[sampling_results.sources]
row_dict[t_pyg_type] = torch.as_tensor(src.values, device="cuda")

else:
# This will retrieve the single string representation.
@@ -822,36 +923,18 @@ def _get_renumbered_edge_groups_from_sample(

for pyg_can_edge_type_str, ix in eoi_types.items():
pyg_can_edge_type = tuple(pyg_can_edge_type_str.split("__"))
src_type, _, dst_type = pyg_can_edge_type

# Get the de-offsetted sources
sources = torch.as_tensor(
sampling_results.sources.iloc[ix].values, device="cuda"
)
sources_ix = torch.searchsorted(
self.__vertex_type_offsets["stop"], sources
)
sources -= self.__vertex_type_offsets["start"][sources_ix]

# Create the row entry for this type
src_id_table = noi_index[src_type]
src_id_map = (
cudf.Series(cupy.asarray(src_id_table), name="src")
.reset_index()
.rename(columns={"index": "new_id"})
.set_index("src")
)
src = src_id_map["new_id"].loc[cupy.asarray(sources)]
row_dict[pyg_can_edge_type] = torch.as_tensor(src.values, device="cuda")
if self.__order == "CSR":
src_type, _, dst_type = pyg_can_edge_type
else: # CSC
dst_type, _, src_type = pyg_can_edge_type

# Get the de-offsetted destinations
dst_num_type = self._numeric_vertex_type_from_name(dst_type)
destinations = torch.as_tensor(
sampling_results.destinations.iloc[ix].values, device="cuda"
)
destinations_ix = torch.searchsorted(
self.__vertex_type_offsets["stop"], destinations
)
destinations -= self.__vertex_type_offsets["start"][destinations_ix]
destinations -= self.__vertex_type_offsets["start"][dst_num_type]

# Create the col entry for this type
dst_id_table = noi_index[dst_type]
@@ -864,6 +947,24 @@ def _get_renumbered_edge_groups_from_sample(
dst = dst_id_map["new_id"].loc[cupy.asarray(destinations)]
col_dict[pyg_can_edge_type] = torch.as_tensor(dst.values, device="cuda")

# Get the de-offsetted sources
src_num_type = self._numeric_vertex_type_from_name(src_type)
sources = torch.as_tensor(
sampling_results.sources.iloc[ix].values, device="cuda"
)
sources -= self.__vertex_type_offsets["start"][src_num_type]

# Create the row entry for this type
src_id_table = noi_index[src_type]
src_id_map = (
cudf.Series(cupy.asarray(src_id_table), name="src")
.reset_index()
.rename(columns={"index": "new_id"})
.set_index("src")
)
src = src_id_map["new_id"].loc[cupy.asarray(sources)]
row_dict[pyg_can_edge_type] = torch.as_tensor(src.values, device="cuda")

return row_dict, col_dict

def put_tensor(self, tensor, attr) -> None:
@@ -959,9 +1060,7 @@ def _get_tensor(self, attr: CuGraphTensorAttr) -> TensorType:
t = t[-1]

if isinstance(t, np.ndarray):
t = torch.as_tensor(t, device="cuda")
else:
t = t.cuda()
t = torch.as_tensor(t, device="cpu")

return t

@@ -979,7 +1078,6 @@ def _get_tensor(self, attr: CuGraphTensorAttr) -> TensorType:

t = torch.concatenate([t, u])

t = t.cuda()
return t

def _multi_get_tensor(self, attrs: List[CuGraphTensorAttr]) -> List[TensorType]:
107 changes: 76 additions & 31 deletions python/cugraph-pyg/cugraph_pyg/loader/cugraph_node_loader.py
Original file line number Diff line number Diff line change
@@ -23,12 +23,15 @@
from cugraph.utilities.utils import import_optional, MissingModule

from cugraph_pyg.data import CuGraphStore
from cugraph_pyg.loader.filter import _filter_cugraph_store
from cugraph_pyg.sampler.cugraph_sampler import _sampler_output_from_sampling_results
from cugraph_pyg.sampler.cugraph_sampler import (
_sampler_output_from_sampling_results_heterogeneous,
_sampler_output_from_sampling_results_homogeneous,
)

from typing import Union, Tuple, Sequence, List, Dict

torch_geometric = import_optional("torch_geometric")
torch = import_optional("torch")
InputNodes = (
Sequence
if isinstance(torch_geometric, MissingModule)
@@ -253,55 +256,97 @@ def __next__(self):

raw_sample_data = cudf.read_parquet(parquet_path)
if "map" in raw_sample_data.columns:
self.__renumber_map = raw_sample_data["map"]
num_batches = end_inclusive - self.__start_inclusive + 1

map_end = raw_sample_data["map"].iloc[num_batches]

map = torch.as_tensor(
raw_sample_data["map"].iloc[0:map_end], device="cuda"
)
raw_sample_data.drop("map", axis=1, inplace=True)

self.__renumber_map_offsets = map[0 : num_batches + 1] - map[0]
self.__renumber_map = map[num_batches + 1 :]

else:
self.__renumber_map = None

self.__data = raw_sample_data[list(columns.keys())].astype(columns)
self.__data.dropna(inplace=True)

if (
len(self.__graph_store.edge_types) == 1
and len(self.__graph_store.node_types) == 1
):
group_cols = ["batch_id", "hop_id"]
self.__data_index = self.__data.groupby(group_cols, as_index=True).agg(
{"sources": "max", "destinations": "max"}
)
self.__data_index.rename(
columns={"sources": "src_max", "destinations": "dst_max"},
inplace=True,
)
self.__data_index = self.__data_index.to_dict(orient="index")

# Pull the next set of sampling results out of the dataframe in memory
f = self.__data["batch_id"] == self.__next_batch
if self.__renumber_map is not None:
i = self.__next_batch - self.__start_inclusive
ix = self.__renumber_map.iloc[[i, i + 1]]
ix_start, ix_end = ix.iloc[0], ix.iloc[1]
current_renumber_map = self.__renumber_map.iloc[ix_start:ix_end]
if len(current_renumber_map) != ix_end - ix_start:
raise ValueError("invalid renumber map")
else:
current_renumber_map = None

sampler_output = _sampler_output_from_sampling_results(
self.__data[f], current_renumber_map, self.__graph_store
)
# this should avoid d2h copy
current_renumber_map = self.__renumber_map[
self.__renumber_map_offsets[i] : self.__renumber_map_offsets[i + 1]
]

# Get ready for next iteration
self.__next_batch += 1
else:
current_renumber_map = None

# Get and return the sampled subgraph
if isinstance(torch_geometric, MissingModule):
noi_index, row_dict, col_dict, edge_dict = sampler_output["out"]
return _filter_cugraph_store(
self.__feature_store,
if (
len(self.__graph_store.edge_types) == 1
and len(self.__graph_store.node_types) == 1
):
sampler_output = _sampler_output_from_sampling_results_homogeneous(
self.__data[f],
current_renumber_map,
self.__graph_store,
noi_index,
row_dict,
col_dict,
edge_dict,
self.__data_index,
self.__next_batch,
)
else:
out = torch_geometric.loader.utils.filter_custom_store(
self.__feature_store,
self.__graph_store,
sampler_output.node,
sampler_output.row,
sampler_output.col,
sampler_output.edge,
sampler_output = _sampler_output_from_sampling_results_heterogeneous(
self.__data[f], current_renumber_map, self.__graph_store
)

return out
# Get ready for next iteration
self.__next_batch += 1

# Create a PyG HeteroData object, loading the required features
out = torch_geometric.loader.utils.filter_custom_store(
self.__feature_store,
self.__graph_store,
sampler_output.node,
sampler_output.row,
sampler_output.col,
sampler_output.edge,
)

# Account for CSR format in cuGraph vs. CSC format in PyG
if self.__graph_store.order == "CSC":
for node_type in out.edge_index_dict:
out[node_type].edge_index[0], out[node_type].edge_index[1] = (
out[node_type].edge_index[1],
out[node_type].edge_index[0],
)

out.set_value_dict("num_sampled_nodes", sampler_output.num_sampled_nodes)
out.set_value_dict("num_sampled_edges", sampler_output.num_sampled_edges)

return out

@property
def _starting_batch_id(self):
return self.__starting_batch_id

def __iter__(self):
return self
57 changes: 0 additions & 57 deletions python/cugraph-pyg/cugraph_pyg/loader/filter.py

This file was deleted.

281 changes: 174 additions & 107 deletions python/cugraph-pyg/cugraph_pyg/sampler/cugraph_sampler.py
Original file line number Diff line number Diff line change
@@ -12,26 +12,21 @@
# limitations under the License.


from typing import Sequence
from typing import Sequence, Dict, Tuple

from cugraph_pyg.data import CuGraphStore

from cugraph.utilities.utils import import_optional, MissingModule
from cugraph.utilities.utils import import_optional
import cudf

dask_cudf = import_optional("dask_cudf")
torch_geometric = import_optional("torch_geometric")

torch = import_optional("torch")
HeteroSamplerOutput = torch_geometric.sampler.base.HeteroSamplerOutput

HeteroSamplerOutput = (
None
if isinstance(torch_geometric, MissingModule)
else torch_geometric.sampler.base.HeteroSamplerOutput
)


def _count_unique_nodes(
def _get_unique_nodes(
sampling_results: cudf.DataFrame,
graph_store: CuGraphStore,
node_type: str,
@@ -54,8 +49,8 @@ def _count_unique_nodes(
Returns
-------
int
The number of unique nodes of the given node type.
cudf.Series
The unique nodes of the given node type.
"""
if node_position == "src":
edge_index = "sources"
@@ -78,12 +73,111 @@ def _count_unique_nodes(

sampling_results_node = sampling_results[f]
else:
return 0
return cudf.Series([], dtype="int64")

return sampling_results_node[edge_index].nunique()
return sampling_results_node[edge_index]


def _sampler_output_from_sampling_results(
def _sampler_output_from_sampling_results_homogeneous(
sampling_results: cudf.DataFrame,
renumber_map: torch.Tensor,
graph_store: CuGraphStore,
data_index: Dict[Tuple[int, int], Dict[str, int]],
batch_id: int,
metadata: Sequence = None,
) -> HeteroSamplerOutput:
"""
Parameters
----------
sampling_results: cudf.DataFrame
The dataframe containing sampling results.
renumber_map: torch.Tensor
The tensor containing the renumber map, or None if there
is no renumber map.
graph_store: CuGraphStore
The graph store containing the structure of the sampled graph.
data_index: Dict[Tuple[int, int], Dict[str, int]]
Dictionary where keys are the batch id and hop id,
and values are dictionaries containing the max src
and max dst node ids for the batch and hop.
batch_id: int
The current batch id, whose samples are being retrieved
from the sampling results and data index.
metadata: Tensor
The metadata for the sampled batch.
Returns
-------
HeteroSamplerOutput
"""

if len(graph_store.edge_types) > 1 or len(graph_store.node_types) > 1:
raise ValueError("Graph is heterogeneous")

hops = torch.arange(
sampling_results.hop_id.iloc[len(sampling_results) - 1] + 1, device="cuda"
)
hops = torch.searchsorted(
torch.as_tensor(sampling_results.hop_id, device="cuda"), hops
)

node_type = graph_store.node_types[0]
edge_type = graph_store.edge_types[0]

num_nodes_per_hop_dict = {node_type: torch.zeros(len(hops) + 1, dtype=torch.int64)}
num_edges_per_hop_dict = {edge_type: torch.zeros(len(hops), dtype=torch.int64)}

if renumber_map is None:
raise ValueError("Renumbered input is expected for homogeneous graphs")

noi_index = {node_type: torch.as_tensor(renumber_map, device="cuda")}

row_dict = {
edge_type: torch.as_tensor(sampling_results.sources, device="cuda"),
}

col_dict = {
edge_type: torch.as_tensor(sampling_results.destinations, device="cuda"),
}

num_nodes_per_hop_dict[node_type][0] = data_index[batch_id, 0]["src_max"] + 1
for hop in range(len(hops)):
hop_ix_start = hops[hop]
hop_ix_end = hops[hop + 1] if hop < len(hops) - 1 else len(sampling_results)

if num_nodes_per_hop_dict[node_type][hop] > 0:
max_id_hop = data_index[batch_id, hop]["dst_max"]
max_id_prev_hop = (
data_index[batch_id, hop - 1]["dst_max"]
if hop > 0
else data_index[batch_id, 0]["src_max"]
)

if max_id_hop > max_id_prev_hop:
num_nodes_per_hop_dict[node_type][hop + 1] = (
max_id_hop - max_id_prev_hop
)
else:
num_nodes_per_hop_dict[node_type][hop + 1] = 0
# will default to 0 if the previous hop was 0, since this is a PyG requirement

num_edges_per_hop_dict[edge_type][hop] = hop_ix_end - hop_ix_start

if HeteroSamplerOutput is None:
raise ImportError("Error importing from pyg")

return HeteroSamplerOutput(
node=noi_index,
row=row_dict,
col=col_dict,
edge=None,
num_sampled_nodes=num_nodes_per_hop_dict,
num_sampled_edges=num_edges_per_hop_dict,
metadata=metadata,
)


def _sampler_output_from_sampling_results_heterogeneous(
sampling_results: cudf.DataFrame,
renumber_map: cudf.Series,
graph_store: CuGraphStore,
@@ -109,7 +203,7 @@ def _sampler_output_from_sampling_results(

hops = torch.arange(sampling_results.hop_id.max() + 1, device="cuda")
hops = torch.searchsorted(
torch.as_tensor(sampling_results.hop_id.values, device="cuda"), hops
torch.as_tensor(sampling_results.hop_id, device="cuda"), hops
)

num_nodes_per_hop_dict = {}
@@ -119,13 +213,11 @@ def _sampler_output_from_sampling_results(
sampling_results_hop_0 = sampling_results.iloc[
0 : (hops[1] if len(hops) > 1 else len(sampling_results))
]

for node_type in graph_store.node_types:
if len(graph_store.node_types) == 1:
num_unique_nodes = sampling_results_hop_0.sources.nunique()
else:
num_unique_nodes = _count_unique_nodes(
sampling_results_hop_0, graph_store, node_type, "src"
)
num_unique_nodes = _get_unique_nodes(
sampling_results_hop_0, graph_store, node_type, "src"
).nunique()

if num_unique_nodes > 0:
num_nodes_per_hop_dict[node_type] = torch.zeros(
@@ -134,112 +226,87 @@ def _sampler_output_from_sampling_results(
num_nodes_per_hop_dict[node_type][0] = num_unique_nodes

if renumber_map is not None:
if len(graph_store.node_types) > 1 or len(graph_store.edge_types) > 1:
raise ValueError(
"Precomputing the renumber map is currently "
"unsupported for heterogeneous graphs."
)
raise ValueError(
"Precomputing the renumber map is currently "
"unsupported for heterogeneous graphs."
)

node_type = graph_store.node_types[0]
if not isinstance(node_type, str):
raise ValueError("Node types must be strings")
noi_index = {node_type: torch.as_tensor(renumber_map.values, device="cuda")}

edge_type = graph_store.edge_types[0]
if (
not isinstance(edge_type, tuple)
or not isinstance(edge_type[0], str)
or len(edge_type) != 3
):
raise ValueError("Edge types must be 3-tuples of strings")
if edge_type[0] != node_type or edge_type[2] != node_type:
raise ValueError("Edge src/dst type must match for homogeneous graphs")
row_dict = {
edge_type: torch.as_tensor(sampling_results.sources.values, device="cuda"),
}
col_dict = {
edge_type: torch.as_tensor(
sampling_results.destinations.values, device="cuda"
# Calculate nodes of interest based on unique nodes in order of appearance
# Use hop 0 sources since those are the only ones not included in destinations
# Use torch.concat based on benchmark performance (vs. cudf.concat)

if sampling_results_hop_0 is None:
sampling_results_hop_0 = sampling_results.iloc[
0 : (hops[1] if len(hops) > 1 else len(sampling_results))
]

nodes_of_interest = (
cudf.Series(
torch.concat(
[
torch.as_tensor(sampling_results_hop_0.sources, device="cuda"),
torch.as_tensor(sampling_results.destinations, device="cuda"),
]
),
}
else:
# Calculate nodes of interest based on unique nodes in order of appearance
# Use hop 0 sources since those are the only ones not included in destinations
# Use torch.concat based on benchmark performance (vs. cudf.concat)
nodes_of_interest = (
cudf.Series(
torch.concat(
[
torch.as_tensor(
sampling_results_hop_0.sources.values, device="cuda"
),
torch.as_tensor(
sampling_results.destinations.values, device="cuda"
),
]
),
name="nodes_of_interest",
)
.drop_duplicates()
.sort_index()
name="nodes_of_interest",
)
del sampling_results_hop_0
.drop_duplicates()
.sort_index()
)

# Get the grouped node index (for creating the renumbered grouped edge index)
noi_index = graph_store._get_vertex_groups_from_sample(
torch.as_tensor(nodes_of_interest.values, device="cuda")
)
del nodes_of_interest
# Get the grouped node index (for creating the renumbered grouped edge index)
noi_index = graph_store._get_vertex_groups_from_sample(
torch.as_tensor(nodes_of_interest, device="cuda")
)
del nodes_of_interest

# Get the new edge index (by type as expected for HeteroData)
# FIXME handle edge ids/types after the C++ updates
row_dict, col_dict = graph_store._get_renumbered_edge_groups_from_sample(
sampling_results, noi_index
)
# Get the new edge index (by type as expected for HeteroData)
# FIXME handle edge ids/types after the C++ updates
row_dict, col_dict = graph_store._get_renumbered_edge_groups_from_sample(
sampling_results, noi_index
)

for hop in range(len(hops)):
hop_ix_start = hops[hop]
hop_ix_end = hops[hop + 1] if hop < len(hops) - 1 else len(sampling_results)
sampling_results_hop = sampling_results.iloc[hop_ix_start:hop_ix_end]
sampling_results_to_hop = sampling_results.iloc[0:hop_ix_end]

for node_type in graph_store.node_types:
if len(graph_store.node_types) == 1:
num_unique_nodes = sampling_results_hop.destinations.nunique()
else:
num_unique_nodes = _count_unique_nodes(
sampling_results_hop, graph_store, node_type, "dst"
)
unique_nodes_hop = _get_unique_nodes(
sampling_results_to_hop, graph_store, node_type, "dst"
)

unique_nodes_0 = _get_unique_nodes(
sampling_results_hop_0, graph_store, node_type, "src"
)

num_unique_nodes = cudf.concat([unique_nodes_0, unique_nodes_hop]).nunique()

if num_unique_nodes > 0:
if node_type not in num_nodes_per_hop_dict:
num_nodes_per_hop_dict[node_type] = torch.zeros(
len(hops) + 1, dtype=torch.int64
)
num_nodes_per_hop_dict[node_type][hop + 1] = num_unique_nodes
num_nodes_per_hop_dict[node_type][hop + 1] = num_unique_nodes - int(
num_nodes_per_hop_dict[node_type][: hop + 1].sum(0)
)

if len(graph_store.edge_types) == 1:
edge_type = graph_store.edge_types[0]
if edge_type not in num_edges_per_hop_dict:
num_edges_per_hop_dict[edge_type] = torch.zeros(
numeric_etypes, counts = torch.unique(
torch.as_tensor(
sampling_results.iloc[hop_ix_start:hop_ix_end].edge_type,
device="cuda",
),
return_counts=True,
)
numeric_etypes = list(numeric_etypes)
counts = list(counts)
for num_etype, count in zip(numeric_etypes, counts):
can_etype = graph_store.numeric_edge_type_to_canonical(num_etype)
if can_etype not in num_edges_per_hop_dict:
num_edges_per_hop_dict[can_etype] = torch.zeros(
len(hops), dtype=torch.int64
)
num_edges_per_hop_dict[graph_store.edge_types[0]][hop] = len(
sampling_results_hop
)
else:
numeric_etypes, counts = torch.unique(
torch.as_tensor(sampling_results_hop.edge_type.values, device="cuda"),
return_counts=True,
)
numeric_etypes = list(numeric_etypes)
counts = list(counts)
for num_etype, count in zip(numeric_etypes, counts):
can_etype = graph_store.numeric_edge_type_to_canonical(num_etype)
if can_etype not in num_edges_per_hop_dict:
num_edges_per_hop_dict[can_etype] = torch.zeros(
len(hops), dtype=torch.int64
)
num_edges_per_hop_dict[can_etype][hop] = count
num_edges_per_hop_dict[can_etype][hop] = count

if HeteroSamplerOutput is None:
raise ImportError("Error importing from pyg")
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_basic(dask_client, karate_gnn):
F, G, N = karate_gnn
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True)
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")
loader = CuGraphNeighborLoader(
(cugraph_store, cugraph_store),
torch.arange(N["type0"] + N["type1"], dtype=torch.int64),
@@ -52,7 +52,7 @@ def test_cugraph_loader_basic(dask_client, karate_gnn):
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_hetero(dask_client, karate_gnn):
F, G, N = karate_gnn
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True)
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")
loader = CuGraphNeighborLoader(
(cugraph_store, cugraph_store),
input_nodes=("type1", torch.tensor([0, 1, 2, 5], device="cuda")),
28 changes: 15 additions & 13 deletions python/cugraph-pyg/cugraph_pyg/tests/mg/test_mg_cugraph_sampler.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,9 @@
import pytest

from cugraph_pyg.data import CuGraphStore
from cugraph_pyg.sampler.cugraph_sampler import _sampler_output_from_sampling_results
from cugraph_pyg.sampler.cugraph_sampler import (
_sampler_output_from_sampling_results_heterogeneous,
)

from cugraph.gnn import FeatureStore

@@ -31,7 +33,7 @@
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_neighbor_sample(dask_client, basic_graph_1):
F, G, N = basic_graph_1
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True)
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")

batches = cudf.DataFrame(
{
@@ -56,7 +58,7 @@ def test_neighbor_sample(dask_client, basic_graph_1):
.sort_values(by=["sources", "destinations"])
)

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
sampling_results=sampling_results,
renumber_map=None,
graph_store=cugraph_store,
@@ -84,7 +86,7 @@ def test_neighbor_sample(dask_client, basic_graph_1):

# check the hop dictionaries
assert len(out.num_sampled_nodes) == 1
assert out.num_sampled_nodes["vt1"].tolist() == [4, 4]
assert out.num_sampled_nodes["vt1"].tolist() == [4, 1]

assert len(out.num_sampled_edges) == 1
assert out.num_sampled_edges[("vt1", "pig", "vt1")].tolist() == [6]
@@ -95,7 +97,7 @@ def test_neighbor_sample(dask_client, basic_graph_1):
@pytest.mark.skip(reason="broken")
def test_neighbor_sample_multi_vertex(dask_client, multi_edge_multi_vertex_graph_1):
F, G, N = multi_edge_multi_vertex_graph_1
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True)
cugraph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")

batches = cudf.DataFrame(
{
@@ -119,7 +121,7 @@ def test_neighbor_sample_multi_vertex(dask_client, multi_edge_multi_vertex_graph
.compute()
)

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
sampling_results=sampling_results,
renumber_map=None,
graph_store=cugraph_store,
@@ -144,8 +146,8 @@ def test_neighbor_sample_multi_vertex(dask_client, multi_edge_multi_vertex_graph

# check the hop dictionaries
assert len(out.num_sampled_nodes) == 2
assert out.num_sampled_nodes["black"].tolist() == [2, 2]
assert out.num_sampled_nodes["brown"].tolist() == [3, 2]
assert out.num_sampled_nodes["black"].tolist() == [2, 0]
assert out.num_sampled_nodes["brown"].tolist() == [3, 0]

assert len(out.num_sampled_edges) == 5
assert out.num_sampled_edges[("brown", "horse", "brown")].tolist() == [2]
@@ -186,7 +188,7 @@ def test_neighbor_sample_mock_sampling_results(dask_client):
torch.tensor([3.2, 2.1], dtype=torch.float32), type_name="A", feat_name="prop1"
)

graph_store = CuGraphStore(F, G, N, multi_gpu=True)
graph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")

# let 0, 1 be the start vertices, fanout = [2, 1, 2, 3]
mock_sampling_results = cudf.DataFrame(
@@ -198,7 +200,7 @@ def test_neighbor_sample_mock_sampling_results(dask_client):
}
)

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
mock_sampling_results, None, graph_store, None
)

@@ -218,9 +220,9 @@ def test_neighbor_sample_mock_sampling_results(dask_client):
assert out.col[("B", "ba", "A")].tolist() == [1, 1]

assert len(out.num_sampled_nodes) == 3
assert out.num_sampled_nodes["A"].tolist() == [2, 0, 1, 0, 1]
assert out.num_sampled_nodes["B"].tolist() == [0, 2, 0, 1, 0]
assert out.num_sampled_nodes["C"].tolist() == [0, 0, 2, 0, 2]
assert out.num_sampled_nodes["A"].tolist() == [2, 0, 0, 0, 0]
assert out.num_sampled_nodes["B"].tolist() == [0, 2, 0, 0, 0]
assert out.num_sampled_nodes["C"].tolist() == [0, 0, 2, 0, 1]

assert len(out.num_sampled_edges) == 3
assert out.num_sampled_edges[("A", "ab", "B")].tolist() == [3, 0, 1, 0]
Original file line number Diff line number Diff line change
@@ -117,8 +117,8 @@ def test_get_edge_index(graph, edge_index_type, dask_client):
G[et][1] = cudf.Series(G[et][1])
elif edge_index_type == "dask-cudf":
for et in list(G.keys()):
G[et][0] = dask_cudf.from_cudf(cudf.Series(G[et][0]), npartitions=2)
G[et][1] = dask_cudf.from_cudf(cudf.Series(G[et][1]), npartitions=2)
G[et][0] = dask_cudf.from_cudf(cudf.Series(G[et][0]), npartitions=1)
G[et][1] = dask_cudf.from_cudf(cudf.Series(G[et][1]), npartitions=1)

cugraph_store = CuGraphStore(F, G, N, multi_gpu=True)

@@ -215,7 +215,7 @@ def test_renumber_vertices_multi_edge_multi_vertex(
def test_renumber_edges(abc_graph, dask_client):
F, G, N = abc_graph

graph_store = CuGraphStore(F, G, N, multi_gpu=True)
graph_store = CuGraphStore(F, G, N, multi_gpu=True, order="CSR")

# let 0, 1 be the start vertices, fanout = [2, 1, 2, 3]
mock_sampling_results = cudf.DataFrame(
158 changes: 104 additions & 54 deletions python/cugraph-pyg/cugraph_pyg/tests/test_cugraph_loader.py
Original file line number Diff line number Diff line change
@@ -26,12 +26,14 @@
from cugraph.utilities.utils import import_optional, MissingModule

torch = import_optional("torch")
torch_geometric = import_optional("torch_geometric")
trim_to_layer = import_optional("torch_geometric.utils.trim_to_layer")


@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_basic(karate_gnn):
F, G, N = karate_gnn
cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")
loader = CuGraphNeighborLoader(
(cugraph_store, cugraph_store),
torch.arange(N["type0"] + N["type1"], dtype=torch.int64),
@@ -57,7 +59,7 @@ def test_cugraph_loader_basic(karate_gnn):
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_hetero(karate_gnn):
F, G, N = karate_gnn
cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")
loader = CuGraphNeighborLoader(
(cugraph_store, cugraph_store),
input_nodes=("type1", torch.tensor([0, 1, 2, 5], device="cuda")),
@@ -82,23 +84,29 @@ def test_cugraph_loader_hetero(karate_gnn):

@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_from_disk():
m = [2, 9, 99, 82, 9, 3, 18, 1, 12]
n = torch.arange(1, 1 + len(m), dtype=torch.int32)
x = torch.zeros(256, dtype=torch.int32)
x[torch.tensor(m, dtype=torch.int32)] = n
F = FeatureStore()
F.add_data(torch.tensor([1, 2, 3, 4, 5, 6, 7]), "t0", "x")
F.add_data(x, "t0", "x")

G = {("t0", "knows", "t0"): 7}
N = {"t0": 7}
G = {("t0", "knows", "t0"): 9080}
N = {"t0": 256}

cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")

bogus_samples = cudf.DataFrame(
{
"sources": [0, 1, 2, 3, 4, 5, 6],
"destinations": [6, 4, 3, 2, 2, 1, 5],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 2, 2], dtype="int32"),
"sources": [0, 1, 2, 3, 4, 5, 6, 6],
"destinations": [5, 4, 3, 2, 2, 6, 5, 2],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35, 40],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 1, 2, 2], dtype="int32"),
}
)
map = cudf.Series(m, name="map")
bogus_samples = bogus_samples.join(map, how="outer").sort_index()

tempdir = tempfile.TemporaryDirectory()
for s in range(256):
@@ -115,32 +123,49 @@ def test_cugraph_loader_from_disk():
for sample in loader:
num_samples += 1
assert sample["t0"]["num_nodes"] == 7
# correct vertex order is [0, 1, 2, 6, 4, 3, 5]; x = [1, 2, 3, 7, 5, 4, 6]
assert sample["t0"]["x"].tolist() == [1, 2, 3, 7, 5, 4, 6]
assert list(sample[("t0", "knows", "t0")]["edge_index"].shape) == [2, 7]
# correct vertex order is [0, 1, 2, 5, 4, 3, 6]; x = [1, 2, 3, 6, 5, 4, 7]
assert sample["t0"]["x"].tolist() == [3, 4, 5, 6, 7, 8, 9]

edge_index = sample[("t0", "knows", "t0")]["edge_index"]
assert list(edge_index.shape) == [2, 8]

assert (
edge_index[0].tolist()
== bogus_samples.sources.dropna().values_host.tolist()
)
assert (
edge_index[1].tolist()
== bogus_samples.destinations.dropna().values_host.tolist()
)

assert num_samples == 256


@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_from_disk_subset():
m = [2, 9, 99, 82, 9, 3, 18, 1, 12]
n = torch.arange(1, 1 + len(m), dtype=torch.int32)
x = torch.zeros(256, dtype=torch.int32)
x[torch.tensor(m, dtype=torch.int32)] = n
F = FeatureStore()
F.add_data(torch.tensor([1, 2, 3, 4, 5, 6, 7]), "t0", "x")
F.add_data(x, "t0", "x")

G = {("t0", "knows", "t0"): 7}
N = {"t0": 7}
G = {("t0", "knows", "t0"): 9080}
N = {"t0": 256}

cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")

bogus_samples = cudf.DataFrame(
{
"sources": [0, 1, 2, 3, 4, 5, 6],
"destinations": [6, 4, 3, 2, 2, 1, 5],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 2, 2], dtype="int32"),
"sources": [0, 1, 2, 3, 4, 5, 6, 6],
"destinations": [5, 4, 3, 2, 2, 6, 5, 2],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35, 40],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 1, 2, 2], dtype="int32"),
}
)
map = cudf.Series(m, name="map")
bogus_samples = bogus_samples.join(map, how="outer").sort_index()

tempdir = tempfile.TemporaryDirectory()
for s in range(256):
@@ -159,33 +184,45 @@ def test_cugraph_loader_from_disk_subset():
num_samples += 1
assert sample["t0"]["num_nodes"] == 7
# correct vertex order is [0, 1, 2, 6, 4, 3, 5]; x = [1, 2, 3, 7, 5, 4, 6]
assert sample["t0"]["x"].tolist() == [1, 2, 3, 7, 5, 4, 6]
assert list(sample[("t0", "knows", "t0")]["edge_index"].shape) == [2, 7]
assert sample["t0"]["x"].tolist() == [3, 4, 5, 6, 7, 8, 9]

edge_index = sample[("t0", "knows", "t0")]["edge_index"]
assert list(edge_index.shape) == [2, 8]

assert (
edge_index[0].tolist()
== bogus_samples.sources.dropna().values_host.tolist()
)
assert (
edge_index[1].tolist()
== bogus_samples.destinations.dropna().values_host.tolist()
)

assert num_samples == 100


@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_cugraph_loader_from_disk_subset_renumbered():
def test_cugraph_loader_e2e_coo():
m = [2, 9, 99, 82, 9, 3, 18, 1, 12]
x = torch.randint(3000, (256, 256)).to(torch.float32)
F = FeatureStore()
F.add_data(torch.tensor([1, 2, 3, 4, 5, 6, 7]), "t0", "x")
F.add_data(x, "t0", "x")

G = {("t0", "knows", "t0"): 7}
N = {"t0": 7}
G = {("t0", "knows", "t0"): 9999}
N = {"t0": 256}

cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")

bogus_samples = cudf.DataFrame(
{
"sources": [0, 1, 2, 3, 4, 5, 6],
"destinations": [6, 4, 3, 2, 2, 1, 5],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 2, 2], dtype="int32"),
"sources": [0, 1, 2, 3, 4, 5, 6, 6],
"destinations": [5, 4, 3, 2, 2, 6, 5, 2],
"edge_type": cudf.Series([0, 0, 0, 0, 0, 0, 0, 0], dtype="int32"),
"edge_id": [5, 10, 15, 20, 25, 30, 35, 40],
"hop_id": cudf.Series([0, 0, 0, 1, 1, 1, 2, 2], dtype="int32"),
}
)

map = cudf.Series([2, 9, 0, 2, 1, 3, 4, 6, 5], name="map")
map = cudf.Series(m, name="map")
bogus_samples = bogus_samples.join(map, how="outer").sort_index()

tempdir = tempfile.TemporaryDirectory()
@@ -200,22 +237,35 @@ def test_cugraph_loader_from_disk_subset_renumbered():
input_files=list(os.listdir(tempdir.name))[100:200],
)

num_samples = 0
for sample in loader:
num_samples += 1
assert sample["t0"]["num_nodes"] == 7
# correct vertex order is [0, 2, 1, 3, 4, 6, 5]; x = [1, 3, 2, 4, 5, 7, 6]
assert sample["t0"]["x"].tolist() == [1, 3, 2, 4, 5, 7, 6]
convs = [
torch_geometric.nn.SAGEConv(256, 64, aggr="mean").cuda(),
torch_geometric.nn.SAGEConv(64, 8, aggr="mean").cuda(),
torch_geometric.nn.SAGEConv(8, 1, aggr="mean").cuda(),
]

edge_index = sample[("t0", "knows", "t0")]["edge_index"]
assert list(edge_index.shape) == [2, 7]
assert (
edge_index[0].tolist()
== bogus_samples.sources.dropna().values_host.tolist()
)
assert (
edge_index[1].tolist()
== bogus_samples.destinations.dropna().values_host.tolist()
)
trim = trim_to_layer.TrimToLayer()
relu = torch.nn.functional.relu
dropout = torch.nn.functional.dropout

assert num_samples == 100
for hetero_data in loader:
ei = hetero_data["t0", "knows", "t0"]["edge_index"]
x = hetero_data["t0"]["x"].cuda()
num_sampled_nodes = hetero_data["t0"]["num_sampled_nodes"]
num_sampled_edges = hetero_data["t0", "knows", "t0"]["num_sampled_edges"]

print(num_sampled_nodes, num_sampled_edges)

for i in range(len(convs)):
x, ei, _ = trim(i, num_sampled_nodes, num_sampled_edges, x, ei, None)

s = x.shape[0]

x = convs[i](x, ei, size=(s, s))
x = relu(x)
x = dropout(x, p=0.5)
print(x.shape)

print(x.shape)
x = x.narrow(dim=0, start=0, length=x.shape[0] - num_sampled_nodes[1])

assert list(x.shape) == [3, 1]
28 changes: 15 additions & 13 deletions python/cugraph-pyg/cugraph_pyg/tests/test_cugraph_sampler.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,9 @@
import pytest

from cugraph_pyg.data import CuGraphStore
from cugraph_pyg.sampler.cugraph_sampler import _sampler_output_from_sampling_results
from cugraph_pyg.sampler.cugraph_sampler import (
_sampler_output_from_sampling_results_heterogeneous,
)

from cugraph.utilities.utils import import_optional, MissingModule
from cugraph import uniform_neighbor_sample
@@ -29,7 +31,7 @@
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_neighbor_sample(basic_graph_1):
F, G, N = basic_graph_1
cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")

batches = cudf.DataFrame(
{
@@ -49,7 +51,7 @@ def test_neighbor_sample(basic_graph_1):
return_offsets=False,
).sort_values(by=["sources", "destinations"])

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
sampling_results=sampling_results,
renumber_map=None,
graph_store=cugraph_store,
@@ -77,7 +79,7 @@ def test_neighbor_sample(basic_graph_1):

# check the hop dictionaries
assert len(out.num_sampled_nodes) == 1
assert out.num_sampled_nodes["vt1"].tolist() == [4, 4]
assert out.num_sampled_nodes["vt1"].tolist() == [4, 1]

assert len(out.num_sampled_edges) == 1
assert out.num_sampled_edges[("vt1", "pig", "vt1")].tolist() == [6]
@@ -87,7 +89,7 @@ def test_neighbor_sample(basic_graph_1):
@pytest.mark.skipif(isinstance(torch, MissingModule), reason="torch not available")
def test_neighbor_sample_multi_vertex(multi_edge_multi_vertex_graph_1):
F, G, N = multi_edge_multi_vertex_graph_1
cugraph_store = CuGraphStore(F, G, N)
cugraph_store = CuGraphStore(F, G, N, order="CSR")

batches = cudf.DataFrame(
{
@@ -107,7 +109,7 @@ def test_neighbor_sample_multi_vertex(multi_edge_multi_vertex_graph_1):
with_batch_ids=True,
).sort_values(by=["sources", "destinations"])

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
sampling_results=sampling_results,
renumber_map=None,
graph_store=cugraph_store,
@@ -132,8 +134,8 @@ def test_neighbor_sample_multi_vertex(multi_edge_multi_vertex_graph_1):

# check the hop dictionaries
assert len(out.num_sampled_nodes) == 2
assert out.num_sampled_nodes["black"].tolist() == [2, 2]
assert out.num_sampled_nodes["brown"].tolist() == [3, 2]
assert out.num_sampled_nodes["black"].tolist() == [2, 0]
assert out.num_sampled_nodes["brown"].tolist() == [3, 0]

assert len(out.num_sampled_edges) == 5
assert out.num_sampled_edges[("brown", "horse", "brown")].tolist() == [2]
@@ -147,7 +149,7 @@ def test_neighbor_sample_multi_vertex(multi_edge_multi_vertex_graph_1):
def test_neighbor_sample_mock_sampling_results(abc_graph):
F, G, N = abc_graph

graph_store = CuGraphStore(F, G, N)
graph_store = CuGraphStore(F, G, N, order="CSR")

# let 0, 1 be the start vertices, fanout = [2, 1, 2, 3]
mock_sampling_results = cudf.DataFrame(
@@ -159,7 +161,7 @@ def test_neighbor_sample_mock_sampling_results(abc_graph):
}
)

out = _sampler_output_from_sampling_results(
out = _sampler_output_from_sampling_results_heterogeneous(
mock_sampling_results, None, graph_store, None
)

@@ -179,9 +181,9 @@ def test_neighbor_sample_mock_sampling_results(abc_graph):
assert out.col[("B", "ba", "A")].tolist() == [1, 1]

assert len(out.num_sampled_nodes) == 3
assert out.num_sampled_nodes["A"].tolist() == [2, 0, 1, 0, 1]
assert out.num_sampled_nodes["B"].tolist() == [0, 2, 0, 1, 0]
assert out.num_sampled_nodes["C"].tolist() == [0, 0, 2, 0, 2]
assert out.num_sampled_nodes["A"].tolist() == [2, 0, 0, 0, 0]
assert out.num_sampled_nodes["B"].tolist() == [0, 2, 0, 0, 0]
assert out.num_sampled_nodes["C"].tolist() == [0, 0, 2, 0, 1]

assert len(out.num_sampled_edges) == 3
assert out.num_sampled_edges[("A", "ab", "B")].tolist() == [3, 0, 1, 0]
Original file line number Diff line number Diff line change
@@ -199,7 +199,7 @@ def test_renumber_vertices_multi_edge_multi_vertex(multi_edge_multi_vertex_graph
def test_renumber_edges(abc_graph):
F, G, N = abc_graph

graph_store = CuGraphStore(F, G, N)
graph_store = CuGraphStore(F, G, N, order="CSR")

# let 0, 1 be the start vertices, fanout = [2, 1, 2, 3]
mock_sampling_results = cudf.DataFrame(