Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates pytest benchmarks to use synthetic data and multi-GPUs #3540

Merged
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
aec3565
Added new CLI options for RMAT datasets, added __str__() for Dataset …
rlratzel May 4, 2023
829740b
Added ability to create a Dataset instance using a .csv file on disk.…
rlratzel May 4, 2023
7b3d65e
Added instances of Dataset objects not yet in metadata folder using .…
rlratzel May 4, 2023
3422db1
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 4, 2023
af6d0be
Added Dataset.unload() to help reduce memory usage when multiple Data…
rlratzel May 5, 2023
4ec4b95
WIP: updated fixtures
rlratzel May 5, 2023
c19ca0d
Removed fixture and isntead use imported module directly since its al…
rlratzel May 5, 2023
6839b72
Added MG support to RmatDataset.
rlratzel May 6, 2023
4dd8168
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 6, 2023
17ad493
Updated markers for real (.csv file) dataset params, minor code cleanup.
rlratzel May 6, 2023
347450c
WIP: refactored fixtures to keep MG runs together to more efficiently…
rlratzel May 7, 2023
98bb280
Removed usued config feature from Dataset, update test_datasets to us…
rlratzel May 8, 2023
dc7da78
Changed pytest utility to generate test IDs with characters that are …
rlratzel May 8, 2023
cb6f7e6
Added initial uniform_neighbor_sample benchmark.
rlratzel May 8, 2023
de6c4e0
Added egonet benchmark
rlratzel May 9, 2023
d1d1aa5
Changed marker names.
rlratzel May 11, 2023
c194c3b
Removed unused calls to compute adj list, fixed marker name, added tr…
rlratzel May 12, 2023
4826db5
Added comments, remove unused ETL marker.
rlratzel May 16, 2023
138a8d9
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 18, 2023
7d0e38c
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 18, 2023
b0bccf2
Centralizes the dask client start/stop utils into a single set of fun…
rlratzel May 18, 2023
2c3ffa9
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 18, 2023
f4433f4
Adds specific datasets to download from CI scripts (new datasets for …
rlratzel May 18, 2023
cb0a502
Adds FIXMEs and a docstring.
rlratzel May 19, 2023
c35868a
Merge branch 'branch-23.06' into branch-23.06-mg_pytest_benchmarks
alexbarghi-nv May 19, 2023
e015581
Merge remote-tracking branch 'upstream/branch-23.06' into branch-23.0…
rlratzel May 19, 2023
ce2c109
Merge branch 'branch-23.06-mg_pytest_benchmarks' of https://github.co…
rlratzel May 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
def create_graph(graph_data):
"""
Create a graph instance based on the data to be loaded/generated.
"""
"""
print("Initalize Pool on client")
rmm.reinitialize(pool_allocator=True)
# Assume strings are names of datasets in the datasets package
Expand Down Expand Up @@ -77,7 +77,7 @@ def create_graph(graph_data):
num_nodes_dict = {'_N':num_nodes}

gs = CuGraphStorage(num_nodes_dict=num_nodes_dict, single_gpu=True)
gs.add_edge_data(edgelist_df,
gs.add_edge_data(edgelist_df,
# reverse to make same graph as cugraph
node_col_names=['dst', 'src'],
canonical_etype=['_N', 'connects', '_N'])
Expand All @@ -90,11 +90,9 @@ def create_mg_graph(graph_data):
"""
Create a graph instance based on the data to be loaded/generated.
"""
## Reserving GPU 0 for client(trainer/service project)
n_devices = os.getenv('DASK_NUM_WORKERS', 4)
n_devices = int(n_devices)
# range starts at 1 to let let 0 be used by benchmark/client process
visible_devices = os.getenv("DASK_WORKER_DEVICES", "1,2,3,4")

visible_devices = ','.join([str(i) for i in range(1, n_devices+1)])
cluster = LocalCUDACluster(protocol='ucx', rmm_pool_size='25GB', CUDA_VISIBLE_DEVICES=visible_devices)
client = Client(cluster)
Comms.initialize(p2p=True)
Expand Down Expand Up @@ -137,7 +135,7 @@ def create_mg_graph(graph_data):
num_nodes_dict = {'_N':num_nodes}

gs = CuGraphStorage(num_nodes_dict=num_nodes_dict, single_gpu=False)
gs.add_edge_data(edgelist_df,
gs.add_edge_data(edgelist_df,
node_col_names=['dst', 'src'],
canonical_etype=['_N', 'C', '_N'])
return (gs, client, cluster)
Expand Down Expand Up @@ -166,7 +164,7 @@ def get_uniform_neighbor_sample_args(
num_start_verts = int(num_verts * 0.25)
else:
num_start_verts = batch_size

srcs = G.graphstore.gdata.get_edge_data()['_SRC_']
start_list = srcs.head(num_start_verts)
assert len(start_list) == num_start_verts
Expand Down Expand Up @@ -229,7 +227,7 @@ def bench_cugraph_dgl_uniform_neighbor_sample(
fanout_val.reverse()
sampler = dgl.dataloading.NeighborSampler(uns_args["fanout"])
sampler_f = sampler.sample_blocks

# Warmup
_ = sampler_f(g=G, seed_nodes=uns_args["seed_nodes"])
# print(f"\n{uns_args}")
Expand Down
Loading