[cuGRAPH] Sampling Optimizations #2

VibhuJawa · 2022-05-17T01:17:36Z

Description

This PR optimizes the sampling workflow to make it go 3x+ faster by making sure we do sampling end to end with GPUs .

Before this PR on the as-skitter dataset it took 29.3 s now we only take 11.30 s and more importantly we spend 99.6 % of time in cugraph.community.egonet.batched_ego_graphs(g, current_seeds, radius=1) rather than just spending 34.8 % there.

This means that it should scale by the performance of batched_ego_graphs rather than other library overhead.

Profiling:

Before the PR:

Timer unit: 1e-06 s

Total time: 29.3832 s
File: /tmp/ipykernel_62775/1201298652.py
Function: cugraphSampler_old at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def cugraphSampler_old(g, nodes, fanouts, edge_dir='in', prob=None, replace=False,
     2                                                                copy_ndata=True, copy_edata=True, _dist_training=False, exclude_edges=None):
     3                                               # from here get in a new for loop
     4                                               # ego_net return edge list
     5                                               # print("edge data")
     6                                               # print(g.edge_data)
     7         1         27.0     27.0      0.0      num_nodes = len(nodes)
     8         1         15.0     15.0      0.0      if torch.is_tensor(nodes):
     9         1       1206.0   1206.0      0.0          current_seeds = cupy.asarray(nodes)
    10         1        518.0    518.0      0.0          current_seeds = cudf.Series(current_seeds)
    11                                               else:
    12                                                   current_seeds = nodes.reindex(index = np.arange(0, num_nodes))
    13                                           
    14         1          3.0      3.0      0.0      blocks = []
    15                                               #seeds = cudf.Series(nodes.to_array())
    16                                           
    17         3          4.0      1.3      0.0      for fanout in fanouts:
    18         2   10229989.0 5114994.5     34.8          ego_edge_list, seeds_offsets = cugraph.community.egonet.batched_ego_graphs(g, current_seeds, radius = 1)
    19                                                   #print ("current_seeds", current_seeds)
    20         2        453.0    226.5      0.0          print ("fanout", fanout)
    21                                                   #all_parents = cupy.ndarray(fanout*len(current_seeds))
    22                                                   #all_children = cupy.ndarray(fanout*len(current_seeds))
    23         2        193.0     96.5      0.0          all_parents = cupy.ndarray(0)
    24         2         35.0     17.5      0.0          all_children = cupy.ndarray(0)
    25                                                   #print ("all parents", all_parents)
    26                                               # filter and get a certain size neighborhood
    27      2002       2407.0      1.2      0.0          for i in range(1, len(seeds_offsets)):
    28      2000     530286.0    265.1      1.8              pos0 = seeds_offsets.values_host[i-1]
    29      2000     424828.0    212.4      1.4              pos1 = seeds_offsets.values_host[i]
    30      2000     791732.0    395.9      2.7              edge_list = ego_edge_list[pos0:pos1]
    31                                                       # get randomness fanout
    32      2000   12797656.0   6398.8     43.6              filtered_list = edge_list[edge_list ['dst']== current_seeds[i-1]]
    33                                                        
    34                                                       # get sampled_list
    35      2000      28990.0     14.5      0.1              if len(filtered_list) > fanout:
    36       659      88522.0    134.3      0.3                  sampled_indices = random.sample(filtered_list.index.to_arrow().to_pylist(), fanout)
    37       659    3559557.0   5401.5     12.1                  filtered_list = filtered_list.reindex(index = sampled_indices)
    38                                                           
    39      2000     291726.0    145.9      1.0              children = cupy.asarray(filtered_list['src'])
    40      2000     259716.0    129.9      0.9              parents = cupy.asarray(filtered_list['dst'])
    41                                                       # copy the src and dst to cupy array
    42      2000     217204.0    108.6      0.7              all_parents = cupy.append(all_parents, parents)
    43      2000     154211.0     77.1      0.5              all_children = cupy.append(all_children, children)
    44                                                       #print (len(test_parents)) 
    45                                           
    46                                                   # generate dgl.graph and  blocks
    47         2       1971.0    985.5      0.0          sampled_graph = dgl.graph ((all_children,all_parents))
    48                                                   #print(all_parents)
    49                                                   #print(all_children)
    50                                                   #print(sampled_graph.edges())
    51                                                   #print(seeds.to_array())
    52                                                   # '_ID' is EID
    53         2          3.0      1.5      0.0          num_edges = len(all_children) 
    54         2        468.0    234.0      0.0          sampled_graph.edata['_ID'] = torch.tensor(np.arange (num_edges))
    55         2       1446.0    723.0      0.0          print(sampled_graph.edata)
    56                                                   #block =dgl.to_block(sampled_graph,current_seeds.to_array())
    57                                                   #block.edata[dgl.EID] = eid
    58                                                   #current_seeds = block.srcdata[dgl.NID]
    59                                                   #current_seeds = cudf.Series(current_seeds.cpu().detach().numpy())
    60                                           
    61                                                   #blocks.insert(0, block)
    62                                                   # end of for
    63                                           
    64         1          1.0      1.0      0.0      return sampled_graph

After the PR:

Timer unit: 1e-06 s

Total time: 11.3084 s
File: /tmp/ipykernel_62775/3202272512.py
Function: cugraphSampler_vjawa at line 31

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    31                                           def cugraphSampler_vjawa(
    32                                               g,
    33                                               nodes,
    34                                               fanouts,
    35                                               edge_dir="in",
    36                                               prob=None,
    37                                               replace=False,
    38                                               copy_ndata=True,
    39                                               copy_edata=True,
    40                                               _dist_training=False,
    41                                               exclude_edges=None,
    42                                           ):
    43                                               # from here get in a new for loop
    44                                               # ego_net return edge list
    45                                           
    46                                               # vjawa: Below fails
    47                                               # print("edge data")
    48                                               # print(g.edge_data)
    49         1         14.0     14.0      0.0      num_nodes = len(nodes)
    50         1          4.0      4.0      0.0      if torch.is_tensor(nodes):
    51         1        809.0    809.0      0.0          current_seeds = cp.asarray(nodes)
    52         1        201.0    201.0      0.0          current_seeds = cudf.Series(current_seeds)
    53                                               else:
    54                                                   current_seeds = nodes.reindex(index=cp.arange(0, num_nodes))
    55                                           
    56                                               # blocks = []
    57                                               # seeds = cudf.Series(nodes.to_array())
    58         3          4.0      1.3      0.0      for fanout in fanouts:
    59         2          4.0      2.0      0.0          (
    60         2         44.0     22.0      0.0              ego_edge_list,
    61         2          7.0      3.5      0.0              seeds_offsets,
    62         4   11258415.0 2814603.8     99.6          ) = cugraph.community.egonet.batched_ego_graphs(
    63         2          2.0      1.0      0.0              g, current_seeds, radius=1
    64                                                   )
    65                                                   # filter and get a certain size neighborhood
    66                                                   # Step 1
    67                                                   # Get Filtered List of ego_edge_list corresposing to current_seeds
    68                                                   # We filter by creating a series of destination nodes
    69                                                   # corresponding to the offsets and filtering non matching vallues
    70                                           
    71         2       1838.0    919.0      0.0          seeds_offsets_s = cudf.Series(seeds_offsets).values
    72         2        495.0    247.5      0.0          offset_lens = seeds_offsets_s[1:] - seeds_offsets_s[0:-1]
    73         2       3131.0   1565.5      0.0          dst_seeds = current_seeds.repeat(offset_lens)
    74         2        156.0     78.0      0.0          dst_seeds.index = ego_edge_list.index
    75         2       3785.0   1892.5      0.0          filtered_list = ego_edge_list[ego_edge_list["dst"] == dst_seeds]
    76                                           
    77                                                   # Step 2
    78                                                   # Sample Fan Out
    79                                                   # for each dst take maximum of fanout samples
    80         2      35537.0  17768.5      0.3          filtered_list = group_sample(filtered_list, by="dst", n_samples=fanout)
    81                                           
    82         2        970.0    485.0      0.0          all_children = create_tensor_from_cupy_cudf_objs(filtered_list["src"])
    83         2        670.0    335.0      0.0          all_parents = create_tensor_from_cupy_cudf_objs(filtered_list["dst"])
    84                                           
    85         2       1582.0    791.0      0.0          sampled_graph = dgl.graph((all_children, all_parents))
    86                                           
    87                                                   # print(all_parents)
    88                                                   # print(all_children)
    89                                                   # print(sampled_graph.edges())
    90                                                   # print(seeds.to_array())
    91                                                   # '_ID' is EID
    92                                           
    93         2         15.0      7.5      0.0          num_edges = len(all_children)
    94         4        471.0    117.8      0.0          sampled_graph.edata["_ID"] = from_dlpack(
    95         2        214.0    107.0      0.0              cp.arange(num_edges).toDlpack()
    96                                                   )
    97                                                   # print(sampled_graph.edata)
    98                                                   # block =dgl.to_block(sampled_graph,current_seeds.to_array())
    99                                                   # block.edata[dgl.EID] = eid
   100                                                   # current_seeds = block.srcdata[dgl.NID]
   101                                                   # current_seeds = cudf.Series(current_seeds.cpu().detach().numpy())
   102                                           
   103                                                   # blocks.insert(0, block)
   104                                                   # end of for
   105                                           
   106         1          1.0      1.0      0.0      return sampled_graph

VibhuJawa added 2 commits May 16, 2022 18:05

Sampling end to end on GPUs

ac0c4c2

Fix dataset path to /home/xiaoyunw/cugraph/datasets/

cab5f44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuGRAPH] Sampling Optimizations #2

[cuGRAPH] Sampling Optimizations #2

VibhuJawa commented May 17, 2022

[cuGRAPH] Sampling Optimizations #2

Are you sure you want to change the base?

[cuGRAPH] Sampling Optimizations #2

Conversation

VibhuJawa commented May 17, 2022

Description

Profiling: