batch update and load balancing #41

samuelecancellieri · 2020-05-29T16:30:10Z

Hi,

I downloaded the repo from the master and I'm encountering the following problems when using the bfs2 algorithm.

The first one is a problem with the binary search load balancing, in fact, when used to perform the BFS, each run return a different distance despite starting from the same root.
I solved the problem by changing the balancing to vertexbased, by I'm not sure what is causing this behavior.

The second and more big problem, arises when I try to use update batches.
I'm able to generate batches and correctly insert the batches into the graph (edge insertion for example) but when I try any operation on the graph, some updated edges are not present, or better, they are present, but they have an incorrect destination, in fact, they all terminate into node 0.

For example if I insert the batch (100, 200), the graph is updated with an edge starting from node 100 and going into node 200. But sometimes, when the batch is greater than a couple thousands (e.g. 5000 new edges), the new inserted edges start from the correct source but terminates into node 0, other times the behavior is inverted, all the already present edges, start from the correct source but terminates into 0 and all the new edges are correct.
This behavior is not common to all nodes updated in the batch, only some follow this strange behavior.

I've tried different batch updates, but I cannot figure it out why there are these problems.

Just to have more details, I include my machine specs:
GPU is GTX 1060 with 6gb RAM
toolkit is version V10.2.89
and NVIDIA drivers are version 440.82
on Ubuntu 18.04 OS

Thanks a lot for any answer.

Best,
Samuele

ogreen · 2020-05-31T03:10:17Z

For the BFS code:
Can you include the name of the input and the exact command line parameters you used?
It would also be helpful if you added the output for two runs.

For the code using batch updates: can you include code for reproducibility?
Have you verified that the batches are indeed holding the expected values before performing the update operation?

samuelecancellieri · 2020-06-01T08:24:17Z

Hi, thanks for the super fast reply.

The input for the first call is: ./bfs2 as-Skitter.mtx
and two consecutive run report these results:
RUN 1

RUN 2

The code for the batch update is the following:

Then I iterate over the two arrays to print sources and destinations of the batch and for check i use the print function for batches

The prints are exactly equal, so I think the updates are done correctly.
The two batches are inverted because I've noticed that when using undirected graphs the edge list is updated only in one direction, so as a solution I perform a double insertion with sources and destination reversed.

This is the code I use to check if the update is present on the graph.

And i call the operator with this function:

update_src and update_dst are the vector containing the sources and destinations of the batches and update_size is the single batch size.

This is an example of the behavior I've described before:

As you can see the node 67807 has destination 0 for all its edges but one, 121355, this destination is the newly inserted edge.
But this behavior is not repeated on all the updated nodes, in fact, you can see that 67646 is correctly pointing to its destination nodes, including the newly inserted 373151.

Sorry for the very long mail and thanks for the help.

ogreen · 2020-06-11T14:31:29Z

Can you point me to the code? I will gladly run on my GV100 and try this out.
(Sorry for the slow response, I did not get a notification that there was a response on Git).

samuelecancellieri · 2020-06-11T16:07:16Z

Hi,
I attach to this reply the code I'm using to test the batches, it's a very simple modification done on the original files, BFStest2.cu and the TopDown2.cuh in the hornetsnest folder.
The graph I used in the testing is the following:
https://sparse.tamu.edu/SNAP/as-Skitter
Downloaded in matrix market format.

Thanks a lot for the help.
batch_test.zip

ogreen · 2020-06-12T17:02:32Z

@samuelecancellieri , I just finished updating a system with a GTX1060 GPU to have NVIDIA driver 440 and CUDA ToolKit 10.2. I was able to recreate your problem.
On my GV100, I can't recreate the BFS. I get consistent results.
It seems that the bug is in the load-balancer as you outlined. It could be that the problem that you are you seeing with the update process is also related (as we use the load-balancing mechanisms internally).

I just submitted a PR for a new load balancing mechanism called LRB (Logarthim Radix Binning) that I hope will resolve this problem:
#38

I added a flag to the BFS tester that allows controlling which of the two load-balancing mechanisms to use. LRB gives consistent results on the GTX1060.
Can you see if that resolve the accuracy problem with BFS?
We can move on to the update process after that.
Thanks.

samuelecancellieri · 2020-06-18T09:05:33Z

Hi,
sorry for the slow response.

I tested the new load balancing and the problem with BFS are solved, now the results are consistent each run. But the problem with the update is still there, whenever I performed a batch update, some vertices are connected to the node 0 with no error reported from the batch update operation (batch.insert). I don't know if the problem is the load balancing, there's a way to change the load balancing mechanism also in the batch update kernel??

Thanks a lot for all the support.

Best,
Samuele

ogreen · 2020-06-19T01:15:38Z

It seems that the problem is with BinarySearchLB in XLIB.
Notice that both the BinarySearch load-balancer that you were using in your BFS, is also used in the update process:

cuhornet/primitives/LoadBalancing/BinarySearchKernel.cuh

Line 109 in 14d691f

xlib::binarySearchLB<BLOCK_SIZE, ITEMS_PER_BLOCK / BLOCK_SIZE>

and
https://github.com/rapidsai/cuhornet/blob/master/hornet/include/Core/BatchUpdate/BatchUpdateKernels.cuh#L62

Do your batches include duplicate values?
If you can ensure that the edges in the batch are unique and the edges in the batch do not exist in the graph, you can add the following two flags:
ORIGINAL: hornet_graph.insert(batch_update_src_to_dst);
New : hornet_graph.insert(batch_update_src_to_dst, false,false);
This might help resolve the problem.

samuelecancellieri · 2020-06-19T12:15:25Z

Hi,
I tried to change the insert instruction but with no luck, do you think will be possible to change the load_balancing mechanism used internally by the batch update kernel? Maybe use the new LRB that solved the problem with the BFS?

Thanks,
Samuele

ogreen · 2020-06-25T16:55:07Z

I am not sure how easy it will be to update the batching process.
Have you taken a look at the xlib binary search? It might be an easy fix there.

ogreen · 2020-07-07T12:24:42Z

@samuelecancellieri , can you verify that you are not using 64-bit integers as part of the batch process? A mis-aligned type might cause problems.
Can you point me to a branch with the code? I can try to recreate the problem.
One more thing worth trying would be to generate the batch on the host and then copy them over to the device manually into two arrays. You would need to update your class UpdatePtr to be DEVICE type.

samuelecancellieri · 2020-07-07T14:44:19Z

Hi,
I verified the type of int used in the creation of the batch and it occupies 4 bytes in memory, so I think there is no misalignment. Here the code I use to create the batches.

I downloaded the code from the master branch of this repo more than a month ago, I'm not sure about the commit, if you know a way to get the number I will provide it in the next reply.

I don't think I understood your proposed solution, now these are the classes I'm using in the test.

You are suggesting to change the type to DEVICE and the manually copy into the GPU the arrays and perform a manual batch update? It will be very helpful to have an example on how to do it.

Thanks a lot for the reply.
I hope we will find a solution to this annoying problem, because the rest of the framework is working really well.

Samuele

ogreen · 2020-07-08T00:57:50Z

I hope that the following code will be helpful:

`
using UpdatePtr = hornet::BatchUpdatePtr<vert_t, hornet::EMPTY, hornet::DeviceType::DEVICE>;

generateBatch(graph, batch_size, batch_src, batch_dst, BatchGenType::INSERT, batch_gen_property::UNIQUE);

vert_t *d_batch_src,vert_t* d_batch_dst;
cudaMalloc(&d_batch_src, sizeof(vert_t)*batch_size);
cudaMalloc(&d_batch_dst, sizeof(vert_t)*batch_size);

cudaMemcpy(d_batch_src, batch_src, sizeof(vert_t)*batch_size, cudaMemcpyHostToDevice);
cudaMemcpy(d_batch_dst, batch_dst, sizeof(vert_t)*batch_size, cudaMemcpyHostToDevice);

// Do something

`

samuelecancellieri · 2020-07-08T11:10:54Z

Hi,
I've tested the proposed solution, but with no luck.
I've also done checks with Memcheck and Synccheck but no error reported, also the gdb does not found any error during execution.
I think the last solution would be to control the load balancing mechanism. I've done some minor testing on that part and found nothing wrong on the surface, but probably I'm missing something.
Let me know if you need more information to help with the problem recreation.

Thanks a lot again.

z1ko · 2023-04-18T15:05:22Z

Hello, are there any news about this issue?
@samuelecancellieri did you ever manage to solve it?

Thanks in advance.

samuelecancellieri · 2023-04-19T06:02:06Z

Hello, are there any news about this issue? @samuelecancellieri did you ever manage to solve it?

Thanks in advance.

Hi, sorry but I've not worked on this in almost 3 years.
But I remember the only thing that solved the problem was using a different graphic card. I've tried solving directly on the code but I abandoned that difficult path.

Sorry to be not helpful.

Best,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch update and load balancing #41

batch update and load balancing #41

samuelecancellieri commented May 29, 2020

ogreen commented May 31, 2020

samuelecancellieri commented Jun 1, 2020

ogreen commented Jun 11, 2020

samuelecancellieri commented Jun 11, 2020

ogreen commented Jun 12, 2020

samuelecancellieri commented Jun 18, 2020

ogreen commented Jun 19, 2020

samuelecancellieri commented Jun 19, 2020

ogreen commented Jun 25, 2020

ogreen commented Jul 7, 2020

samuelecancellieri commented Jul 7, 2020

ogreen commented Jul 8, 2020

samuelecancellieri commented Jul 8, 2020

z1ko commented Apr 18, 2023

samuelecancellieri commented Apr 19, 2023

batch update and load balancing #41

batch update and load balancing #41

Comments

samuelecancellieri commented May 29, 2020

ogreen commented May 31, 2020

samuelecancellieri commented Jun 1, 2020

ogreen commented Jun 11, 2020

samuelecancellieri commented Jun 11, 2020

ogreen commented Jun 12, 2020

samuelecancellieri commented Jun 18, 2020

ogreen commented Jun 19, 2020

samuelecancellieri commented Jun 19, 2020

ogreen commented Jun 25, 2020

ogreen commented Jul 7, 2020

samuelecancellieri commented Jul 7, 2020

ogreen commented Jul 8, 2020

samuelecancellieri commented Jul 8, 2020

z1ko commented Apr 18, 2023

samuelecancellieri commented Apr 19, 2023