Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch update and load balancing #41

Open
samuelecancellieri opened this issue May 29, 2020 · 15 comments
Open

batch update and load balancing #41

samuelecancellieri opened this issue May 29, 2020 · 15 comments

Comments

@samuelecancellieri
Copy link

Hi,

I downloaded the repo from the master and I'm encountering the following problems when using the bfs2 algorithm.

The first one is a problem with the binary search load balancing, in fact, when used to perform the BFS, each run return a different distance despite starting from the same root.
I solved the problem by changing the balancing to vertexbased, by I'm not sure what is causing this behavior.

The second and more big problem, arises when I try to use update batches.
I'm able to generate batches and correctly insert the batches into the graph (edge insertion for example) but when I try any operation on the graph, some updated edges are not present, or better, they are present, but they have an incorrect destination, in fact, they all terminate into node 0.

For example if I insert the batch (100, 200), the graph is updated with an edge starting from node 100 and going into node 200. But sometimes, when the batch is greater than a couple thousands (e.g. 5000 new edges), the new inserted edges start from the correct source but terminates into node 0, other times the behavior is inverted, all the already present edges, start from the correct source but terminates into 0 and all the new edges are correct.
This behavior is not common to all nodes updated in the batch, only some follow this strange behavior.

I've tried different batch updates, but I cannot figure it out why there are these problems.

Just to have more details, I include my machine specs:
GPU is GTX 1060 with 6gb RAM
toolkit is version V10.2.89
and NVIDIA drivers are version 440.82
on Ubuntu 18.04 OS

Thanks a lot for any answer.

Best,
Samuele

@ogreen
Copy link
Contributor

ogreen commented May 31, 2020

For the BFS code:
Can you include the name of the input and the exact command line parameters you used?
It would also be helpful if you added the output for two runs.

For the code using batch updates: can you include code for reproducibility?
Have you verified that the batches are indeed holding the expected values before performing the update operation?

@samuelecancellieri
Copy link
Author

Hi, thanks for the super fast reply.

The input for the first call is: ./bfs2 as-Skitter.mtx
and two consecutive run report these results:
RUN 1
image
RUN 2
image

The code for the batch update is the following:
image

Then I iterate over the two arrays to print sources and destinations of the batch and for check i use the print function for batches
image

The prints are exactly equal, so I think the updates are done correctly.
The two batches are inverted because I've noticed that when using undirected graphs the edge list is updated only in one direction, so as a solution I perform a double insertion with sources and destination reversed.

This is the code I use to check if the update is present on the graph.
image

And i call the operator with this function:
image

update_src and update_dst are the vector containing the sources and destinations of the batches and update_size is the single batch size.

This is an example of the behavior I've described before:

image

As you can see the node 67807 has destination 0 for all its edges but one, 121355, this destination is the newly inserted edge.
But this behavior is not repeated on all the updated nodes, in fact, you can see that 67646 is correctly pointing to its destination nodes, including the newly inserted 373151.

Sorry for the very long mail and thanks for the help.

@ogreen
Copy link
Contributor

ogreen commented Jun 11, 2020

Can you point me to the code? I will gladly run on my GV100 and try this out.
(Sorry for the slow response, I did not get a notification that there was a response on Git).

@samuelecancellieri
Copy link
Author

Hi,
I attach to this reply the code I'm using to test the batches, it's a very simple modification done on the original files, BFStest2.cu and the TopDown2.cuh in the hornetsnest folder.
The graph I used in the testing is the following:
https://sparse.tamu.edu/SNAP/as-Skitter
Downloaded in matrix market format.

Thanks a lot for the help.
batch_test.zip

@ogreen
Copy link
Contributor

ogreen commented Jun 12, 2020

@samuelecancellieri , I just finished updating a system with a GTX1060 GPU to have NVIDIA driver 440 and CUDA ToolKit 10.2. I was able to recreate your problem.
On my GV100, I can't recreate the BFS. I get consistent results.
It seems that the bug is in the load-balancer as you outlined. It could be that the problem that you are you seeing with the update process is also related (as we use the load-balancing mechanisms internally).

I just submitted a PR for a new load balancing mechanism called LRB (Logarthim Radix Binning) that I hope will resolve this problem:
#38

I added a flag to the BFS tester that allows controlling which of the two load-balancing mechanisms to use. LRB gives consistent results on the GTX1060.
Can you see if that resolve the accuracy problem with BFS?
We can move on to the update process after that.
Thanks.

@samuelecancellieri
Copy link
Author

Hi,
sorry for the slow response.

I tested the new load balancing and the problem with BFS are solved, now the results are consistent each run. But the problem with the update is still there, whenever I performed a batch update, some vertices are connected to the node 0 with no error reported from the batch update operation (batch.insert). I don't know if the problem is the load balancing, there's a way to change the load balancing mechanism also in the batch update kernel??

Thanks a lot for all the support.

Best,
Samuele

@ogreen
Copy link
Contributor

ogreen commented Jun 19, 2020

It seems that the problem is with BinarySearchLB in XLIB.
Notice that both the BinarySearch load-balancer that you were using in your BFS, is also used in the update process:

xlib::binarySearchLB<BLOCK_SIZE, ITEMS_PER_BLOCK / BLOCK_SIZE>

and
https://github.com/rapidsai/cuhornet/blob/master/hornet/include/Core/BatchUpdate/BatchUpdateKernels.cuh#L62

Do your batches include duplicate values?
If you can ensure that the edges in the batch are unique and the edges in the batch do not exist in the graph, you can add the following two flags:
ORIGINAL: hornet_graph.insert(batch_update_src_to_dst);
New : hornet_graph.insert(batch_update_src_to_dst, false,false);
This might help resolve the problem.

@samuelecancellieri
Copy link
Author

Hi,
I tried to change the insert instruction but with no luck, do you think will be possible to change the load_balancing mechanism used internally by the batch update kernel? Maybe use the new LRB that solved the problem with the BFS?

Thanks,
Samuele

@ogreen
Copy link
Contributor

ogreen commented Jun 25, 2020

I am not sure how easy it will be to update the batching process.
Have you taken a look at the xlib binary search? It might be an easy fix there.

@ogreen
Copy link
Contributor

ogreen commented Jul 7, 2020

@samuelecancellieri , can you verify that you are not using 64-bit integers as part of the batch process? A mis-aligned type might cause problems.
Can you point me to a branch with the code? I can try to recreate the problem.
One more thing worth trying would be to generate the batch on the host and then copy them over to the device manually into two arrays. You would need to update your class UpdatePtr to be DEVICE type.

@samuelecancellieri
Copy link
Author

Hi,
I verified the type of int used in the creation of the batch and it occupies 4 bytes in memory, so I think there is no misalignment. Here the code I use to create the batches.
image

I downloaded the code from the master branch of this repo more than a month ago, I'm not sure about the commit, if you know a way to get the number I will provide it in the next reply.

I don't think I understood your proposed solution, now these are the classes I'm using in the test.
image
You are suggesting to change the type to DEVICE and the manually copy into the GPU the arrays and perform a manual batch update? It will be very helpful to have an example on how to do it.

Thanks a lot for the reply.
I hope we will find a solution to this annoying problem, because the rest of the framework is working really well.

Samuele

@ogreen
Copy link
Contributor

ogreen commented Jul 8, 2020

I hope that the following code will be helpful:

`
using UpdatePtr = hornet::BatchUpdatePtr<vert_t, hornet::EMPTY, hornet::DeviceType::DEVICE>;

generateBatch(graph, batch_size, batch_src, batch_dst, BatchGenType::INSERT, batch_gen_property::UNIQUE);

vert_t *d_batch_src,vert_t* d_batch_dst;
cudaMalloc(&d_batch_src, sizeof(vert_t)*batch_size);
cudaMalloc(&d_batch_dst, sizeof(vert_t)*batch_size);

cudaMemcpy(d_batch_src, batch_src, sizeof(vert_t)*batch_size, cudaMemcpyHostToDevice);
cudaMemcpy(d_batch_dst, batch_dst, sizeof(vert_t)*batch_size, cudaMemcpyHostToDevice);

// Do something

`

@samuelecancellieri
Copy link
Author

Hi,
I've tested the proposed solution, but with no luck.
I've also done checks with Memcheck and Synccheck but no error reported, also the gdb does not found any error during execution.
I think the last solution would be to control the load balancing mechanism. I've done some minor testing on that part and found nothing wrong on the surface, but probably I'm missing something.
Let me know if you need more information to help with the problem recreation.

Thanks a lot again.

@z1ko
Copy link

z1ko commented Apr 18, 2023

Hello, are there any news about this issue?
@samuelecancellieri did you ever manage to solve it?

Thanks in advance.

@samuelecancellieri
Copy link
Author

Hello, are there any news about this issue? @samuelecancellieri did you ever manage to solve it?

Thanks in advance.

Hi, sorry but I've not worked on this in almost 3 years.
But I remember the only thing that solved the problem was using a different graphic card. I've tried solving directly on the code but I abandoned that difficult path.

Sorry to be not helpful.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants