Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark scripts for training time performance #139

Closed
HanYuanyuaner opened this issue Mar 18, 2019 · 7 comments
Closed

Benchmark scripts for training time performance #139

HanYuanyuaner opened this issue Mar 18, 2019 · 7 comments
Labels

Comments

@HanYuanyuaner
Copy link

❓ Questions & Help

Hi,
I run the gcn.py in /example folder, and change dataset name to "PubMed". In the website the training time about this dataset with gcn.py is 2.0s. But, in my server it only need about 0.7+s. The training time of gat.py is about 3s not 12s. My GPU is GTX 1080 Ti, 200 epochs. So do you know the reason?

@rusty1s
Copy link
Member

rusty1s commented Mar 18, 2019

Yes, you are right! Training speed performance increased once more with this PR. However, training speed may still vary for same GPUs for different PyTorch or CUDA versions. I will try to keep the performance table as up to date as possible and provide training time evaluation scripts for verification.

@rusty1s rusty1s changed the title Question about training time with PubMed dataset Benchmark scripts for training time performance Mar 18, 2019
@HanYuanyuaner
Copy link
Author

Yes, you are right! Training speed performance increased once more with this PR. However, training speed may still vary for same GPUs for different PyTorch or CUDA versions. I will try to keep the performance table as up to date as possible and provide training time evaluation scripts for verification.

Thank you for your respond. I checked my code in message_passing.py. The code is still old and torch embedding is not used. And my environment is ubuntu 16.04, torch 1.0, cuda 9.0 and cudnn7.0. For dataset Cora and Citeseer the experiment results are similar with yours. Only PubMed is different. I only modify dataset name in gcn.py/gat.py. Do I need to modify any other code?

@rusty1s
Copy link
Member

rusty1s commented Mar 18, 2019

I added a small script in benchmark/runtime to check current running times of model-dataset pairs. Actually, PubMed is now similar in speed to Cora and CiteSeer. I wonder what caused the delay back then. I will update running times ASAP.

@HanYuanyuaner
Copy link
Author

I added a small script in benchmark/runtime to check current running times of model-dataset pairs. Actually, PubMed is now similar in speed to Cora and CiteSeer. I wonder what caused the delay back then. I will update running times ASAP.

Thank you for your respond. Another question is about distributed computation. In example the dataset is small, for real case the dataset may be large for one worker, so how to separate graph to sub-graph may be a problem, do you have any suggestion? Or dose PyG have the potential to support distributed computation?

@rusty1s
Copy link
Member

rusty1s commented Mar 19, 2019

Hi, you can always use more workers if this proves to be beneficial. We support distributed training via torch.distributed or nn.DataParallel. What do you mean with "so how to separate graph to sub-graph may be a problem"?

@HanYuanyuaner
Copy link
Author

Hi, you can always use more workers if this proves to be beneficial. We support distributed training via torch.distributed or nn.DataParallel. What do you mean with "so how to separate graph to sub-graph may be a problem"?

I mean if the graph is too large for one GPU to store, how to store it?

@rusty1s
Copy link
Member

rusty1s commented Mar 20, 2019

Sadly, we currently do not support giant graph processing. Giant graphs are usually processed via sampling techniques. This is a rather difficult but important feature for PyG and it is definitively on my ToDo list. I will close this request in favour of #64.

@rusty1s rusty1s closed this as completed Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants