-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
num_worker training dependency #196
Comments
Hi, which example are you talking about? |
Hi, the property_prediction with csv_data_configuration. I used the regression_train.py code. |
Sorry for the late reply.
Have you eliminated all sources of randomness? By default
Yes, you can set |
Thanks very much for your response. I try to use my own splitting for the Train/test/val sets which are based on splitting 0 and 1 labels separately. I have a column that has the splitting. Is there an easy way to do this? added -ttvc (--train-test-val-col) argument which indicates column for train-test-val split labels
And here is the change I made where the data gets read and splitting done:
Thanks |
I actually found the SingleTaskStratifiedSplitter class which I think will do what I found but did not see it in the options for splitting method. I will try to use it. Please let me know if you think this is a correct way to do it. |
That should work. Feel free if you encounter any further issues. |
Thanks Mufei. Just wondering if the code has been ever used for large scale datasets(e.g., 100 million molecules). If so, what you suggest to use or change within the code to make it scalable and memory efficient? Thanks. |
I have not tested the code for that scale. Likely you will need to check if you have enough memory to load the data at once or alternatively load the data in batches. You will also need more computational resources, e.g., multi-GPU training. The example here might help. |
Hi mufeili,
I have a couple of question which I appreciate it if you could help with.
-Changing the number of workers changes the number of epochs required to converge which is not expected. Increasing # of CPUs increases the training time. Any advice on why these happen?
-Could we use graph.bin file generated previously to start training without loading graph from a .csv file?
Thanks.
The text was updated successfully, but these errors were encountered: