-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameter generation is very slow #51
Comments
we at tigergraph tried this data generation for SF-1000. It took 33+ hours. However, the problem is we could not find the parameter files under ldbc_snb_data/substitution_parameters. Have anyone successfully generated the parameters for SF-1000? |
Can you look at the log files of parameter generation (parameters_bi.log and parameters_interactive.log). Any hint there? |
Continued for the comment from TigerGraph above: and for "parameters_interactive.log": It seems that there is something wrong with the memory. I add the modification 'export HADOOP_CLIENT_OPTS="-Xmx200G"' in "run.sh" and the memory size of my machine is 244GB. Do you have any suggestions? |
Parameter generation is implemented using a couple of python scripts, this is the reason it is so slow, because its execution is not parallelized in any way. Setting HADOOP_CLIENT_OPTS will have no effect on parameter generation. |
According to reports by several users, for large datesets (eg. SF1000) the generation of parameters becomes the most expensive part of the generation process (90 minutes of generating data, 12 hours for generating the parameters). We should rethink its implementation (maybe porting it to a hadoop job).
The text was updated successfully, but these errors were encountered: