Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a support for dialect mysql in tpcds query generation tool? #13

Closed
krdeepakio opened this issue Jul 11, 2015 · 7 comments
Closed
Labels

Comments

@krdeepakio
Copy link

Thanks you for helping so many people with this repository. It would have been impossible to get this to work by using the source from official site.

I have the data ready in the database. During query generation I did not find any option for mysql as a dialect. I have tried ansi and sqlserver but the queries generated there are not valid mysql queries.

Any help in this regard will be highly appreciated. Thank you.

@gregrahn
Copy link
Owner

gregrahn commented Jul 15, 2015

Use the netezza.tpl. MySQL and Netezza both use "limit N" to limit the rows returned. Other adjustments may be necessary as well. Also note, MySQL does not support window functions, so you will have to use the version from the query_variants directory for those queries.

@krdeepakio
Copy link
Author

Used the netezza variant, also took help from https://github.com/cloudera/impala-tpcds-kit
I am using this sed command to convert blanks to \N, so that mysql creates that column with null value. It works fine.
sed -i -e 's/^|/\N|/' -e 's/||/|\N|/g' -e 's/||/|\N|/g' -e 's/|$/|/' *.dat

Have you looked into the data generation code also, so that we can change it to produce \N instead of blank.

@gregrahn
Copy link
Owner

gregrahn commented Aug 3, 2015

It's probably not difficult to set empty fields to \N, but it's also pretty lame that MySQL still has no option to load missing fields as NULL. There's been a feature request for this since 2006 here. An alternative is also to use the NULLIF(@colname,'') when loading.

I added #14 for this to track it.

@krdeepakio
Copy link
Author

Thank you. I am importing 100GB of data in mysql. The server is 16 core xlarge ec2 instance with 64G RAM. I created the schema (tpcds.sql) and then loaded the data. It took around 7 hours. After that I am running tpcds_ri.sql. This has been running for last 10 hours and still only 20% complete. I must be doing something wrong.

My innodb settings are as follows.
innodb_buffer_pool_size - 2G
innodb_log_file_size - 512M
innodb_log_buffer_size - 1M

My guess is that buffer log size or log file size are culprit. If you have tested with 100G data, can you share your numbers.

@krdeepakio
Copy link
Author

Changed the settings to 32G, 1G, 64M

Alter table add constraint foreign key still taking very long.

@krdeepakio
Copy link
Author

Another request which may be simple to implement. Can we add the support for 10 GB data? It will be really helpful. Thank you.

@gregrahn
Copy link
Owner

gregrahn commented Aug 3, 2015

A few things:

  • Asking for MySQL tuning help is outside the scope of this issues for this project. Sorry.
  • You can generate any size data you want, however
  • Only certain scale factors are valid for official results. See Clause 3.1 in the TPC-DS Specification
  • One topic per issue, so closing this one.

@gregrahn gregrahn closed this as completed Aug 3, 2015
sjp38 added a commit to sjp38/tpcds-mariadb that referenced this issue Sep 26, 2017
Blank cell in data becomes integer zero instead of NULL on MariaDB and
it makes tpcds_ri.sql to fail.  This commit fixes the problem by
converting the blank cells into `\N` using sed.  The solution has came
from Web[1].

[1]
gregrahn/tpcds-kit#13 (comment)

Signed-off-by: SeongJae Park <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants