Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decimal for CSVReader if user provide the schema with decimal #926

Closed
liukun4515 opened this issue Nov 9, 2021 · 4 comments · Fixed by #941 or #952
Closed

Add decimal for CSVReader if user provide the schema with decimal #926

liukun4515 opened this issue Nov 9, 2021 · 4 comments · Fixed by #941 or #952
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@liukun4515
Copy link
Contributor

liukun4515 commented Nov 9, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I need to read decimal data from csvfile to decimalarray if i provide the schema with decimal column.

If read the csv file using the inferred schema, we just use the old rule and convert the value to float64 or UTF8

Describe the solution you'd like

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@liukun4515 liukun4515 added the enhancement Any new improvement worthy of a entry in the changelog label Nov 9, 2021
@jimexist
Copy link
Member

jimexist commented Nov 9, 2021

thanks for the issue @liukun4515 however i don't think this is doable as is.

by default numbers are read into f64 but not decimal because for decimal you'll need to have a precision configured. that can't be inferred from the csv data itself. if you do need to do that either read as f64 and convert yourself (using cast) or pre-convert the data as parquet so that arrow can be correctly mapped.

@liukun4515
Copy link
Contributor Author

thanks for the issue @liukun4515 however i don't think this is doable as is.

by default numbers are read into f64 but not decimal because for decimal you'll need to have a precision configured. that can't be inferred from the csv data itself. if you do need to do that either read as f64 and convert yourself (using cast) or pre-convert the data as parquet so that arrow can be correctly mapped.

The precision is a problem, i found the issue from the datafusion when i use this sql to create a table.

CREATE EXTERNAL TABLE food (a DECIMAL(10,0), b INT) STORED AS CSV LOCATION 'data.csv';

I can read data from the csv file, so i want to implement the feature to support decimal data from csv file.

I think if user provide the schema with decimal data type, we should read csv file data and convert them to decimal type.

@liukun4515 liukun4515 changed the title Add decimal for CSVReader Add decimal for CSVReader if user provide the schema with decimal Nov 11, 2021
@liukun4515
Copy link
Contributor Author

If user read the csv file using the schema inferred from sample data, we just the FLOAT64 or UTF8 to read float point data.
@jimexist

@liukun4515
Copy link
Contributor Author

@jimexist please take a look this pull request #941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
2 participants