Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default value of datafusion.catalog.has_header to true #11936

Closed
alamb opened this issue Aug 11, 2024 · 2 comments · Fixed by #11919
Closed

Change default value of datafusion.catalog.has_header to true #11936

alamb opened this issue Aug 11, 2024 · 2 comments · Fixed by #11919
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 11, 2024

Is your feature request related to a problem or challenge?

As @jgranduel notes in #11848

I think most people would expect when reading this data as a csv file, we would get two columns a and b

a,b
1,2

However, datafusion currently treats the CSV file as "without header" by default resulting in

> datafusion-cli.exe
DataFusion CLI v40.0.0
> select * from './data_dir/';
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| a        | b        |
| 1        | 2        |
+----------+----------+
4 row(s) fetched.

Describe the solution you'd like

I propose removing this "rough edge" and changing the datafusion default behavior to be what I think most people would expect

+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+

Describe alternatives you've considered

I think we should change the default vale of datafusion.catalog.has_header from false to true

/// Default value for `format.has_header` for `CREATE EXTERNAL TABLE`
/// if not specified explicitly in the statement.
pub has_header: bool, default = false

Additional context

No response

@korowa
Copy link
Contributor

korowa commented Aug 11, 2024

I think it'll be fine because there is already CSVReadOptions with default has_header: true (link), which produces some inconsistency in default handling csv.

@alamb
Copy link
Contributor Author

alamb commented Aug 12, 2024

I think default to "no headers" is unlikely to have been a conscious choice (I suspect it was a historical accident)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants