Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select header row number when reading CSV files #781

Closed
JonGretar opened this issue Dec 21, 2023 · 2 comments · Fixed by #782
Closed

Select header row number when reading CSV files #781

JonGretar opened this issue Dec 21, 2023 · 2 comments · Fixed by #782

Comments

@JonGretar
Copy link
Contributor

It would be helpful to add a :header_row option to the reading of CSV files. And that this is separate from the :skip_rows option.
It is not uncommon, especially when working with scientific equipment that the header might not be in the first row and also that there might be non-data rows after it.

As an example I point to the eddy covariance data example.

"TOA5","6843","CR3000","6843","CR3000.Std.22","CPU:CA_Flux__GOOD.CR3","24006","ts_Above"
"TIMESTAMP","RECORD","Ux","Uy","Uz","co2","h2o","Ts","press","diag_csat"
"TS","RN","m/s","m/s","m/s","mg/m^3","g/m^3","C","kPa","m/s"
"","","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp"
"2012-06-07 13:00:00.05",111868400,0.468,-0.9077501,0.1785,659.7584,9.530561,28.52527,100.1938,0
"2012-06-07 13:00:00.1",111868401,0.60275,-1.0795,0.283,660.0234,9.492132,28.51141,100.1938,0
....
  • Here the first row is data about the equipment.
  • The second row is the column names.
  • Third row are the units.
  • Fourth is other metadata
  • And then the data finally starts.

Of course reading this is not complex. Just use skip_rows: 1 and then delete the first two rows in the dataframe. But this is such a common pattern in scientific data that it might be worth considering supporting it inside the read_csv/2 function.

Of course I would also love to be able to save the units row as a series attribute. But that is a discussion for another issue. 😉

@josevalim
Copy link
Member

If this is supported in polars, then 👍 for a PR that adds this.

@JonGretar
Copy link
Contributor Author

Hmmm

Polars has 'skip_rows_after_header'. I'll take a look at adding that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants