Select header row number when reading CSV files #781

JonGretar · 2023-12-21T12:10:22Z

It would be helpful to add a :header_row option to the reading of CSV files. And that this is separate from the :skip_rows option.
It is not uncommon, especially when working with scientific equipment that the header might not be in the first row and also that there might be non-data rows after it.

As an example I point to the eddy covariance data example.

"TOA5","6843","CR3000","6843","CR3000.Std.22","CPU:CA_Flux__GOOD.CR3","24006","ts_Above"
"TIMESTAMP","RECORD","Ux","Uy","Uz","co2","h2o","Ts","press","diag_csat"
"TS","RN","m/s","m/s","m/s","mg/m^3","g/m^3","C","kPa","m/s"
"","","Smp","Smp","Smp","Smp","Smp","Smp","Smp","Smp"
"2012-06-07 13:00:00.05",111868400,0.468,-0.9077501,0.1785,659.7584,9.530561,28.52527,100.1938,0
"2012-06-07 13:00:00.1",111868401,0.60275,-1.0795,0.283,660.0234,9.492132,28.51141,100.1938,0
....

Here the first row is data about the equipment.
The second row is the column names.
Third row are the units.
Fourth is other metadata
And then the data finally starts.

Of course reading this is not complex. Just use skip_rows: 1 and then delete the first two rows in the dataframe. But this is such a common pattern in scientific data that it might be worth considering supporting it inside the read_csv/2 function.

Of course I would also love to be able to save the units row as a series attribute. But that is a discussion for another issue. 😉

The text was updated successfully, but these errors were encountered:

josevalim · 2023-12-21T12:16:39Z

If this is supported in polars, then 👍 for a PR that adds this.

JonGretar · 2023-12-21T13:52:39Z

Hmmm

Polars has 'skip_rows_after_header'. I'll take a look at adding that.

JonGretar mentioned this issue Dec 21, 2023

Support for skip_rows_after_header option in reading csv files #782

Merged

billylanchantin closed this as completed in #782 Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select header row number when reading CSV files #781

Select header row number when reading CSV files #781

JonGretar commented Dec 21, 2023

josevalim commented Dec 21, 2023

JonGretar commented Dec 21, 2023

Select header row number when reading CSV files #781

Select header row number when reading CSV files #781

Comments

JonGretar commented Dec 21, 2023

josevalim commented Dec 21, 2023

JonGretar commented Dec 21, 2023