Add read_markdown_table #104

sjackman · 2015-04-01T18:00:41Z

Read Markdown tables. For example:

A|B|C|D
-|-|-|-
0|1|2|3
4|5|6|7

A	B	C	D
0	1	2	3
4	5	6	7

To do:

Determine automatically whether the column names are present.
Remove leading and trailing white space around each cell.

hadley · 2015-04-02T13:21:15Z

This doesn't seem quite right to me - you're manually dropping the second line (i.e. the dashes) but the column guesser will still see it so I think every column will be imported as text. Instead I think you'll need to read the column names yourself (since they're always required in this format), then call read_delim() with those names and nskip = 2. Does that make sense?

Read Markdown tables. For example: A|B|C|D -|-|-|- 0|1|2|3 4|5|6|7 To do: Determine automatically whether the column names are present. Remove leading and trailing white space around each cell.

sjackman · 2015-04-02T16:57:39Z

Hi, Hadley. The column names are in fact optional (see below), at least for Pandoc. The row of hyphens is not optional. See Extension: pipe_tables. GitHub Markdown preview doesn't like a missing header line. Good observation about the column formats. I hadn't noticed that. I've updated the PR.

It duplicates a lot of code from read_delimited. Would you rather add an additional argument to read_delimited named skip_after_header?

-|-|-|-
0|1|2|3
4|5|6|7

0	1	2	3
4	5	6	7

sjackman · 2015-04-03T21:38:47Z

@hadley I'd love for this function to make it into your CRAN submission. I'm happy to do what's needed to get it in. How are you feeling about this PR? Would you prefer adding the option skip_after_header to read_delimited to avoid repetition of code?

hadley · 2015-09-23T11:32:21Z

I think this looks ok, but I'd rather call it read_md_table() (or maybe read_table_md()?), and it needs a few unit tests.

hadley · 2015-09-28T16:03:09Z

If you can do by Friday, I can include in the next version of readr

sjackman · 2015-09-30T05:09:00Z

Hi, Hadley. Yep, I think I can get this done by Friday. I'll need to add a skip_after_header parameter to both col_spec_standardise and read_delimited to skip the row of hyphens following the header line. Does that work for you?

hadley · 2015-09-30T12:57:53Z

Hmmm, I really want to avoid have skip before and after header parameters.

sjackman · 2015-09-30T16:10:18Z

I can duplicate the code of read_delimited—I've already done so in this PR—but I'd rather not duplicate the code of col_spec_standardise. Any alternative suggestion?

sjackman · 2015-09-30T17:35:15Z

I could see a skip_after_header option also having a use in loading the last million records of a billion-record TSV file.

hadley · 2015-10-12T14:46:11Z

Hmmm, I think that calls for a more generalised solution which I don't have time to think about at the moment. Lets try again for the next version.

jimhester · 2018-11-13T19:51:18Z

While this idea I think is interesting in practice the number of documents which contain just markdown tables and nothing else is very small, so it is hard to envision cases when this would be used.

Since it has been languishing for a number of years I think it should be closed.

sjackman · 2018-11-13T22:12:09Z

No worries at all, Jim.

We discussed reading Markdown tables using rio over at gesistsa/rio#70
The final solution was to convert Markdown tables to HTML, and then read the HTML tables, which rio already supports.
As a bonus, rio supports reading multiple tables from a single document gesistsa/rio#126

Add read_markdown_table

96e6499

Read Markdown tables. For example: A|B|C|D -|-|-|- 0|1|2|3 4|5|6|7 To do: Determine automatically whether the column names are present. Remove leading and trailing white space around each cell.

sjackman force-pushed the read_markdown_table branch from 8199a9d to 96e6499 Compare April 2, 2015 16:51

sjackman mentioned this pull request Jan 20, 2016

Import from markdown tables? gesistsa/rio#70

Closed

hadley added the in progress label Jun 2, 2016

hadley removed the in progress label Feb 10, 2017

jimhester closed this Nov 13, 2018

sjackman deleted the read_markdown_table branch November 13, 2018 22:12

sjackman restored the read_markdown_table branch November 13, 2018 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read_markdown_table #104

Add read_markdown_table #104

sjackman commented Apr 1, 2015

hadley commented Apr 2, 2015

sjackman commented Apr 2, 2015

sjackman commented Apr 3, 2015

hadley commented Sep 23, 2015

hadley commented Sep 28, 2015

sjackman commented Sep 30, 2015

hadley commented Sep 30, 2015

sjackman commented Sep 30, 2015

sjackman commented Sep 30, 2015

hadley commented Oct 12, 2015

jimhester commented Nov 13, 2018

sjackman commented Nov 13, 2018

Add read_markdown_table #104

Add read_markdown_table #104

Conversation

sjackman commented Apr 1, 2015

hadley commented Apr 2, 2015

sjackman commented Apr 2, 2015

sjackman commented Apr 3, 2015

hadley commented Sep 23, 2015

hadley commented Sep 28, 2015

sjackman commented Sep 30, 2015

hadley commented Sep 30, 2015

sjackman commented Sep 30, 2015

sjackman commented Sep 30, 2015

hadley commented Oct 12, 2015

jimhester commented Nov 13, 2018

sjackman commented Nov 13, 2018