Headerless tables from DOCX #10513
-
Hi, is it possible to read a headerless table from DOCX? Converting from docx always reads the first line as a header, but in some cases this isn't what I want. Is it possible to handle different kinds of tables differently? (In this particular case, I can just run some xsl on the result html as I'm using a xproc post-processing pipeline already. But a generic solution would be much better and also more robust and flexible.) |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
I tried to follow the hint for headerless tables in markdown, and it works somehow. I wrote a docx file with a table like this (the first line is made of blank cells):
Then this is the result of ----------------------- ----------------------- -----------------------
a b c
d e f
----------------------- ----------------------- ----------------------- So the markdown This is the result of <table>
<colgroup>
<col style="width: 33%" />
<col style="width: 33%" />
<col style="width: 33%" />
</colgroup>
<tbody>
<tr>
<td>a</td>
<td>b</td>
<td>c</td>
</tr>
<tr>
<td>d</td>
<td>e</td>
<td>f</td>
</tr>
</tbody>
</table> Also the html And this is the result of
Looking at the native output, you can see that the first row of blank cell is still there, so I think the headerless table is a result of the Writers ( And, of course, you must provide docx documents with tables that have the first line made of blank cells. |
Beta Was this translation helpful? Give feedback.
-
I think that in the case of the markdown writer, it is, effectively, emitting the empty row but it's being collapsed by some logic that collapses excess blank lines. If you do In the case of the HTML writer, there is some logic that emits empty rows: |
Beta Was this translation helpful? Give feedback.
-
Would it make sense to assume a header would be formatted differently, e.g. in bold? So, a table without a first row formatted in bold would be headerless? |
Beta Was this translation helpful? Give feedback.
-
The docx reader can distinguish between tables with and without a header row: see |
Beta Was this translation helpful? Give feedback.
-
Ok, I found it. There's a setting that sets the first row as a header. Who could have guessed that... Sorry for the noise. |
Beta Was this translation helpful? Give feedback.
Ok, I found it. There's a setting that sets the first row as a header.
Who could have guessed that... Sorry for the noise.