Don't emit unnecessary classes in HTML tables #9325

ThomasSoeiro · 2024-01-10T00:53:45Z

It is currently not possible to prevent pandoc from adding attributes to the HTML output from a Markdown input (e.g. .header, .odd, .even in the ReprEx below). It is only possible to drop attributes using filters.

Since both the CommonMark and GitHub Flavored Markdown specs do not mention default attributes in HTML output, shouldn't this be opt-in by default? Or possible to opt-out at least?

ReprEx

Using e.g. this input:

| foo | bar |
| --- | --- |
| baz | bim |
| baz | bim |

And converting to HTML using pandoc --from gfm --to html5, we get:

<table>
<thead>
<tr class="header">
<th>foo</th>
<th>bar</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>baz</td>
<td>bim</td>
</tr>
<tr class="even">
<td>baz</td>
<td>bim</td>
</tr>
</tbody>
</table>

The text was updated successfully, but these errors were encountered:

jgm · 2024-01-10T01:31:42Z

These are harmless; just don't add CSS rules that do things with them.

I don't think adding the option to avoid these is worth the increase in complexity.

jgm · 2024-01-10T01:33:10Z

Further note: the commonmark spec says

Note that not every feature of the HTML samples is mandated by the spec. For example, the spec says what counts as a link destination, but it doesn’t mandate that non-ASCII characters in the URL be percent-encoded. To use the automatic tests, implementers will need to provide a renderer that conforms to the expectations of the spec examples (percent-encoding non-ASCII characters in URLs). But a conforming implementation can use a different renderer and may choose not to percent-encode non-ASCII characters in URLs.

The spec is not about HTML output, it's about specifying how the commonmark document should be parsed into a structured document.

ThomasSoeiro · 2024-01-10T08:17:34Z

These are harmless; just don't add CSS rules that do things with them.

I agree this is not a big deal. However, these are common class names that are likely to be used elsewhere in a project. It would require to either drop them to reuse the class names, or use less meaninful class names.

jgm · 2024-01-10T16:36:29Z

Agreed -- we could use something like table-header, even-row, odd-row.
Of course, it would be a backwards-incompatible change, so I'm not sure it's a good idea.

ThomasSoeiro · 2024-01-11T15:41:12Z

My concern was for the sake of dropping "unecessary" classes to prevent name clashes since we can easely select .header using thead and .odd/even using variations of tbody tr:nth-child(2n).

Anyway, if you think theses classes are useful, you can close the issue as is.

Thanks a lot for your work on Pandoc!

jgm · 2024-01-11T16:40:53Z

My concern was for the sake of dropping "unecessary" classes to prevent name clashes since we can easely select .header using thead and .odd/even using variations of tbody tr:nth-child(2n).

This is true, though it wasn't in earlier versions of pandoc (when nth-child wasn't supported in CSS and we didn't put the header in thead!).

I think it might be worth stopping using these classes, and using the alternative you suggest instead.

tarleb · 2024-01-19T17:05:50Z

Could this be a "good first issue"?

jgm · 2024-01-19T17:16:48Z

Yes, it would be an easy one -- just have to change the HTML writer, the styles.html template, and some tests I think.

ThomasSoeiro · 2024-01-19T17:33:14Z

Could you point me to the HTML writer please?
(I'll have a look but I don't know Pandoc internals nor haskell...)

jgm · 2024-01-19T17:38:49Z

I think that if you can't find the HTML writer yourself, you're unlikely to be able to fix this issue, so I'll leave that as an exercise to the reader. :)

gregdan3 · 2024-06-07T03:14:16Z

For what it's worth, I just ran into this issue- I was using the classname header and I didn't want or expect the class to be on every table's first tr*. I would've used a lua filter to omit it, but I can't find a way to remove classes via a filter- I suspect that's related to #684?

But uh, my use case is incredibly silly.
I'm building my site with Pandoc, and I want it to be able to render readably on the Nintendo DS Browser. That browser throws out CSS rules for elements it doesn't recognize, and it doesn't recognize most semantic html elements including header- it does recognize CSS rules for classes though, so I (reasonably I thought) assigned a class header to the element header, and then moved all my header style rules to that class. And that worked great, until I spotted every first tr* with its content squashed to the left. Anyway, I'll just rename my header for now.

Funny enough, my silly usecase is exactly the one where I'd still want the odd/even/header classes to style, since I explicitly want to target older browsers that lack the CSS.

I'm only familiar with pandoc as a user, not a dev, but why not shunt these classes into an extension rather than remove them entirely?

*: corrected

bpj · 2024-06-07T08:18:40Z

Can you give an example of your (markdown?) source? If I understand you correctly you are adding a class .header to your heading elements like

## Heading {.header}

which conflicts with the header class which Pandoc adds automatically to <thead> elements in HTML?

It does indeed seem like your options are

to use another custom class name like .heading.¹
to post-process your HTML removing the header class from <thead> elements.

Obviously 1 is the easier option if possible.

A class .heading has the advantage of being terminologically correct: tables have headers but sections have headings. Pandoc calling its heading class Header is a misnomer (which it is too late to change!) ↩

bpj · 2024-06-07T08:51:35Z

This minimal Lua filter will change the class on all "Header" elements.

-- Predicate function to filter out 'header' class
local is_not_header_class = function(x)
  return 'header' ~= x
end

-- Global function to process heading elements
Header = function(head)
  -- -- Restrict to heading levels 2 through 4 (<h2>, <h3>, <h4>)
  -- if 2 > head.level or 4 < head.level then
  --   return nil -- leave unchanged
  -- end
  if head.classes:includes('header') then
    -- Make sure not to remove other classes!
    head.classes = head.classes:filter(is_not_header_class)
    head.classes:insert(1, 'heading')
    return head
  end
  -- Else leave unchanged
  return nil
end

gregdan3 · 2024-06-07T12:40:16Z

If I understand you correctly you are adding a class .header to your heading elements

Ah, sorry, I'm adding a class to my header element, which is not one that pandoc emits. I'm substituting my generated markdown into an HTML template to make the base of every page, since there's a fair amount of web specific stuff that markdown doesn't want or need to do.

My template, trimmed some:

<!DOCTYPE HTML>
<html lang="en-US">
  <head>
    <!-- metadata -->
  </head>
  <div class="body">
    <div class="-header"> 
        <!-- initial page content -->
    </div>

    <div class="article">
      $for(include-before)$ $include-before$ $endfor$ $body$
      $for(include-after)$ $include-after$ $endfor$
    </div>

    <div class="footer">
        <!-- end of page content -->
    </div>

  </div>
</html>

Example page:

---
title: test page!
date: 2024-05-30
author: gregdan3
description: a secret test page for all my formatting
---

# Tables

|     center aligned     | left aligned  | right aligned | default alignment |
| :--------------------: | :------------ | ------------: | ----------------- |
|        Item1.1         | Item2.1       |       Item3.1 | Item4.1           |
| **_bold italic item_** | Item2.2       |       Item3.2 | `mono item`       |
|        Item1.3         | **bold item** |       Item3.3 | Item4.3           |
|        Item1.4         | Item2.4       |       Item3.4 | Item4.4           |

Gluing these together:

cat pages/test.md | pandoc --lua-filter=pandoc/filters.lua --from=markdown+yaml_metadata_block+wikilinks_title_after_pipe-definition_lists-smart \
        --template=templates/default.html \
        --metadata="directory:test.md" \
        -o build/test.html

And the result:

<!DOCTYPE html>
<html lang="en-US">
  <head>
    <!-- metadata -->
  </head>
  <div class="body">
    <div class="-header">
      <!-- initial page content -->
    </div>

    <div class="article">
       <h1 id="tables">Tables</h1>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">center aligned</th>
<th style="text-align: left;">left aligned</th>
<th style="text-align: right;">right aligned</th>
<th>default alignment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Item1.1</td>
<td style="text-align: left;">Item2.1</td>
<td style="text-align: right;">Item3.1</td>
<td>Item4.1</td>
</tr>
<tr class="even">
<td style="text-align: center;"><strong><em>bold italic
item</em></strong></td>
<td style="text-align: left;">Item2.2</td>
<td style="text-align: right;">Item3.2</td>
<td><code>mono item</code></td>
</tr>
<tr class="odd">
<td style="text-align: center;">Item1.3</td>
<td style="text-align: left;"><strong>bold item</strong></td>
<td style="text-align: right;">Item3.3</td>
<td>Item4.3</td>
</tr>
<tr class="even">
<td style="text-align: center;">Item1.4</td>
<td style="text-align: left;">Item2.4</td>
<td style="text-align: right;">Item3.4</td>
<td>Item4.4</td>
</tr>
</tbody>
</table>
      
    </div>

    <div class="footer">
      <!-- end of page content -->
    </div>
  </div>
</html>

Also, I was mistaken before; it was the first tr being given the class header, not thead.

bpj · 2024-06-07T18:32:47Z

I think you need to post-process the HTML.

I would do it with either of

Perl and Mojo::DOM (but be warned: this depends on all of
Mojolicious!)
Python and BeautifulSoup.

These two have pretty similar interfaces:

Perl code:

use 5.016;
use utf8;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;

use Path::Tiny qw[path];
use Mojo::DOM;

my $file = path 'test.html';

my $html = $file->slurp_utf8;

my $dom = Mojo::DOM->new($html);

my $fix_classes = sub {
  my($elem) = @_;
  if ( 'header' eq $elem->{class} ) {
    delete $elem->{class};
  }
  else {
    $elem->{class} =~ s!\bheader\b!!;
  }
};

$dom->find('tr.header')->each($fix_classes);

$file->spew_utf8($dom);

Python code:

from bs4 import BeautifulSoup

with open('test.html', mode='r' encoding='UTF-8') as fh:
  text = fh.read()
  soup = BeautifulSoup(text, 'html.parser')

for tr in soup.select('tr.header'):
  if 1 == len(tr['class']):
    del tr['class']
  else:
    tr['class'] = [c for c in tr['class'] if 'header' != c]
    
open('test.html', mode='w', encoding='UTF-8').write(soup.prettify())

ThomasSoeiro added the enhancement label Jan 10, 2024

jgm changed the title ~~Option to prevent pandoc from adding attributes to the HTML output from a Markdown input~~ Don't emit unnecessary classes in HTML tables Jan 11, 2024

jgm added format:HTML writer labels Jan 11, 2024

jgm added the good first issue label Jan 19, 2024

ThomasSoeiro mentioned this issue Jan 26, 2024

Don't emit unnecessary classes in HTML tables (#9325) #9376

Merged

jgm closed this as completed in cd15313 Jun 7, 2024

cderv mentioned this issue Jun 25, 2024

check change in Pandoc 3.2.1 regarding some CSS rules for table quarto-dev/quarto-cli#10120

Closed

cderv mentioned this issue Sep 27, 2024

Markdown tables isn't formatted in newer Quarto versions quarto-dev/quarto-cli#10904

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't emit unnecessary classes in HTML tables #9325

Don't emit unnecessary classes in HTML tables #9325

ThomasSoeiro commented Jan 10, 2024 •

edited

Loading

jgm commented Jan 10, 2024

jgm commented Jan 10, 2024

ThomasSoeiro commented Jan 10, 2024

jgm commented Jan 10, 2024

ThomasSoeiro commented Jan 11, 2024

jgm commented Jan 11, 2024 •

edited

Loading

tarleb commented Jan 19, 2024

jgm commented Jan 19, 2024

ThomasSoeiro commented Jan 19, 2024

jgm commented Jan 19, 2024

gregdan3 commented Jun 7, 2024 •

edited

Loading

bpj commented Jun 7, 2024

bpj commented Jun 7, 2024

gregdan3 commented Jun 7, 2024

bpj commented Jun 7, 2024

Don't emit unnecessary classes in HTML tables #9325

Don't emit unnecessary classes in HTML tables #9325

Comments

ThomasSoeiro commented Jan 10, 2024 • edited Loading

ReprEx

jgm commented Jan 10, 2024

jgm commented Jan 10, 2024

ThomasSoeiro commented Jan 10, 2024

jgm commented Jan 10, 2024

ThomasSoeiro commented Jan 11, 2024

jgm commented Jan 11, 2024 • edited Loading

tarleb commented Jan 19, 2024

jgm commented Jan 19, 2024

ThomasSoeiro commented Jan 19, 2024

jgm commented Jan 19, 2024

gregdan3 commented Jun 7, 2024 • edited Loading

bpj commented Jun 7, 2024

Footnotes

bpj commented Jun 7, 2024

gregdan3 commented Jun 7, 2024

bpj commented Jun 7, 2024

ThomasSoeiro commented Jan 10, 2024 •

edited

Loading

jgm commented Jan 11, 2024 •

edited

Loading

gregdan3 commented Jun 7, 2024 •

edited

Loading