-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't emit unnecessary classes in HTML tables #9325
Comments
These are harmless; just don't add CSS rules that do things with them. I don't think adding the option to avoid these is worth the increase in complexity. |
Further note: the commonmark spec says
The spec is not about HTML output, it's about specifying how the commonmark document should be parsed into a structured document. |
I agree this is not a big deal. However, these are common class names that are likely to be used elsewhere in a project. It would require to either drop them to reuse the class names, or use less meaninful class names. |
Agreed -- we could use something like table-header, even-row, odd-row. |
My concern was for the sake of dropping "unecessary" classes to prevent name clashes since we can easely select Anyway, if you think theses classes are useful, you can close the issue as is. Thanks a lot for your work on Pandoc! |
This is true, though it wasn't in earlier versions of pandoc (when nth-child wasn't supported in CSS and we didn't put the header in thead!). I think it might be worth stopping using these classes, and using the alternative you suggest instead. |
pandoc
from adding attributes to the HTML output from a Markdown input
Could this be a "good first issue"? |
Yes, it would be an easy one -- just have to change the HTML writer, the styles.html template, and some tests I think. |
Could you point me to the HTML writer please? |
I think that if you can't find the HTML writer yourself, you're unlikely to be able to fix this issue, so I'll leave that as an exercise to the reader. :) |
For what it's worth, I just ran into this issue- I was using the classname But uh, my use case is incredibly silly. Funny enough, my silly usecase is exactly the one where I'd still want the odd/even/header classes to style, since I explicitly want to target older browsers that lack the CSS. I'm only familiar with pandoc as a user, not a dev, but why not shunt these classes into an extension rather than remove them entirely? *: corrected |
Can you give an example of your (markdown?) source? If I understand you correctly you are adding a class
which conflicts with the It does indeed seem like your options are
Obviously 1 is the easier option if possible. Footnotes
|
This minimal Lua filter will change the class on all "Header" elements. -- Predicate function to filter out 'header' class
local is_not_header_class = function(x)
return 'header' ~= x
end
-- Global function to process heading elements
Header = function(head)
-- -- Restrict to heading levels 2 through 4 (<h2>, <h3>, <h4>)
-- if 2 > head.level or 4 < head.level then
-- return nil -- leave unchanged
-- end
if head.classes:includes('header') then
-- Make sure not to remove other classes!
head.classes = head.classes:filter(is_not_header_class)
head.classes:insert(1, 'heading')
return head
end
-- Else leave unchanged
return nil
end |
Ah, sorry, I'm adding a class to my My template, trimmed some: <!DOCTYPE HTML>
<html lang="en-US">
<head>
<!-- metadata -->
</head>
<div class="body">
<div class="-header">
<!-- initial page content -->
</div>
<div class="article">
$for(include-before)$ $include-before$ $endfor$ $body$
$for(include-after)$ $include-after$ $endfor$
</div>
<div class="footer">
<!-- end of page content -->
</div>
</div>
</html> Example page: ---
title: test page!
date: 2024-05-30
author: gregdan3
description: a secret test page for all my formatting
---
# Tables
| center aligned | left aligned | right aligned | default alignment |
| :--------------------: | :------------ | ------------: | ----------------- |
| Item1.1 | Item2.1 | Item3.1 | Item4.1 |
| **_bold italic item_** | Item2.2 | Item3.2 | `mono item` |
| Item1.3 | **bold item** | Item3.3 | Item4.3 |
| Item1.4 | Item2.4 | Item3.4 | Item4.4 | Gluing these together: cat pages/test.md | pandoc --lua-filter=pandoc/filters.lua --from=markdown+yaml_metadata_block+wikilinks_title_after_pipe-definition_lists-smart \
--template=templates/default.html \
--metadata="directory:test.md" \
-o build/test.html And the result: <!DOCTYPE html>
<html lang="en-US">
<head>
<!-- metadata -->
</head>
<div class="body">
<div class="-header">
<!-- initial page content -->
</div>
<div class="article">
<h1 id="tables">Tables</h1>
<table>
<thead>
<tr class="header">
<th style="text-align: center;">center aligned</th>
<th style="text-align: left;">left aligned</th>
<th style="text-align: right;">right aligned</th>
<th>default alignment</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Item1.1</td>
<td style="text-align: left;">Item2.1</td>
<td style="text-align: right;">Item3.1</td>
<td>Item4.1</td>
</tr>
<tr class="even">
<td style="text-align: center;"><strong><em>bold italic
item</em></strong></td>
<td style="text-align: left;">Item2.2</td>
<td style="text-align: right;">Item3.2</td>
<td><code>mono item</code></td>
</tr>
<tr class="odd">
<td style="text-align: center;">Item1.3</td>
<td style="text-align: left;"><strong>bold item</strong></td>
<td style="text-align: right;">Item3.3</td>
<td>Item4.3</td>
</tr>
<tr class="even">
<td style="text-align: center;">Item1.4</td>
<td style="text-align: left;">Item2.4</td>
<td style="text-align: right;">Item3.4</td>
<td>Item4.4</td>
</tr>
</tbody>
</table>
</div>
<div class="footer">
<!-- end of page content -->
</div>
</div>
</html> Also, I was mistaken before; it was the first |
I think you need to post-process the HTML. I would do it with either of
These two have pretty similar interfaces: Perl code: use 5.016;
use utf8;
use strict;
use warnings;
use warnings FATAL => 'utf8';
use autodie;
use Path::Tiny qw[path];
use Mojo::DOM;
my $file = path 'test.html';
my $html = $file->slurp_utf8;
my $dom = Mojo::DOM->new($html);
my $fix_classes = sub {
my($elem) = @_;
if ( 'header' eq $elem->{class} ) {
delete $elem->{class};
}
else {
$elem->{class} =~ s!\bheader\b!!;
}
};
$dom->find('tr.header')->each($fix_classes);
$file->spew_utf8($dom); Python code: from bs4 import BeautifulSoup
with open('test.html', mode='r' encoding='UTF-8') as fh:
text = fh.read()
soup = BeautifulSoup(text, 'html.parser')
for tr in soup.select('tr.header'):
if 1 == len(tr['class']):
del tr['class']
else:
tr['class'] = [c for c in tr['class'] if 'header' != c]
open('test.html', mode='w', encoding='UTF-8').write(soup.prettify()) |
It is currently not possible to prevent
pandoc
from adding attributes to the HTML output from a Markdown input (e.g..header
,.odd
,.even
in the ReprEx below). It is only possible to drop attributes using filters.Since both the CommonMark and GitHub Flavored Markdown specs do not mention default attributes in HTML output, shouldn't this be opt-in by default? Or possible to opt-out at least?
ReprEx
Using e.g. this input:
And converting to HTML using
pandoc --from gfm --to html5
, we get:The text was updated successfully, but these errors were encountered: