Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header attributes #4

Closed
mironovalexey opened this issue May 15, 2019 · 9 comments
Closed

Header attributes #4

mironovalexey opened this issue May 15, 2019 · 9 comments

Comments

@mironovalexey
Copy link
Contributor

It seems that parser.WithAttribute() pocessinп does not always work well.

https://github.com/mironovalexey/gm-test/tree/master/hattrs

https://github.com/mironovalexey/gm-test/blob/master/hattrs/test.md

@yuin
Copy link
Owner

yuin commented May 15, 2019

Thanks, I've fixed the issue.

@yuin yuin closed this as completed May 15, 2019
@mironovalexey
Copy link
Contributor Author

@yuin, didn't you consider to avoid closing # signs during atriibutes {container} extraction?

The most common syntax is

# Title {#id ...}

In fact, I've never seen anyone use markup like # Title #. Moreover, such syntax is optional.

https://github.com/yuin/goldmark/blob/master/parser/atx_heading.go#L96

@yuin
Copy link
Owner

yuin commented May 15, 2019

@mironovalexey

I do not consider to avoid closing # for now.

  1. PHP Markdown Extra's attribute example has closing '#'.

  2. Currently, goldmark does not have a way to parse attributes for inline elements.
    Omitting closing '#' means attributes can not be parsed in block element parsing processes due
    to determining whether '{' indicates attribute opens we must parse inline elements.

    Example:

     ## title {**{attr="this is not attributes"}**}
    
     ## title **{attr="this is attribute"}
    

I'm planning to implement attributes for other elements, but it is ambivalent what forms is the best.
People involved CommonMark has discussed about Attributes a long time, but there is no conclusion.

@mironovalexey
Copy link
Contributor Author

## title {**{attr="this is not attributes"}**} is a really rare case and it is always possible to escape the curly brackets: ## title \{**\{attr="this is not attributes"\}**\}.

DITA: https://github.com/jelovirt/org.lwdita/blob/master/src/test/resources/markdown/header_attributes.md
Kramdown: https://kramdown.gettalong.org/syntax.html#specifying-a-header-id
Mmark (gomarkdown): https://mmark.nl/post/syntax/
Python: https://python-markdown.github.io/extensions/attr_list/ (in fact ':' is not needed)
Markdown-it: https://www.npmjs.com/package/markdown-it-attrs
Pandoc: https://pandoc.org/MANUAL.html#extension-header_attributes

A have seen a lot of MD docs and in fact most of them use #Title {...} syntax. Such behavior can be perceived as an agreement and can be switched on by an extension parameter. This gives an excellent compatibility with huge number of existing markdown documents (hugo pages:) and significantly simplifies the syntax.

This is quite an important case, so I pay such attention to it.

@yuin
Copy link
Owner

yuin commented May 15, 2019

You right, but CommonMark includes many many such "really rare cases" . This is why markdown parsers based on CommonMark barely exist. And ignoring such rare cases is why markdown has so many dialects and less compatibility. CommonMark does not accept spec only because "it is de facto standard" . If not, discussion about attributes can not continue for several years.

goldmark has basically the same policy.

Your example includes attributes for other elements. If we have attributes for other element, we probably can parse consistently. But If we implement this feature now , it will be an ad-hoc implementation.

Look, I do not say "I'm not planning to implement this feature". This should be designed more carefully including attributes for other elements and it probably take time.

@mironovalexey
Copy link
Contributor Author

Got it, thank you. But extra space in headings still alive.
# First # -> <h1> First</h1>

@yuin
Copy link
Owner

yuin commented May 15, 2019

I've fixed invalid leading spaces.

About attributes, in addition, header ids currently do not support multiline attributes. If we support muiltiline attributes, things will be more complex.

> > ## header {#id a-long-attribute="aaaa
> > aaaaaaaaaaaaaaaaaaa"}

This is very hard to parse. Honestly, I hesitated implementing header attributes because I can not implement attributes consistently.
Because this style attributes break core CommonMark parsing strategy: Parsing blocks first, and then inline elements.

One possible solution is allowing multiple attributes on one element but one attribute must be on single line.

>> ## header
>> {#id}
>> {.class}
>> {a-long-attribute="aaaaaaaaaaaaaaaaaaaaaaaaaaaa"}

This style of attributes are easy to parse. If we found '{' on the line head in block parsing phase, we try to parse this line as an attribute. If we can parse it and there are no blank lines before it, this attribute belongs previous sibling.

@yuin
Copy link
Owner

yuin commented May 16, 2019

@mironovalexey

I've got many stars unexpectedly yesterday, so this project will possibly be used in many projects.
(I started this project just for my personal projects. For my usage, header ids without '#' is not necessary.)

I know that many users need header ids without closing '#', so I've added your requested feature anyhow.

But as described above, the attribute syntax is being discussed in the CommonMark forum.
I added a disclaimer to the README.

Attributes are being discussed in the CommonMark forum. This syntax possibly changes in the future.

@mironovalexey
Copy link
Contributor Author

@yuin

You are absolutely right to think carefully about supporting the attributes of any element. But header attributes is probably one of the most important and probably the most simple case. Header is always a single line, and HeaderAttributes is an extension out of commonmark spec. This extension can be based on the agremeent that {...} (or {<patten>} ) container at the end of line
is treated as an attributes container.

https://spec.commonmark.org/0.29/#atx-heading

Look here - that's how Jarno do it for DITA-OT. I do not mean that regex is needed.

Talking about markdown in general... Markdown isn't either extendable or structured by design. This is for easy read / easy write.

It seems to me that a new language should appear. Probably, it will be something between MD and RST.

Your syntax

## header
{#id}
{.class}
{a-long-attribute="aaaaaaaaaaaaaaaaaaaaaaaaaaaa"}

->

## header
{
id="the_id"
class="the_class"
other="other_attr"
}

->
RST (has no ATX headings, so this is an img)

.. image:: picture.jpeg
   :height: 100px
   :width: 200 px
   :scale: 50 %
   :alt: alternate text
   :align: right

I've got many stars unexpectedly yesterday, so this project will possibly be used in many projects.

I've expected it :)

I want to congratulate you on your success. When I saw your code, I was very impressed. And I hope that this project will have a long and happy life.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants