Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow comment tags (<!, --, and >) to be nested. #10153

Closed
RokeJulianLockhart opened this issue Feb 22, 2024 · 15 comments
Closed

Allow comment tags (<!, --, and >) to be nested. #10153

RokeJulianLockhart opened this issue Feb 22, 2024 · 15 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@RokeJulianLockhart
Copy link

RokeJulianLockhart commented Feb 22, 2024

What problem are you trying to solve?

The undermentioned correctly-used HTML comment tags:

<!-- -->

...cannot be nested like:

<!--
	<!-- -->
-->

This means that using comments to temporarily remove well-described code in order to debug it is immensely difficult in HTML.

What solutions exist today?

Some IDE extensions automatically break the tags in non-standardized manners, like github.com/philsinatra/NestedCommentsVSCode/blob/9c25135847af99c89e66b14e7396a3bf2b0d7cf0/README.md?plain=1#L36-L42:

<main>
  <div class="container">
    <h2>Hello World</h2>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>
    <!-- <p>Lorem ipsum dolor, sit amet consectetur adipisicing elit.</p> -->
  </div>
</main>

Becomes:

<!-- <main>
  <div class="container">
    <h2>Hello World</h2>
    <p>Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>
    <!~~ <p>Lorem ipsum dolor, sit amet consectetur adipisicing elit.</p> ~~>
  </div>
</main> -->

However, I could also use a custom tag. This may appear as if it is the immediately easier solution to this, but significant disadvantages to immediately choosing this option exist:

Choosing to add a new (for instance, <comment>) tag means that not solely must I propose and successfully implement a new element, I must additionally deprecate the previous absurdly widely-used <!-- --> element. This seems to me as if it would cause more disruption, because we shouldn't leave both in, lest they duplicate functionality.

Considering that modifying the parsing of the existent comment tag wouldn't affect HTML users - developers - in any manner I can foresee, modifying the engines to allow the comment tag to be nested sounds like a better idea.

I would like to eventually propose a <comment> tag. However, this seems to me to be a less disruptive modification to the specification.

How would you solve it?

I would permit the tags to be nested. I would like this support to be unanimous to avoid situations like stackoverflow.com/revisions/6698115/2:

If the compiler doesn't allow nesting, the first */ will terminate the opening of the multiline comment, meaning the 0 won't be commented out. Written with some spaces:

int nest = /*/*/ 0 * /**/ 1;

resulting in the code

int nest = 0 * 1; // -> 0

If it allows nesting, it will be

int nest = /*/*/0*/**/ 1;

resulting in

int nest = 1;

Anything else?

  1. For those unaware of why this isn't already possible, stackoverflow.com/revisions/12102131/6 explains it well.
  2. I've posted this to new.reddit.com/user/rokejulianlockhart/duplicates/1b69fcv, so that it might become more visible.
@RokeJulianLockhart RokeJulianLockhart added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Feb 22, 2024
@annevk
Copy link
Member

annevk commented Mar 4, 2024

We cannot make a breaking change like this to how HTML is parsed. Especially with so little justification. I hope you understand.

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
@RokeJulianLockhart

This comment was marked as resolved.

@annevk
Copy link
Member

annevk commented Mar 4, 2024

https://whatwg.org/working-mode#changes describes the process, but in this specific case I know that it's extremely unlikely you'll get implementer support as making a breaking change to how HTML is parsed is not done lightly.

I can tell you that WebKit would not want this change to be made, but maybe you can convince other implementers. I'll reopen for now given that you don't seem convinced.

@annevk annevk reopened this Mar 4, 2024
@RokeJulianLockhart
Copy link
Author

#10153 (comment)

Thank you, @annevk. I'm aware that this shall be an uphill battle, considering how much time this issue has been left unfixed.

@RokeJulianLockhart
Copy link
Author

#10153 (comment)

@annevk, I know solely that Google, Mozilla, and Apple are stakeholders in this, so do you know of any implementers I might have missed? Additionally, how do you suggest I contact them? I've always created issues in their respective public bug trackers - is that the standard way to propose things like this?

@kbrosnan
Copy link

kbrosnan commented Mar 6, 2024

I can see your enthusiasm for this feature. This change would be a breaking change to how HTML is parsed. This would break correctly parsed pages that currently exist. It would add a significant layer of complexity when parsing HTML.

This change would need an exceptional use case to be considered. Easier debugging is not exceptional. Some complied languages support this feature, again not exceptional.

Filing bugs, Anne already said that WebKit would not accept such a change. For Mozilla it would get duplicated to an old invalid bug like 195133. I expect a similar response from Blink/Chrome developers.

I suspect that this feels circular but you are proposing a fundamental change that affects any page that uses comments. Such a change is not taken lightly.

@RokeJulianLockhart
Copy link
Author

RokeJulianLockhart commented Mar 6, 2024

#10153 (comment)

@kbrosnan, thanks. Indeed, I recognise what you've stated.

@zcorpan
Copy link
Member

zcorpan commented Mar 14, 2024

We will not implement this for Gecko.

As already mentioned, this would break existing pages that have this syntax and expect the current behavior. This alone is a sufficient blocker.

But I think there's also another aspect here. When changing how HTML parsing works, we need to be consider if it introduces a new XSS problem. I believe this would: let's say a website allows untrusted input but uses an HTML-parser based sanitizer to remove anything that can run scripts. Comments are considered safe. If the website's backend implements this change, but the user's browser does not (it takes a long time for all users to update even if all browsers were to ship a coordinated change), then the user is now vulnerable to XSS.

What you could do instead is to use the template element, which can be nested and makes stuff inside it inert.

For those unaware of why this isn't already possible, https://stackoverflow.com/a/12102131/9731176 explains it well.

The SGML-style comment parsing issue was solved in the spec in 2006, so that is not relevant.

@zcorpan zcorpan closed this as completed Mar 14, 2024
@zcorpan zcorpan closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2024
@RokeJulianLockhart RokeJulianLockhart changed the title Nested comment tags. Allow comment tags (<!, --, and >) to be nested. Aug 1, 2024
@RokeJulianLockhart
Copy link
Author

RokeJulianLockhart commented Aug 23, 2024

#10153 (comment)

Thanks for all of your inputs, and I apologize for taking so long considering how much effort you put in to consider this proposal. It's quite evident to me that the manner in which comment tags are currently parsed by most renderers isn't equivalent to other tags are, and that we consequently can't modify them without introducing XSS attacks to most of the web.

Instead:

  1. Would a proposal to include a <comment> tag be rejected because it would duplicate the functionality already provided by the existent tags, or would that be a feasible alternative? This is, of course, already possible if the user instantiates the tag in their webpage. However, it's rather non-standard.

  2. If that isn't feasible either, perhaps a non-normative note might be (like Allow comment indicators (/* and */) to be nested. w3c/csswg-drafts#10768 (comment) mentions). It would be of use in persuading IDEs to support rendering such tags like comments.

@dmsnell
Copy link
Contributor

dmsnell commented Aug 23, 2024

I know solely that Google, Mozilla, and Apple are stakeholders in this, so do you know of any implementers I might have missed?

A change to allow nested comments would post substantial risks to content authored in WordPress over the past seven years. It relies on the fact that comments cannot be nested for structure and security.


I don't see the need for a new tag when TEMPLATE exists and effectively does what I think you are intending to do with <comment>.

@RokeJulianLockhart
Copy link
Author

RokeJulianLockhart commented Aug 23, 2024

#10153 (comment)

@dmsnell, thanks. Is all content inside it sanitised, like <!-- --> or <script type="text/plain"></script> do? If so, I'll advocate for <template>, then.

@dmsnell
Copy link
Contributor

dmsnell commented Aug 23, 2024

@RokeJulianLockhart content inside of comments isn't sanitized either. a --> or --!> will break out of the comment. Inside the template you would only need to ensure that no end tag whose name is TEMPLATE appears, usually in the form of </template>. As long as that's the case it shouldn't interact with your page, and templates nest and maintain their own stack.

@RokeJulianLockhart
Copy link
Author

#10153 (comment)

I have learnt of something that is very relevant to this:

The first editions of the HTML1 DTD actually included a (presumably nestable) <comment> tag, with exactly the behaviours that I requested. However, it was removed by the W3C due to SGML non-comformance 1 despite remaining in IE8 for some time.

The WHATWG's HTML Living Standard is no longer SGML-compliant. However, the aforementioned rationale in this thread to not add such a tag irrespective remains valid.

Footnotes

  1. stackoverflow.com/revisions/35074964/1

    This 2002 Usenet thread provides a nice discussion of the comment element's history. Most relevant:

    Oh, how quickly they forget. The COMMENT element was created by...
    Tim Berners-Lee. Or mabye Dan Connolly. The point is, it existed in
    an early version of HTML (back before they were making formal DTDs),
    and was quietly dropped when the W3C crew realized the COMMENT
    element was bad SGML.

    The closest to an "official word" can be found in the 1993 version of what we now call the HTML standard:

    Status: Obsolete

    A comment element used for bracketing off unneed text and comment
    has been introduced in some browsers but will be replaced by the
    SGML command feature in new implementations.

@dmsnell
Copy link
Contributor

dmsnell commented Dec 8, 2024

@RokeJulianLockhart HTML was never SGML-compliant, though some HTML2/3/4 may have been. The DTD’s, if I’m not wrong, came about later in an attempt to formalize the HTML rules as an SGML application, but that was never truly successful because those DTDs didn’t correspond to how actual HTML parsers worked. HTML5 is fully-incompatible with SGML.

For a cursory examination I ran my parser against a list of 293,965 root-path documents from a list of domains sorted on rank (some day I hope to get a processing pipeline established for Common Crawl but today I rely on this somewhat lazier dataset). For every comment in the document, I then examined if it contained the text <!-- to see if there are potentially-nested comments. Additionally I asked if the immediately-following node is a text node containing --> or --!>.

404 (0.137%) of the results contained what looks like nested comments. Among the other metrics, there were no instances of --!> and only 9 with -->. One was an abruptly-closed comment <!-->Keywords? -->. Only two or three were the result of failed comment nesting.

One example of breakage is where I found <!--<div id="princ"><!-- début princ -->. With nested comments enabled most of the site would be hidden by a comment.

Sites also contain the old IE conditional comment,

<!--[if IE]>
<style type="text/css">
<!-- * { word-break: break-all; } -->
</style>
<![endif]-->

and in cases like this the result would be benign, thankfully.

Another type of error is when something has cut off the comment closer. It’s not evident why, but this appeared.

                        <!-- 메인페이지_광고영역
                        <div class="row" style="margin-bottom:20px;">
                                <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                                <ins class="adsbygoogle"
                                         style="display:block"
                                         data-ad-client="ca-pub-XXXXXXX"
                                         data-ad-slot="XXXXXXX"
                                         data-ad-format="auto"
                                         data-full-width-responsive="true"></ins>
                                <script>
                                (adsbygoogle = window.adsbygoogle || []).push({});
                                </script>
                        </div>
                        <!-- 메인페이지_광고영역 끝-->

But again here, because of the missing closer, this would nest and swallow the rest of the page.

Like before, I see far fewer instances of actual nesting of comments in the wild than I do of errors caused by some other kind of improper processing or stitching-together of documents.

In any case, I wanted to share some data, because I realized that we’re all kind of spinning around some opinions but it might be helpful to know what kind of actual potential impact a change like this could have.

If you want to convince people you might start simply by attempting to quantify what impact the change would have. Because of the fact that many of these existing nested comments are the result of other errors, it may not be enough simply to ask how many pages would parse differently. Perhaps a nice metric would be like this:

  • Of all pages out there, how many would this change affect?
  • Of those, how many end the parse inside the comment? (wiped the entirety of the rest of the page)
  • Otherwise, what is the distribution of changes in comment length between the nesting and non-nesting parser. If the broad difference is small (e.g. from <!-- <!-- comment --> -->, which is a fairly common example) then the impact might also be small. If it’s large it might suggest a missing closer, which gobbles up large swaths of the page.

@RokeJulianLockhart

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

5 participants