-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: TSDoc-flavored-Markdown (TSFM) instead of CommonMark #29
Comments
I started prototyping this idea today. If we go this route, it will be a major simplification and should save us a lot of time on the implementation. |
Something worth calling out here is how this can interact with docs.microsoft.com/DocFX. Now, I know that we are working on a standard here, but fragmentation and a bunch of custom stuff is a bit of a concern. We do have support for Markdown Extensions, so likely that should be a place where we can plug in. The format you are talking about here is parser-specific - on docs.microsoft.com, we've recently switched to MarkDig, that handles CommonMark parsing much better. It would be preferable to not be inventing our own standard due to the fact that the rest of the documentation stack does not use (and we have no plans to), and guiding people to one set of conventions for TS documentation contributions and another one for the rest of docs seems problematic. Besides, this also adds the added issue of our own parser interpreting the proposed conventions incorrectly. |
|
@pgonzal The current examples show an input that would be treated differently by the two implementations, but only show one normalized form. Each example should be presented with two normalized forms:
|
While I agree with encouraging users to use a normalized input form where available, I generally disagree with the premise of this proposal (that CommonMark is problematic and/or suggesting that another Markdown flavor will serve to simplify the space). The following are specific claims which I most disagree with:
This form is widely used (it's even the default behavior when you click the quote formatting button in the GitHub editor), and everyone since the dawn of email knows that
These rules may cause some confusion for new users, but the general availability of live-preview editors helps users avoid the pitfalls.
This is confusing. Many people (myself included), only use one of these for lists.
For any case directly supported by CommonMark without the use of HTML tags, I would oppose a restriction that does not allow a normalized form of the same content to exist without using HTML tags. Markdown provides a set of features which generally allow users to avoid falling back to HTML tags, and a Markdown processor which deviates from this goal feels incomplete. As a user, it would be frustrating to be told Markdown can be used, only to find that it can only sometimes be used. |
@sharwell thanks for your feedback here. As I mentioned in the initial issue description, TSDoc and Markdown have somewhat different requirements, which greatly complicates attempts to embed Markdown inside TSDoc. The two biggest hangups for me are:
Keep in mind that TSDoc is not just some English prose for humans to read. TSDoc goes in computer source code, and sometimes it contains tags that affect how a project gets built. Often it gets edited in a Git "merge conflict" editor that doesn't have any nice syntax highlighting.
That seems reasonable. But could you share an example of a realistic TypeScript code comment where someone would need to use
I agree it's frustrating. But maybe it would be less frustrating than a realization that the syntax is not predictable ("I have no idea whether the stuff I'm writing will get rendered correctly by whatever documentation tool runs in this particular repo") or not interoperable ("I marked this API as That said, I'm not being dogmatic about this. We modeled TSDoc as an open standard specifically to solicit your input and ideas. :-) A lot of these debates seem to get settled when we switch from philosophy and design, and instead look at specific real-world documentation problems that turn up. For example #128 was fairly enlightening for me personally. By end of 2018, our API Extractor tool will have fully implemented all the core features of TSDoc (including declaration references) and processed a fairly large corpus of Microsoft APIs. When we write up the spec proposal, I want it to include real-world examples for each design decision. |
@pgonzal The only case where it's come up for me to date is here: Historically, the other thing I've used the quote syntax for is arguably improper, e.g. callouts like this:
It's the best translation I could think of at the time for what I would prefer to write with C#'s /// <note type="warning">
/// <para>This method likely does not behave as you expect.</para>
/// </note> |
Based on the issues encountered in the issue #12 thread, we are concluding that TSDoc cannot reasonably be based directly on the CommonMark spec. The goals are conflicting:
CommonMark goal: ("common" = union) Provide a standardized algorithm for parsing every familiar markup notation. It's okay if the resulting syntax rules are impossible for humans to memorize, because mistakes can be easily corrected using the editor's interactive preview. If a syntax is occasionally misinterpreted, the consequence is incorrect formatting on the web site, which is a relatively minor issue.
TSFM goal: ("common" = intersection) Provide a familiar syntax that is very easy for humans to memorize, so that a layperson can predict exactly how their markup will be rendered (by every possible downstream doc pipeline). Computer source code is handled by many different viewers which may not support interactive preview. If a syntax is occasionally misinterpreted, the consequence is that a tag such as
@beta
or@internal
may be ignored by the parser, which could potentially cause a serious issue (e.g. an escalation from an enterprise customer whose service was interrupted because of a broken API contract).Hypothesis: For every TSFM construct, there exists a normalized form that will be parsed identically by CommonMark and TSDoc. In "strict mode" the TSDoc library can issue warnings for expressions that are not in normalized form. Assuming the author eliminates all such warnings, then a documentation pipeline can passthrough unmodified TSDoc content to a backend CommonMark engine, and have confidence that the rendered output will be correct.
Below are some proposed TSFM restrictions:
Whitespace generally doesn't matter
This principle is very easy for people to remember, and eliminates a ton of edge cases.
Example 1:
Example 1 converted to normalized form (so CommonMark interprets it the same as TSDoc):
Example 2:
Example 2 converted to normalized form (so CommonMark interprets it the same as TSDoc):
Stars cannot be nested arbitrarily
TSDoc will support stars for bold/italics, based on 6 types of tokens that can be recognized by the lexical analyzer with minimal lookahead:
*text
is interpreted as<i>text
text*
is interpreted astext</i>
**text
is interpreted as<b>text
text**
is interpreted astext</b>
***text
is interpreted as if<b+i>text
text***
is interpreted as iftext</b+i>
Other patterns are NOT interpreted as star tokens, e.g.
text * text *
contains literal asterisks, as does****a****
. A letter in the middle of a word can never be styled using stars, e.g.Toys*R*Us
contains literal asterisk characters. A single-star followed by a double-star can be closed by a triple-star (e.g.*italics **bold+italics***
is seen as<i>italics<b>bold+italics</b+i>
). Star markup is prohibited from spanning multiple lines.Other characters (e.g. underscore) are NOT supported by TSDoc as synonyms for bold/italics.
Example 3:
Example 3 normalized form:
Example 4:
Example 4 normalized form:
Code spans are simplified
For TSFM, a nonescaped backtick will always start a code span and end with the next backtick. Whitespace doesn't matter.
Example 5:
Example 5 normalized form:
Blocks don't nest
I want to say that ">" blockquotes should not be supported at all, since the whitespace handling for these constructs is highly counterintuitive. Instead we would recommend
<blockquote>
HTML tags for this scenario.Lists are a very useful and common scenario. However, CommonMark lists also have a lot of counterintuitive rules regarding handling of whitespace.
A simplification would be to say that TSFM interprets any line that starts with "-" as being a list item, and the list ends with the first blank line. No other character (e.g. "*" or "+") can be used to create lists. If complicated nesting is required, then HTML tags such as
<ul>
and<li>
should be used to avoid any confusion.Example 6:
Example 6 normalized form:
The text was updated successfully, but these errors were encountered: