-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/doc: reconsider comment rewrites of ''
to ”
#54312
Comments
I had to put my glasses on to see the difference at my regular zoom level, so I think you're right :) And this is on a 25" monitor, let alone a tiny phone screen. |
cc @rsc |
Another example is when docs refer to the empty string, specifically JS where strings aren't necessarily double quoted, leading to undesirable doc replacements (and apparently, inserting quotes at the start/end of paragraphs that are not quotes?) as in: evanw/esbuild@296870e |
An example from #56380 (
|
I recently encountered a particularly annoying case of this behavior. I wanted to write this:
But no matter what I do, it keeps being turned into this:
This replacement makes no sense, and is not helpful at all. Really hoping this behavior can be reverted. |
So, the more I look at this, the more I think that:
I just don't think this actually makes things better. If I want Fancy Quotes, I can type them! We have full UTF-8 support. I do not think the assumption should be that, if someone writing documentation for code uses characters in it that have special significance in thousands of contexts across basically every widely-used programming language, the most likely interpretation is that they're wrong and we should fix their code for them. I think the assumption should be that if someone wrote an apostrophe, they meant apostrophe, and if they wrote two apostrophes, they meant two apostrophes, and that if they wanted a fancy close-quote, they would have typed one. There may have been a brief window during which everything could display fancy quotes, but nothing could type them or store them in files, and it thus made sense to try to "correct" them. It's not the case now. If I wanted to type those fancy quotes, I could! I could embed them in things, I could use an editor which inserted them where appropriate, I have lots of options. What I don't have options for is writing documentation for my code and not having it corrupted by |
Discussion of this problem has been going on since March 2022 (originally in other issues). It is marked as |
The Revisions to that behavior should also go through the proposal process, so I am marking this issue as a proposal now. |
You say:
but I don't agree that it was “essentially disregarded” — you received multiple replies from @ianlancetaylor, who is one of the members of the proposal review group, explaining the rationale for the behavior implemented in that proposal. |
''
to ”
''
to ”
I received two replies from Ian. The first did not address the issue by error and the second only barely touched on it. I'll have to disagree with the assessment here. |
As a point of clarification, the top comment suggests that the rewrite originated in #51082. This is not quite true. The rewrite of Part of the work we did in #51082 was have gofmt reformat doc comment text to look the same way it would in the HTML form. So the rewrite is better highlighting in text mode that these comments never had the meaning you thought they did. Overall I believe the text rewrite is a net win: it makes clear to people whether the syntax they've typed means what they think it does, which is to say whether the syntax they've typed will render to HTML as they intend. You've found out that writing I agree with the quoted comment that the right answer for f-prime is to use the Unicode code points with that meaning. If f″ (Unicode double-prime) doesn't render well, then it would also work to double a single prime: f′′. I don't see this as a workaround at all. I see it as using the Unicode code points with the actual intended meaning. I also see the problem with writing @kortschak wrote:
There are 5k+ open issues as I write this. There are not enough people paid to work on Go to handle all of them. This should have been marked as a proposal, but we mistriaged it. My apologies - mistakes happen. Now it is marked as a proposal. That said, I fear your full expectation is not far off the mark: It may be possible to make the change in future versions of Go based on the Go version, so that docs written for Go 1.21 still get the rewrite but docs written using newer versions of Go do not. That would require passing a Go version through to gofmt and also go/doc/comment.Parser, which is non-trivial new API, not to mention a major conceptual change for gofmt. I am not convinced those changes are worth the effort when both of the solutions above lead to improved documentation. |
The current rewriting also applies to unexported comments that never end up in HTML render, such as the example I ran in to (reposted here by ALTree). The difference between godoc and the other formats you mention is that the others all have a way to write an inline literal '' when need be, without starting a full code paragraph. Godoc currently lacks this. And to make matters worse it's not merely a display issue, but actually mangles the original code. This isn't an "it's a bit awkward to do this" kind of issue, it's a "it's literally impossible to do this, and my comment becomes corrupted if I try" kind of issue. Simply not applying this for unexported godoc comments might be a simple change that would go some way towards alleviating the pain this causes, although it's not a full fix. The convention of using ``quotes like this'' is an old-fashioned dying convention that only old unix greybeard-y types use and know about and would surprise many younger people, so I'd argue it's not all that "widely used". I say this as an old unix greybeard-y type that used ``quotes'' habitually for many years. I'm unconvinced it's behaviour worth keeping in the first place – especially since we're dealing with technical documentation, and I would say that accuracy trumps prettiness every time. |
This is a misreading of what I wrote. While I dislike smart quotes in webpages, I am significantly more concerned about source code rewrites. This is new (well, it was when I wrote the issue). I am concerned with how this impacts on ability to read source in editor. |
This proposal has been added to the active column of the proposals project |
It has had this meaning for a long time, yes, but the rewrite — even at the rendering level — has existed for a relatively short time. Prior to the changes to pkgsite, there was no tool that rewrote comment semantics here. On noticing that this had happened I raised #51807 and sent this change fixing breakage of |
Here is an example in the wild of gofmt mangling the documentation in @robpike's project, ivy. robpike/ivy@3d1d0ff#diff-a20b1b3b4b2bca5bef2b853a6e3f19def513381f9c7ee68d6979f0435c885dcfL210 Syntactically, string literals are very similar to those in Go, with back-quoted
raw strings and double-quoted interpreted strings. Unlike Go, single-quoted strings
are equivalent to double-quoted, a nod to APL syntax. A string with a single char
-is just a singleton char value; all others are vectors. Thus ``, "", and '' are
+is just a singleton char value; all others are vectors. Thus “, "", and ” are
empty vectors, `a`, "a", and 'a' are equivalent representations of a single char,
and `ab`, `a` `b`, "ab", "a" "b", 'ab', and 'a' 'b' are equivalent representations
of a two-char vector. |
FWIW, #61365 is a concrete example involving comments discussing SQL quoting syntax in a SQL parser.
I haven't quite followed this issue, and I don't know if this is a valid way of thinking about it, but it seems that this transform was happening for many years, but it was happening only transiently at render time (e.g., when rendering to HTML on godoc.org). And now as of Go 1.19, the transform started happening in two places:
For actively maintained source files (files that have been edited with a modern toolchain), the render-time transform is now effectively a no-op because the source code already has the smart quotes. Of course, an actively maintained project might not actively edit each source file (and not all projects do a project-wide gofmt), so some files might not get touched frequently, or ever. I wonder if in the future it might be reasonable to turn off the transform for both the HTML rendering and for code formatting, because at some point enough of the existing actively maintained code that relies on the transform would still render as “expected” by that author because their source code was already persistently transformed. Of course, people would need to learn to stop typing If that happens at some point, then for a project or file that intended smart quotes but has not been actively edited since Go 1.19 (e.g., maybe a project is "done") or when looking at pre-1.19 versions of documentation for some projects on pkg.go.dev, the render-time view would match the source code view (e.g., both showing |
I'm not completely ignoring this - I have a program running to gather data we need and haven't gotten through the results yet. |
I extracted all the doc comments containing single forward or backward quotes from my corpus of code from proxy.golang.org, totaling 90GB. I've attached a shuffled random sample of 100k comments as docquote100k.txt.gz. The sampling is just from the full collection of comments with no regard to packages or modules, so if one package or module has 10X the comments of another, that package or module's comments will be sampled at 10X the rate of the other. In 88% of those comments, the only single quotes are forward quotes between alphabetic ASCII characters, like in “don't communicate by sharing memory”. Those are served perfectly well by the existing rule but also easy to identify, so it's worth filtering them out. Let's call that the contraction form (it also includes possessives). Ignoring contraction quotes, docquote2-100k.txt.gz is a new shuffled random sample of 100k comments with non-contraction-form quotes. (The sampled comments all contain non-contraction-form quotes, but they may also contain contraction-form quotes.) Further downsampling, docquote2-1k.txt.gz is a sample of only 1k comments. Those 1000 comments break down as follows: Backquote `x` only (783)
Some are inconsistent, only putting backquotes around some names and leaving others unquoted, like 40, 74, 129, among others. Forward quote 'x' only (128)
Backquote `x` and double quote "x" (19)
Backquote `x` and forward quote 'x' (13)
Backquote and triple-backquote (6) Forward quote 'x' and double quote "x" (6) Triple backquote only (5) Double backquote ``x`` (4) Double forward quote ''x'' (2) Backquote, triple-backquote, and double-quote (1) Forward quote and triple-backquote (1) Trifecta! Forward quote 'x', backquote `x` and double quote "x" (1) Backquote `x` and HTML \"x\" (1) Smart doubled single quotes ``x'' (1) Escaped forward quotes \'x\' (1) Forward-backward paired quotes `'x`' (1) False positive due to struct tags in code display (2) Bad shuffle splitting bug or other false positive (5) Punctuation lists ( Stray quote in code block or commented-out code (5) Forward single quotes mid-syntax (1) Backquote in +kubebuilder:validation:Pattern directive (2) Literal backquote inside double-quoted string literal (1) A few observations:
In docquote2-100k.txt, using regular expressions to classify:
Sampling 100 of the 1047 and inspecting by hand:
Based on all this we can very roughly estimate about 10 out of 100k comments meaning '' as a doubled prime, or about 1 per 10,000 of the comments containing non-contraction-form quotes. So it happens, but it's not terribly common. Real Go quoted comments outnumber doubled primes 10 to 1. Grepping for comments with lines containing just a single doubled-quote in docquote2-100k (
Note that if multiple double-primes occurred on a line, they wouldn't get caught by that expression, so it may be undercounting, but probably not by a lot, and it estimates 1 in 100k (down from the previous 1 in 10k). Grepping for single ``x'' on a line in the same set finds 71, so by this estimate Go quoted comments outnumber doubled primes 70 to 1. So doubled primes are not a reason to turn off the Go quoted comment changes. It's worth noting that Go project code mentioning the quote conversion was about 2% of the samples found, suggesting that there's just not many doubled quotes in the ecosystem overall (only 50X more than Go project code). A more compelling reason might be all the misuse of ``x`` or ''x'', but again most of these seem to be in auto-generated APIs, so they are not terribly compelling. The most compelling reason is probably empty strings. |
All in all, I think this is a giant mess, mostly apparently caused by people shoehorning comment syntaxs from other languages into Go files. I don't think we have to complicate Go in response, any more than we do when people, say, translate Java literally to Go; those people are using Go wrong and should not do that. That said, I also think we can tighten up the `` and '' conversion and fix most double-primes and empty strings, without losing the intentional quotation marks:
This should correctly handle double-primes, unless they are inside `` and '' intended as quotes, but in that case just type “ and ” directly, or edit the gofmt'ed output to correct it, and it will stick. This should also correctly handle empty strings, even things like I think we could reasonably make those changes for Go 1.23. |
@rsc thank you for doing that analysis. I still disagree with the notion that it is OK to rewrite comments given that the author of the comment knows the intended semantics better than anyone else (as stated in my comment here). The proposed looks like to will reduce the harm done by these rewrites, and I think that is is possible to work around the harms that are done. Ideally, we would not be rewriting, but I think that despite the commentary here, I'm not going to convince you of that. |
It's interesting that you consider ' as a forward quote. To me, it's a straight quote, and ´ would be a forward quote, the matching partner to the backquote `.
It's strange that anyone would do this. It seems typographically incorrect and unbalanced compared to `x´. |
@gophun Most US keyboard layouts do not have a way to input ´ at all. `x' and ``x'' is a convention that arose decades ago as an approximation to proper styled quotes within ASCII (the American Standard Code for Information Interchange). Computing unfortunately has a long history of anglocentrism. |
To me, that sounds like it points to an unfortunate flaw in the analysis: it is trading off two sources of survivorship bias.
We could perhaps measure (1) by looking for unexpected (unpaired?) occurrences of the characters I don't see how we can measure (2) beyond upvotes on this issue, since that manifests as developer friction but often won't leave evidence in the source code. |
To call out what seems to me to be the elephant in the room: is there anyone who is not @rsc who affirmatively prefers the smart quote rewrites? I've followed this discussion pretty closely on this thread, on #51082, and on the many linked issues, and the sentiments I've seen from other folks range from non-committal to surprised to strongly averse. But I'm not aware of instances where someone said they like or want this behavior. Did I miss it? Is there some silent group of programmers out there who like the smart quote rewriting and will voice their displeasure if it goes away? If the entire constituency for this feature is @rsc, then ISTM that what we are trying to figure out is the minimum amount of quote-rewriting that @rsc will accept. I know this is a bit blunt, but I'm trying to clarify what exactly what is going on here. |
We have a documented behavior that has existed for over a decade. I'm trying to respect the existing usage while avoiding the problem that @kortschak ran into. #54312 (comment) seems to do that. Does anyone object to #54312 (comment) ? |
I do. |
Given that we rewrite |
Seems like this discussion was essentially pointless. I'm sorry I raised it. At the very least I'd like to see the re-writes that touch comments that are never rendered into godoc being reverted. There is zero reason to make those rewrites and the tools (as repeatedly stated above) does not know the intention of the author better than the author does. |
I'm not sure what you mean here but the rewrites only affect doc comments. Non-doc comments are never modified. If you see any non-doc comments being modified, please post an example and we will fix it. Thanks! |
Why not? Just because it's old doesn't mean it's good, or that it doesn't introduce more problems than it solves. For a very long time this was more or less invisible behaviour most people weren't aware of. I think very few people would complain if it was removed. Whether one likes it or not, using straight "quotes" is a standard way to write English, and that's been the case for quite a while now. It's fine to introduce typographical niceties, but doing that in a programming language context seems misplaced. I'm not writing for The New York Times, I'm writing Go code. A complex rewriting scheme that takes several paragraphs to explain is not something I'd consider hugely desirable, and seems the sort of hidden complexity that should be avoided if at all possible.
It's still a "doc comment" if the function is unexported, as per my earlier example. But that "doc comment" never shows in godoc. |
Thank you for correcting me. I have confirmed that this is true mod the non-rendered unexported function comments (this is a fun one which I don't think would be handled correctly by the rules above). Apologies. All up, I guess the proposed solution is acceptable. I can't say I like it, but the grosser parts of the rewrites can now be worked around with some effort. It strikes me that "but in that case just type “ and ” directly" would have been a good solution. |
@kortschak In your example https://go.dev/play/p/FLLDRxzddIW those quotes would be left alone in the new rules, since (1) the |
Thank you for clarifying that for me. |
Based on the discussion above, this proposal seems like a likely accept. Currently, in doc comments, doubled single quotes The proposal is to tighten the conversion rules, both in the doc comment reformatter and in the html conversion, to the following:
|
No change in consensus, so accepted. 🎉 Currently, in doc comments, doubled single quotes The proposal is to tighten the conversion rules, both in the doc comment reformatter and in the html conversion, to the following:
|
''
to ”
''
to ”
A belated reply to #54312 (comment), after being pinged privately about it. In general our approach is not to change documented behavior without a very good reason, in the spirit of both Chesterton's Fence and Go compatibility. I did the work of reading through and presenting lots of existing usage in #54312 (comment). The key point is "Real Go quoted comments outnumber doubled primes 10 to 1." To discard existing usage, we need a much stronger reason than "people don't like it" or even "we all agree it was a mistake". Not breaking things that work is crucial to the overall approach we take in Go. That's why I suggested the compromise of converting fewer but keeping all the real usage working. |
In ae3d890 as part of addressing #51082, a change was made to go/doc that rewrite all
''
to”
. This makes semantic changes to comments where'
is used as a prime and''
is used as double prime, common in mathematical code.This was raised in #51082 (comment) but essentially disregarded. A work around was suggested,
However on investigation with relevant fonts (the font here and the font used by pkg.go.dev) at 100%, U+2033 DOUBLE PRIME is barely distinguishable from U+2032 PRIME and worse, also barely distinguishable from other commonly used marks in the same position such as '*' (Comparison: f′ f″ f* ).
The change has made it harder to read these comments, harder to write them in a way that doesn't get mutated and easier for incorrectly formatted comments to be committed (f'' getting mutated to f” which is essentially indistinguishable from f″ at normal font sizes).
The text was updated successfully, but these errors were encountered: