-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor declaration emitter into declaration transformer #21930
Refactor declaration emitter into declaration transformer #21930
Conversation
@@ -3182,8 +3258,7 @@ namespace ts { | |||
finally { if (e) throw e.error; } | |||
} | |||
return ar; | |||
}; | |||
` | |||
};` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why all these ending backticks got moved up a line? Simple reason. I updated guessIndentation
to infer whitespace from whitespace-only lines (but not empty lines), as when converting comments into synthetic ones (as on accessors), once you ignore the first line you can often be left with just whitespace! In any case, this brought to my attention the fact that some of our helpers had their trailing backtick on a new line tabbed one less indentation level than the rest of the lines in the helper. We were a bit inconsistent about this, so I just updated the helpers that did not to have the backtick after the last nonwhitespace character.
Trailing commas are good, tho? |
Windows performance numbers are always erratic for emit since it incorporates file I/O and for some reason file I/O via libuv on Windows never has stable numbers compared to other OSes. |
This is strictly a bug fix IMO. |
src/compiler/comments.ts
Outdated
@@ -260,7 +260,21 @@ namespace ts { | |||
} | |||
} | |||
|
|||
function isJSDocLikeText(text: string, start: number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicate this with isJsDocStart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't. They behave meaningfully differently. isJSDocLikeText
matches anything that starts with **
and isn't an empty comment. isJsDocStart
only matches comments starting with **
but not ***
and doesn't filter empty comments at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does isJsDocStart
check for /***
? If we don't consider /***
to be JSDoc there, then we shouldn't allow it here either.
Assuming the last condition in isJsDocStart
is not a typo (e.g. checking for *
instead of /
), it seems like both of these could be the same function:
function isJsDocStart(content: string, start: number) {
return content.charCodeAt(start) === CharacterCodes.slash &&
content.charCodeAt(start + 1) === CharacterCodes.asterisk &&
content.charCodeAt(start + 2) === CharacterCodes.asterisk &&
content.charCodeAt(start + 3) !== CharacterCodes.asterisk &&
content.charCodeAt(start + 3) !== CharacterCodes.slash;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does isJsDocStart check for /***
Honestly no idea. It just seems as if we decided at some point that extra stars (like if, for example, you put line-length star guards around your comment) don't count as jsdoc as far as the parser is concerned. However I definitely think it's still correct to preserve such a comment in declaration emit, hence the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've unified them both under the previous isJSDocLikeText
implementation, since it seems like the parser definition has just sat around since its introduction three years ago, and the only justification for why /*** @type {Foo} */
is any less jsdoc than /** @type {Foo} */
I've found is a remark on usejsdoc.org; but we're typically more lenient than it says (plus, closure compiler allows it (you can see the type annotation still getting applied under the advanced mode error)) cc @DanielRosenwasser it makes sense to be more lenient here, right?
src/compiler/emitter.ts
Outdated
if (result) { | ||
return result; | ||
} | ||
} | ||
} | ||
} | ||
|
||
/*@internal*/ | ||
export function getOutputPathsFor(sourceFile: SourceFile | Bundle, host: EmitHost, forceDtsPaths?: boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't mark forceDtsPaths
as optional.
src/compiler/emitter.ts
Outdated
let declarationPrinter: Printer; | ||
if (emitOnlyDtsFiles || compilerOptions.declaration) { | ||
const nonJsFiles = filter(sourceFiles, isSourceFileNotJavaScript); | ||
declarationTransform = transformNodes(resolver, host, compilerOptions, (compilerOptions.outFile || compilerOptions.out) ? [createBundle(nonJsFiles)] : nonJsFiles, [transformDeclarations], /*allowDtsFiles*/ false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: break (compilerOptions.outFile || compilerOptions.out) ? [createBundle(nonJsFiles)] : nonJsFiles
into its own variable
|
||
switch (input.kind) { | ||
case SyntaxKind.ExportDeclaration: { | ||
if (!resultHasExternalModuleIndicator && isSourceFile(input.parent)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically I don't think you need to check resultHasExternalModuleIndicator
before setting it to true
but I guess it can save some work in the cases where you need to check modifiers.
return emittedImports; | ||
} | ||
|
||
function visitDeclarationComponents(input: Node): VisitResult<Node> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name here is tripping me up. What is a declaration component? Are we just lacking a better name for a fuzzy concept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the things at the top level are declaration statements. What's a good name for everything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would topLevelDeclaration
/topLevelDeclarationStatement
and subDeclaration
/subDeclarationStatement
make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sub-parts aren't themselves guaranteed to be either declarations or statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't have an issue with visitDeclarationComponents
, perhaps visitDeclarationSubtree
might be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't subtree just about as overloaded a term as components with respect to an ast transformer? Tbqh, neither term gives you a great idea of what it operates over.
} | ||
} | ||
|
||
function collectReferences(sourceFile: SourceFile, ret = createMap<SourceFile>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather just avoid the default parameter since it's only used in two ways.
@@ -688,6 +689,7 @@ namespace ts { | |||
Loop = 2, // Automatically generated identifier with a preference for '_i'. | |||
Unique = 3, // Unique name based on the 'text' property. | |||
Node = 4, // Unique name based on the node in the 'original' property. | |||
OptimisticUnique = 5, // Unique name based on the 'text' property, first instance won't use '_#' if there's no conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES. But is there any case where we don't want to just use OptimisticUnique
? Maybe we should make this the behavior of Unique
after this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aozgaa has a reasonable point that you potentially want the Unique
behavior in some cases (not here) to indicate that the output was machine-generated to some extent.
src/compiler/sourcemap.ts
Outdated
}; | ||
|
||
function setDisabled(state: boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
44f08d3
to
53571a7
Compare
53571a7
to
3adcbc7
Compare
52c9f0a
to
db8e239
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to my other review comments, I noticed when reviewing the baselines that there are a large number of differences due to the same general changes that seem unnecessary or inconsistent:
- Some imports and exports are often in a different order.
- No trailing comma on enum members. (while this is mentioned above, we don't emit enum members to JS, so we should preserve our existing emit)
- No newline in empty module blocks. (while this is mentioned above, we don't emit modules to JS, so we should preserve our existing emit)
- We sometimes (but not always) add
: any
type annotations (or: any[]
for rest parameters) where we didn't before. (while this is mentioned above, we still have inconsistencies) - Incorrect indentation of multi-line parameter lists.
A number of other baselines differ only in whitespace around binding patterns, which I'm ok with as they align with our JS emit.
src/compiler/comments.ts
Outdated
@@ -260,7 +260,21 @@ namespace ts { | |||
} | |||
} | |||
|
|||
function isJSDocLikeText(text: string, start: number) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does isJsDocStart
check for /***
? If we don't consider /***
to be JSDoc there, then we shouldn't allow it here either.
Assuming the last condition in isJsDocStart
is not a typo (e.g. checking for *
instead of /
), it seems like both of these could be the same function:
function isJsDocStart(content: string, start: number) {
return content.charCodeAt(start) === CharacterCodes.slash &&
content.charCodeAt(start + 1) === CharacterCodes.asterisk &&
content.charCodeAt(start + 2) === CharacterCodes.asterisk &&
content.charCodeAt(start + 3) !== CharacterCodes.asterisk &&
content.charCodeAt(start + 3) !== CharacterCodes.slash;
}
src/compiler/emitter.ts
Outdated
// Emit each output file | ||
performance.mark("beforePrint"); | ||
forEachEmittedFile(host, emitSourceFileOrBundle, transform.transformed, emitOnlyDtsFiles); | ||
performance.measure("printTime", "beforePrint"); | ||
|
||
// Clean up emit nodes on parse tree | ||
transform.dispose(); | ||
if (declarationTransform) declarationTransform.dispose(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're reusing the parse tree across two different transformation pipelines at the same time. This could be problematic if either the declaration transforms or javascript transforms modify the emitNode on a parse tree node in a way that would affect the other. Even if we aren't doing anything now, it would be very easy to run into this in the future.
To be safe, we might need two separate passes over the files: one full pass (transform, print, and dispose) for javascript emit and another full pass (transform, print, and dispose) for declaration emit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I implemented it that way initially, but it changes the order we emit files in (instead of all outputs for the same file being one after another, dts outputs come after everything else), which affects almost every baseline, as it happens. It would be better to make it "safe" to run multiple transforms at the same time, since nothing can enforce the invariant that you don't do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it "safe" might necessitate additional indirection for a lot of utility functions that interact with emitNode
. Perhaps it would be better to transform and print each source file one at a time for multi-file emit. That could also cut back on the amount of memory we need to hold onto between each file as well, but adds additional overhead to reinitialize the printer and transformer between each file.
How we enforce this invariant is that one of first things we do in transformNodes
is to clean up all emitNode
entries on the parse tree. Its highly possible that we already have a bug in these new changes due to the fact we call transformNodes
first for .js output and then for .d.ts output, erasing any emitNode
entries we need for JS emit before we ever get to the point of printing. Its likely that we just don't have a test case that covers this scenario currently, and that we might see breaks in our RWC tests due to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to have a situation where it matters, unless a transformer sets an emit node on a node in a different file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, we probably already have situations where this matters since the transformNodes
call for the declaration emit basically already disposes the transform result for the javascript emit. We just probably don't have a test for that case. The most likely causes for a parse tree node having an emitNode
revolve around custom comment or source map positions associated with identifiers. Even if we don't have an issue today, its such an obscure interaction that when we do have an issue it will be very hard to track down.
I'd be tempted to change the line in transformNodes that ensures the parse tree is disposed to instead perform a Debug.assert
if there are undisposed parse tree nodes so that we can catch these errors more readily. That would enforce that you .dispose()
a transform result before you reuse the parse tree in another call to transformNodes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that just seems like a bit of an obscure API with a bunch of implicit invariants that can't be enforced in the typesystem. It's probably better if we can just keep emit nodes in a side table associated with each transform/transform result; especially since it's super easy just to accidentally call dispose
before printing, for example, which would have silently messed up your print. (And the table already exists globally fir the implementation of dispose in the first place, iirc) If they're stored in the transform result, then dispose is just no longer necessary. (And we'd no longer be making parse tree nodes polymorphic on the presence of emit nodes, which is probably good and has potential to outweigh the extra indirection, just like inverting the checker's cache structure experimentally did)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use a side table only for the parse tree nodes, since we wouldn't want to hold onto intermediate nodes created by transforms that end up elided by a later transform as there's no point in holding onto information we don't need. The problem is that we would have to have a significant breaking change to the public API for transforms to support that (since right now those functions just look at .emitNode
). Its much easier and less likely to break the API if we just ensure that we finish the JS transformation before we start the declaration one. For single file emit that's not a problem because we work at the Bundle level anyways, but for multi-file output we would have to transform and emit a single file at a time. That is a much more palatable change (and one that I've looked at once or twice in the past when analyzing performance), since we won't need the emitNode
data between multiple files for multi-file output.
src/compiler/emitter.ts
Outdated
if (n.kind === SyntaxKind.Bundle) { | ||
return sourceFileOrBundle.kind === SyntaxKind.Bundle; | ||
} | ||
return getOriginalNode(n) === getOriginalNode(sourceFileOrBundle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No sense in repeatedly calling getOriginalNode
on sourceFileOrBundle
. I would refactor this into a variable outside of the call to find
.
src/compiler/factory.ts
Outdated
@@ -214,6 +223,68 @@ namespace ts { | |||
return <BooleanLiteral & Token<SyntaxKind.FalseKeyword>>createSynthesizedNode(SyntaxKind.FalseKeyword); | |||
} | |||
|
|||
// Modifiers | |||
|
|||
export function createAbstractModifier() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all of these, rather than just calling createToken
directly? They'll be inlined anyways and all this does is just add more code to the compiler that needs to be parsed, compiled, and held in memory. The reason we have functions for createSuper
, createThis
, etc. is that they are for the most part reused repetitively throughout the codebase. Other than createDeclareModifier
, all of the other createXModifier
functions seem to only be used once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public API. It's very odd to have one but not the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should add these to services since they don't need to be in tsc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've usually inlined a single-use sensible function (and we have a lot of internal single-use functions) and copied it into services for a tiny possible benefit to tsc elsewhere. It doesn't really make sense to split up the factory functions like that, from an organizational standpoint. Plus as you said, they'll almost certainly get inlined by the JIT if it matters, and function bodies are parsed lazily in the engine anyway, so any tiny parse overhead is deferred anyway. Tbqh, if you were into that I'd say we should have a tool for automatically inlining things based on a known heuristic (like the maximum bytecode size before the containing function can be inlined, for example), rather than speculating on it, and making the API harder to maintain in the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck, we should probably just be running a dead code elimination tool on tsc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't feel that these are worth it. If we are purely adding them as a convenience then we should have one for every token you can create that we emit, and we don't do that for things like QuestionToken
, EqualsGreaterThanToken
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those have token
in the name, which leads one to believe createToken
is the correct way to go (and should appear in quickinfo if you're searching blindly, which, given our lack of docs, is what will probably be done), while these show up in quick info when you search with modifier
, which is the name of the type where you'd use them. These also have an implied correct location in the AST (modifier positions), while a "question token" is a bit ambiguous - there's no obvious question token locations (there're all called out as .questionToken
on a parent node) and there's little in common between where they appear.
Pretty much just that while Modifier
s are effectively just tokens in the implementation, they carry some extra semantic baggage that makes it more obvious how to get a hold of one of them if they actually have individual public functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged them all into one method
const updated = <SourceFile>createSynthesizedNode(SyntaxKind.SourceFile); | ||
updated.flags |= node.flags; | ||
updated.statements = createNodeArray(statements); | ||
updated.endOfFileToken = node.endOfFileToken; | ||
updated.fileName = node.fileName; | ||
updated.path = node.path; | ||
updated.text = node.text; | ||
updated.isDeclarationFile = isDeclarationFile === undefined ? node.isDeclarationFile : isDeclarationFile; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote this originally I made an effort to evaluate the property assignments in the same way (including order and optionality) that they are assigned by the parser to avoid excess polymorphism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are very much not in the same order right now; this is the initial ordering from the parser:
sourceFile.text = sourceText;
sourceFile.bindDiagnostics = [];
sourceFile.languageVersion = languageVersion;
sourceFile.fileName = normalizePath(fileName);
sourceFile.languageVariant = getLanguageVariant(scriptKind);
sourceFile.isDeclarationFile = isDeclarationFile;
sourceFile.scriptKind = scriptKind;
if this is important we can correct it in a separate PR.
} | ||
declare enum E12 { | ||
A = 1, | ||
B = 2, | ||
C = 4, | ||
C = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of these differences due to trailing commas in enums. Would it make sense to always emit the trailing comma as we have in the past?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reusing the original nodes in a bunch of these cases, which did not have trailing commas.
import { x, a12 as y } from "./server"; | ||
import { x11 as z } from "./server"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is a seemingly unnecessary reordering of imports and exports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, These are the aliases that aren't actually marked as visible until some time after they are traversed. (If I remember, that ties into the "actually collects declaration errors during the pass instead of after" part of the change) Inserting them in-order would require a bit more work to put them back into the location they were elided from (essentially a n-ary walk just to reinsert imports painted as required during the prior walks) - in the old declaration emitter this was done by tracking the absolute position in the ouput string it was elided from and splicing it in at the end of the walk if it was visible. I guess the similar could be done here (tracking the position in the statement list each import was elided from and the list it was elided from, then rather than inserting at the top, inserting near where the recorded original location was; or maybe substuting an associated transformer-only statement that's been flagged?), but it seems like a lot of extra work for little gain? We don't check TDZ or anything in declaration files, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given one of the reasons for .d.ts files is to document the API, it seems like it makes more sense to try to keep the declaration where it originally would have occurred in the file as it is more likely to be logically associated with the export that needed it, resulting in clearer documentation. If .d.ts were ever only intended to be machine readable, I'd be less concerned. I think it's worth preserving the old behavior in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order should now be preserved
@@ -32,10 +32,10 @@ var cReturnVal = exports.cProp.foo(10); | |||
|
|||
|
|||
//// [internalAliasClassInsideTopLevelModuleWithoutExport.d.ts] | |||
import xc = x.c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case of odd reordering, especially since if the source were written this way it would have failed at runtime.
|
||
|
||
//// [f1.d.ts] | ||
export declare class A { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened here? Why are these declarations now missing?
Never mind, I reread the description of the PR, so this is ok.
@@ -1,3 +1,4 @@ | |||
/// <reference path="../ref/m2.d.ts" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could cause breaks in dependent packages if you ship a .d.ts that references build inputs that aren't present in the published package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, This code is broken to begin with, as the compiler options in use are an error to use together - see OP. This is arguably more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it the conflicting options diagnostics aren't preventing declaration emit to begin with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, options diagnostics aren't directly checked for blocking declaration emit, since they're only generated by the program
.
@rbuckton I believe I've either spoken to you about or mitigated each of your comments now? |
This is some prerequisite work we wanted to get done before tackling some of our more complex declaration emit requests (ie, declaration implementation maps). And y'know what? It's an overall reduction in the number of SLOC in the codebase 😉
Notable nonfunctional changes:
Also, some small functional changes:
any
is more aggressively appended to user supplied types which would otherwise be an error undernoImplicitAny
mode- fixes Certain types can produce implicit 'any' declaration output #21614It's worth stating, since some people have mentioned that they think this is the way to allowing declaration emit for JS files: While the transform does work on JS (though it filters js out right now to retain old behavior), it does not handle JS-specific patterns our binder picks up, like IIFEs, psuedo-namespaces, non-es6 classes, and more. It is quite decisively a TS-pattern declaration transformer. It's highly likely that the correct way to go about creating a declaration file for a JS file is to synthesize the entire declaration file directly from the shape of the exports object (instead of the original declarations), in order to smooth over all the odd patterns we recognize in JS but not in a
.d.ts
, as a separate transform (naturally, they'd need to call into one another for bundled emit).