-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trade-offs in Control Flow Analysis #9998
Comments
with persistent data structures it makes perfect practical sense, amortized time/memory consumption for a typical application would be comparable, it take's some discipline and support from the language ( #6230, #2319, #1213), but it's worth the trouble with please no more half measures based on assumptions for a couple of typical cases and endless exceptions that only worsen soundness |
Looking through the RWC tests, I see only one case where the additional narrowing performed by #9407 causes unwanted errors. However, there were scores of real bugs and inconsistencies being caught, such as checking for the same value twice, assuming that enum members with the value 0 pass truthiness checks, dead branches in BTW, the compiler itself relies on side effecting changes to a if (token === SyntaxKind.ExportKeyword) {
nextToken();
if (token === SyntaxKind.DefaultKeyword) {
// We have "export default"
}
...
} This becomes an error with #9407 because the compiler continues to think We will instead be using a function to obtain the current token: if (token() === SyntaxKind.ExportKeyword) {
nextToken();
if (token() === SyntaxKind.DefaultKeyword) {
// We have "export default"
}
...
} where the function is simply: function token(): SyntaxKind {
return currentToken;
} Since all modern JavaScript VMs inline such simple functions there is no performance penalty. I think this pattern of suppressing type narrowing by accessing mutable state using a function is a reasonable one. |
is there any possibility of a const on an argument?
In swift/objc they added the "in", "out", and "inout" meta attributes to arguments so you could see if they meant for a value to be modified. i feel like this could be useful for annotating whether a function will modify a variable it received or not. Not sure if I'm missing something. |
While I think a argument by argument modifier makes the most sense. Of course Is there any benefit in considering a deeply immutable design time keyword to avoid any confusion between |
in my stubborn defense of
i really wish it was it was considered other options may work out too and should be considered, just not a big fan of half-baked solutions |
I understand your argument, although I think what you would cover with "obviously" pure functions are not ones that suffer from CFA challenges. You example of I think your argument might have more merit if you could think of some examples where there are current CFA issues that would be solved by a pure notation that benefitted the developer. My argument isn't against the concept of writing pure functions, it is more of the argument that an "all or nothing" solution likely won't be practical. One of the biggest current issues with CFA in my opinion is in Promise callbacks. There are many situations where those might not be pure. That is why I think an argument by argument mutability notation would solve significantly more use cases and do so in a more end-developer controllable way. |
although you are right that the example doesn't target CFA issues, the answer is rather evident:
this is one merit of the pureness, among many others let me say it again, if a promise callback is pure, all reasoning that was done outside of it's scope or time is still valid inside of it you might wonder why i am so certain, let me explain, there is no notion of time in math that stands behind the idea of pureness, time doesn't exist there, it's absolutely static, everything existed forever and forever will and cannot ever change (because changes need time) to the point where a pure function can be thrown away and replaced with its result (what's called referential transparency) which for the any given arguments will always be the same (hence a question why not just use a hash-table instead to map arguments to results), so you don't really need a function in the first place, armed with such super static equivalence between results that were calculated vs hardcoded the only merit in using a function is to save some memory that otherwise would be taken by a huge static table mapping all possible arguments to all results, since math is absolute abstraction we only care when it comes to program it this is what a pure function is, this is why it will always work i agree that it's rather 'all or nothing', but is the price you have to pay for the peace of mind because in its strictness it solves the CFA problems and many others simply by giving them no place to exist |
think about it, a "pure" (aka persistent) data structure, say, a linked list, can be deeply copied just by... being assigned:
how cool is that? in my humble opinion "pure" functions and data structures can easily cover 40% of typical everyday needs without even feeling restricted as for the rest 60% (since we don't appreciate any sort of extremes) we can use good old mutable impure JavaScript but mind those 40% percent of no problem ever! that's a darn bargain, just let it happen and enjoy it |
totally agree with @Aleksey-Bykov arguments in favor of having "pure" introduced into the type checker. |
Just my .02, I think "pure" would be too abstract and too confusing to understand. I'm actually in favor of apple style semantics, "in", "out", "inout" Also I think const could lead to confusion since they would be similar in usage but different concepts. It might be easier for beginners to pick up and understand but the intent simply wouldn't be as clear. |
@RyanCavanaugh The compiler should already be able to answer the question in the example of Optimistic: Bad behavior on locals already with yes. If We can make this question insteresting again by changing function doSomething() {
if (token !== Token.Alpha) {
maybeNextToken();
// is this possible?
if (token === Token.Alpha) {
// something happens
}
}
} Now it is only possible if function nextToken() {
token = Token.Alpha;
} it would be possible if and only if |
Infer constraints about functionsHow about tracking variable changes of functions? The basic idea is that we know a constraint before a function call and that we can reason about a constraint after a function call. Note: I write a new introduced term bold in the following when it is mentioned the first time. Jump back to this position to see the definition of this term.
|
@maiermic your post was pretty complex, so I may have misunderstood it, but it looks like you are describing what control flow anaylsis already does, plus extending it to 'look into' the functions being called within guarded blocks as well. If so, how is this different to the inlining approach mentioned by @RyanCavanaugh in the OP? |
@yortus That's right, I describe what control flow analysis already does to introduce the notation I use afterwards to explain my idea. However, I'd like to clarify that I don't 'look into' a function every time it is called (if that wasn't clear so far). Instead I infer the constraint transition of a function when I look at it's definition. Therefore, I have to look into the function, but I only have to do this once, since I can carry the constraint transition in the type of the function. When I look at the function call, I can use the constraint transition that I inferred before to calculate the constraint that holds after the function call from the constraint that holds before the function call. Mitigating with (shallow) inlining / analysis I don't know how (shallow) inlining/analysis works and @RyanCavanaugh doesn't explain it. Nevertheless, he shows two examples. 1. Example // Non-null assignment can still trigger null warnings
function fn(x: string | null) {
function check1() {
x = 'still OK';
}
if (x !== null) {
check1();
// Flow: Error, x could be null
console.log(x.substr(0));
}
} Flow claims that In my approach, the constraint transition of Note: You can even infer 2. Example // Inlining is only one level deep
function fn(x: string | null) {
function check1() {
check2();
}
function check2() {
x = null;
}
if (x !== null) {
check1();
// Flow: No error
console.log(x.substr(0)); // crashes
}
} To come straight to the point, the constraint transition
Before the call of |
Quite long to fully read and understand, but this seems very interesting. It makes me think of a kind of propositional logic solver like the Prolog language. Correct me if I'm wrong. |
@maiermic hmm yes interesting. I suppose two issues would be: a) performance. @RyanCavanaugh mentions a "full inlining solution [...] wouldn't be even remotely practical", but what about this deep constraint analysis? b) what about calls to third-party functions (e.g. calls into platform, frameworks, or node dependencies) where we have no function bodies to look into (only .d.ts declarations)? |
might be related #8545 (comment) |
@Aleksey-Bykov #8545 (comment) looks like a similar approach for manually annotating constraints, even though you only refer to a specific case (object initializers) |
@yortus good questions
@RyanCavanaugh May you please explain performace issues of a full inlining solution? Without thinking too much about it, I guess deep constraint analysis might be doable in linear or linearithmic time complexity. Gathered constraints are seralizable and can be cached so that you can speed up analysis time. Manuall constraint annotations can further improve analysis time.
If you don't know the code, you cann't do flow analysis. Without .d.ts declarations you couldn't even do any type analysis. You either include constraint annotaions in .d.ts files or you have to fall back to another approach. You might choose an optimistic approach if no constraint annotaions exist, since you can add constraint annotations to .d.ts if a function has a relevant side effect. |
Initially I thought that the current unsound strategy would work, though in the previous months 1) narrowing works on more locations (with CFA) 2) narrowing works on more types (enums, numbers) and 3) narrowing works with more type guards. This has demonstrated several issues with the current strategy. A solution that has been discussed was that the compiler should track where a variable might be modified through a function call. Since JavaScript is a higher-order language (functions can be passed as arguments, stored in variables and so on), it's hard/impossible to track where a function might be called. A reassignment of a variable can also happen without a function call, for instance by a getter or setter function, or by arithmetic that calls My suggestion is basically that a variable may only be narrowed if its assignments can be tracked completely. A variable can only be narrowed if all reassignments (that is, assignments except the initializer) are inside the control-flow graph of the current function. That means that a variable declared with To reduce this restriction, the control-flow graph of the function could be extended to an inter-procedural control flow graph. The analysis stays sound, but becomes more accurate. The graph can relatively easily be extended at call expression that directly refer to a function declaration. Assignments in functions that are not used in higher-order constructs can now be tracked too. It would be interesting to extend this to higher-order functions too, but that would be more complex. It would be necessary to annotate function definitions with the control flow of their arguments some way. I'm not sure whether an abstraction over the control flow graph is needed. The graph could become a lot bigger with this inter-procedural analysis. However, the compiler already uses a compressed control flow graph, which basically contains only branch nodes and assignments. It might help to track which variables are referenced by a function to skip the whole function if the analyzed variable is not referenced there. Thoughts? Let me know if something is not clear. |
How is this code possible with let myValue: string
[1, 2, 3].forEach(x => {
if (x == 2) {
myValue = 'found it'
}
})
console.log(myValue) // error: myValue used before assignment Since the callback is not guaranteed to be executed TS sees that not all code paths assign the value. The same would be true when using a |
Either initialize it with a sentinel: let myValue: string = ''
// Dangerous if myValue is never actually set to a real value Or, if you're worried about the sentinel never getting overwritten, with let myValue: string | undefined = undefined;
...
console.log(myValue!);
// Still dangerous if myValue is never actually set to a real value, but less so since undefined is likely to fail earlier / harder Or, have a check to narrow the type: let myValue: string | undefined = undefined;
...
if (myValue === undefined) {
throw new Error();
}
console.log(myValue); // Guaranteed safety |
@Arnavion thanks! |
@RyanCavanaugh Following up here, from #11393, with a suggestion. I would be nice to have an indication of whether the optimistic heuristic was used or not when reporting an error. Something like This will allow the developer to quickly figure out if the error is certain, or if it may just be the consequence of an optimistic assumption made by the compiler. It may also reduce the flow of non-bugs reported here 😄. |
Just wanted to point out that TypeScript has no concept of this at the moment 🌹 |
I encountered this use case, I believe it belongs here. What happens: trying to make typeguard based on shape of children props. But type is not inferred correctly. type ChildItem = {id: number} | string
type Item = {
children: ChildItem[] // can be objects or strings here
}
type NarrowedItem = {
children: {id: number}[] // only objects here
}
const allItems: Item[] = [] // initially accept with any children
const filteredItems: NarrowedItem[] = allItems.filter(item => // and here only accept items
item.children.every(child => typeof child !== 'string') // if all its children are objects
) // it should be correct in runtime, but typescript thinks it is not narrowed
const filteredItemsWorkaround: NarrowedItem[] = allItems.filter((item): item is NarrowedItem => // same code, but with "item is NarrowedItem"
item.children.every(child => typeof child !== 'string')
) // it works here Error on line 13 (
|
@Arctomachine - This works in 5.5 because of pull #57465.
but not the negation
Issue #15048 would cover that, and it's mentioned as a possible extension in #57465. |
I work with Hack (also created at Facebook) which shares the same conventions as Flow, invalidating all assertions when calling impure functions. Hack & Flow's absolute approach is impractical for TypeScript, but you might want to introduce a config option for people with small projects to opt into stricter behaviour themselves. I took the config approach with Psalm (a SA tool for PHP). On a ~1MLOC legacy codebase that option was not enabled, but many smaller projects have that option enabled. |
There's a TC39 proposal, but it has been Stage-0 for ~7 years, and nobody seems to care :( There's also an ESL rule |
Here's a real world example (ask me how I know) where the promise case can bite you. The fix is simple if you're aware of this landmine, but it is easy to miss and get misled by TS trying to be "helpful". const MyComponent: FC = () => {
const canvasRef = useRef<HTMLCanvasElement>(null);
useEffect(() => {
void (async () => {
if (!canvasRef.current) {
return;
}
// this is correct - canvasRef.current is guaranteed to be non-null
canvasRef.current.getContext('2d')?.clearRect(0, 0, canvasRef.current.width, canvasRef.current.height);
// any async func that can take a while
await new Promise(r => setTimeout(r, 5000));
// This might totally blow up - canvasRef.current can be null. But TS assures me it isn't.
canvasRef.current.getContext('2d')?.rect(0, 0, 10, 10);
})();
}, []);
return (
<canvas ref={canvasRef} />
);
}; |
Some rough notes from a conversation @ahejlsberg and I had earlier about trade-offs in the control flow analysis work based on running the real-world code (RWC) tests. For obvious reasons I'll be comparing what Flow does with similar examples to compare and contrast possible outcomes.
The primary question is: When a function is invoked, what should we assume its side effects are?
One option is to be pessimistic and reset all narrowings, assuming that any function might mutate any object it could possibly get its hands on. Another option is to be optimistic and assume the function doesn't modify any state. Both of these seem to be bad.
This problem spans both locals (which might be subject to some "closed over or not" analysis) and object fields.
Optimistic: Bad behavior on locals
The TypeScript compiler has code like this:
Optimistically assuming
token
isn't modified bymaybeNextToken
incorrectly flagstoken === Token.Alpha
as an impossibility. However, in other cases, this is a good check to do! See later examples.Optimistic: Bad behavior on fields
The RWC suite picked up a "bug" that looked like this:
The
??
line here is not a bug in the user code, but we thought it was, because after the%%
block runs, the only remaining value inresult.success
's domain isfalse
.Pessimistic: Bad behavior on locals
We found actual bugs (several!) in partner code that looked like this:
Here, we detected the bug that
Kind.Good
(which has the falsy value0
) is not in the domain ofkind
at the point of thecase
label. However, if we were fully pessimistic, we couldn't know that the global functionlog
doesn't modify the global variablekind
, thus incorrectly allowing this broken code.Pessimistic: Bad behavior on fields, example 1
A question on flowtype SO is a good example of this
A smaller example that demonstrates the behavior:
The problem here is that, pessimistically, something like this might be happening:
Pessimistic: Bad behavior on fields, example 2
The TS compiler has code that looks like this (simplified):
Here, we discriminated the
Node
union type by itskind
. A pessimistic behavior would say that the second invocations are unsafe, because the call tovisit
may have mutatednode.kind
through a secondary reference and invalidated the discrimination.Mitigating with (shallow) inlining / analysis
Flow does some assignment analysis to improve the quality of these errors, but it's obviously short of a full inlining solution, which wouldn't be even remotely practical. Some examples of how to defeat the analysis:
Mitigating with
const
parametersA low-hanging piece of fruit is to allow a
const
modifier on parameters. This would allow a much faster fix for code that looks like this:Mitigating with
readonly
fieldsThe
visitChildren
example above might be mitigated by saying thatreadonly
fields retain their narrowing effects even in the presence of intervening function calls. This is technically unsound as you may have both areadonly
and non-readonly
alias to the same property, but in practice this is probably very rare.Mitigating with other options
Random ideas that got thrown out (will add to this list) but are probably bad?
pure
modifier on functions that says this function doesn't modify anything. This is a bit impractical as we'd realistically want this on the vast majority of all functions, and it doesn't really solve the problem since lots of functions only modify one thing so you'd really want to say "pure
except form
"volatile
property modifier that says this "this property will change without notice". We're not C++ and it's perhaps unclear where you'd apply this and where you wouldn't.The text was updated successfully, but these errors were encountered: