Suggestion: control flow type analysis could account for reassignments through a closure #8353

malibuzios · 2016-04-28T15:15:45Z

TypeScript Version:

1.9.0-dev.20160427

Code:

function iWantString(s: string) {
    if (typeof s !== "string")
        throw new TypeError("That is not a string!");
}

declare let x: number | string | boolean;

function func() {
    x = 1234;
}

if (typeof x === "string") {
    func();
    iWantString(x);
}

Actual behavior:
The call to func() doesn't influence the compiler's awareness on the possible types x might have after the call. It still thinks it is string and the call to iWantString fails at runtime.

Suggestion:
The compiler could account for the possibility that func may reassign x as a captured variable, by storing a list of all the captured variables func may possibly reassign throughout its execution and the types they may hold at all return positions.

For the above example, the type of x would be widened to include the additional types x may receive during the execution of func (only number in this example). Right after the function returns, the type of x would be widened to string | number and iWantString(x) would fail at compile time.

Examples:

declare let x: boolean | number | string[] | number[];

function func() {
    if (Math.random() > 0.5) {
        // Since 'x' has not been reassigned in this code path, there's no effect here
        return;
    }

    if (Math.random() > 0.5) {
        x = 4321; // no eventual effect
        x = true; // 'x' is added to the list with the possible type 'boolean'
        return;
    }

    x = 1234; // no eventual effect
    x = ["a", "b"]; // no eventual effect
    x = [1, 2]; // the entry for 'x' is extended to include type 'number[]'
}

if (typeof x === "string") {
    func();
    // type of x is widened to 'string | boolean | number[]'
}

More complex example that includes further calls within the called function itself to other functions that reassign x as a captured variable:

declare let x: boolean | number | string[] | number[];

function func1() {
    x = 4321; // no eventual effect
    x = true; // 'x' is added to the list for 'func1' with the possible type 'boolean'
}

function func2() {
    x = 1111; // 'x' is added to the list for 'func2' with the possible type 'number'
}

function func3() {
    if (Math.random() > 0.5) {
        func1(); // The list generated for 'func2' is merged into the one for 'func3'
        return;
    }
    else {
        func2(); // no eventual effect
        x = [2, 3]; // the entry for 'x' is extended to include type 'number[]' 
        return;
    }

    x = 1234; // no eventual effect
    x = ["a", "b"]; // no eventual effect
    x = [1, 2]; // the entry for 'x' is extended to include type 'number[]'
}

if (typeof x === "string") {
    func3();
    // type of 'x' is widened to 'string | boolean | number[]'
}

Implementation details:

This may look very complicated, but it isn't really compared to what has already been achieved for flow and assignment analysis (which is remarkable!). Here's one way to do it:

For any given function:

Keep a list of all captured variables that are possibly reassigned in it.
For every code path where one of these variables has been possibly reassigned, accumulate the possible types it may hold right before the function returns. Annotate these types on the list.
A variable that is indirectly reassigned through a call to a secondary function is also considered a reassignment done by the calling function and will be included in the list.

After each call to a function:

Lookup the current set of scoped variables (including captured, closure-bound ones).
Query each one of them in the list generated for the function that has just returned and merge the possible set of types specified in the list into the set of currently known ones.

That's it, nested calls naturally propagate this information backwards as this process is internally applied for them.

(I have considered whether the set specified in list should entirely replace the current one, instead of just being merged. However, there is no guarantee here that in all return positions the function has actually reassigned a particular variable. There's just the possibility it did, unless slightly more advanced analysis is performed to detect when a particular variable is known for certain to have been reassigned in every return path, and then the set can possibly be replaced).

Limitations:

Functions that are called using .call or .apply or a more complex mechanism, like a promise, cannot be easily taken into account. However, I felt that the return for this enhancement is sufficient to justify the reasonable effort needed (on a personal note - I have no capability or time to implement something like this myself - the learning curve itself is very, very steep - but I might have tried if I could).

Edits 11 May 2016: improved some phrasing corrected the terminology to 'reassignments` instead of 'mutations', and 'captured variables' instead of 'variables assigned through a closure'.

The text was updated successfully, but these errors were encountered:

mhegazy · 2016-04-28T16:53:48Z

one thing to note, just checking mutations on captured variables is not sufficient, a function can mutate a property of paramter as well.. e.g.

type MyObject = {
    a: string | number;
}

var o: MyObject = { a: "string" };

function mutator(obj: MyObject) {
    obj.a = 0;
}

if (typeof o.a === "string") { 
    mutator(o);
    o.a // is number and not string.
}

also simple function calls do not capture everything, for instance, this is a common pattern in the TS compiler code base to pass a function as an argument to an iterating function see https://github.com/Microsoft/TypeScript/blob/master/src/compiler/checker.ts#L2966

forEachChild(node, visit);

funciton visit(node) {
   /// some side effects
}

one last thought, the issue of delayed initialization pattern is a more annoying than the issue of missed mutations pattern. currently if you use a function to internalize your state, the type checker treats your variables as uninitialized, and frankly that is more annoying than it not catching a mutation as suggested by the OP. e.g.:

let value: string | undefined;

// value here is undefined

initializeValue();

// value here is still undefined

if (value) { 
    // value here is nothing, as it was undefined, and then narrowed
    value.length /// error!!
}


function initializeValue() { 
    value = "initial value";
}

mhegazy · 2016-04-28T16:54:14Z

also pinging @ahejlsberg since we talked about this yesterday.

malibuzios · 2016-04-28T17:09:46Z

Yes, I have considered most of these.

For the first: I wasn't sure if modifications of properties is included in assignment and flow analysis! If it does that would be incredible!

Second: yes I'm thinking about this. If a function is passed as an argument to another and that main function is called, one can conservatively assume that the function passed as argument was internally called at least once in the execution of the main one and account for its possible its set of mutations as well, although this would only be a heuristic in this case. [Edit: Async callbacks would be problematic here..]

Third: I believe the approach I considered should resolve this. It would analyze initializeValue and realize that in all its return positions value receives type string. After initializeValue is called it would only consider string as a possible type for value (this is a more 'advanced' aspect for the analysis I mentioned in a remark).

malibuzios · 2016-04-28T19:48:43Z

Various thoughts:

To clarify: every function is only analyzed once. There's no regard for a 'contextual' information for a particular call. Thus, there shouldn't be a significant influence on the performance of the compiler.
To clarify: there is no need to analyze every assignment (my examples were written before I wrote down the algorithm). If a captured variable is known to be reassigned at one code path, one can 'fast-forward' directly to the position right before the return statement and query its final type there.
There could be some technical complexities with analyzing circular calls where function A calls function B which in turn calls function A, but I guess that's something that compiler developers are accustomed to.. :)
My terminology could have been improved for 'reassigned' instead of 'mutated', and 'captured variable' instead of 'referenced through a closure', I might edit this out later.
Still thinking about the effects of guards on captured variables within the function.
There are some cases where it is virtually impossible to track side-effects caused by captured variables, for example, if a function is passed to a class, and the class internally calls it:

class SomeClass {
    constructor(private f: Function) {
    }

    doSomething() {
        this.f();
    }
}

declare let x: number | string;

let c = new SomeClass(() => { x = 1234; });

c.doSomething(); // No reasonable way to track the side-effects here

malibuzios · 2016-04-29T06:23:25Z

There's at least one scenario that I haven't addressed in the algorithm (there are probably many more but it's hard to cover everything just by considering examples). For example, for conditional branches that contain an assignment but don't lead to a return path:

declare let x: number | string | boolean;  

function func() {
    if (Math.random() > 0.5) {
        x = 1234;
    }

    if (Math.random() > 0.5) {
        x = "abcd";
    }
}

If "fast-forwarding" here to the return position then the analyzed type would also include boolean which from the internal perspective of the function is a useful piece of information but shouldn't be annotated in the list because the function doesn't reassign to it.

One possible way to approach this is to have a slightly different way of analyzing captured variables, which would be applied in addition to the regular one, and whose results would only be used for the captured variable assignment analysis. In this method they are assumed not to have any type initially (I believe this is currently reflected as the "nothing" type in the language service), so every additional type they may accept would only be because of side-effects caused by the function or some secondary function that has been called.

I haven't yet considered the interaction with guards over captured variables here..

zpdDG4gta8XKpMCd · 2016-04-29T22:01:33Z

related #7770

mhegazy · 2016-04-29T22:35:55Z

To follow up, we do not think we will be able to do the full analysis needed to enable this request. the ideal solution would be to inline function calls for control flow analysis purposes and observe their side effects. while doable, this is a significant investment that we are not willing to take at time being.
Issue detailed by #8381 is a more feasible one to tackle at the time being.

You can find more about this in meeting notes in #8388.

malibuzios · 2016-04-30T07:30:48Z

@mhegazy

Thank you for considering this. You are presenting it a bit like a bug report or feature request, but that was not really the intention. It was about trying to encourage an exploration of this enhancement of the current analysis, which could cover a significant aspect of it that is currently missing. So far I have spent a total of about 12 hours of my own time to try to investigate and provide the best suggestions on possible to do this, with my limited abilities and perspective. I seriously start to feel that this time has been put to waste.

I did not come with the expectations this should or must happen. I fear that just stamping this is as "Declined" "Too Complex" would present it as undesirable and discourage further attempts to address this, perhaps with more innovative or efficient methods that both myself and your team may have not considered. Overall it gives a negative impression, and I'm seriously considering to reduce the amount of time I spend on this. I don't think that you guys deserve it, honestly.

You have a group of devoted people who despite your aggressive and somewhat rude handling and responses, still, for some reason try their best to help you. I think you started taking that for granted, probably a long time ago. I also think the people who eventually stayed were the kind who either weren't aware, didn't care of or developed an immunity to these various slight insulting and patronizing ways you approach them (or became aggressive and rude themselves). They simply ignore you and try to concentrate on the more objective aspects at hand. Personally I'm simply not the kind of person who would tolerate that for long.

In any case, you need to realize and start appreciating that unlike a lot of the feedback you get here, every issue and comment and I make are very thoroughly researched. Even a 3-4 paragraph comment sometimes takes between an hour and two of work. I have to tell you directly, your system and way of handling things actively discourages this type of contributors. It sometimes feels that I, as one person have put more thought and effort into something than the whole TS team combined, and all I get is a terse one or two paragraph response that doesn't really inform me of anything I haven't thought about before or involves me in the insights and ideas of the team.

The design meeting notes are so condensed that not that much value can be extracted from them for an outside position (they may be useful for an insider as a quick reminder or log, but not that much more than that). There is no two-way process (our I would even say - it is actively discouraged) and if I would try to ask a question I'm afraid that I would be rudely dismissed and even literally be charged as intentionally trying waste your time or embarrass you.

I also think this whole 'over-sensitivity' about questioning prior design decisions is just immature. I would expect that the people on this stature would be able to handle almost anything and objectively evaluate these types of queries and even take some of the harshest criticism (assuming it has some merit and objective quality to it). If I were running a community for my own programming-language, I would welcome and encourage people to try to inspect and analyze the design as much as they can. I will even actively help them and try to improve and generalize their criticism if they weren't able to present it broadly or accurately enough. I would feel honored that someone have taken the time to evaluate my language, which I know would never be perfect or at least as good as the ones that would come after it.

To sum it up: I think you have a very useful language with some very interesting and innovative ideas. And some very talented people are involved in it. Many people admire this work and mostly on that basis are willing to participate and do their best to help you. I will still do, but I would simply try my best to actively ignore these patterns of behavior as much as I can. I just hope some day you will improve on how to run a positive and welcoming community.

mhegazy · 2016-04-30T19:07:30Z

I do apologize if my response seems rude or condescending in any way. that is definitely not the intention.

As for the suggestion, we do appreciate input, and ideas; and personally, i have read, responded to the issue, put it the design meeting agenda and personally presented it. we have reached a conclusion that this is not something we want to pursue at the time being. not sure I see what i could have done here to be more appreciative.

Again my apologizes if my response offended anyone. have a good weekend.

zpdDG4gta8XKpMCd · 2016-04-30T19:15:21Z

@malibuzios one feasible way to do what you asking for is to employ pure function (#7770) this is the only feasible way to do so otherwise it is indeed too complex, in order to understand how complex it is consider providing an algorithm that tells whether a type assertion has or has not changed for an arbitrary piece of typescript code

malibuzios · 2016-04-30T19:32:14Z

@mhegazy

That's OK, and quite nice of you to apologize (I did not expect that). I'm not actually mad or angry, and I try not to build impressions based on online interaction. I apologize myself if this made anyone feel uncomfortable. It just seems like the way that issues are handled sometimes gives the impression of you and others are being very brutal and judgmental, and also sometimes close-minded and biased.

Perhaps if GitHub had a priority system, you could just put it as priority '0' instead of closing it. I feel having a fine-grained priority system, that is also filterable by threshold, could avoid many uncomfortable situations of having to 'forcibly' close issues this way (using labels cannot really satisfy this as you cannot filter or order by threshold).

In this case you can also use a label like 'No Foreseeable Plans' instead of 'Declined' (I worked very hard at trying to find a good alternative label, including asking a question at English usage stack exchange. I also have a complete set of alternative label suggestions that I will post in the next weeks - this work took many many hours to do).

@Aleksey-Bykov

This is strictly about the analysis of possible assignments for a particular, limited, set of captured variables within a function. I have tried to develop an algorithm that is not contextual and would analyze each function only once, thus could be significantly more efficient than applying the same approach, say, for how IIFE's are analyzed (hopefully without a dramatic loss in usefulness).

I have also considered how it would extend to things like pure functions, but that is way beyond what is required here and significantly more difficult to do. Maybe the text gave a inaccurate impression as I used the misleading 'mutation' terminology, which I intended to change to 'reassignment'. However I don't feel it is really important anymore.

RyanCavanaugh · 2016-05-03T16:49:13Z

@malibuzios thank you for your feedback on our system.

Something that we struggle with is that we can't plausibly track thousands of issues if "Endlessly relitigate" is a possible status. This includes any system where someone spending time thinking about something necessitates us spending an equal amount of time engaging with them (this simply doesn't scale).

I'm going to work with @mhegazy over the next few days to revamp our labels and status system. Hopefully we can bring both clarity and ❤️ to that.

malibuzios · 2016-05-11T07:35:34Z

I have corrected the terminology used in the original post to use 'reassignments` instead of 'mutations', and 'captured variables' instead of 'variables assigned through a closure'. I hope this would help to clarify the intended meanings here for anyone coming across this at a future time.

malibuzios mentioned this issue Apr 29, 2016

Strange flow analysis for guards applied to a variable having an initial type of 'nothing' #8374

Closed

mhegazy mentioned this issue Apr 29, 2016

Design Meeting Notes, 4/29/2016 #8388

Closed

mhegazy closed this as completed Apr 29, 2016

malibuzios mentioned this issue May 1, 2016

add a modifier for pure functions #7770

Open

mhegazy mentioned this issue May 5, 2016

Global variables in local functions are not control flow analysed #8472

Closed

tinganho mentioned this issue May 10, 2016

Void type narrowing stops working in callbacks #8541

Closed

malibuzios changed the title ~~Suggestion: control flow type analysis could account for assignments through a closure~~ Suggestion: control flow type analysis could account for reassignments through a closure May 11, 2016

mhegazy mentioned this issue May 12, 2016

add object initializers #8545

Closed

microsoft locked and limited conversation to collaborators Jun 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: control flow type analysis could account for reassignments through a closure #8353

Suggestion: control flow type analysis could account for reassignments through a closure #8353

malibuzios commented Apr 28, 2016 •

edited

Loading

mhegazy commented Apr 28, 2016

mhegazy commented Apr 28, 2016

malibuzios commented Apr 28, 2016 •

edited

Loading

malibuzios commented Apr 28, 2016 •

edited

Loading

malibuzios commented Apr 29, 2016 •

edited

Loading

zpdDG4gta8XKpMCd commented Apr 29, 2016

mhegazy commented Apr 29, 2016

malibuzios commented Apr 30, 2016 •

edited

Loading

mhegazy commented Apr 30, 2016

zpdDG4gta8XKpMCd commented Apr 30, 2016 •

edited

Loading

malibuzios commented Apr 30, 2016 •

edited

Loading

RyanCavanaugh commented May 3, 2016

malibuzios commented May 11, 2016

Suggestion: control flow type analysis could account for reassignments through a closure #8353

Suggestion: control flow type analysis could account for reassignments through a closure #8353

Comments

malibuzios commented Apr 28, 2016 • edited Loading

mhegazy commented Apr 28, 2016

mhegazy commented Apr 28, 2016

malibuzios commented Apr 28, 2016 • edited Loading

malibuzios commented Apr 28, 2016 • edited Loading

malibuzios commented Apr 29, 2016 • edited Loading

zpdDG4gta8XKpMCd commented Apr 29, 2016

mhegazy commented Apr 29, 2016

malibuzios commented Apr 30, 2016 • edited Loading

mhegazy commented Apr 30, 2016

zpdDG4gta8XKpMCd commented Apr 30, 2016 • edited Loading

malibuzios commented Apr 30, 2016 • edited Loading

RyanCavanaugh commented May 3, 2016

malibuzios commented May 11, 2016

malibuzios commented Apr 28, 2016 •

edited

Loading

malibuzios commented Apr 28, 2016 •

edited

Loading

malibuzios commented Apr 28, 2016 •

edited

Loading

malibuzios commented Apr 29, 2016 •

edited

Loading

malibuzios commented Apr 30, 2016 •

edited

Loading

zpdDG4gta8XKpMCd commented Apr 30, 2016 •

edited

Loading

malibuzios commented Apr 30, 2016 •

edited

Loading