Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash when undoing deletion of function definition code blocks #11220

Merged
merged 12 commits into from
Nov 4, 2020

Conversation

aparajit-pratap
Copy link
Contributor

@aparajit-pratap aparajit-pratap commented Oct 29, 2020

Purpose

Fix crash when deleting/modifying codeblocks

What is causing the crash:

  1. There is code that assumes that Lists of CodeBLocks should have their indexes correspond to the CodeBlock.codeBlockId.
    Basically code that does the following: CodeBlockList[someCodeBlockId]
    This assumptions is wrong because CodeBlockLists can be altered (items removed at any index in the list). For example when you delete a code block that contains a function declaration (see
    CompleteCodeBlockList.RemoveAll(x => x.codeBlockId == cbID);
    )
  2. Not having null checks on arrays of symbolTables or ProcedureTables (or instruction streams) that can contain null items.

More than the crash:
There is code that is error prone (multiple arrays that have to kept in sync, array indexes passed/stored in different objects).
I propose to create a new task to refactor the code - maybe add a Dictionary to hold all data that needs to be in sync. Also maybe change the data type of codeblockIds (currently integer - CodeBlock.codeBlockId) to a GUID or a dedicated class so that there will be no confusion that codeblockIds can be used as indexes in arrays or keys in Dictionaries.

My proposed solution:
Minimal changes that resole the crash.

  1. Null checks when iterating on arrays of SymbolTables, ProcedureTables and InstructionStreams.
  2. Swap out all code that assumed CodeBlockArrays can use codeblockIds as their indexes with the CoreUtil.GetCodeBlock() .
    CoreUtil.GetCodeBlock() basically iterates the codeblock arrays and tries to match each one with the code block id.

Potential issues with proposal:
CoreUtil.GetCodeBlock(codeBlockList, codeBlockId) may cause performance degradation - although, from my experiments/measurements, I only noticed insignificant increases. Only tested on hundreds of codeblocks.
--Update-- performance should not be an issue, because CompleteCodeBlocks contains a flat list of all code blocks. When searching for a code block through CoreUtil.GetCodeBlock no recursive calls should be made if the blockId is valid.

Declarations

Check these if you believe they are true

  • The codebase is in a better state after this PR
  • Is documented according to the standards
  • The level of testing this PR includes is appropriate
  • User facing strings, if any, are extracted into *.resx files
  • All tests pass using the self-service CI.
  • Snapshot of UI changes, if any.
  • Changes to the API follow Semantic Versioning and are documented in the API Changes document.
  • This PR modifies some build requirements and the readme is updated

Reviewers

(FILL ME IN) Reviewer 1 (If possible, assign the Reviewer for the PR)

(FILL ME IN, optional) Any additional notes to reviewers or testers.

FYIs

(FILL ME IN, Optional) Names of anyone else you wish to be notified of

@aparajit-pratap
Copy link
Contributor Author

aparajit-pratap commented Oct 29, 2020

@pinzart I think this is the PR you intended to make. If so, I made this on your behalf so that we can start reviewing it. Or have you made another one I'm unaware of?

@pinzart90
Copy link
Contributor

pinzart90 commented Oct 30, 2020

@aparajit-pratap this is the PR I intended to create, so thanks.
I cannot seem to find you in the reviewer list though,,,,
Oh...duh, you are the creator of the PR :)

@pinzart90 pinzart90 requested a review from sm6srw October 30, 2020 15:10
@pinzart90 pinzart90 marked this pull request as ready for review October 30, 2020 18:45
@pinzart90 pinzart90 added WIP DNM Do not merge. For PRs. labels Oct 30, 2020
@pinzart90 pinzart90 added PTAL Please Take A Look 👀 and removed DNM Do not merge. For PRs. WIP labels Nov 2, 2020
@pinzart90 pinzart90 changed the title account for deletion of nodes crash when undoing deletion of function definition code blocks Nov 2, 2020
Copy link
Contributor

@sm6srw sm6srw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! And I do support a follow up refactoring task.

continue;
}

var cb = CoreUtils.GetCodeBlock(exe.CompleteCodeBlocks, currentLangBlock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this still have the potential to be slow if the input size is large? Programs can be of arbitrarily large size can't they?

What about a version of Coreutils.GetCodeBlock that takes an initial index to check?

I guess eventually we'd want to stop using arrays and start using a sorted or balanced data structure with fast lookup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole issue is that codeBlockIds are not equivalent to the indexes in CompleteCodeBlocks.
A version of Coreutils.GetCodeBlock that takes an initial index to check could return wrong data (even though we could guard against invalid indexes)

As I mentioned in the description, I want to refactor these data structs and replace with something more suitable to the access patterns (ex. simple Dictionionary<codeBlockId, data> )

For the first question, yes there is potential to degrade if we have graphs with a large number of codeblocks (hundreds of thousands).
There is an opportunity to optimize the existing changes I made (like moving CoreUtils.GetCodeBlock outside the for in for loops). I would still leave it as it is even if it gets a bit slower.
Otherwise we would have to change a lot of code to refactor in the current PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation - for a bit further clarification - when you say have graphs with a large number of codeblocks (hundreds of thousands). you are referring to codeblocks as codeblocks in the VM/AST right, not codeblock nodes?

IE - a single node might generate many codeblocks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IE - a single node might generate many codeblocks?
Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hoping to do the refactor as soon as possible. Do you think that because of possibility of perf degradation due to this PR would we should include the refactor here as well ?

Copy link
Contributor Author

@aparajit-pratap aparajit-pratap Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjkkirschner yes, codeblocks here refer to a code scope. For example the body of a for, while, if, else statements are all codeblocks by themselves. Each one of these codeblocks has a symbol table, procedure table, instruction stream, etc.
For example, the following DS code snippet:

def foo(a, b)
{
	return [Imperative]
	{
		list = a..b;
		x = [];
		count = 0;
		for(i in list)
		{
			x[count] = i;
			count = count+1;
		}
		return x;
	}
};

x = true;
l1 = foo(1, 10);
l2 = foo(10, 20);

[Imperative]
{
	if(x==true)
	{
		return l1;
	}
	else
	{
		return l2;
	}
};

generates the following instruction set. Each instruction is prefixed with a number like [i.1.0], which is of the format [associative: a/imperative: i.codeblock index.instruction index]:

def foo:var[]..[](a:var, b:var)
{
        ==============Start Node==============
        [i.1.0]push a
        [i.1.1]push b
        [i.1.2]push null
        [i.1.3]push 0
        [i.1.4]push False
        [i.1.5]push False
        [i.1.6]push 0[depth]
        [i.1.7]callr %generate_range
        [i.1.8]push _rx
        [i.1.9]setexpuid 19[exprId]
        [i.1.10]pop list
        [i.1.11]setexpuid 20[exprId]
        [i.1.12]newarr 0
        [i.1.13]pop x
        [i.1.14]setexpuid 21[exprId]
        [i.1.15]push 0
        [i.1.16]pop count
        [i.1.17]setexpuid 22[exprId]
        [i.2.18]pushb 2
        [i.2.19]push null
        [i.2.20]pop i
        [i.2.21]setexpuid 23[exprId]
        [i.2.22]push list
        [i.2.23]pop %forloop_expr_0
        [i.2.24]setexpuid 24[exprId]
        [i.2.25]push 0
        [i.2.26]pop %forloop_count_0
        [i.2.27]setexpuid 25[exprId]
        [i.2.28]pushvarsize %forloop_expr_0
        [i.2.29]pop %forloop_key_0
        [i.2.30]push %forloop_key_0
        [i.2.31]push null
        [i.2.32]nq
        [i.2.33]cjmp  L(-1)
        [i.2.34]setexpuid 26[exprId]
        [i.3.35]pushb 3
        [i.3.36]push %forloop_expr_0
        [i.3.37]push %forloop_count_0
        [i.3.38]push 0[depth]
        [i.3.39]callr ValueAtIndexInForLoop
        [i.3.40]push _rx
        [i.3.41]pop i
        [i.3.42]setexpuid 27[exprId]
        [i.3.43]push %forloop_key_0
        [i.3.44]push 1
        [i.3.45]add
        [i.3.46]pop %forloop_key_0
        [i.3.47]setexpuid 28[exprId]
        [i.3.48]push %forloop_count_0
        [i.3.49]push 1
        [i.3.50]add
        [i.3.51]pop %forloop_count_0
        [i.3.52]setexpuid 29[exprId]
        [i.3.53]push i
        [i.3.54]push count
        [i.3.55]push 1[dim]
        [i.3.56]setelement x
        [i.3.57]setexpuid 30[exprId]
        [i.3.58]push count
        [i.3.59]push 1
        [i.3.60]add
        [i.3.61]pop count
        [i.3.62]setexpuid 31[exprId]
        [i.3.63]retcn
        [i.2.64]jmp  L1(30)
        [i.2.65]retcn
        [i.1.66]push x
        [i.1.67]retb
        [i.1.68]setexpuid 32[exprId]
        [a.0.2]bounce 1, 0
        [a.0.3]push _rx
        [a.0.4]return
        ==============End Node==============
}

==============Start Node==============
[a.0.219]push True
[a.0.220]pop x
[a.0.221]dep 1[ExprUID] False[SSA]
[a.0.222]jdep
==============End Node==============
==============Start Node==============
[a.0.223]pushlevel @0
[a.0.224]popguides 0
[a.0.225]push 1
[a.0.226]pushlevel @0
[a.0.227]popguides 0
[a.0.228]push 10
[a.0.229]push 0[depth]
[a.0.230]callr foo
[a.0.231]push _rx
[a.0.232]pop %t_0_bfef6a96488243a381505b36007a5c28
[a.0.233]dep 2[ExprUID] True[SSA]
[a.0.234]jdep
==============End Node==============
==============Start Node==============
[a.0.235]push %t_0_bfef6a96488243a381505b36007a5c28
[a.0.236]pop l1
[a.0.237]dep 2[ExprUID] False[SSA]
[a.0.238]jdep
==============End Node==============
==============Start Node==============
[a.0.239]pushlevel @0
[a.0.240]popguides 0
[a.0.241]push 10
[a.0.242]pushlevel @0
[a.0.243]popguides 0
[a.0.244]push 20
[a.0.245]push 0[depth]
[a.0.246]callr foo
[a.0.247]push _rx
[a.0.248]pop %t_1_bfef6a96488243a381505b36007a5c28
[a.0.249]dep 3[ExprUID] True[SSA]
[a.0.250]jdep
==============End Node==============
==============Start Node==============
[a.0.251]push %t_1_bfef6a96488243a381505b36007a5c28
[a.0.252]pop l2
[a.0.253]dep 3[ExprUID] False[SSA]
[a.0.254]jdep
==============End Node==============
==============Start Node==============
[i.4.0]push x
[i.4.1]push True
[i.4.2]eq
[i.4.3]cjmp  L(-1)
[i.4.4]setexpuid 33[exprId]
[i.5.5]pushb 5
[i.5.6]push l1
[i.5.7]retb
[i.5.8]setexpuid 34[exprId]
[i.5.9]retcn
[i.4.10]jmp  L1(-1)
[i.6.11]pushb 6
[i.6.12]push l2
[i.6.13]retb
[i.6.14]setexpuid 35[exprId]
[i.6.15]retcn
[i.4.16]push null
[i.4.17]retb
[a.0.255]bounce 4, 0
[a.0.256]push _rx
[a.0.257]pop %tempLangBlock9
[a.0.258]dep -1[ExprUID] False[SSA]
[a.0.259]jdep
==============End Node==============
[a.0.260]push null
[a.0.261]retb

@pinzart90
Copy link
Contributor

@aparajit-pratap @mjkkirschner I added the least amount of refactor changes to address the possible performance issues of iterating over CompleteCodeBlock lists.
Will continue with refactor in different PR.

@@ -214,7 +214,7 @@ public struct ErrorEntry
public List<CodeBlock> CodeBlockList { get; set; }
// The Complete Code Block list contains all the code blocks
// unlike the codeblocklist which only stores the outer most code blocks
public List<CodeBlock> CompleteCodeBlockList { get; set; }
public SortedDictionary<int, CodeBlock> CompleteCodeBlockList { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinzart I don't think we can do this. Unfortunately, lot's of code in here appears to be unnecessarily public. Even though we are in the "engine" we do not know who is using these public members, and we have stated publicly we will follow semantic versioning.

It's entirely possible an extension could access these data structures.

What about obsoleting this member, having it return the values inside the new dictionary, and adding a new one - and lastly consider if the new member should actually be public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..I had hoped the VM was not avavilablr to api users.
But I se you are correct, I updated the code. will add obsolete stmts

@@ -46,7 +46,7 @@ public class Executable
public TypeSystem TypeSystem { get; set; }

public List<CodeBlock> CodeBlocks { get; set; }
public List<CodeBlock> CompleteCodeBlocks { get; set; }
public SortedDictionary<int, CodeBlock> CompleteCodeBlocks { get; set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same API issue.

src/Engine/ProtoCore/Core.cs Show resolved Hide resolved
core.CompleteCodeBlockList.RemoveAll(x => x.guid == bnode.guid);// && x.AstID == bnode.OriginalAstID);
foreach (var key in core.CompleteCodeBlockDict.Where(x => x.Value.guid == bnode.guid).Select(x => x.Key).ToArray())
{
core.CompleteCodeBlockDict.Remove(key);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about doing this:

core.CompleteCodeBlockDict = core.CompleteCodeBlockDict
  .Where(kv => !(kv.Value.guid == bnode.guid))
  .ToDictionary(kv => kv.Key, kv => kv.Value);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm..your suggestions does seem to have less operations
I'll give it a try. THanks

Copy link
Contributor

@pinzart90 pinzart90 Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...actually completeCodeBlockDict is a SortedDictionary.
Your code would have to do something like
core.CompleteCodeBlockDict = new SortedDictionary<int, CodeBlock>(core.CompleteCodeBlockDict.Where(kv => !(kv.Value.guid == bnode.guid)).ToDictionary(kv => kv.Key, kv => kv.Value));
It might not be worth it (creates a standard dictionary and allocates another SortedDict) also I find it a bit less obvious than the original.

To make it even more obvious/ easy to understand maybe this would work better:
var keysToRemove = core.CompleteCodeBlockDict.Where(x => x.Value.guid == bnode.guid).Select(x => x.Key).ToArray(); keysToRemove.ForEach(key => core.CompleteCodeBlockDict.Remove(key));

I don't think performance should be an issue here...

@aparajit-pratap
Copy link
Contributor Author

LGTM!

@pinzart90 pinzart90 merged commit 7a09d0c into master Nov 4, 2020
@pinzart90 pinzart90 deleted the codeblock_list_crash branch November 4, 2020 15:07
@@ -154,15 +154,14 @@ public static int GetSymbolIndex(ClassNode classNode, string name, int classScop
return Constants.kInvalidIndex;
}

public static List<int> GetAncestorBlockIdsOfBlock(int blockId, List<CodeBlock> codeblockList)
public static List<int> GetAncestorBlockIdsOfBlock(int blockId, SortedDictionary<int, CodeBlock> codeblocks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinzart I'm sorry I missed this - you can't change this public signature - please follow the same pattern of obsoleting this and adding an overload.

@@ -69,7 +69,7 @@ public static int GetUpcastCountTo(ClassNode from, ClassNode to, RuntimeCore run
//
// 2.3 Otherwise, classScope == kInvalidIndex && functionScope == kInvalidIndex
// Return public member in derived class, or public member in base classes
public static int GetSymbolIndex(ClassNode classNode, string name, int classScope, int functionScope, int blockId, List<CodeBlock> codeblockList, out bool hasThisSymbol, out ProtoCore.DSASM.AddressType addressType)
public static int GetSymbolIndex(ClassNode classNode, string name, int classScope, int functionScope, int blockId, SortedDictionary<int, CodeBlock> codeblocks, out bool hasThisSymbol, out ProtoCore.DSASM.AddressType addressType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pinzart I'm sorry I missed this - you can't change this public signature - please follow the same pattern of obsoleting this and adding an overload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PTAL Please Take A Look 👀
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants