Split Painless AST into a "user" tree and an "ir" tree #51278

jdconrad · 2020-01-22T00:47:10Z

Background:

Painless has accumulated literal years of technical debt. The current AST handles all semantic checking and code generation. This AST contains a great deal of mutable state behaving as local input and output. The semantic phase changes the current AST tree structure for type and method resolution along with implicit casting and constant folding. The culmination of this is a great deal of mutable state with numerous edge cases making the current AST difficult to change/improve and particularly fragile to manage moving forward.

Goals:

By splitting the AST into two separate trees we can eventually make the "user" tree an immutable representation of the code the user actually typed in. The "user" tree then runs through a semantic phase from which an "ir" tree is generated. The "ir" tree is mutable for additional external phases that can either add additional state/features or optimization through "ir" tree manipulation.

Phases manipulating the "ir" tree are much easier to reason about since we will guarantee that the "ir" tree generated from the "user" tree is semantically correct. Each additional phase will run under the assumption that the "ir" tree remains semantically correct. In turn, we can then decouple customized logic from the "ir" tree such as additional exceptions to encapsulate a script's execute method, so the "ir" tree nodes are relatively generic. Instead, an additional phase would add exception nodes to wrap the execute method. Each of these phases has the potential to be per script context allowing for further optimization phases such as possibly using read only variables to reduce has lookups from Maps, or method folding for SQL, etc.

While it is possible to update the existing AST to accommodate for some optimizations, for the reasons outlined above it is at best incredibly difficult as maintaining correctness becomes even harder with each additional change.

What this PR does:

This PR takes the existing Painless AST responsible for both semantic checking and writing Java ASM and splits it into two separate trees. The first tree, termed the "user" tree, is now responsible for semantic checking and generation of a second "ir" tree. The second tree, termed the "ir" tree, is responsible for generating the Java ASM bytecode. This change takes the nodes in nearly a 1:1 ratio with the exception of some improved super classes for the "ir" nodes to help with reduction of boilerplate getters/setters. The Painless AST remains mutable for this PR. This change simply takes the existing AST nodes, splits them into a "user" node and an equivalent "ir" node. Each "user" node will generate it's equivalent "ir" node during a separate phase, but eventually this will be combined into a single phase.

This PR is excessively large, what's going on?:

This is the minimum (or very close to it) required to effectively split the trees, and have them actually wired together. There is a huge amount of boilerplate accounting for the large increase in number of lines mostly due to making the "ir" nodes in a builder-style format, and then wiring each "user" node to generate an equivalent "ir" node. Each piece of tree structure and data in the "ir" nodes is available for manipulation to allow for maximum flexibility during external phases. This also allows the "user" tree to easily generate "ir" nodes as it gathers each piece of data necessary for code generation.

The code here is a bit raw, but this is an important first step to achieve the goals outlined above. The code in this PR generates scripts that are identical to scripts generated by the current code.

Given that this really is giant, one alternative strategy to break this up would be to check in 10-20 "ir" nodes at a time through 5-10 PRs leaving them isolated and inactive, then have a PR to wire the two trees together.

elasticmachine · 2020-01-22T00:47:12Z

Pinging @elastic/es-core-infra (:Core/Infra/Scripting)

rjernst

This looks pretty good. I like the new design, and while I understand this is only the start, the separation should make adding new optimizations much easier long term.

One general comment: Could we keep setters as void? I see a mix of a builder pattern for the setters, but the nodes are not builders, they are simply mutable.

jdconrad · 2020-01-22T18:25:38Z

@rjernst Thank you for the feedback. I will make the change to remove the builder pattern in this case as after speaking with you I see how this is confusing.

jdconrad · 2020-01-23T00:41:16Z

@rjernst I've removed the builder pattern from the "ir" nodes in favor of standard getters and setters. Please take a look again when you have a chance.

modules/lang-painless/src/main/java/org/elasticsearch/painless/PainlessScriptEngine.java

stu-elastic · 2020-01-23T18:47:33Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/ir/ArgumentsNode.java

+
+    /* ---- begin tree structure ---- */
+
+    private final List<ExpressionNode> argumentNodes = new ArrayList<>();


These are the actual arguments, correct? If so, consider renaming to arguments. It's a bit confusing to have an ArgumentsNode type with field argumentNodes.

I'm open to suggestions here. I know the names are a bit confusing and clash, but they do both describe accurately what each item is. I have divided the data in the "ir" nodes with two types of data - tree structure and local data. All tree structure members end with Node to help differentiate this from typical data. This also creates a general consistency between all nodes.

stu-elastic · 2020-01-23T18:51:56Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/ir/AssignmentNode.java

+    private boolean post;
+    private Operation operation;
+    private boolean read;
+    private boolean cat;


What is a cat here?

cat is short for concatenation for String types. These names are copied directly from their equivalent "user" node, so I would prefer to leave this for now as mechanical, but renaming should be a future change.

It's clear when reading the code further down, but a word or two as a comment would be nice.

Added a comment here and in BinaryMathNode that also does concatenations.

stu-elastic · 2020-01-23T19:05:09Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/ir/ArgumentsNode.java

+
+    /* ---- end tree structure */
+
+    public ArgumentsNode() {


Why is a default constructor insufficient?

I was planning to have more than one constructor early on in the refactor, but wanted to show it was okay to create nodes from scratch. I will remove these as they are no longer necessary.

jdconrad · 2020-01-23T20:23:16Z

@stu-elastic Thank you for the review. I have removed all the default constructors from the "ir" nodes.

stu-elastic · 2020-01-23T21:04:21Z

modules/lang-painless/src/main/java/org/elasticsearch/painless/ir/BraceSubNode.java

+
+    @Override
+    protected int accessElementCount() {
+        return 2;


Can ya comment on this?

Sure, happy to - currently, all of the nodes that contain code allowing for the storage of values such as variables or fields have several additional methods attached to them including accessElementCount. These methods are used by the AssignmentNode to do the majority of the work. AssignmentNode is also responsible for two additional items - compound assignment and a value being read from either pre or post assignment (++ and -- operator or x = y = z where y must be read from post assignment). To do this all the storeable nodes follow a common pattern to do compound assignment, etc. This requires knowledge of how many ASM stack elements have been placed on the stack in order to access the actual value which is what accessElementCount is returning. As an example in this case, BraceSubNode may refer to an array access where on the stack the array reference along with the index is placed prior to accessing the actual value. accessElementCount refers to these two values and returns 2. This allows for some shortcutting by AssignmentNode to re-read the value if necessary. Take for instance x.y.z[2] += 1;. To access z[2], we have already accessed x and y. If we didn't shortcut straight to z[2] again to write the value we would have to double access x and y. Hopefully, this makes some sense.

Edit: I don't want to expend too much effort here adding additional comments because the intention with further refactoring is for this specific code to go away in favor of storeable nodes getting more responsibility from assignment.

stu-elastic

Great start dude.

jdconrad · 2020-01-23T22:29:16Z

@stu-elastic Thank you again for the reviews.

jdconrad · 2020-01-24T16:12:59Z

@rjernst Thanks for the review again as well.

jdconrad added WIP :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >refactoring v8.0.0 labels Jan 22, 2020

jdconrad requested review from rjernst and stu-elastic January 22, 2020 00:47

jdconrad added 19 commits January 22, 2020 08:00

copy nodes to split ir and ast

ceafd7c

converted some ast nodes to ir nodes

bf21ab1

converted more nodes

71add6c

checkpoint

b6ccc78

converted more nodes

754d2d9

completion of expression node conversion

fd96614

convert all prefix nodes

b6c7d8e

partially changed node data to user getters/setters

f68872a

partially changed data to be mutable

c18a182

converted more nodes to ir

0bcc0c7

completeion of first pass of splitting nodes

58f8e6d

fixes

6349c67

fix subtle issue with unboxing def

ab969e4

move script root

86dc9a7

remove class

6149e43

add setters with covariant return types for all nodes

32b8b8b

build ir tree from ast

a9118e7

fix bugs/tests

058c416

remove bad refactor of initializer to initializerNode

e4fcce2

jdconrad force-pushed the trees1 branch from 5439c0c to e4fcce2 Compare January 22, 2020 16:00

rjernst reviewed Jan 22, 2020

View reviewed changes

remove pseudo builder setters from ir nodes

0222ff5

jdconrad added 2 commits January 22, 2020 16:30

modify user nodes to work with ir setters

fbe2745

fix instanceof missing node

8a0a65c

jdconrad removed the WIP label Jan 23, 2020

stu-elastic reviewed Jan 23, 2020

View reviewed changes

modules/lang-painless/src/main/java/org/elasticsearch/painless/PainlessScriptEngine.java Show resolved Hide resolved

stu-elastic reviewed Jan 23, 2020

View reviewed changes

jdconrad added 2 commits January 23, 2020 12:14

Merge branch 'master' into trees1

7f123a4

remove all default constructors from ir nodes

1c03a51

stu-elastic reviewed Jan 23, 2020

View reviewed changes

added some comments for cat

56a366a

stu-elastic approved these changes Jan 23, 2020

View reviewed changes

jdconrad merged commit 70729f3 into elastic:master Jan 24, 2020

stu-elastic mentioned this pull request Mar 18, 2020

Painless Compiler Extensibility #53702

Open

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split Painless AST into a "user" tree and an "ir" tree #51278

Split Painless AST into a "user" tree and an "ir" tree #51278

jdconrad commented Jan 22, 2020

elasticmachine commented Jan 22, 2020

rjernst left a comment

jdconrad commented Jan 22, 2020

jdconrad commented Jan 23, 2020

stu-elastic Jan 23, 2020

jdconrad Jan 23, 2020 •

edited

Loading

stu-elastic Jan 23, 2020

jdconrad Jan 23, 2020

stu-elastic Jan 23, 2020

jdconrad Jan 23, 2020

stu-elastic Jan 23, 2020

jdconrad Jan 23, 2020

jdconrad commented Jan 23, 2020

stu-elastic Jan 23, 2020

jdconrad Jan 23, 2020 •

edited

Loading

stu-elastic left a comment

jdconrad commented Jan 23, 2020

jdconrad commented Jan 24, 2020


		/* ---- begin tree structure ---- */

		private final List<ExpressionNode> argumentNodes = new ArrayList<>();

Split Painless AST into a "user" tree and an "ir" tree #51278

Split Painless AST into a "user" tree and an "ir" tree #51278

Conversation

jdconrad commented Jan 22, 2020

elasticmachine commented Jan 22, 2020

rjernst left a comment

Choose a reason for hiding this comment

jdconrad commented Jan 22, 2020

jdconrad commented Jan 23, 2020

Choose a reason for hiding this comment

jdconrad Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdconrad commented Jan 23, 2020

Choose a reason for hiding this comment

jdconrad Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

stu-elastic left a comment

Choose a reason for hiding this comment

jdconrad commented Jan 23, 2020

jdconrad commented Jan 24, 2020

jdconrad Jan 23, 2020 •

edited

Loading

jdconrad Jan 23, 2020 •

edited

Loading