`RuleNode` is not typed #24

xukai92 · 2020-05-13T22:40:14Z

I'm new to this package, but I hit some performance bottleneck due to type instability. Particularly, RuleNode is not typed, so does get_executable (e.g.

ExprRules.jl/src/ExprRules.jl

Line 370 in 6c983e5

root = (rulenode._val != nothing) ?

). As a result, programs that evaluate the tree under a grammar are not typed and tends to have performance issues.

By looking at

ExprRules.jl/src/ExprRules.jl

Lines 234 to 238 in 6c983e5

    
           mutable struct RuleNode 
        
               ind::Int # index in grammar 
        
               _val::Any  #value of _() evals 
        
               children::Vector{RuleNode} 
        
           end

, it seems to me that it should be possible to correctly type RuleNode because we can know the type of _val from a grammar - if a grammar is specificed. This is actually done in the the codes below the struct definition.

Does it make sense to support such things, either in RuleNode or another new struct (e.g. TypedRuleNode?

The text was updated successfully, but these errors were encountered:

mykelk · 2020-05-14T02:27:03Z

Any thoughts @rcnlee and @mossr ?

rcnlee · 2020-05-15T17:32:54Z

The type (lefthand side) of the grammar is largely just a name and in general doesn't correspond to the actual Julia type that we would see during execution. One way to type RuleNode is to make it parameterized, i.e., RuleNode{T}. It would add some complexity, but I think it's doable. I can put something together. Would you be able to help me compare the performances?

rcnlee · 2020-05-16T07:30:27Z

OK, I parameterized RuleNode and pushed it to the dev branch. This change breaks the NodeRecycler (which is a mempool to enable reuse of RuleNodes), but all other tests pass. Can you see if this makes a huge performance gain?
There are some other experimental features for performance: SymbolTable; and NodeRecycler. Have you tried using those features (on the master branch)?

Update: My informal testing is finding master to be slightly faster than dev, so the change isn't adding any benefit.

xukai92 · 2020-05-16T22:32:55Z

Thanks for the quick response. I wil give a try on the dev branch.

There are some other experimental features for performance: SymbolTable; and NodeRecycler. Have you tried using those features (on the master branch)?

Ragrding this, I guess we in fact need to make SymbolTable typed (but not really RuleNode as I originally thought) so that interpret can know the correct type. Would it be possible to also support use a named tuple for SymbolTable (rather than a dictionary)? (also see my quick benchmark in the end)

Update: My informal testing is finding master to be slightly faster than dev, so the change isn't adding any benefit.

How are you testing now? My guess is that unless the whole pipeline (SymbolTable, get_executable, Core.eval) is typed, there would not be obvious performance gain. I did checked the upper bound by comparing the pipeline version against a pure Julia function (f(a,b,c) = a * b / c), and the pipeline version can be more than 100x slower.

Below is an example

using BenchmarkTools, ExprOptimization

const GRAMMAR = @grammar begin
    Real = m1
    Real = m2
    Real = r²
    Real = Real + Real
    Real = Real - Real
    Real = Real * Real
    Real = Real / Real
end

const tree = RuleNode(7, [RuleNode(6, [RuleNode(2), RuleNode(1)]), RuleNode(3)])
const symtab = SymbolTable(GRAMMAR)
const ex = get_executable(tree, GRAMMAR)

f(m1, m2, r²) = m1 * m2 / r²

function f_pipeline(m1, m2, r²)
    symtab[:m1] = m1
    symtab[:m2] = m2
    symtab[:r²] = r²
    return Core.eval(symtab, ex)
end

f_alloc(symtab) = symtab[:m1] * symtab[:m2] / symtab[:r²]

function f_nt(m1, m2, r²)
    symtab = (m1=m1, m2=m2, r²=r²)
    return f_alloc(symtab)
end

const symtab_fake = Dict{Symbol,Any}(:m1 => nothing, :m2 => nothing, :r² => nothing)

function f_dict(m1, m2, r²)
    symtab_fake[:m1] = m1
    symtab_fake[:m2] = m2
    symtab_fake[:r²] = r²
    return f_alloc(symtab_fake)
end

let m1=rand(), m2=rand(), r²=rand()
    @btime f($m1, $m2, $r²)
    @btime f_pipeline($m1, $m2, $r²)
    @btime f_nt($m1, $m2, $r²)
    @btime f_dict($m1, $m2, $r²)
end

gives

  0.558 ns (0 allocations: 0 bytes)
  233.022 ns (5 allocations: 80 bytes)
  2.165 ns (0 allocations: 0 bytes)
  134.421 ns (5 allocations: 80 bytes)

The first number is simply the upper bound which we cannot hit.
The second number is the performance before checking the new master.
The thrid is a named tuple version of a fake symbol table.
The forth is a dictionary version of a fake symbol table.

By comparing 3 and 4 (and see that 4 and 2 is in the same magnitude), it makes me believe the fact of symbol table is not typed might be the root cause of slowness. How do you think?

UPDATE: I update the dictionary version of fake symbol table to reuse the dictionary.

rcnlee · 2020-05-17T05:50:10Z

I was only interested in testing the effect of changing RuleNode to parameterized, so I just called rand() many times and timed that. I think the reason that change doesn't help is because children now needs to be a vector of abstract type.

Interesting analysis! I think there are a few points to keep in mind. 2 is slower than 4 because 4 has the advantage of a precompiled function (compile time is not being counted), whereas 2 is straight interpreting. A second reason is that 2, in addition to needing to walk the tree, needs to look up the function calls as well (*,/,+), which 4 isn't doing. Once things are compiled, running a function is fast, but compiling actually takes a long time. Not only that, compiling a lot of functions, which you need in expression optimization, actually increasingly slows down julia. 3 is also benefitting from the function compilation as well.

I think the main problem that is preventing an easy switch to named tuples is that the symbol table needs to be mutable. It contains all the symbols from the grammar, which are fixed (for example the operation *, /, + in your example), but argument variables need to be loaded in during execution. Expressions can also make assignments which write to the symbol table. It wouldn't make sense to recreate the entire symbol table as an immutable every time. One possible way could be to maintain two internal symbol tables, one for the fixed stuff and one for the changing variables. This would require a bit of work modifying the current interpreter and would probably inherit a cost because now every lookup might be two lookups. There may be a net benefit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`RuleNode` is not typed #24

`RuleNode` is not typed #24

xukai92 commented May 13, 2020

mykelk commented May 14, 2020

rcnlee commented May 15, 2020

rcnlee commented May 16, 2020 •

edited

Loading

xukai92 commented May 16, 2020 •

edited

Loading

rcnlee commented May 17, 2020 •

edited

Loading

RuleNode is not typed #24

RuleNode is not typed #24

Comments

xukai92 commented May 13, 2020

mykelk commented May 14, 2020

rcnlee commented May 15, 2020

rcnlee commented May 16, 2020 • edited Loading

xukai92 commented May 16, 2020 • edited Loading

rcnlee commented May 17, 2020 • edited Loading

`RuleNode` is not typed #24

`RuleNode` is not typed #24

rcnlee commented May 16, 2020 •

edited

Loading

xukai92 commented May 16, 2020 •

edited

Loading

rcnlee commented May 17, 2020 •

edited

Loading