Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuleNode is not typed #24

Open
xukai92 opened this issue May 13, 2020 · 5 comments
Open

RuleNode is not typed #24

xukai92 opened this issue May 13, 2020 · 5 comments

Comments

@xukai92
Copy link
Contributor

xukai92 commented May 13, 2020

I'm new to this package, but I hit some performance bottleneck due to type instability. Particularly, RuleNode is not typed, so does get_executable (e.g.

root = (rulenode._val != nothing) ?
). As a result, programs that evaluate the tree under a grammar are not typed and tends to have performance issues.

By looking at

mutable struct RuleNode
ind::Int # index in grammar
_val::Any #value of _() evals
children::Vector{RuleNode}
end
, it seems to me that it should be possible to correctly type RuleNode because we can know the type of _val from a grammar - if a grammar is specificed. This is actually done in the the codes below the struct definition.

Does it make sense to support such things, either in RuleNode or another new struct (e.g. TypedRuleNode?

@mykelk
Copy link
Member

mykelk commented May 14, 2020

Any thoughts @rcnlee and @mossr ?

@rcnlee
Copy link
Collaborator

rcnlee commented May 15, 2020

The type (lefthand side) of the grammar is largely just a name and in general doesn't correspond to the actual Julia type that we would see during execution. One way to type RuleNode is to make it parameterized, i.e., RuleNode{T}. It would add some complexity, but I think it's doable. I can put something together. Would you be able to help me compare the performances?

@rcnlee
Copy link
Collaborator

rcnlee commented May 16, 2020

OK, I parameterized RuleNode and pushed it to the dev branch. This change breaks the NodeRecycler (which is a mempool to enable reuse of RuleNodes), but all other tests pass. Can you see if this makes a huge performance gain?
There are some other experimental features for performance: SymbolTable; and NodeRecycler. Have you tried using those features (on the master branch)?

Update: My informal testing is finding master to be slightly faster than dev, so the change isn't adding any benefit.

@xukai92
Copy link
Contributor Author

xukai92 commented May 16, 2020

Thanks for the quick response. I wil give a try on the dev branch.

There are some other experimental features for performance: SymbolTable; and NodeRecycler. Have you tried using those features (on the master branch)?

Ragrding this, I guess we in fact need to make SymbolTable typed (but not really RuleNode as I originally thought) so that interpret can know the correct type. Would it be possible to also support use a named tuple for SymbolTable (rather than a dictionary)? (also see my quick benchmark in the end)

Update: My informal testing is finding master to be slightly faster than dev, so the change isn't adding any benefit.

How are you testing now? My guess is that unless the whole pipeline (SymbolTable, get_executable, Core.eval) is typed, there would not be obvious performance gain. I did checked the upper bound by comparing the pipeline version against a pure Julia function (f(a,b,c) = a * b / c), and the pipeline version can be more than 100x slower.

Below is an example

using BenchmarkTools, ExprOptimization

const GRAMMAR = @grammar begin
    Real = m1
    Real = m2
    Real = r²
    Real = Real + Real
    Real = Real - Real
    Real = Real * Real
    Real = Real / Real
end

const tree = RuleNode(7, [RuleNode(6, [RuleNode(2), RuleNode(1)]), RuleNode(3)])
const symtab = SymbolTable(GRAMMAR)
const ex = get_executable(tree, GRAMMAR)

f(m1, m2, r²) = m1 * m2 /function f_pipeline(m1, m2, r²)
    symtab[:m1] = m1
    symtab[:m2] = m2
    symtab[:r²] =return Core.eval(symtab, ex)
end

f_alloc(symtab) = symtab[:m1] * symtab[:m2] / symtab[:r²]

function f_nt(m1, m2, r²)
    symtab = (m1=m1, m2=m2, r²=r²)
    return f_alloc(symtab)
end

const symtab_fake = Dict{Symbol,Any}(:m1 => nothing, :m2 => nothing, :r² => nothing)

function f_dict(m1, m2, r²)
    symtab_fake[:m1] = m1
    symtab_fake[:m2] = m2
    symtab_fake[:r²] =return f_alloc(symtab_fake)
end

let m1=rand(), m2=rand(), r²=rand()
    @btime f($m1, $m2, $r²)
    @btime f_pipeline($m1, $m2, $r²)
    @btime f_nt($m1, $m2, $r²)
    @btime f_dict($m1, $m2, $r²)
end

gives

  0.558 ns (0 allocations: 0 bytes)
  233.022 ns (5 allocations: 80 bytes)
  2.165 ns (0 allocations: 0 bytes)
  134.421 ns (5 allocations: 80 bytes)
  • The first number is simply the upper bound which we cannot hit.
  • The second number is the performance before checking the new master.
  • The thrid is a named tuple version of a fake symbol table.
  • The forth is a dictionary version of a fake symbol table.

By comparing 3 and 4 (and see that 4 and 2 is in the same magnitude), it makes me believe the fact of symbol table is not typed might be the root cause of slowness. How do you think?


UPDATE: I update the dictionary version of fake symbol table to reuse the dictionary.

@rcnlee
Copy link
Collaborator

rcnlee commented May 17, 2020

I was only interested in testing the effect of changing RuleNode to parameterized, so I just called rand() many times and timed that. I think the reason that change doesn't help is because children now needs to be a vector of abstract type.

Interesting analysis! I think there are a few points to keep in mind. 2 is slower than 4 because 4 has the advantage of a precompiled function (compile time is not being counted), whereas 2 is straight interpreting. A second reason is that 2, in addition to needing to walk the tree, needs to look up the function calls as well (*,/,+), which 4 isn't doing. Once things are compiled, running a function is fast, but compiling actually takes a long time. Not only that, compiling a lot of functions, which you need in expression optimization, actually increasingly slows down julia. 3 is also benefitting from the function compilation as well.

I think the main problem that is preventing an easy switch to named tuples is that the symbol table needs to be mutable. It contains all the symbols from the grammar, which are fixed (for example the operation *, /, + in your example), but argument variables need to be loaded in during execution. Expressions can also make assignments which write to the symbol table. It wouldn't make sense to recreate the entire symbol table as an immutable every time. One possible way could be to maintain two internal symbol tables, one for the fixed stuff and one for the changing variables. This would require a bit of work modifying the current interpreter and would probably inherit a cost because now every lookup might be two lookups. There may be a net benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants