-
-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization ideas #1669
Comments
Note that this 100% concerns rendering, and shouldn't leave very many significant changes in other files, other than to update DOM nodes. |
Another, very low hanging fruit is changing the IIFE style of the wrapper. JS engines delay the actual parsing of functions until call time, to improve load times. On load, they just run a basic "validator" parser on functions that does not generate IR or machine code. This leads to faster load times, since often most functions are not needed on page load. They have exceptions hardcoded for I'll have to find the exact incantations (a priori, if we use |
I'd appreciate if you could wait for the (WIP) #1653 fix to be merged before optimizing. It would also be nice to have tests for the various pool scenarios before we go wild on the engine. |
Copying my comment from the gist: My two cents: 1 - That "monstrosity" is the search space reduction algorithm for list diffs. Last I checked, none of the other libraries split this down, precisely for perf reasons. Can you provide some more concrete suggestions? I don't think "break into smaller functions so JIT can re-inline them into effectively the same thing" is a task worth pursuing. 2 - I've kept my hads off of that because afaik @pygy is working on that 3 - not sure I understand this one. You're saying we should instead copy everything from the new tree back to the old one? That makes no sense. For example, say you have data = [1,2,3] and in the view data.map(n => m("li", n)). Then you do data.shift(). This causes O(n) worth of copying when diffing, which is horrible. 4 - again, I don't understand the proposal. Are you saying we should create twice as many objects to enforce encapsulation of properties like .events? 5 - I don't understand this one either. Why is 4+ vnode types better than one? The vast majority of vnode properties apply to all vnode types, so conceptually dividing them doesn't really make any potential polymorphic types that much smaller, and it just ups the smartness requirement for the JIT engine. 6 - You keep saying vnodes are big, but they really aren't. They have 11 properties. By comparison, Inferno nodes have 8. FWIW, Dominic mentioned a while back that they went back to straight vdom, no vnode pooling. DOM pooling already happens with recycling. You're welcome to try to optimize it more but I already spent a fair amount of time fidgeting w/ that trying to lower vdom bench results (where recycling improvements can be tested most easily) and the numbers are very close to inferno already. At this point, I'm more interested in making sure it's correct than optimizing it Anyways, I don't mean to just come here and shoot the whole thing down, so I'm going to add my own brain dump of perf-related things: 1 - First and foremost, I have done ZERO IrHydra work on Mithril. That's the lowest hanging fruit performance-wise. |
I'll comment with a more detailed response on the rest later (doing this
from mobile), but I'd be shocked if you can even get a useful vdom library
that also validates as asm.js without resorting to something like
Emscripten. And even then, goodbye to easy JS use, because the Emscripten
JS API requires explicit memory management, too.
…On Thu, Mar 2, 2017, 15:44 Leo Horie ***@***.***> wrote:
Copying my comment from the gist:
My two cents:
1 - That "monstrosity" is the search space reduction algorithm for list
diffs. Last I checked, none of the other libraries split this down,
precisely for perf reasons. Can you provide some more concrete suggestions?
I don't think "break into smaller functions so JIT can re-inline them into
effectively the same thing" is a task worth pursuing.
2 - I've kept my hads off of that because afaik @pygy
<https://github.com/pygy> is working on that
3 - not sure I understand this one. You're saying we should instead copy
everything from the new tree back to the old one? That makes no sense. For
example, say you have data = [1,2,3] and in the view data.map(n => m("li",
n)). Then you do data.shift(). This causes O(n) worth of copying when
diffing, which is horrible.
4 - again, I don't understand the proposal. Are you saying we should
create twice as many objects to enforce encapsulation of properties like
.events?
5 - I don't understand this one either. Why is 4+ vnode types better than
one? The vast majority of vnode properties apply to all vnode types, so
conceptually dividing them doesn't really make any potential polymorphic
types that much smaller, and it just ups the smartness requirement for the
JIT engine.
6 - You keep saying vnodes are big, but they really aren't. They have 11
properties. By comparison, Inferno nodes have 8. FWIW, Dominic mentioned a
while back that they went back to straight vdom, no vnode pooling. DOM
pooling already happens with recycling. You're welcome to try to optimize
it more but I already spent a fair amount of time fidgeting w/ that trying
to lower vdom bench results (where recycling improvements can be tested
most easily) and the numbers are very close to inferno already. At this
point, I'm more interested in making sure it's correct than optimizing it
Anyways, I don't mean to just come here and shoot the whole thing down, so
I'm going to add my own brain dump of perf-related things:
1 - First and foremost, I have done ZERO IrHydra work on Mithril. That's
the lowest hanging fruit performance-wise.
2 - Mithril doesn't implement the kivi/ivi algorithm for reducing the
number of DOM shuffles in large mostly non-sorted
sorts. IMHO, this affects very few real use cases, but it is nonetheless a
known optimization that has not been implemented.
3 - Mithril currently completely chokes on the asmjs validator. Making it
not choke would likely uncover some addressable issues.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1669 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AERrBAnE84gPo2IPluos2EB0NBDPRpVKks5rhyo_gaJpZM4MQwFE>
.
|
@lhorie 3 - You can just mutate it to reflect the current state. i.e removing 3 from A |
@lhorie, re point 3, currently Having younger objects means that they are collected early, which leads to smaller full passes and smoother frame rates. |
@isiahmeadows re. 5, IIRC @lhorie benchmarked the cost of extra properties and it was deemed minor. |
@pygy re. 5 I'm speaking in terms of memory optimization, not speed optimization. Benchmarks aren't going to tell you that much, because they usually only render small trees. If you use the TodoMVC benchmark, you'd have to crank the list up to about 1000 to notice anything, because the DOM structure is so small. But heavier, real-world apps would notice this more. A good example of this is with incremental-dom: it is near the bottom in raw CPU benchmarks, but it still remains responsive in reality due to lower memory constraints, and it especially shines on mobile for that very reason. @lhorie Now that I'm here:
Examples?
That was pretty much my concrete suggestion. For a concrete example, consider this Acorn patch from a few years ago. It split up a large function with complex logic into smaller ones that the engine could handle more easily. I can tell you V8 isn't alone, although it's the most affected. Here's a few more concrete examples of how splitting this up could result in faster and easier-to-maintain code:
That was my plan, too. Just noted it for relevance, and I left implicit the assumption I wouldn't be alone in tackling this, if we were to progress.
No. I just mean these two assignments should be going the other way, and from
It won't double anything unless you're exclusively creating/removing vnodes, never updating them (which never occurs in practice, and is a horribly inefficient way to use any vdom library). And no, it does not enforce encapsulation, because const C = {view(vnode) {
if (tree == null) C.parent = vnode
return C.tree = m("div")
}
m.render(document.body, m(C))
m.render(document.body, m(C))
assert.equal(C.parent, C.tree)
And how often do those vnodes need retained? If we can reduce memory usage on updates, I suspect we'll close in even more on Inferno (even if it's a small amount). Plus, let's consider our mobile users, where memory is a much bigger concern. My big push has been trying to reduce memory where I can without slowing down overall CPU performance. And IMHO perf gains without tradeoffs are always nice. Even if CPU benchmarks are unaffected, reduced memory -> reduced GC -> improved responsiveness. |
There is a little benchmark-project, which includes mithril (maybe in some previous version) and contains some memory-checks, example-report here: https://rawgit.com/krausest/js-framework-benchmark/master/webdriver-ts/table.html Maybe it would be a good idea to specificly create a test or benchmark for memory-overhead? |
Obviously. And there are some key things in the renderer that we can reasonably guess should show up somewhere, even in just the renderer tests, like the
True, but I agree with you that this really isn't worth implementing.
And one more point on 6: this could be moved to the |
Maybe, but I feel we may have to specially construct that kind of test. The closest I've found to a decent usable benchmark is this (a variant is already checked in here, live version here), but I still feel that's somewhat lacking, since that never adds/removes nodes (just updates with random changes). Here's what I'd like to see in a benchmark:
Of course, this may require significant optimization to avoid burying the frame rate, so I'd probably write my own PRNG using the xorshift32 algorithm and write parts of the benchmark code in asm.js. Problem is, everyone is making benchmarks that are too simple, but real-world uses require better stress testing than just pushing everything through the same path each time. Oh, and the last test (combines all of them), I expect could bog down several frameworks. And the only way to optimize for it is by making an adaptive diff/patch algorithm. |
@isiahmeadows
js-repaint-perfs is a very poor benchmark for multiple reasons
js-framework-benchmark is unambiguously better in every sense. localvoid's uibench and vdom-benchmark are also pretty thorough but the feedback loop is harder to grok. i use dbmonster @ 0% as a quick-and-dirty test of pure framework overhead and the effect of various optimization experiments. it does help to quickly catch obvious regressions. |
@thysultan I know push, pop get O(1) complexity on a copy operation, but sort, reverse, shift, unshift, splice, etc can easily be worst case (O(n)). The claim here is that adding those potential costs in js land will be offset by gains in GC land. This claim, to me, falls into the realm of extraordinary claims that require extraordinary evidence.
@isiahmeadows I think you got that backwards. Benchmarks usually render large trees as an attempt to make measurable times big enough to drown out confounding factors. I would argue that rendering thousands of DOM elements is something a real app should actively avoid, if at all possible, because DOM costs increase linearly with element count and no amount of optimization can ever change that. People often talk about benchmarks in mobile, but seriously, why would any real life app want to spend battery chewing through thousands of things that can't possibly fit on a phone screen?
I'm not going to hunt down all of them, but here's inferno ( https://github.com/infernojs/inferno/blob/master/packages/inferno/src/DOM/patching.ts#L457 ) and kivi ( https://github.com/localvoid/kivi/blob/master/lib/vnode.ts#L1520 ).
Honestly, branch prediction has never been an optimization target for any perf work I've done, so I have no idea what magnitude of gains we're talking about here. I've only ever seen it come up in super convoluted micro benches in C. If you want to pursue that, it would be ideal to have some way of validating before/after performance before attempting. I don't agree with breaking down at the level you highlighted (because it would require the creation of extra temp objects, AND it would make mutations non-local), but I'm amenable to breaking out the payloads when a cursor hits a dead end (e.g. https://github.com/lhorie/mithril.js/blob/next/render/render.js#L177-L180 ). With that said, it's worth remembering that not all target engines might support the types of JIT optimizations you're looking to exploit (looking at you, IE 10).
I still don't understand. Simply flipping lhs and rhs on those would clobber state rather than keep it. I think you're underestimating what it would mean for vnode to remain referentially the same on every render in the way that vnode.state is. For example, what happens w/ onbeforeupdate(vnode, old)
C.parent isn't equal to C.tree. Are you saying it should be?
In Chrome? What about Firefox or IE? You're claiming that 4 types can still be optimized at least as well as 1 AND that we GC gains on top of that, in all engines. Again, I think this is an extraordinary claim and it needs some evidence. FWIW, I've experienced significant slowdowns in FF with as few as two identical structure objects during my tests last year, so I'm particularly skeptical about this claim.
Here's the problem I see with memory related efforts: I can measure the impact of object allocations in hyperscript micro-benchmarks (and hence why hyperscript code has such a complex memoization scheme), but dbmonster (which is supposed to be a very high stress test) on chrome never spikes over the initial 10mb that chrome sets aside as js heap memory. If I put on my C hat, the type of optimizations I do in hyperscript basically translates to looking at incurred costs of malloc/free and then doing things to avoid calling malloc/free. Most of the discussions here are about GC heuristics and CPU branch prediction, which might as well be voodoo magic in terms of how transparent/predictable they are with regards to what the machine is actually doing today, and what it will be doing in version X of browser Y tomorrow.
The irony here is that 99% of the time people complain about perf, they say "oh I have this 5000 element thing"... Let's keep things in perspective: these proposal are all micro-optimizations, of the 99% effort for 1% gain variety. To address real app problems, you need solutions of algorithmic / design nature; micro-optimizations don't do squat when going against linear complexity. |
@lhorie as a middle ground, you could keep the old vnodes but the new children arrays (and sorry for missing the point of your first remark). You'd probably still gain on the worse GC pauses. There would be a cost for moving objects around, though, but a priori minor since both arrays would be in cache at the point where the vnodes must be transferred. |
@lhorie Okay. I stand corrected for most of my ideas. I'll remain on my point that we should at least reuse the initial vnode state through the entire component lifetime, if anything to reduce the node API's memory footprint during updates. |
Doesn't that already happen? Do you mean in a specific case? |
@lhorie I mean specifically this, which currently does not happen AFAICT (note the conditional check): var vnode
var Comp = {
oninit(v) { vnode = v },
view(v) {
if (vnode !== v) throw new Error("Wrong vnode!")
return m("div", [v.attrs.count])
}
}
for (var i = 0; i < 10; i++) {
m.render(document.body, m(Comp, {count: i}))
} |
TL;DR: Let's do some heavy memory layout optimization to make GC more predictable and speed up creation/destruction. Less garbage is always nice, and engines can do some nice things if we let them.
See this gist for more details.
/cc @lhorie @pygy @tivac
The text was updated successfully, but these errors were encountered: