Optimization: a Stack/Heap approach #22

thysultan · 2017-06-21T21:11:51Z

Instead of

body {
  color: red
}

producing

[
  [RULE_START, 1],
    [SELECTOR, 'body'],
    [PROPERTY, 'color'],
    [VALUE, 'red']
  [RULE_END]
]

It could instead produce

HEAP = ['body', 'color', 'red']

[
RULE_START, 0,
SELECTOR, 0,
PROPERTY, 1,
VALUE, 2,
RULE_END, 2
]

Where the ints 0,1,2 are memory addresses for the corresponding values in the HEAP.

This way you can use a flat Uint32Array typed array for maximum throughput on performance.

The text was updated successfully, but these errors were encountered:

kof · 2017-06-21T21:21:26Z

Ah I got what you mean - using separate arrays for markers and props/values/selectors!

kof · 2017-06-21T21:22:03Z

Thats an interesting idea, we should add a benchmark for this as well.

kof · 2017-06-21T21:23:18Z

We could in theory end up with 3 arrays. Function refs, strings and markers. Each of them would be of one consistent type and markers even integers.

thysultan · 2017-06-21T21:28:18Z

It could also mean we could reuse selectors, properties and property values. i.e

body {
  color: red;
}

h1 {
color: red;
}

Would produce

HEAP = ['body', 'color', 'red', 'h1']

[
RULE_START, 0,
SELECTOR, 0,
PROPERTY, 1,
VALUE, 2,
RULE_END, 2,

RULE_START, 3,
SELECTOR, 3,
PROPERTY, 1,
VALUE, 2,
RULE_END, 3
]

kof · 2017-06-21T21:29:47Z

Oh so the HEAP array would become a collection of unique strings, neat! Not sure though how much it will bring after gzip, but def. worth considering!

thysultan · 2017-06-21T21:33:14Z

Yes gzip might make the KB benefit a hard sell but It should improve the memory that the runtime JS VM has to allocate and code it has to parse.

kof · 2017-06-21T21:34:37Z

Yeah, if it won't slow down parsing, definitely worth optimizing!

thysultan · 2017-06-21T23:15:25Z

WIP of a compiler that produces this from css strings. https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html

kof · 2017-06-22T00:18:44Z

How is it related to this issue?

thysultan · 2017-06-22T00:21:49Z

Yes, this is not for the bench, just a proof of concept of a compiler that generates this format from a string source, related issue #16

kof · 2017-06-22T10:37:26Z

Ok I have added this approach to the bench (see fromTwoTyptedArrays fn) https://esbench.com/bench/592d599e99634800a03483d8

I din't use Uint32Array because it slows it down a lot.

It is not too fast. It is possible that with a much larger test data it will show better relative performance, but mostly I think its a memory optimization, because we add more lookups when using this approach.

The "From Array of strings" test performs best so far.

thysultan · 2017-06-22T15:20:43Z

For the tests to be fair you'd need to exclude the creation of the arrays from the bench and put that into the setup phase otherwise you're also measuring the creating of the data structure, which in this case is meant to be a preprocess operation instead of a runtime operation.

kof · 2017-06-22T15:21:48Z

I do it for other tests as well, so its comparable

thysultan · 2017-06-22T15:27:09Z

Since the different methods use very different data-structures that are created on each run it would not be comparable since the creating of the data-structures is meant to be a compile time operation, this would include fromArrayOfStrings as well.

thysultan · 2017-06-22T15:32:52Z

You're also measuring the cost of creating the function on every operation i.e return (input) => { this could throw out any possibility that the VM could generate optimized code which means we might be just measuring a baseline(unoptimized) version that the VM could produce.

thysultan · 2017-06-22T15:48:38Z

For example this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html, *see the console.

It avoids all the points i mentioned above, showing that this bytecode format with a typedArray is faster.

1873 ops, from typedArray
87 ops, from stylis
1638 ops, fromSingleArrayOfStrings

kof · 2017-06-22T17:22:15Z

You're also measuring the cost of creating the function on every operation i.e return (input) => { this could throws out any possibility that the VM could generate optimized code which means we might be just measuring a baseline(unoptimized) version that the VM could produce.

Can you elaborate on this specific case?

kof · 2017-06-22T17:28:59Z

For example this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html, *see the console.

on my machine in chrome is "fromSingleArrayOfStrings" the fastest

thysultan · 2017-06-22T17:30:20Z

Can you elaborate on this specific case?

The operation is calling a function that returns another function, then useIt calls this function.

So, you're also measuring the cost of creating the function that converts the data-structure to a string which varies across implementations, this paired with the fact that it's being generated on every operations means that the function will also never get labeled as hot by the VM to produce optimized code since every run is both executing and creating a new function.

On my machine is "fromSingleArrayOfStrings" the fastest.

What where the results?

kof · 2017-06-22T17:40:52Z

The operation is calling a function that returns another function

Did you see that the first function is self invoking? I scoped the static internals inside of the closure. At bench time there is only one fn call.

// then useIt calls this function.

I am not sure it does what it should, but my idea was to avoid smart optimizations which might end up with doing nothing. I am still not sure if it does the job nor if useIt function was actually necessarily in this case, but in any case all tests use it so they are fair.

So, you're also measuring the cost of creating the function that converts the data-structure to a string which varies across implementations

I don't think so.

What where the results?

Just refreshed a couple of times, the results vary between 2500 and 3500, this time the "bytecode" variant was faster multiple times. I am not sure the way you measure is correct. Did you try benchmark.js? I guess they have bullet proof logic for measuring.

kof · 2017-06-22T17:46:59Z

Just noticed you are using "while" in one case and "for" in another, can you try making both implementations as similar as possible?

kof · 2017-06-22T17:49:03Z

Regarding not measuring the data generation time. I realize that we would have better parsing results when we not generate the data for each test, but this would be less realistic, because in the practice we would generate the data first, then parse, both steps are involved in the real world scenario, right?

kof · 2017-06-22T17:54:07Z

Regarding warming up and getting optimized hot code. I am not sure this is the right approach here. I am not sure that function will be called that often in a real world use case so that it becomes hot. Its not a webserver working same operations all the time. It will parse how many?, lets say 100 style sheets on average and its done. Also if done right, it will never parse all of them at once, but rather only those which are required for the current UI state.

For more realistic statistics we could count amount of css rules on popular websites.

thysultan · 2017-06-22T17:55:46Z

Did you see that the first function is self invoking?

I missed that :P

Just refreshed a couple of times, the results vary between 2500 and 3500

Same here.

I am not sure the way you measure is correct.

What part is incorrect?

Regarding not measuring the data generation time? I realize that we would have better parsing results when we not generate the data for each test, but this would be less realistic, because in the praxis we would generate the data first, then parse, both steps are involved in the real world scenario, right?

No, you would not do this on every operation at runtime, the data-structure would be created once at runtime. For example

// data-structure
class Main extends React.Component {}

// X number of operations
ReactDOM.render(<Main>, document.body)
ReactDOM.render(<Main>, document.body)

Just noticed you are using "while" in one case and "for" in another, can you try making both implementations as similar as possible?

Sure, i copied you're implementation of fromSingleArrayOfStrings in the ESBench to compare with that.

kof · 2017-06-22T17:57:56Z

What part is incorrect?

It seems correct, but very simplistic, I think jsperf did a lot of work to get better/more stable results.

No, you would not do this on every operation at runtime, the data-structure would be created once at runtime.

Yes but it would be also parsed just once I assume. After that its either something static or some models.

thysultan · 2017-06-22T18:00:26Z

In the bench it's being created on every operation, since the bench runs this operation > 10,000 times it affects the results.

kof · 2017-06-22T18:01:50Z

In the bench it's being created on every operation, since the bench runs this operation > 10,000 times it affects the results

You mean data generation for every test? Yes thats what I was trying to reproduce. I assume a test case includes data generation + parsing in the real world.

kof · 2017-06-22T18:03:20Z

We can make for sure separate tests with data generation and without. Just to be sure that data generation doesn't eat up most of the performance budget. But at the end, important is a test which has both, data structure and parsing. Basically if data generation costs too much time but parsing is faster it doesn't matter, important is whats faster when both is done.

kof · 2017-06-22T18:05:33Z

What might be really important is to do tests with more realistic data, meaning instead of 1 rule, using an average rule amount from some popular sites. Larger data sets might completely change the benchmark results.

thysultan · 2017-06-22T18:06:14Z

In the real world the data-structure should probably be created just once for the lifetime of the page, the toString implementation on the other hand could get executed more than once.

What might be really important is to do tests with more realistic data, meaning instead of 1 rule, using an average rule amount from some popular sites.

I agree.

kof · 2017-06-22T18:07:49Z

In the real world the data-structure should probably be create once for the lifetime of the page, the toString implementation on the other hand could get executed more than once.

My assumption is that this format is intermediate. Meaning it will be parsed only once and transformed to some other structure with models, optimized for updates. This is basically a a structure optimized for read operation.

So even if toString will be done more than once, the underlying models will handle that, not this format.

Though I might be wrong, we don't know that for sure. Its an assumption I was working with and would strongly depend on how it would be used in the wild.

thysultan · 2017-06-22T18:14:08Z

Meaning it will be parsed only once

Yes but the following is not what you want to happen, which is what the bench does.

while (i++ < 100) arr = [...] // operations

but rather

arr = []
while (i++ < 100) // operations

BTW, removed the warmup from this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html and changed the for to while to align both implementations.

392 ops fromIntTypedArray
6 ops stylis
132 ops fromSingleArrayOfStrings

kof · 2017-06-22T18:17:44Z

Yes but the following is not what you want to happen, which is what the bench does.

I thought thats exactly what I want to happen. Because I want each test to:

declare data
parse data

BTW, removed the warmup from this bench

Ok looks similar on my machine.

kof · 2017-06-22T18:19:31Z

If I remove data generation in each test and hoist to the top and suddenly one implementation becomes faster than another, it would simply mean that the data format was slowing one of them down, but thats what would happen in real life.

thysultan · 2017-06-22T18:25:31Z

thought thats exactly what I want to happen. Because I want each test to:

In a real implemented the data would be created once, the bench does not mimic this.

To compare it to react components the bench would look like

class Bench extends React.Component {
  render () {
    class Data extends React.Component { 
      render() {return null}
    }

    return <Data>
  }
}

vs how it should be implemented

class Data extends React.Component { 
  render() {return null}
}
class Bench extends React.Component {
  render () {
    return <Data>
  }
}

kof · 2017-06-22T18:27:31Z

In a real implemented the data would be created once

Yes and the parsing would also happen only once. Thats an assumption I work with. We agree to disagree at this point.

thysultan · 2017-06-22T18:31:20Z

It has nothing to do with parsing in this case, in the first example react will remove and create the component Data on every render, This is how a VM would handle it as well, the array will be created and destroyed(GC) on every run, which is an anti pattern in both cases.

kof · 2017-06-22T18:32:18Z

We are stuck. Anybody help.

kof · 2017-06-22T21:14:20Z

Ok we had a chat on gitter and now it became clear. The point that hasn't been clearly communicated is that the heap idea is using just one heap for the entire application. Meaning all sheets will be compiled out of one heap array. That means that every cycle in the bench doesn't need to contain both heap and the pointers map, it only needs pointers map.

This approach will probably show much better performance/memory footprint when there is a lot of rules/sheets.

geelen · 2017-06-23T00:08:45Z

Yep this sounds like a reasonable optimisation, but it doesn't need to change the semantics of the format itself. As I've said before, talk of optimisation seems pretty premature if we don't even know for sure the total scope of the format.

Once we have a working reference implementation, we can look design some more comprehensive benchmarks to better understand the performance tradeoffs. Personally, I favour a single inline data structure because it simplifies streaming responses and handling the injection of chunks of CSS due to dynamic values. Any kind of heap structure can be built up from that if needed at runtime.

For SC, I've built a pathological performance case (129KB jsx file) which is a conversion of the tachyons.io homepage (64KB html + 83KB css). There's nothing dynamic, I haven't refactored the CSS at all, it's just a simple stress test on a decent-sized DOM. It might be worth doing something similar (or even converting this page if your find/replace fu is strong) to test both parse & serialisation phase on a few hundred components at once.

kof · 2017-06-23T01:27:43Z

Yeah, I totally agree that we need to understand the scope and tradeoffs better. Though I would continue experiments with performance, because performance is the primary driving force for this project and also I am genuinely curious.

Also I think we need to refactor the bench to some more realistic scenario like your pathological case.

kof · 2017-06-23T01:30:19Z

Btw. @thysultan built the "standard" generator into stylus so that now we have a tool that can take a decent amount or regular CSS and produce for us a decent standard cssinjs array we can then use for benchmarks.

geelen · 2017-06-23T06:26:56Z

Btw. @thysultan built the "standard" generator into stylus

Wait, already? Is he... a wizard?

kof · 2017-06-23T17:51:21Z

@thysultan can you please produce ~100kb for each variant of the bench? If you post them here I will update. For e.g.

gaearon · 2017-06-24T14:43:09Z

cc @trueadm who's been thinking about similar things on React side and might offer some thoughts on formats

thysultan · 2017-06-24T15:50:07Z

@kof ATM the stylis plugin only handles converting to this proposed stack/heap format, will add support for the other formats in the bench.

kof · 2017-06-24T16:10:05Z

we should unify our bench efforts in one repo

…

On Jun 24, 2017 17:50, "Sultan Tarimo" ***@***.***> wrote: @kof <https://github.com/kof> ATM the stylis plugin only handles converting to this proposed stack/heap format, will add support for the other formats in the bench. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AADOWKv62H2cby-fykPQ_yPArHsZv0j4ks5sHTAwgaJpZM4OBhm8> .

kof · 2017-07-05T04:59:35Z

@geelen I was thinking regarding streamability and well, fully streamable it can be only without javascript in it, because functions etc are no good for this. We never intended this format to be fully serialized as it is on top of js. On the other hand, heap can be transfered first similar to http headers, and then the body with all the refs. So yeah header needs to be fully loaded before body can be parsed in chunks. Maybe it is not that bad, considered header would contain only unique values.

I think if this gives us serious perf advantages, we should go for it. It will also make up a bit for the many comas and reduce the payload.

Also I can't think of any applications for a streaming parser for this. We will most probably embed this format into js files or into js bundle.

kof · 2017-10-08T11:23:37Z

I think that the heap optimization could be even done upon consumption, at build stage. Only consumer knows the full list of styles used in the bundle and can produce the a list of properties without any duplicates just once. For e.g. as a webpack plugin.

evan-scott-zocdoc · 2017-11-16T16:44:23Z

Reminds me a bit of atomic CSS: breaking apart the constituent parts into a unique'd lookup table of sorts. Cool technique!

thysultan changed the title ~~Explore a Stack/Heap approach.~~ Explore a Stack/Heap approach Jun 21, 2017

kof mentioned this issue Jul 17, 2017

Optimization using JSON.parse #31

Open

kof changed the title ~~Explore a Stack/Heap approach~~ Optimization: a Stack/Heap approach Oct 8, 2017

kof mentioned this issue Oct 8, 2017

Shorthands #8

Open

Optimization: a Stack/Heap approach #22

Optimization: a Stack/Heap approach #22

Comments

thysultan commented Jun 21, 2017 • edited Loading

kof commented Jun 21, 2017 • edited Loading

kof commented Jun 21, 2017

kof commented Jun 21, 2017 • edited Loading

thysultan commented Jun 21, 2017 • edited Loading

kof commented Jun 21, 2017

thysultan commented Jun 21, 2017

kof commented Jun 21, 2017

thysultan commented Jun 21, 2017 • edited Loading

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017

thysultan commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017 • edited Loading

thysultan commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017

kof commented Jun 22, 2017

thysultan commented Jun 22, 2017 • edited Loading

kof commented Jun 22, 2017

thysultan commented Jun 22, 2017

kof commented Jun 22, 2017

kof commented Jun 22, 2017 • edited Loading

geelen commented Jun 23, 2017

kof commented Jun 23, 2017

kof commented Jun 23, 2017 • edited Loading

geelen commented Jun 23, 2017

kof commented Jun 23, 2017 • edited Loading

gaearon commented Jun 24, 2017

thysultan commented Jun 24, 2017

kof commented Jun 24, 2017 via email

kof commented Jul 5, 2017 • edited Loading

kof commented Oct 8, 2017 • edited Loading

evan-scott-zocdoc commented Nov 16, 2017

thysultan commented Jun 21, 2017 •

edited

Loading

kof commented Jun 21, 2017 •

edited

Loading

kof commented Jun 21, 2017 •

edited

Loading

thysultan commented Jun 21, 2017 •

edited

Loading

thysultan commented Jun 21, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

thysultan commented Jun 22, 2017 •

edited

Loading

kof commented Jun 22, 2017 •

edited

Loading

kof commented Jun 23, 2017 •

edited

Loading

kof commented Jun 23, 2017 •

edited

Loading

kof commented Jul 5, 2017 •

edited

Loading

kof commented Oct 8, 2017 •

edited

Loading