-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: a Stack/Heap approach #22
Comments
Ah I got what you mean - using separate arrays for markers and props/values/selectors! |
Thats an interesting idea, we should add a benchmark for this as well. |
We could in theory end up with 3 arrays. Function refs, strings and markers. Each of them would be of one consistent type and markers even integers. |
It could also mean we could reuse selectors, properties and property values. i.e body {
color: red;
}
h1 {
color: red;
} Would produce HEAP = ['body', 'color', 'red', 'h1']
[
RULE_START, 0,
SELECTOR, 0,
PROPERTY, 1,
VALUE, 2,
RULE_END, 2,
RULE_START, 3,
SELECTOR, 3,
PROPERTY, 1,
VALUE, 2,
RULE_END, 3
] |
Oh so the HEAP array would become a collection of unique strings, neat! Not sure though how much it will bring after gzip, but def. worth considering! |
Yes gzip might make the KB benefit a hard sell but It should improve the memory that the runtime JS VM has to allocate and code it has to parse. |
Yeah, if it won't slow down parsing, definitely worth optimizing! |
WIP of a compiler that produces this from css strings. https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html |
How is it related to this issue? |
Yes, this is not for the bench, just a proof of concept of a compiler that generates this format from a string source, related issue #16 |
Ok I have added this approach to the bench (see fromTwoTyptedArrays fn) https://esbench.com/bench/592d599e99634800a03483d8 I din't use Uint32Array because it slows it down a lot. It is not too fast. It is possible that with a much larger test data it will show better relative performance, but mostly I think its a memory optimization, because we add more lookups when using this approach. The "From Array of strings" test performs best so far. |
For the tests to be fair you'd need to exclude the creation of the arrays from the bench and put that into the setup phase otherwise you're also measuring the creating of the data structure, which in this case is meant to be a preprocess operation instead of a runtime operation. |
I do it for other tests as well, so its comparable |
Since the different methods use very different data-structures that are created on each run it would not be comparable since the creating of the data-structures is meant to be a compile time operation, this would include |
You're also measuring the cost of creating the function on every operation i.e |
For example this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html, *see the console. It avoids all the points i mentioned above, showing that this bytecode format with a typedArray is faster.
|
Can you elaborate on this specific case? |
on my machine in chrome is "fromSingleArrayOfStrings" the fastest |
The operation is calling a function that returns another function, then So, you're also measuring the cost of creating the function that converts the data-structure to a string which varies across implementations, this paired with the fact that it's being generated on every operations means that the function will also never get labeled as hot by the VM to produce optimized code since every run is both executing and creating a new function.
What where the results? |
Did you see that the first function is self invoking? I scoped the static internals inside of the closure. At bench time there is only one fn call. // then useIt calls this function. I am not sure it does what it should, but my idea was to avoid smart optimizations which might end up with doing nothing. I am still not sure if it does the job nor if
I don't think so.
Just refreshed a couple of times, the results vary between 2500 and 3500, this time the "bytecode" variant was faster multiple times. I am not sure the way you measure is correct. Did you try benchmark.js? I guess they have bullet proof logic for measuring. |
Just noticed you are using "while" in one case and "for" in another, can you try making both implementations as similar as possible? |
Regarding not measuring the data generation time. I realize that we would have better parsing results when we not generate the data for each test, but this would be less realistic, because in the practice we would generate the data first, then parse, both steps are involved in the real world scenario, right? |
Regarding warming up and getting optimized hot code. I am not sure this is the right approach here. I am not sure that function will be called that often in a real world use case so that it becomes hot. Its not a webserver working same operations all the time. It will parse how many?, lets say 100 style sheets on average and its done. Also if done right, it will never parse all of them at once, but rather only those which are required for the current UI state. For more realistic statistics we could count amount of css rules on popular websites. |
I missed that :P
Same here.
What part is incorrect?
No, you would not do this on every operation at runtime, the data-structure would be created once at runtime. For example
Sure, i copied you're implementation of |
It seems correct, but very simplistic, I think jsperf did a lot of work to get better/more stable results.
Yes but it would be also parsed just once I assume. After that its either something static or some models. |
In the bench it's being created on every operation, since the bench runs this operation > 10,000 times it affects the results. |
You mean data generation for every test? Yes thats what I was trying to reproduce. I assume a test case includes data generation + parsing in the real world. |
We can make for sure separate tests with data generation and without. Just to be sure that data generation doesn't eat up most of the performance budget. But at the end, important is a test which has both, data structure and parsing. Basically if data generation costs too much time but parsing is faster it doesn't matter, important is whats faster when both is done. |
What might be really important is to do tests with more realistic data, meaning instead of 1 rule, using an average rule amount from some popular sites. Larger data sets might completely change the benchmark results. |
In the real world the data-structure should probably be created just once for the lifetime of the page, the toString implementation on the other hand could get executed more than once.
I agree. |
My assumption is that this format is intermediate. Meaning it will be parsed only once and transformed to some other structure with models, optimized for updates. This is basically a a structure optimized for read operation. So even if toString will be done more than once, the underlying models will handle that, not this format. Though I might be wrong, we don't know that for sure. Its an assumption I was working with and would strongly depend on how it would be used in the wild. |
Yes but the following is not what you want to happen, which is what the bench does. while (i++ < 100) arr = [...] // operations but rather arr = []
while (i++ < 100) // operations BTW, removed the warmup from this bench https://rawgit.com/thysultan/stylis.js/master/plugins/bytecode/tests/index.html and changed the
|
I thought thats exactly what I want to happen. Because I want each test to:
Ok looks similar on my machine. |
If I remove data generation in each test and hoist to the top and suddenly one implementation becomes faster than another, it would simply mean that the data format was slowing one of them down, but thats what would happen in real life. |
In a real implemented the data would be created once, the bench does not mimic this. To compare it to react components the bench would look like class Bench extends React.Component {
render () {
class Data extends React.Component {
render() {return null}
}
return <Data>
}
} vs how it should be implemented class Data extends React.Component {
render() {return null}
}
class Bench extends React.Component {
render () {
return <Data>
}
} |
Yes and the parsing would also happen only once. Thats an assumption I work with. We agree to disagree at this point. |
It has nothing to do with parsing in this case, in the first example react will remove and create the component |
We are stuck. Anybody help. |
Ok we had a chat on gitter and now it became clear. The point that hasn't been clearly communicated is that the heap idea is using just one heap for the entire application. Meaning all sheets will be compiled out of one heap array. That means that every cycle in the bench doesn't need to contain both heap and the pointers map, it only needs pointers map. This approach will probably show much better performance/memory footprint when there is a lot of rules/sheets. |
Yep this sounds like a reasonable optimisation, but it doesn't need to change the semantics of the format itself. As I've said before, talk of optimisation seems pretty premature if we don't even know for sure the total scope of the format. Once we have a working reference implementation, we can look design some more comprehensive benchmarks to better understand the performance tradeoffs. Personally, I favour a single inline data structure because it simplifies streaming responses and handling the injection of chunks of CSS due to dynamic values. Any kind of heap structure can be built up from that if needed at runtime. For SC, I've built a pathological performance case (129KB jsx file) which is a conversion of the tachyons.io homepage (64KB html + 83KB css). There's nothing dynamic, I haven't refactored the CSS at all, it's just a simple stress test on a decent-sized DOM. It might be worth doing something similar (or even converting this page if your find/replace fu is strong) to test both parse & serialisation phase on a few hundred components at once. |
Yeah, I totally agree that we need to understand the scope and tradeoffs better. Though I would continue experiments with performance, because performance is the primary driving force for this project and also I am genuinely curious. Also I think we need to refactor the bench to some more realistic scenario like your pathological case. |
Btw. @thysultan built the "standard" generator into stylus so that now we have a tool that can take a decent amount or regular CSS and produce for us a decent standard cssinjs array we can then use for benchmarks. |
Wait, already? Is he... a wizard? |
@thysultan can you please produce ~100kb for each variant of the bench? If you post them here I will update. For e.g. |
cc @trueadm who's been thinking about similar things on React side and might offer some thoughts on formats |
@kof ATM the stylis plugin only handles converting to this proposed stack/heap format, will add support for the other formats in the bench. |
we should unify our bench efforts in one repo
…On Jun 24, 2017 17:50, "Sultan Tarimo" ***@***.***> wrote:
@kof <https://github.com/kof> ATM the stylis plugin only handles
converting to this proposed stack/heap format, will add support for the
other formats in the bench.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADOWKv62H2cby-fykPQ_yPArHsZv0j4ks5sHTAwgaJpZM4OBhm8>
.
|
@geelen I was thinking regarding streamability and well, fully streamable it can be only without javascript in it, because functions etc are no good for this. We never intended this format to be fully serialized as it is on top of js. On the other hand, heap can be transfered first similar to http headers, and then the body with all the refs. So yeah header needs to be fully loaded before body can be parsed in chunks. Maybe it is not that bad, considered header would contain only unique values. I think if this gives us serious perf advantages, we should go for it. It will also make up a bit for the many comas and reduce the payload. Also I can't think of any applications for a streaming parser for this. We will most probably embed this format into js files or into js bundle. |
I think that the heap optimization could be even done upon consumption, at build stage. Only consumer knows the full list of styles used in the bundle and can produce the a list of properties without any duplicates just once. For e.g. as a webpack plugin. |
Reminds me a bit of atomic CSS: breaking apart the constituent parts into a unique'd lookup table of sorts. Cool technique! |
Instead of
producing
It could instead produce
Where the ints
0,1,2
are memory addresses for the corresponding values in theHEAP
.This way you can use a flat
Uint32Array
typed array for maximum throughput on performance.The text was updated successfully, but these errors were encountered: