-
-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use ES6 collections for lookups in crawl(). #212
Use ES6 collections for lookups in crawl(). #212
Conversation
In a very large JSON schema (>7MiB), it was found that the overwhelming majority of time was spent on array lookups of the form `if (arr.indexOf(something) !== -1)`, i.e. existence checks. Changing a few lookup structures to sets and a map, with O(1) lookup time rather than O(N) array lookup, provided a 10x speedup. Unfortunately, the schema is proprietary and cannot easily be shared.
Calling in the big guns on this one. @P0lip what say you. Is there any way we could see the schema if an NDA is signed or something? We need test cases and the maintainers here (many of us from Stoplight) have no interest in doing anything sneaky. We like our job! :D |
Just realised a test is failing and it’s not currently working properly. I still think it’s a valid effort, but obviously I need to fix it. I’d cancel the PR and create a new one if you hadn’t already responded with a question.
Er, I can look into that; as an alternative, I might try to whip up a tool to anonymise the schema (changing all the property names to something random but consistent, say). |
No need to cancel anything lets just get it working. 🙌🏻 Either one! Anonymised would be amazing because we can try and distill a test case out of it. |
I’ve fixed the error (in both your test suite and my own use case). Unfortunately it seems I can’t hand the schema over untouched, at least not without undergoing Bureaucracy… Well, you can look at the changes (which are after all pretty minimal) and decide whether it’s Good Enough without a big hulking schema, or I will come back to this and try to find either a sample I can share or a good way of anonymising the schema (the challenge being to keep |
I think a similar issue was raised in #211, thus we already may have a nice file to verify against. It's tad smaller than yours, but still a significant one. I thought about using collections when that PR was introduced first, but I'm a bit hesitant when it comes to collections, because we aim to support IE11 (yeah, I know), and collections are not quite completely implemented in that browser. Would you mind verifying the commit above to see if it improves the situation on your front? I'm all in for merging this PR if we decide to drop support for IE11. |
No problem. And yes, that seems to improve my situation a great deal, too. Probably a little more than my fix (well, mine was pretty crude since I don’t know the internals), though it’s hard to be sure (my test isn’t that clean and precise, as I was looking for order-of-magnitude improvements, so it has a lot of other moving pieces to distract me). I suppose there might still be value in the other ES6 parts ( What are the odds of seeing that branch in a release any time soon? |
Yeah, totally. I'm in for bringing that in.
To be honest, I have a feeling we may move on and merge your PR |
@philsturgeon what's the default branch now? is it |
master should always be the default, beta was just there for us to see if semantic release was working. We can kill that off of the feature works. |
We should probably enable CI checks for each PR, if we can. |
I'm not expecting beta to live long, i think we should delete it as soon as we're past this current phase of performance improvements, so people can try comparing the two, then we delete the branch and get back to master merging. |
Sorry I misunderstood. We used to have checks enabled, I've put them back now. |
🎉 This PR is included in version 9.0.8 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
In a very large JSON schema (>7MiB), it was found that the overwhelming majority of time was spent on array lookups of the form
if (arr.indexOf(something) !== -1)
, i.e. existence checks. Changing a few lookup structures to sets and a map, with O(1) lookup time rather than O(N) array lookup, provided a 10x speedup.Unfortunately, the schema is proprietary and cannot easily be shared.