Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ES6 collections for lookups in crawl(). #212

Merged
merged 2 commits into from
Feb 4, 2021

Conversation

haggholm
Copy link
Contributor

@haggholm haggholm commented Feb 3, 2021

In a very large JSON schema (>7MiB), it was found that the overwhelming majority of time was spent on array lookups of the form
if (arr.indexOf(something) !== -1), i.e. existence checks. Changing a few lookup structures to sets and a map, with O(1) lookup time rather than O(N) array lookup, provided a 10x speedup.

Unfortunately, the schema is proprietary and cannot easily be shared.

In a very large JSON schema (>7MiB), it was found that the overwhelming
majority of time was spent on array lookups of the form
`if (arr.indexOf(something) !== -1)`, i.e. existence checks. Changing
a few lookup structures to sets and a map, with O(1) lookup time rather
than O(N) array lookup, provided a 10x speedup.

Unfortunately, the schema is proprietary and cannot easily be shared.
@philsturgeon
Copy link
Member

Calling in the big guns on this one. @P0lip what say you.

Is there any way we could see the schema if an NDA is signed or something? We need test cases and the maintainers here (many of us from Stoplight) have no interest in doing anything sneaky. We like our job! :D

@haggholm haggholm changed the title Use ES6 collections for lookups in crawl(). WIP: Use ES6 collections for lookups in crawl(). Feb 3, 2021
@haggholm
Copy link
Contributor Author

haggholm commented Feb 3, 2021

Just realised a test is failing and it’s not currently working properly. I still think it’s a valid effort, but obviously I need to fix it. I’d cancel the PR and create a new one if you hadn’t already responded with a question.

Is there any way we could see the schema if an NDA is signed or something? We need test cases and the maintainers here (many of us from Stoplight) have no interest in doing anything sneaky. We like our job! :D

Er, I can look into that; as an alternative, I might try to whip up a tool to anonymise the schema (changing all the property names to something random but consistent, say).

@philsturgeon
Copy link
Member

No need to cancel anything lets just get it working. 🙌🏻

Either one! Anonymised would be amazing because we can try and distill a test case out of it.

@haggholm
Copy link
Contributor Author

haggholm commented Feb 3, 2021

I’ve fixed the error (in both your test suite and my own use case). Unfortunately it seems I can’t hand the schema over untouched, at least not without undergoing Bureaucracy… Well, you can look at the changes (which are after all pretty minimal) and decide whether it’s Good Enough without a big hulking schema, or I will come back to this and try to find either a sample I can share or a good way of anonymising the schema (the challenge being to keep $refs intact, which…well, when I’m submitting a PR to your repo, that seems pretty important).

@haggholm haggholm changed the title WIP: Use ES6 collections for lookups in crawl(). Use ES6 collections for lookups in crawl(). Feb 3, 2021
@P0lip
Copy link
Member

P0lip commented Feb 3, 2021

I think a similar issue was raised in #211, thus we already may have a nice file to verify against. It's tad smaller than yours, but still a significant one.
That said, I mitigated that perf cliff by avoiding processedObjects entirely.
I need to verify it against a few other specs to see whether it doesn't introduce a slowdown elsewhere, but so far it's been on par with 9.0.7, while still not introducing any cliff.
The commit is here stoplightio@55dee7e.

I thought about using collections when that PR was introduced first, but I'm a bit hesitant when it comes to collections, because we aim to support IE11 (yeah, I know), and collections are not quite completely implemented in that browser.
Although the usage we deal with here is likely to be perfectly fine in case of IE11, I still have some doubts.

Would you mind verifying the commit above to see if it improves the situation on your front?

I'm all in for merging this PR if we decide to drop support for IE11.

@haggholm
Copy link
Contributor Author

haggholm commented Feb 3, 2021

Would you mind verifying the commit above to see if it improves the situation on your front?

No problem. And yes, that seems to improve my situation a great deal, too. Probably a little more than my fix (well, mine was pretty crude since I don’t know the internals), though it’s hard to be sure (my test isn’t that clean and precise, as I was looking for order-of-magnitude improvements, so it has a lot of other moving pieces to distract me).

I suppose there might still be value in the other ES6 parts (parents and dereferencedCache)—my tests are too imprecise to tell—but not a very great deal.

What are the odds of seeing that branch in a release any time soon?

@P0lip
Copy link
Member

P0lip commented Feb 4, 2021

I suppose there might still be value in the other ES6 parts (parents and dereferencedCache)—my tests are too imprecise to tell—but not a very great deal.

Yeah, totally. I'm in for bringing that in.

What are the odds of seeing that branch in a release any time soon?

To be honest, I have a feeling we may move on and merge your PR
We have a test suite running in IE11, thus if this PR introduces any regression there, it should be caught.

@P0lip P0lip changed the base branch from master to beta February 4, 2021 11:23
@P0lip
Copy link
Member

P0lip commented Feb 4, 2021

@philsturgeon what's the default branch now? is it beta? I thought we changed it to beta, but for some reason GH still says it's master

@philsturgeon
Copy link
Member

master should always be the default, beta was just there for us to see if semantic release was working. We can kill that off of the feature works.

@P0lip P0lip changed the base branch from beta to master February 4, 2021 15:07
@P0lip
Copy link
Member

P0lip commented Feb 4, 2021

We should probably enable CI checks for each PR, if we can.
In any case, this is good to merge.

@philsturgeon
Copy link
Member

I'm not expecting beta to live long, i think we should delete it as soon as we're past this current phase of performance improvements, so people can try comparing the two, then we delete the branch and get back to master merging.

@philsturgeon
Copy link
Member

Sorry I misunderstood. We used to have checks enabled, I've put them back now.

@github-actions
Copy link

github-actions bot commented Jun 7, 2021

🎉 This PR is included in version 9.0.8 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants