Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terrible performance with big anyOf field #3692

Closed
4 tasks done
michal-kurz opened this issue May 24, 2023 · 23 comments · Fixed by #3721 or #3845
Closed
4 tasks done

Terrible performance with big anyOf field #3692

michal-kurz opened this issue May 24, 2023 · 23 comments · Fixed by #3721 or #3845
Labels
any-one-all-of Related to fixing anyOf, oneOf or allOf bug help wanted performance

Comments

@michal-kurz
Copy link
Contributor

michal-kurz commented May 24, 2023

Prerequisites

What theme are you using?

core

Version

5.7.0

Current Behavior

Since upgrade from v2 to v5, we're having a lot of trouble with performance on one of our bigger forms. The whole schema is a single oneOf field, with 86 different object options, and about 6000 lines total.

The form is basically unusable when typing, because validator.isValid() calls from inside Form->getDefaultFormState()->getClosestMatchingOption() and MuiSchemaField->componentDidUpdate()->getClosestMatchingOption() are clogging up the main thread completely:

image

I was hoping to solve those problems by using the discriminator property, as hinted in this comment, but it isn't making to be making any noticable difference - because isValid still gets called, even when using discriminant:

if (discriminatorField && has(option, [PROPERTIES_KEY, discriminatorField])) {
  const value = get(formData, discriminatorField);
  const discriminator = get(option, [PROPERTIES_KEY, discriminatorField], {});

  if (validator.isValid(discriminator, value, rootSchema)) {
    return i;
  }
}

I'm not sure why this is clogging the thread so badly even on strong computers, but it does. Not sure if its a bug, but it may very well be, since liveValidate runs completely OK on the form - and I don't see a reason why pre-selecting an option should be so much more demanding than liveValidating the whole form.

This modification removes my performance problems completely, although it's certainly not a suitable complete solution: https://github.com/michal-kurz/react-jsonschema-form/pull/2/files

Is there a bug with ajv validators? If not, is it appropriate to run this demanding validation in this manner? What do you think should be done about it, if anything?

Expected Behavior

No response

Steps To Reproduce

Paste this schema into playground - this is our real schema, with values replaced by hashes (to strip potentially sensitive information)

Environment

No response

Anything else?

No response

@michal-kurz michal-kurz added bug needs triage Initial label given, to be assigned correct labels and assigned labels May 24, 2023
@michal-kurz
Copy link
Contributor Author

michal-kurz commented May 24, 2023

There is also one very related concern we have, although it has nothing to do with performance:

Sometimes it happens to us, that the server-stored data for oneOf/anyOf fields become schema-invalid over time. This happens when
a. we alter the schema (for example adding a new required property) without migrating server-stored values, or
b. we modify the structure by means outside of rjsf (this happens very often to us, and we allow this by design)

In those cases, rjsf sometimes doesn't know which item to try matching (since all are invalid), and the wrong item gets displayed as selected in the oneOf/anyOf field.

We would absolutely love to have a "manual discriminator" mode, where discriminator would be the ultimate identifier of selected items. In this mode, we would be responsible for making absolutely sure that every item on the list has a unique discriminator (we're willing to accept the field breaking completely if we mess this up), but we would have the certainty of items always matching what we need, regardless of their validity.

It seems to me that implementing such "manual discriminator" mode could solve this issue by itself (by a mechanism very similar to the PR i linked above). I'm willing to implement it, because we've been struggling with this greatly, and those problems with wrong selected options keep coming back - we have spent well over 40 hours on tickets related to this in the last year alone.

@JoeTrixta
Copy link

Hi @michal-kurz thanks so much for bringing this up - we are facing the same issue when it comes to any complex form schema since the release change from anything over v4.2.3.

Ive checked out and built the branch you speak of and linked to a local example to check it works, wondering what your doing for the local discriminator? Or what your current discriminatorField field looks like?

Are you using OpenAPI? If so could you confirm your process? Trying to use this as a temp fix

@michal-kurz
Copy link
Contributor Author

michal-kurz commented May 24, 2023

Hey, @JoeTrixta !

Take everything I say here (including code) with a grain of salt, I'm not all that confident in what I'm doing. I'm also not very sure I understand your questions, but I'll try to answer you best I can, and we can go from there :)

I am using RJSF (react-jsonschema-form), which seems to implement some parts of the OpenAPI specification. So I am implicitly using OpenAPI in this regard, but not in any other way.

Discriminator seems to be one part of the OpenAPI specification adopted by RJSF. It works by manually specifying the name of a property which is unique for every item, and thus can be used to discern or "discriminate" different item types ( = tell apart from each other unambiguosuly).

You can see it in the schema I provided. By stating in my schema:

"discriminator": {
    "propertyName": "uniq_id"
}

I am specifying that uniq_id is my unique discriminating property. Then I add this property with unique value on every item.

Then, in the PR, I'm making a modification where if discriminating property is present, all other ways of finding the matching option are ignored. This way I skip the expensive validator.isValid() calls which kill my performance. It is important to note that if I declare discriminating property on the onyOf field, and then forget to specify the property on an option item, this option will never get matched.

The code in my PR is a proof of concept only. You definitely shouldn't use it in production without understanding what it does, and making sure it doesn't break anything - it probably does! But if you want to tinker with it to make it safe and usable, I believe this would be fairly easy to do - both my change and the surrounding function are quite straight-forward and self-contained.

@michal-kurz
Copy link
Contributor Author

Hey, @JoeTrixta, I'm back with more - I slept on it, and decided to move forward with this concept in our code-base asap, because it solves a high-priority bug in our codebase steming from this bug in rjsf.

I made a new PR - it functions extremely similarly to my original one. It tries to force-match first option which has the same discriminant as formData. If it fails to do that, it falls back to the default rjsf logic.

It may break some more advanced properties/usages of discriminant (hard to say, as discriminant is a very new feature, and is not yet documented) - you're probably not using those, and neither are we, but it's something that should be mentioned.

I will patch my changes into our project, and pass it to QA to test. If we don't find any issues, I will try to deploy this to production tomorrow. I think it's fairly safe for you to do the same, if you need to. If it's a nice-to-have for you, then you'll probably be better of just by waiting to see if we end up merging this mechanism to the library itself.

@heath-freenome
Copy link
Member

@michal-kurz I believe that what you are doing is very similar to what is happening in the getFirstMatchingOption() code. It probably makes sense for RJSF to provide a flag that allows users to go with the less precise getFirstMatchingOption() over the getClosestMatchingOption in situations like yours. That said, while what you provided in your PR works fine for enum based choices, it would not work when someone is using the const/title approach.

If you have the band-width we could talk about how to fix this in a more reusable way for every user of the library.

@heath-freenome heath-freenome added help wanted performance any-one-all-of Related to fixing anyOf, oneOf or allOf and removed needs triage Initial label given, to be assigned correct labels and assigned labels May 26, 2023
@evshi
Copy link
Contributor

evshi commented Jun 1, 2023

@michal-kurz do you also have any recursive refs? We're seeing something similar but our anyOf list is not large, a few dozen at most.

@heath-freenome from what I can tell, the issue seems to be that the schemas need to be compiled for each iteration or recursion, which is a pretty costly operation. The ajv cache is only leveraged if the schema has an $id field. Perhaps it's reasonable to cache by the schema hash here if there is no $id field? I have some local changes that seem to resolve the issue for me.

Also this is likely a separate issue, but since object schemas are augmented by getMatchingOption, I wasn't able to work around this with precompiled validators since there is no fallback if the schemas are changed at runtime.

@heath-freenome
Copy link
Member

@evshi Hmmm, now that I've built a schema hash function for precompiled schemas I think it may be possible to improve the AJV caching for the regular AJV8 validator. If I wasn't slammed in my regular job I'd do it myself. @evshi I'm curious what you came up with.

@heath-freenome
Copy link
Member

Also this is likely a separate issue, but since object schemas are augmented by getMatchingOption, I wasn't able to work around this with precompiled validators since there is no fallback if the schemas are changed at runtime.

@evshi The goal of the precompiled schema was to pre-run this augmentation and generate the precompiled versions. Are you seeing a different behavior? Is there an example where this isn't working? If so, please share it so that we can fix things.
Thanks

@evshi
Copy link
Contributor

evshi commented Jun 5, 2023

@heath-freenome here's a draft pr with my suggested changes to improve performance: #3721

I'll create a separate issue for the gaps between the precompile script and runtime for anyOf/allOf validation. Just to summarize quickly here:

  1. the precompile script doesn't visit subschemas under $defs, leading to broken refs
  2. the precompile script doesn't make the same augmentations to subschemas in getMatchingOption, which means the hashes don't match when looking up precompiled validators

@michal-kurz
Copy link
Contributor Author

@heath-freenome Hey Heath, sorry for the delay. I didn't really have the band-with to engage with this, but I have some now.

@evshi, I see you made a PR which might also help with the performance - If I understand it correctly, it might help with schemas without an $id prop. Unfortunately, the schema I ran into problems with does seem to have an $id - does that mean I should not expect any performance improvement from your enhancement?

If so, I'm definitely down for making a legitimate fix and streaming it into the main repo, @heath-freenome - right now I'm running on a patched dist of RJSF, which is far from ideal :)

I know my PR was pretty much a single-use-case hotfix, but I'm not really sure what you mean by "const/title approach" - do you mean approach to defining questions, or approach to defining a discriminator? Do you think it would be possible for me to join a slack channel with you, or something of that sorts, for more flexible communication?

Thanks :)

@michal-kurz
Copy link
Contributor Author

@evshi @heath-freenome Also, this discussion about $id brings me to one more thing. Our schemas jsons do have $id props, be we often slightly modify them at runtime before passing into <Form /> based on the use-case/formData. This means that sometimes we probably modify the schema on an already rendered form, but both differing versions of schema share the same $id - is this an anti-pattern, and a possible source bugs?

@michal-kurz
Copy link
Contributor Author

@michal-kurz do you also have any recursive refs? We're seeing something similar but our anyOf list is not large, a few dozen at most.

No @evshi, we don't have any recursive refs. But our schema is quite big

@michal-kurz
Copy link
Contributor Author

@nickgros Hey, why have you closed this? Afaik this is not completed! :(

@heath-freenome
Copy link
Member

heath-freenome commented Jun 16, 2023 via email

@nickgros nickgros reopened this Jun 16, 2023
@nickgros
Copy link
Contributor

Yep, was not intentional, sorry. It's because it was linked to #3721

@michal-kurz
Copy link
Contributor Author

Oh, I see! Thanks for re-opening! ❤️

@evshi
Copy link
Contributor

evshi commented Jun 17, 2023

@michal-kurz looking at your schema, it may still be helpful because your oneOf subschemas do not have $id properties so they are not automatically cached by ajv. As I understand it, getMatchingOption will check the validity of each formData against each subschema. If you have many data of the same oneOf schema, it can still save you performance by caching each subschema.

@marecica2
Copy link

Hi there, i also spend some time on this issue and found a suspect. When i reverted this commit #3575 the lag basically disappeared. From the pr it wasnt even clear to me enough why that change and if there is some side-effect

@heath-freenome
Copy link
Member

That fix was made to support the precompiled validators changes that went in later. There was an optimization done in #3721 that should have reduce the performance drop cause by the original change

@marecica2
Copy link

Thanks for reply @heath-freenome. Initially tried the discriminator approach also, but with no significant improvement. It was caching main schema either by $id or hasdh but somewhere inside recursion it was validating some partial subschemas based on selection of matching options of oneOf array options and it was just too many of them each time.
I will try to share to playground sample.

@heath-freenome
Copy link
Member

@marecica2 , @michal-kurz is working on an optimization based on the discriminator approach that hopefully speeds things up.

michal-kurz added a commit to michal-kurz/react-jsonschema-form that referenced this issue Aug 29, 2023
…() calls in getMatchingOption and getClosestMatchingOption.ts for improved performance
@michal-kurz
Copy link
Contributor Author

I just opened a PR: #3845 :)

heath-freenome pushed a commit that referenced this issue Oct 5, 2023
…in getMatchingOption and getClosestMatchingOption for improved performance (#3845)

* fix: fixes #3692 - simple discriminator now bypasses isValid() calls in getMatchingOption and getClosestMatchingOption.ts for improved performance

* chore: update changelog

* docs: add jsdoc to getOptionMatchingSimpleDiscriminator

* test: add missing test for getFirstMatchingOption non-trivial discriminator
@igorbrasileiro
Copy link
Contributor

Is there a recommended approach to adding $id in the schema? I added $id to all my definitions, and the profiling results showed improved performance.

After adding $id to definitions: Two largest tasks: ~1.15s and ~750ms
After

Before adding $id: Two largest tasks: ~3.16s and ~2.55s
Before

However, after introducing $id, rjsf seems to enter a loop when clicking on an anyOf property that's deeply nested (not at the root level) in the form. Has anyone encountered this issue, or is there a known workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment