Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a different way to store MCP reports instead of orderable child nodes #2507

Closed
Joscorbe opened this issue Jan 12, 2021 · 5 comments
Closed

Comments

@Joscorbe
Copy link

Joscorbe commented Jan 12, 2021

We detected a case of the execution of MCP that generated a huge report of more than 30k rows. The report is generated using orderable child nodes, which are extremely inefficient when the amount of nodes is too big.

When a node is added or removed, the list is written entirely, updating the sequence of the orderable child nodes in a hidden property. This means, the bigger the list is, the more expensive the operation gets. When this happens, the previous list becomes garbage, which will be cleanup eventually by the Revision Garbage Collector.

Our proposal is to store the report in a different way, avoiding the creation of this content structure, for example a binary property, that would be a lot more efficient.

In our example, the revision garbage collection was failing with a timeout because with the amount of garbage it was unable to make any progress. We needed to scale up the to make it pass and clean up.

@davidjgonzalez
Copy link
Contributor

@Joscorbe would you be interested in contributing? The original MCP author isn't active in this project anymore :(

@joerghoh joerghoh self-assigned this Jun 21, 2021
@joerghoh
Copy link
Contributor

The naive approach of replacing the nt:unstructured nodetype in the row node by oak:unstructured is not going to work, because otherwise the resulting report has a changed (random) order. As the nodes below rows (which contain the actual row data) are numbered, it would be possible to reconstruct the order from there.

But as suggested it's probably more efficient not to store the report in this format alltogether, but rather store the resulting file.

@joerghoh
Copy link
Contributor

#2669 for the PR

davidjgonzalez added a commit that referenced this issue Aug 31, 2021
* #2507 add GenericBlobReport to save reports in a more efficient way

Co-authored-by: david g <[email protected]>
Co-authored-by: Konrad Windszus <[email protected]>
@stale
Copy link

stale bot commented Jan 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 9, 2022
@Joscorbe
Copy link
Author

I'm closing this issue, since it seems resolved (and tagged wrongly as wontfix).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants