Performance when Generating Newspaper Processes #5093
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related and possibly fixed issues:
The problem is caused by a slow indexing performance of projects in ElasticSearch. The following snippet shows how a project with many processes is indexed at the moment:
For large projects with many >10.000 of processes, the search itself is still fast (ElasticSearch reports a query time of 2ms), howewer the json document becomes huge and will take a lot of time to parse. During the newspaper generating process, the corresponding project is saved repeatedly after generating a new newspaper process, so potentially hundereds of times, each time the project is indexed again with a single new entry added to the list of processes.
The proposed solution removes the list of processes from the index mapping, and replaces it with a property
hasProcesses
, which is required to allow to disable the "delete project icon" when showing a list of projects.A disadvantage of this solution is that projects can no longer be searched for based on the name of their processes. As of my knowledge, the user interface currently does not support a keyword-based search for projects anyway. However, if it is desired to support such a search scenerio in the future, the next best solution would be to improve the newspaper generation task to not save a project multiple times while generating new processes (which has problems as well, e.g., an inconsistent database during the generation task).
A positive side effect of this solution is that both the indexing and query time of projects has improved, which improves the performance when re-indexing all projects, or even loading the dashboard user interface.
@solth
This might also be a problem in the Hibernate-Search branch, see line:
effective-webwork@1a5de7d#diff-5f7246e37b075cd985e0d6b3bdeeb919b5368811377ab0a3517df08897f3c576R111