Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance when saving and indexing processes, tasks, etc. #5368

Closed
thomaslow opened this issue Sep 28, 2022 · 0 comments · Fixed by #5371
Closed

Improve performance when saving and indexing processes, tasks, etc. #5368

thomaslow opened this issue Sep 28, 2022 · 0 comments · Fixed by #5371

Comments

@thomaslow
Copy link
Collaborator

tltr; I propose to use a queue to collect save operations within business logic and trigger saving multiple objects in one go at the end.

When saving things in Kitodo (especially processes and tasks), often a number of related entities are needed to be saved and indexed again. This can quickly get out of hand. Let's say there are 3 processes:

  1. Grand Parent process
  2. Parent process
  3. Child process

Updating a task in the child process also affects the parent and grand parent process. This is already handled in the save method of the ProcessService, which uses a recursive call to save parent processes. In addition, parent processes are saved twice (before saving the child and again after the child was saved). In total there are 9 actual database and indexing operations triggered in the above scenario. The grand parent process is saved 6 times, the parent process twice, the child once.

In addition, the business logic code is not very structured in this regard. Some methods call the BaseDAO.save methods ~whenever they like~. This makes sense when they are called independently, but not when they are called as part of bigger batch updates. For example, increasing the current task status (which involves re-assigning a new user to the next task, and potentially closing a prior task, etc.) triggers multiple calls to the save method of the same task. If this method would trigger a process save (as required for pull request #5360), you can get a coffee before anything is saved.

What can we do about it?

In this ticket I would like to propose the idea to adapt the general save-strategy in Kitodo in a way that allows to defer or enqueue save operations until the respective business logic is done, and afterwards trigger that all enqueued objects are saved and indexed in one go. In my opinion, this should greatly improve the performance when saving things and at the same time keep the convenience of "sprinkling" save calls into the business logic code wherever appropriate.

For each object type (process, task, etc.) there should be a queue containing "save"-intentions that are deduplicated (by object id). So, a recursive save of a process could easily generate 9 save-intentions for a grand parent process, but due to the deduplication, the process is actually only saved once when a final "save-and-index-all-queued-objects" call is triggered.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant