You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tltr; I propose to use a queue to collect save operations within business logic and trigger saving multiple objects in one go at the end.
When saving things in Kitodo (especially processes and tasks), often a number of related entities are needed to be saved and indexed again. This can quickly get out of hand. Let's say there are 3 processes:
Grand Parent process
Parent process
Child process
Updating a task in the child process also affects the parent and grand parent process. This is already handled in the save method of the ProcessService, which uses a recursive call to save parent processes. In addition, parent processes are saved twice (before saving the child and again after the child was saved). In total there are 9 actual database and indexing operations triggered in the above scenario. The grand parent process is saved 6 times, the parent process twice, the child once.
In addition, the business logic code is not very structured in this regard. Some methods call the BaseDAO.save methods ~whenever they like~. This makes sense when they are called independently, but not when they are called as part of bigger batch updates. For example, increasing the current task status (which involves re-assigning a new user to the next task, and potentially closing a prior task, etc.) triggers multiple calls to the save method of the same task. If this method would trigger a process save (as required for pull request #5360), you can get a coffee before anything is saved.
What can we do about it?
In this ticket I would like to propose the idea to adapt the general save-strategy in Kitodo in a way that allows to defer or enqueue save operations until the respective business logic is done, and afterwards trigger that all enqueued objects are saved and indexed in one go. In my opinion, this should greatly improve the performance when saving things and at the same time keep the convenience of "sprinkling" save calls into the business logic code wherever appropriate.
For each object type (process, task, etc.) there should be a queue containing "save"-intentions that are deduplicated (by object id). So, a recursive save of a process could easily generate 9 save-intentions for a grand parent process, but due to the deduplication, the process is actually only saved once when a final "save-and-index-all-queued-objects" call is triggered.
What do you think?
The text was updated successfully, but these errors were encountered:
When saving things in Kitodo (especially processes and tasks), often a number of related entities are needed to be saved and indexed again. This can quickly get out of hand. Let's say there are 3 processes:
Updating a task in the child process also affects the parent and grand parent process. This is already handled in the save method of the ProcessService, which uses a recursive call to save parent processes. In addition, parent processes are saved twice (before saving the child and again after the child was saved). In total there are 9 actual database and indexing operations triggered in the above scenario. The grand parent process is saved 6 times, the parent process twice, the child once.
In addition, the business logic code is not very structured in this regard. Some methods call the BaseDAO.save methods ~whenever they like~. This makes sense when they are called independently, but not when they are called as part of bigger batch updates. For example, increasing the current task status (which involves re-assigning a new user to the next task, and potentially closing a prior task, etc.) triggers multiple calls to the save method of the same task. If this method would trigger a process save (as required for pull request #5360), you can get a coffee before anything is saved.
What can we do about it?
In this ticket I would like to propose the idea to adapt the general save-strategy in Kitodo in a way that allows to defer or enqueue save operations until the respective business logic is done, and afterwards trigger that all enqueued objects are saved and indexed in one go. In my opinion, this should greatly improve the performance when saving things and at the same time keep the convenience of "sprinkling" save calls into the business logic code wherever appropriate.
For each object type (process, task, etc.) there should be a queue containing "save"-intentions that are deduplicated (by object id). So, a recursive save of a process could easily generate 9 save-intentions for a grand parent process, but due to the deduplication, the process is actually only saved once when a final "save-and-index-all-queued-objects" call is triggered.
What do you think?
The text was updated successfully, but these errors were encountered: