-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Process Table Columns Sortable #5360
Make Process Table Columns Sortable #5360
Conversation
Columns that were previously calculated on-the-fly are now added to the elastic search index and calculated at indexing time instead, e.g., the user name of the user that last worked on a task of the process. The code that calculated these information has been moved to the Kitodo-DataManagment project in order to be available at indexing time.
* @return the list of tasks of the process (and potentially its children) | ||
*/ | ||
private static List<Task> getListOfTasksForProgressCalculation(Process process) { | ||
private static List<Task> getListOfTasksForProgressCalculation(Process process, Boolean considerChildren) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using nullable Boolean
class instead of primitive boolean
as the values should only be true
or false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Henning, I didn't really think about it. Fixed it below.
@solth In order for the process status to be sortable, it needs to be kept up-to-date in the ElasticSearch index whenever a task status (OPEN, DONE, INWORK, LOCKED) is changed. This requires that the accompanying process is re-indexed whenever a task status is changed. This conflicts with a pull request from last year, see #4543, which presumably improved performance when saving tasks. How do we deal with this?
Edit: the same applies to "lastEditingUser", which of course also depends on changes to tasks of a process |
…s via a custom sql query.
…ery for task progress calculation.
…alculation to improve compatibility with h2database.
…d by h2 database.
The performance problem mentioned in the previous comment is fixed now by calculating the process status via a custom SQL query. This SQL query allows to efficiently count how many tasks have a certain state (open, locked, inwork, done) even for parent processes that may have hundreds of child tasks (e.g. newspaper processes). Previously, this counting would take more than 1 second for parent processes with many children. The SQL query uses an SQL statement ( In case the SQL query does not work, there is a chance that the resulting exception is intercepted and the previous bean-based status calculation is used as a fallback. However, this depends on how the query fails. I also added a selenium test to check that sorting processes by state works. |
…orted by state and duration.
Merged with master so github only shows code changes to current master. |
So that means re-indexing processes whenever a task is saved does not impose any performance problems anymore? Can you elaborate a little more how using a recursive SQL query helps in this case? If I am not mistaken, #4543 improved performance by reducing read/write operations on the index, not on the database. How does an optimised SQL query have an effect on that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this pull request. I tested it and it works very well with the small list of processes I have on my local development system. I cannot say whether the changes concerning indexing have any negative impact on the performance on larger systems with many processes, though.
On the code side I found only a handful of minor issues like a few typos and unused imports. I am unsure about how progress for processes without own tasks is calculated. Perhaps you could just comment on my question (see below) and elaborate a little more on that specific part.
Kitodo-DataManagement/src/main/java/org/kitodo/data/database/persistence/TaskDAO.java
Outdated
Show resolved
Hide resolved
Kitodo-DataManagement/src/main/java/org/kitodo/data/database/persistence/TaskDAO.java
Outdated
Show resolved
Hide resolved
...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java
Outdated
Show resolved
Hide resolved
...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java
Outdated
Show resolved
Hide resolved
...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java
Outdated
Show resolved
Hide resolved
...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java
Outdated
Show resolved
Hide resolved
...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java
Outdated
Show resolved
Hide resolved
Map<TaskStatus, Integer> counts = countTaskStatusOfProcess(process, considerChildren); | ||
Integer total = counts.values().stream().mapToInt(Integer::intValue).sum(); | ||
|
||
// report processes without any tasks as if they had a single locked task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand this. Does this mean the task progress of a process without own tasks is always encoded as "100% locked tasks"? Shouldn't it instead be illegal (e.g. throw an exception) to try to determine the "task progress" of a process without workflow/tasks (when the parameter considerChildren
is false)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this was implemented this way. This was implemented the same before in ProcessService.
Does this mean the task progress of a process without own tasks is always encoded as "100% locked tasks"?
Yes. Unless considerChildren=true
and the process has children processes with tasks that are not locked.
Shouldn't it instead be illegal (e.g. throw an exception) to try to determine the "task progress" of a process without workflow/tasks (when the parameter considerChildrenis false)?
Maybe.
@@ -203,7 +204,7 @@ private boolean startExport(Process process, LegacyMetsModsDigitalDocumentHelper | |||
|
|||
private boolean exportCompletedChildren(List<Process> children) throws DataException { | |||
for (Process child:children) { | |||
if (processService.getProgress(child.getTasks(), null).equals(COMPLETED) && !child.isExported()) { | |||
if (ProcessConverter.getCombinedProgressAsString(child, false).equals(COMPLETED) && !child.isExported()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't calling ProcessConverter.getCombinedProgressAsString
with second parameter considerChildren = false
with an parent process without own tasks always result in "000000000100", e.g. 100% locked (as oposed to the the expected "100000000000", e.g. 100% completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this list of children contains processes that have no tasks, they are not catched by this if statement, correct. I don't know what this export code is doing, which is why I didn't change the implementation of how the progress status is calculated.
No, there are still performance problems related to the general save-strategy in Kitodo as outlined in #5368. The pull request #5371 slightly improves on this issue a bit for processes with parents. Since this pull request requires that processes are re-indexed when their task status changes, it doesn't make things easier, but there is no other way at the moment.
My original assesment of the problem in this comment was a bit incomplete. The process of a task was already re-indexed when its task status changed (except in one case, see line 94 of the WorkflowControllerService.java). So, the conflict with #4543 seems to be resolved without reverting it. Still, without always re-indexing a process whenenver a task changes, I can not guarantee that there might be some code somewhere, which modifies a task outside of the logic of Besides that, the performance problem of this pull request was caused by two issues:
I hope this clears things up a bit. |
Co-authored-by: Arved Solth <[email protected]>
Co-authored-by: Arved Solth <[email protected]>
Co-authored-by: Arved Solth <[email protected]>
Co-authored-by: Arved Solth <[email protected]>
@solth I commited all of your suggested changes. Thank you for your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the change requests. I still want to make a few tests, so I haven't merged this pull request, yet.
Since some of your other PRs have been merged in the meantime, a small code conflict arose. Could you resolve this conflict while I condict my tests?
Related Issues:
This pull request makes all columns of the process table sortable. In order to achieve this, the data for each column needs to be available as a property in the elastic search index. Some columns were previously generated on-the-fly, such that sorting them was not possible.
These column values are now calculated at indexing time. As a consequence, indexing speed may be slightly reduced, and UI performance (of the process table) should have slightly improved.
This pull request requires a full re-indexing (including an update of the elastic search mapping).
Demo of Process Table:
simplescreenrecorder-2022-09-23_14.26.28.mp4