Make Process Table Columns Sortable #5360

thomaslow · 2022-09-23T12:31:37Z

Related Issues:

Contributes to Sorting in lists #3792
Fixes Process list not sortable by progress #4245

This pull request makes all columns of the process table sortable. In order to achieve this, the data for each column needs to be available as a property in the elastic search index. Some columns were previously generated on-the-fly, such that sorting them was not possible.

These column values are now calculated at indexing time. As a consequence, indexing speed may be slightly reduced, and UI performance (of the process table) should have slightly improved.

This pull request requires a full re-indexing (including an update of the elastic search mapping).

Demo of Process Table:

simplescreenrecorder-2022-09-23_14.26.28.mp4

Columns that were previously calculated on-the-fly are now added to the elastic search index and calculated at indexing time instead, e.g., the user name of the user that last worked on a task of the process. The code that calculated these information has been moved to the Kitodo-DataManagment project in order to be available at indexing time.

… search index.

…e procedure.

henning-gerhardt · 2022-09-23T15:19:33Z

...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java

     * @return the list of tasks of the process (and potentially its children)
     */
-    private static List<Task> getListOfTasksForProgressCalculation(Process process) {
+    private static List<Task> getListOfTasksForProgressCalculation(Process process, Boolean considerChildren) {


Why are you using nullable Boolean class instead of primitive boolean as the values should only be true or false ?

Hi Henning, I didn't really think about it. Fixed it below.

thomaslow · 2022-09-26T10:24:57Z

@solth In order for the process status to be sortable, it needs to be kept up-to-date in the ElasticSearch index whenever a task status (OPEN, DONE, INWORK, LOCKED) is changed. This requires that the accompanying process is re-indexed whenever a task status is changed. This conflicts with a pull request from last year, see #4543, which presumably improved performance when saving tasks.

How do we deal with this?

Re-index processes whenever tasks are saved, potentially reducing performance
Don't index process progress (and thus, make it not sortable)

Edit: the same applies to "lastEditingUser", which of course also depends on changes to tasks of a process

…cesses via a recursive search.

…s via a custom sql query.

…ery for task progress calculation.

…alculation to improve compatibility with h2database.

…d by h2 database.

…rtable

thomaslow · 2022-10-06T13:41:09Z

The performance problem mentioned in the previous comment is fixed now by calculating the process status via a custom SQL query. This SQL query allows to efficiently count how many tasks have a certain state (open, locked, inwork, done) even for parent processes that may have hundreds of child tasks (e.g. newspaper processes). Previously, this counting would take more than 1 second for parent processes with many children.

The SQL query uses an SQL statement (with recursive) that is not supported by Hibernate but works with most databases. I checked support for MySQL 8+, MariaDB 10.2.2+, H2Database 1.4+, but other databases should work too, e.g. PostgreSQL, Oracle. The syntax is the same for all databases I know of.

In case the SQL query does not work, there is a chance that the resulting exception is intercepted and the previous bean-based status calculation is used as a fallback. However, this depends on how the query fails.

I also added a selenium test to check that sorting processes by state works.

…orted by state and duration.

thomaslow · 2022-10-13T11:58:24Z

Merged with master so github only shows code changes to current master.

solth · 2022-10-19T08:35:08Z

The performance problem mentioned in the previous comment is fixed now by calculating the process status via a custom SQL query.

So that means re-indexing processes whenever a task is saved does not impose any performance problems anymore? Can you elaborate a little more how using a recursive SQL query helps in this case? If I am not mistaken, #4543 improved performance by reducing read/write operations on the index, not on the database. How does an optimised SQL query have an effect on that?

solth

Thank you for this pull request. I tested it and it works very well with the small list of processes I have on my local development system. I cannot say whether the changes concerning indexing have any negative impact on the performance on larger systems with many processes, though.

On the code side I found only a handful of minor issues like a few typos and unused imports. I am unsure about how progress for processes without own tasks is calculated. Perhaps you could just comment on my question (see below) and elaborate a little more on that specific part.

Kitodo-DataManagement/src/main/java/org/kitodo/data/database/persistence/TaskDAO.java

...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java

solth · 2022-10-19T09:23:34Z

...Management/src/main/java/org/kitodo/data/elasticsearch/index/converter/ProcessConverter.java

+        Map<TaskStatus, Integer> counts = countTaskStatusOfProcess(process, considerChildren);
+        Integer total = counts.values().stream().mapToInt(Integer::intValue).sum();
+
+        // report processes without any tasks as if they had a single locked task


I am not sure I understand this. Does this mean the task progress of a process without own tasks is always encoded as "100% locked tasks"? Shouldn't it instead be illegal (e.g. throw an exception) to try to determine the "task progress" of a process without workflow/tasks (when the parameter considerChildrenis false)?

I'm not sure why this was implemented this way. This was implemented the same before in ProcessService.

Does this mean the task progress of a process without own tasks is always encoded as "100% locked tasks"?

Yes. Unless considerChildren=true and the process has children processes with tasks that are not locked.

Shouldn't it instead be illegal (e.g. throw an exception) to try to determine the "task progress" of a process without workflow/tasks (when the parameter considerChildrenis false)?

Maybe.

Kitodo/src/main/java/org/kitodo/production/helper/Helper.java

solth · 2022-10-19T10:05:19Z

Kitodo/src/main/java/org/kitodo/export/ExportDms.java

@@ -203,7 +204,7 @@ private boolean startExport(Process process, LegacyMetsModsDigitalDocumentHelper

    private boolean exportCompletedChildren(List<Process> children) throws DataException {
        for (Process child:children) {
-            if (processService.getProgress(child.getTasks(), null).equals(COMPLETED) && !child.isExported()) {
+            if (ProcessConverter.getCombinedProgressAsString(child, false).equals(COMPLETED) && !child.isExported()) {


Wouldn't calling ProcessConverter.getCombinedProgressAsString with second parameter considerChildren = false with an parent process without own tasks always result in "000000000100", e.g. 100% locked (as oposed to the the expected "100000000000", e.g. 100% completed?

If this list of children contains processes that have no tasks, they are not catched by this if statement, correct. I don't know what this export code is doing, which is why I didn't change the implementation of how the progress status is calculated.

thomaslow · 2022-10-19T10:18:35Z

So that means re-indexing processes whenever a task is saved does not impose any performance problems anymore?

No, there are still performance problems related to the general save-strategy in Kitodo as outlined in #5368. The pull request #5371 slightly improves on this issue a bit for processes with parents. Since this pull request requires that processes are re-indexed when their task status changes, it doesn't make things easier, but there is no other way at the moment.

Can you elaborate a little more how using a recursive SQL query helps in this case?

My original assesment of the problem in this comment was a bit incomplete. The process of a task was already re-indexed when its task status changed (except in one case, see line 94 of the WorkflowControllerService.java). So, the conflict with #4543 seems to be resolved without reverting it. Still, without always re-indexing a process whenenver a task changes, I can not guarantee that there might be some code somewhere, which modifies a task outside of the logic of WorkflowControllerService.java without saving and re-indexing its process (such that the process-index will not contain the most up-to-date information).

Besides that, the performance problem of this pull request was caused by two issues:

Changing a task status triggers a save-operation for its process, which leads to the performance issue described in Improve performance when saving and indexing processes, tasks, etc. #5368, which caused many re-indexing operations for its parent processes. This still is a problem, but can not be fixed by this pull request.
Calculating the task status of a parent process took a very long time when retrieving all related tasks for a parent process via bean getter methods (each triggering a database query for the tasks of a child process). This issue was multiplied by problem 1, because parent processes were saved many times (re-calculating the process status multiple times). The custom SQL query allows to efficiently calculate the process status for parent processes (which previously was only partially implemented by Kathrin via the sortHelperStatus property). It is still (unnecessarily) calculated multiple times, but much faster now.

I hope this clears things up a bit.

Co-authored-by: Arved Solth <[email protected]>

thomaslow · 2022-10-19T10:57:09Z

@solth I commited all of your suggested changes. Thank you for your review.

solth

Thanks for implementing the change requests. I still want to make a few tests, so I haven't merged this pull request, yet.

Since some of your other PRs have been merged in the meantime, a small code conflict arose. Could you resolve this conflict while I condict my tests?

thomaslow added 5 commits September 21, 2022 15:42

Enable sorting of columns duration, project and status in process list.

205bd23

Fix codestyle issues.

42b6984

Fix ProcessTypeTest after adding additional properties to the elastic…

adfb44e

… search index.

Fix bug that prevents saving of process due to sortHelperStatus updat…

5bdc555

…e procedure.

henning-gerhardt reviewed Sep 23, 2022

View reviewed changes

Update Boolean parameter to primitve type.

4164caf

thomaslow marked this pull request as draft September 26, 2022 11:12

thomaslow mentioned this pull request Sep 26, 2022

Make task table columns sortable #5365

Merged

Fix NPE when tasks do not have a processing begin date.

b979ff0

thomaslow mentioned this pull request Sep 26, 2022

Make search result list columns sortable #5366

Merged

thomaslow added 3 commits September 28, 2022 12:46

Save process when task status is raised up.

e5e6c11

Update process status calculation to include tasks of grand child pro…

bb15197

…cesses via a recursive search.

Remove unused import in ProcessService.

259276e

thomaslow mentioned this pull request Sep 28, 2022

Improve performance when saving and indexing processes, tasks, etc. #5368

Closed

thomaslow added 8 commits September 30, 2022 13:52

Improve performance when indexing processes by calculating it's statu…

fabd8d9

…s via a custom sql query.

Extend catch statement to any exceptions when performing recursive qu…

41c7201

…ery for task progress calculation.

Explicitly mention id column in native SQL query for process status c…

9876856

…alculation to improve compatibility with h2database.

Do not use query parameter for recursive query, which is not supporte…

a09101c

…d by h2 database.

Merge branch 'master' into make-process-table-columns-sortable

68b43df

Merge branch 'remember-sort-order' into make-process-table-columns-so…

55ee27d

…rtable

Add selenium tests for sorting processes by their state and duration.

1b0d214

Fix checkstyle and remove accidentally added test.

b3d6e1d

Add missing list columns to MockDatabase such that processes can be s…

5371e25

…orted by state and duration.

thomaslow marked this pull request as ready for review October 6, 2022 14:10

Merge branch 'master' into make-process-table-columns-sortable

df78ccf

thomaslow mentioned this pull request Oct 18, 2022

Improve "correction K" tooltip #5403

Merged

solth requested changes Oct 19, 2022

View reviewed changes

thomaslow and others added 4 commits October 19, 2022 12:26

Remove unused import.

35ea25a

Co-authored-by: Arved Solth <[email protected]>

Combine two lines to one.

e55664f

Co-authored-by: Arved Solth <[email protected]>

Fix javadocs and small readability changes.

f9ac5d3

Co-authored-by: Arved Solth <[email protected]>

Fix javadoc.

bc87a20

Co-authored-by: Arved Solth <[email protected]>

solth reviewed Oct 21, 2022

View reviewed changes

thomaslow added 2 commits October 21, 2022 11:24

Merge branch 'master' into make-process-table-columns-sortable

f503d68

Fix javadoc format.

f3bb7e6

solth approved these changes Oct 24, 2022

View reviewed changes

solth merged commit a0b0794 into kitodo:master Oct 25, 2022

solth mentioned this pull request Oct 25, 2022

Performance when saving processes with a parent #5371

Merged

thomaslow mentioned this pull request Feb 23, 2023

Indexing of processes fails in kitodo-production 3.5.0 with mysql 5.7 due to recursive query #5563

Closed

thomaslow mentioned this pull request May 27, 2023

Wrong Color Status Update if Task is taken by Task View (since version 3.5) #5671

Open

solth mentioned this pull request Mar 6, 2024

Processes cannot be sorted by all columns #5001

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Process Table Columns Sortable #5360

Make Process Table Columns Sortable #5360

thomaslow commented Sep 23, 2022 •

edited

Loading

henning-gerhardt Sep 23, 2022

thomaslow Sep 23, 2022

thomaslow commented Sep 26, 2022 •

edited

Loading

thomaslow commented Oct 6, 2022

thomaslow commented Oct 13, 2022

solth commented Oct 19, 2022

solth left a comment •

edited

Loading

solth Oct 19, 2022

thomaslow Oct 19, 2022

solth Oct 19, 2022

thomaslow Oct 19, 2022

thomaslow commented Oct 19, 2022

thomaslow commented Oct 19, 2022

solth left a comment

Make Process Table Columns Sortable #5360

Make Process Table Columns Sortable #5360

Conversation

thomaslow commented Sep 23, 2022 • edited Loading

henning-gerhardt Sep 23, 2022

Choose a reason for hiding this comment

thomaslow Sep 23, 2022

Choose a reason for hiding this comment

thomaslow commented Sep 26, 2022 • edited Loading

thomaslow commented Oct 6, 2022

thomaslow commented Oct 13, 2022

solth commented Oct 19, 2022

solth left a comment • edited Loading

Choose a reason for hiding this comment

solth Oct 19, 2022

Choose a reason for hiding this comment

thomaslow Oct 19, 2022

Choose a reason for hiding this comment

solth Oct 19, 2022

Choose a reason for hiding this comment

thomaslow Oct 19, 2022

Choose a reason for hiding this comment

thomaslow commented Oct 19, 2022

thomaslow commented Oct 19, 2022

solth left a comment

Choose a reason for hiding this comment

thomaslow commented Sep 23, 2022 •

edited

Loading

thomaslow commented Sep 26, 2022 •

edited

Loading

solth left a comment •

edited

Loading