Performance improvement - Bundle #5017

andre-hohmann · 2022-02-23T12:45:23Z

In Kitodo.Production 3.x several processes and functions are slow. This refers the display of lists (processes, users ...), opening and working in the metadata editor or the creation of newspaper processes. For the daily work, slow performance is very annoying and time consuming. The goal is to improve the user experience by solving the performance problems.

The cost estimation is expected to be: high

High costs are expected, because the analysis and the identification of the causes in each case will take probably a lot of time. It must be assumed, that some problems occur only in systems with several hundred thousands of processes.
Maybe some issues are solved by the hibernate search. This should be examined as soon as the hibernate search is implemented.

In the following the suggested Issues are listed:

matthias-ronge · 2022-02-23T15:55:28Z

JSF applications are always a bit sluggish, as each interaction (even expanding a drop-down menu) is sent to server and sent back to the browser. There's nothing we can change about this (unless we put the whole application on a completely different software stack, so that a lot of the action happens purely in the browser).

However, I see some good chances for performance improvements:

I think one main reason for the low performance is, that the JSF form classes are programmed differently than intended by the JSF developers. If you look at the JSF lifecycle, you can see that the properties of the components are queried many many times while a single HTML response is generated by Tomcat. This means that each and every property of the frontend components should be backed 1:1 by a class field of the underlying form class. The fields should be initialized when the form class is created, and should be changed when actions are triggered, but it shouldn't be the case, for example, that a property has to fetch its value from somewhere when queried, because this retrieval will otherwise be necessary over and over again. This also implies that all form classes must be written to be serializable (in the Java sense) in order for Tomcat to cache them on disk. I see a possible solution to this problem—I've been thinking about this for a long time already—to generate skeletons for form classes from the XHTML files with a script and to implement them strictly according to these restrictions. I'd really appreciate that, but that would make a version 4.x, at least.
I see a second layer of accidental complexity in the use of the module loader. In its current form, this brings a lot of additional computing work for the application (and additional work for the administrators with updates), but no profit. The basic idea of dividing the code into modules is valuable, but it would suffice if the module JARs were placed in the web application's WEB-INF/lib directory along with the libraries, and were available for the classloader. The classes could then be accessed statically, and accessing the module loader would not have to be done.
I still wonder to this day what we have the search engine for. I've asked this question a few times, but I've only ever gotten evasive answers. The search engine makes storing objects much slower when it's done synchronously, and so far I don't see much gain in search that couldn't be achieved with the database (or with a much more minimalistic search engine implementation) as well. A very good design of tokenization and indexing would serve well here!
With Jakarta XML Binding, we use a ᴠᴇʀʏ sʟᴏᴡ XML processor. During the DFG development project, we had started to use a performance-optimized and RDF-enabled (which would allow native IIIF support) XML processor to replace the UGH: the Kitodo − Data Access module. The module is covered with 700 tests to make it highly error-free, which is important at this point. Tested, this processor is 8 times faster than JAXB. However, in the course of the project, it was decided to stick to JAXB instead, and that's how it is now. The development at that time is not lost, the code is immortalized in the commit history and could be reactivated with an acceptable effort, making the XML processing significantly faster in all places.

M3ssman · 2022-03-31T11:55:40Z

@matthias-ronge What exactly do you mean with search engine ? The ELK-stuff?

matthias-ronge · 2022-03-31T12:09:58Z

Yes, I mean the ElasticSearch

M3ssman · 2022-03-31T13:40:48Z

I do really appreciate your position on this topic. It has been argued Elastic is there to have virtually all metadata indexed ... fancy at first sign, but actually, I've never missed this back in kitodo2.

Further, I agree that a more sophisticated usage of the underlying database system with proper indexing and so forth, as you already mentioned, should do the same job way easier.

Another point versus Elastic: it increases the installation and support complexity a good deal (not to mention potential license and security issues).

Regarding the XML-Processing: Is this serious? If so, I'd be really interested to get to know why a fast-processing Component has been skipped during project evolution.

matthias-ronge · 2022-04-01T09:38:28Z

For XSLT processing: The main point for using JAXB is that it technically prohibits building wrong METS files, because it cannot handle unknown elements or attributes. (It is even so strict that we found the METS profile used in Germany up to that point contained a wrong referencing by ID.) This made develompent much safer, because a lot of new developers came into the project at that time, and we had really more significant performance issues at that time, so that these milli-second differences have not been taken too serious.

This is nothing bad, since we use an agile software development approach, which means that decisions can always be revised and re-decided otherways, if it seems to become necessary.

Why this could be revised: Over the time, there have come quite a lot of places through the application where it needs to look into the XML files (features that weren’t there at time of that decision either) so I can still think of using the other code may bring performance improvements. However, it is most relevant when many METS files have to be read together, and there are three places I think to know, without checking: during indexing, during newspaper migration, and when opening a year process of a newspaper. Maybe there are other good solutions as well, as storing more data in the search index.

andre-hohmann · 2023-02-20T14:45:28Z

As the main issues (performance regarding newspaper-processes, metadata editor, process lists) are solved, i close this issue.

If other issues with regard to performance problems should be solved, it would be better to describe them separately.

andre-hohmann added 3.x development fund 2022 A candidate for the Kitodo e.V. development fund. labels Feb 23, 2022

solth removed the 3.x label Jul 7, 2022

andre-hohmann closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement - Bundle #5017

Performance improvement - Bundle #5017

andre-hohmann commented Feb 23, 2022

matthias-ronge commented Feb 23, 2022 •

edited

Loading

M3ssman commented Mar 31, 2022

matthias-ronge commented Mar 31, 2022

M3ssman commented Mar 31, 2022

matthias-ronge commented Apr 1, 2022

andre-hohmann commented Feb 20, 2023

Performance improvement - Bundle #5017

Performance improvement - Bundle #5017

Comments

andre-hohmann commented Feb 23, 2022

matthias-ronge commented Feb 23, 2022 • edited Loading

M3ssman commented Mar 31, 2022

matthias-ronge commented Mar 31, 2022

M3ssman commented Mar 31, 2022

matthias-ronge commented Apr 1, 2022

andre-hohmann commented Feb 20, 2023

matthias-ronge commented Feb 23, 2022 •

edited

Loading