-
Notifications
You must be signed in to change notification settings - Fork 5
Documentation
The technical and research partners in IMPACT have developed more than 20 different tools for various stages in the OCR process. Generally speaking, all of these tools operate on image or text data, either by modifying the data or by extracting information from it. IMPACT has therefore also developed an interoperability framework that allows for a loose coupling of these tools and the exchange of data between them.
The IMPACT Interoperability Framework comprises the following components:
- The toolwrapper, a Java application for creating a web service wrapper project for command line tools
- The web service client, a web application that can be used to test the operations of a SOAP web service
- The generic soap client, a Java library that can execute operations of an arbitrary SOAP web service
- The results repository service, a custom SOAP web service that stores files into a WebDAV repository
- The taverna 2 client, a web application that can be used to remotely execute workflows on Taverna 2 Server
In the first step, command line tools are wrapped as web services with the help of the toolwrapper and according tool specifications.
Using the derived workflow modules it is then possible to form a pipeline of the tools where the output of one tool is used as input for the next tool.
An important incentive for creating such a framework is that the historical material that libraries, archives and other content holders are digitising in large quantities is very different in nature. Because there is no optimal combination of tools (called a workflow) for every purpose, users have to be enabled to try and evaluate certain combinations to find their optimal workflow.
The following related articles might also be of interest:
- Dogan, Z.M., C. Neudecker, S. Schlarb and G. Zechmeister: Experimental workflow design and development in digitisation. QQML2010 Conference, 25-28 May 2010, Chania, Crete, Greece.
- Neudecker, C., S. Schlarb, M. Dogan, P. Missier, S. Sufi, A. Williams and K. Wolstencroft: An experimental workflow development platform for historical document digitisation and analysis. Workshop on Historical Document Imaging and Processing (HIP '11) at ICDAR2011, 16-17 September 2011, Beijing, China.
- Schlarb, S. and C. Neudecker: A Heuristic Measure for Detecting Influence of Lossy JP2 Compression on Optical Character Recognition in the Absence of Ground Truth. Archiving 2012 Conference, 12-15 June 2012, Copenhagen, Denmark.