-
Notifications
You must be signed in to change notification settings - Fork 5
Documentation
The technical and research partners in IMPACT have developed more than 20 different tools for various stages in the OCR process. Generally speaking, all of these tools operate on image or text data, either by modifying the data or by extracting information from it. IMPACT has therefore also developed an interoperability framework that allows for a loose coupling of these tools and the exchange of data between them.
The IMPACT Interoperability Framework comprises the following components:
- The toolwrapper, a Java application for creating a web service wrapper project for command line tools
- The web service client, a web application that can be used to test the operations of a SOAP web service
- The generic soap client, a Java library that can execute operations of an arbitrary SOAP web service
- The results repository service, a custom SOAP web service that stores files into a WebDAV repository
- The taverna 2 client, a web application that can be used to remotely execute workflows on Taverna 2 Server
In the first step, command line tools are wrapped as web services with the help of the toolwrapper and according tool specifications. The derived web service can in turn be wrapped again in a workfow module for the Taverna workflow system.
Using the workflow modules it is then possible via drag-and-drop operation in the user interface of the workflow system to form a pipeline of the tools where the output of one tool is used as input for the next tool.
An important incentive for creating such a framework is that the historical material that libraries, archives and other content holders are digitising in large quantities is very diverse in nature. Because there is no optimal combination of tools for every source material and purpose, users have to be enabled to try out and evaluate various combinations to determine their optimal processing chain.
The myExperiment environment, which is integrated with Taverna, is used as the main platform for connecting the resources, such as tools and workflows, with the users in cultural heritage institutions throughout Europe. By means of this Web 2.0 platform people can not only share their workflows, but also their experiences with applying the tools in their particular context and with their material.
In order to quickly try out individual web services as well as comprehensive workflows several clients/interfaces are available:
- Local client
For local development and execution, the Taverna Workbench can be used. It enables you to graphically create, edit and run workflows on your local computer. Taverna Workbench is open source and available for Windows, Linux and OSX. - Web service client
For remote execution of individual web services, the Web service client is provided. It analyses the WSDL file of the web service and presents the operations and the respective input fields to the user. Depending on the type of an input, the user is presented either a simple text input field, or a file upload field. The inserted values are sent to the web service and the resulting return message is displayed. If there are any attached files (for example a converted image) in the returned message, those files can be downloaded via generated links. - Taverna 2 client
For remote execution of workflows, the T2-client is provided. The user uploads a workflow description to be executed and the application presents the user with the according input fields for the workflow. In addition to uploading a workflow from a local disk, the user can also login to myExperiment and a workflow belonging to the user or to one of the user's groups can be uploaded. - Web-WF-client (alpha!)
For web-based workflow development and execution, the Web-wf-design web application can be used. It provides a JavaScript interface to Taverna which enables users to design and execute workflows directly in their browser. Additional documentation is available here.
The following related articles might also be of interest:
- Schlarb, S. and C. Neudecker: A Heuristic Measure for Detecting Influence of Lossy JP2 Compression on Optical Character Recognition in the Absence of Ground Truth. Archiving 2012 Conference, 12-15 June 2012, Copenhagen, Denmark.
- Neudecker, C., S. Schlarb, M. Dogan, P. Missier, S. Sufi, A. Williams and K. Wolstencroft: An experimental workflow development platform for historical document digitisation and analysis. Workshop on Historical Document Imaging and Processing (HIP '11) at ICDAR2011, 16-17 September 2011, Beijing, China.
- Dogan, Z.M., C. Neudecker, S. Schlarb and G. Zechmeister: Experimental workflow design and development in digitisation. QQML2010 Conference, 25-28 May 2010, Chania, Crete, Greece.