Skip to content

digitalhumanists/corpus-services

Repository files navigation

Contributors · Forks · Issues · License


Logo

Corpus Services

The Corpus Services project bundles functionality used for maintenance, curation, conversion, and visualization of corpus data in various projects.
Explore the docs »

Report Bug · Request Feature

Table of Contents

About The Project

The (HZSK) Corpus Services were initially developed at the Hamburg Centre for Language Corpora (HZSK) as a quality control and publication framework for EXMARaLDA corpora. Since then, most development work has been done within the INEL project. A focus has been set on making the code adaptable to other use cases and data types. The Corpus Services project now bundles functionality used for maintenance, curation, conversion, and visualization of corpus data in various projects.

Getting Started

Additional documentation on the Corpus services can be found in the doc folder:

You can also find some sample scripts (batch and shell) to use for calls to the corpus services jar and some further utilities here.

There is also some information and scripts useful for automating the use of corpus-services available here.

Prerequisites

Java needs to be installed.

GitHub artifacts

The latest compiled .jar file can be found here.

Building

To use the services for corpora, compile it using mvn clean compile assembly:single. (See Build with Maven or use a pregenerated artifact from GitHub Actions that you can download here using nightly.link.

Usage

The usable functions can be found in the help output:

java -jar corpus-services-1.0.jar -h

See How to use for the usage of the corpus services.

Libraries

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under MIT License. See LICENSE for more information.

Authors

Anne Ferger

Hanna Hedeland

Daniel Jettka

Tommi Pirinen

Contact

Anne Ferger - @anneferger1 - [email protected]

Project Link: Corpus Services

Metadata

PID: http://hdl.handle.net/11022/0000-0007-D8A6-A

Citation

Please cite both the introduction into the system (Journal article) and the software publication:

Ferger, A., Hedeland, H., Jettka, D. & Pirinen, T. (2020). Corpus Services (Version 1.0). Zenodo. http://doi.org/10.5281/zenodo.4725655

Hedeland, H. & Ferger, A. (2020). Towards Continuous Quality Control for Spoken Language Corpora. International Journal for Digital Curation, 15(1). https://doi.org/10.2218/ijdc.v15i1.601

Acknowledgements

Contributions to the project have been made by staff from the HZSK and several research projects at the University of Hamburg: INEL, the BMBF-funded CLARIN-D project (01UG1620G), the project WO 1886/1-2 within the DFG LIS program, and the BMBF-funded project QUEST.

Parts of this project have been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Thank you to all funders, supporters and contributors!

Logo created at LogoMakr.com. This README file was created on the basis of the Best-README-Template.

Using nightly.link for links to generated artifacts.