Contributors · Forks · Issues · License
The Corpus Services project bundles functionality used for maintenance, curation, conversion, and visualization of corpus data in various projects.
Explore the docs »
Report Bug
·
Request Feature
- About the Project
- Getting Started
- Usage
- Roadmap
- Contributing
- License
- Authors
- Contact
- Acknowledgements
The (HZSK) Corpus Services were initially developed at the Hamburg Centre for Language Corpora (HZSK) as a quality control and publication framework for EXMARaLDA corpora. Since then, most development work has been done within the INEL project. A focus has been set on making the code adaptable to other use cases and data types. The Corpus Services project now bundles functionality used for maintenance, curation, conversion, and visualization of corpus data in various projects.
Additional documentation on the Corpus services can be found in the doc folder:
You can also find some sample scripts (batch and shell) to use for calls to the corpus services jar and some further utilities here.
There is also some information and scripts useful for automating the use of corpus-services available here.
Java needs to be installed.
The latest compiled .jar file can be found here.
To use the services for corpora, compile it using mvn clean compile assembly:single
.
(See Build with Maven
or use a pregenerated artifact from GitHub Actions that you can download here using nightly.link.
The usable functions can be found in the help output:
java -jar corpus-services-1.0.jar -h
See How to use for the usage of the corpus services.
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under MIT License. See LICENSE
for more information.
Anne Ferger
Hanna Hedeland
Daniel Jettka
Tommi Pirinen
Anne Ferger - @anneferger1 - [email protected]
Project Link: Corpus Services
PID: http://hdl.handle.net/11022/0000-0007-D8A6-A
Please cite both the introduction into the system (Journal article) and the software publication:
Ferger, A., Hedeland, H., Jettka, D. & Pirinen, T. (2020). Corpus Services (Version 1.0). Zenodo. http://doi.org/10.5281/zenodo.4725655
Hedeland, H. & Ferger, A. (2020). Towards Continuous Quality Control for Spoken Language Corpora. International Journal for Digital Curation, 15(1). https://doi.org/10.2218/ijdc.v15i1.601
Contributions to the project have been made by staff from the HZSK and several research projects at the University of Hamburg: INEL, the BMBF-funded CLARIN-D project (01UG1620G), the project WO 1886/1-2 within the DFG LIS program, and the BMBF-funded project QUEST.
Parts of this project have been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.
Thank you to all funders, supporters and contributors!
Logo created at LogoMakr.com. This README file was created on the basis of the Best-README-Template.
Using nightly.link for links to generated artifacts.