Skip to content

0005 integrate extract as backend library

Bruno Thomas edited this page Nov 23, 2021 · 1 revision

5. Integrate ICIJ extract java library as backend library

Date: 2021-11-22

Status

Accepted

Context

extract has been developed and used for previous leak projects (panama papers, swiss leaks, luxembourg leaks) based on :

  • Tika
  • tesseract OCR

Decision

Reuse extract separating :

  • a library that is used by Datashare and
  • the existing extract command line interface

Consequences

There is a jar library published on maven central repositories from which depend :

  • datashare
  • extract-cli

Datashare

Customize (Legacy)

Translations

This project is currently available in English, French, Spanish and Japanese. You can help us to improve and complete translations on Crowdin.

About ICIJ

Datashare is a project by ICIJ, a collective of investigative journalists.

ICIJ Logo

Clone this wiki locally