Skip to content

mwhirls/tatoeba-json

Repository files navigation

tatoeba-json

stability-wip License: MIT CC BY 4.0

Japanese-English example sentences from the Tatoeba Project in JSON format.

Format

Each sentence entry in the JSON contains:

  • The entry ID, corresponding to the Japanese example sentence ID from Tatoeba
  • The Japanese example sentence from Tatoeba
  • The corresponding English translation provided by Tatoeba
  • A list of words that appear in the sentence containing:
    • The headword form (dictionary form)
    • An optional reading
    • An optional sense index (which refers to correct sense for the word's dictionary entry in jmdict)
    • An optional surface form, if this differs from the headword form
    • An optional field ("checked") indicating that the sentence pair is a good and checked example of the usage of the word

The list of words for each example sentences is provided by Tatoeba under Japanese indices. The indices were originally compiled when the Tanaka Corpus was integrated into the WWWJDIC server as detailed in the this publication.

See here for more information on the original data format of the Tanaka Corpus.

Releases

You can download the pre-built JSON file from the latest release. Automated releases containing the latest example sentences are scheduled weekly.

License

Tatoeba Project / Tanaka Corpus

Many of the example sentences are originally sourced from the Tanaka Corpus, which is now maintained within the Tatoeba Project. All files downloaded through the Tatoeba Project are licensed under the CC BY 2.0 FR license.

As required by the original license, all derived files containing example sentences distributed in each release are made available under the same license.

CC BY 4.0

Source Code

The original source code and other files in this project, excluding the files mentioned above, are made available under the MIT license (see LICENSE.txt).

About

Japanese example sentences from the Tatoeba Project in JSON format

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages