Japanese-English example sentences from the Tatoeba Project in JSON format.
Each sentence entry in the JSON contains:
- The entry ID, corresponding to the Japanese example sentence ID from Tatoeba
- The Japanese example sentence from Tatoeba
- The corresponding English translation provided by Tatoeba
- A list of words that appear in the sentence containing:
- The headword form (dictionary form)
- An optional reading
- An optional sense index (which refers to correct sense for the word's dictionary entry in jmdict)
- An optional surface form, if this differs from the headword form
- An optional field ("checked") indicating that the sentence pair is a good and checked example of the usage of the word
The list of words for each example sentences is provided by Tatoeba under Japanese indices. The indices were originally compiled when the Tanaka Corpus was integrated into the WWWJDIC server as detailed in the this publication.
See here for more information on the original data format of the Tanaka Corpus.
You can download the pre-built JSON file from the latest release. Automated releases containing the latest example sentences are scheduled weekly.
Many of the example sentences are originally sourced from the Tanaka Corpus, which is now maintained within the Tatoeba Project. All files downloaded through the Tatoeba Project are licensed under the CC BY 2.0 FR license.
As required by the original license, all derived files containing example sentences distributed in each release are made available under the same license.
The original source code and other files in this project, excluding the files mentioned above, are made available under the MIT license (see LICENSE.txt).