tatoeba-json

Japanese-English example sentences from the Tatoeba Project in JSON format.

Format

Each sentence entry in the JSON contains:

The entry ID, corresponding to the Japanese example sentence ID from Tatoeba
The Japanese example sentence from Tatoeba
The corresponding English translation provided by Tatoeba
A list of words that appear in the sentence containing:
- The headword form (dictionary form)
- An optional reading
- An optional sense index (which refers to correct sense for the word's dictionary entry in jmdict)
- An optional surface form, if this differs from the headword form
- An optional field ("checked") indicating that the sentence pair is a good and checked example of the usage of the word

The list of words for each example sentences is provided by Tatoeba under Japanese indices. The indices were originally compiled when the Tanaka Corpus was integrated into the WWWJDIC server as detailed in the this publication.

See here for more information on the original data format of the Tanaka Corpus.

Releases

You can download the pre-built JSON file from the latest release. Automated releases containing the latest example sentences are scheduled weekly.

License

Tatoeba Project / Tanaka Corpus

Many of the example sentences are originally sourced from the Tanaka Corpus, which is now maintained within the Tatoeba Project. All files downloaded through the Tatoeba Project are licensed under the CC BY 2.0 FR license.

As required by the original license, all derived files containing example sentences distributed in each release are made available under the same license.

Source Code

The original source code and other files in this project, excluding the files mentioned above, are made available under the MIT license (see LICENSE.txt).

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
constants.py		constants.py
download.py		download.py
parse.py		parse.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tatoeba-json

Format

Releases

License

Tatoeba Project / Tanaka Corpus

Source Code

About

Releases 27

Packages

Languages

License

mwhirls/tatoeba-json

Folders and files

Latest commit

History

Repository files navigation

tatoeba-json

Format

Releases

License

Tatoeba Project / Tanaka Corpus

Source Code

About

Resources

License

Stars

Watchers

Forks

Releases 27

Packages 0

Languages

Packages