-
I downloaded the JSONL file at the bottom of this page: https://kaikki.org/dictionary/English/words/index.html It's named kaikki.org-dictionary-English-words.jsonl and it's 2.69 GB on my machine. I've been familiarizing myself with the structure. My immediate focus is accessing plural forms of nouns, and I have figured out how to extract them.
But the first entry I wanted to look at, the args members in the header_templates, takes me to a page that confuses me. https://kaikki.org/dictionary/errors/mapping/index/head_templates/list/args.html At the top, for the entries that show "seen in hern [English];", I cannot find arg key names 10-19 either in the entry for "hern" on wiktionary nor in the JSONL file. The only arg key I find is "1", and that is for Hern with a capital H. So, where are the key names 10-19 for hern coming from? I'm grateful you have provided this post-processed dump of the wiktionary data. The requirements for creating it myself are daunting: I'm just using a MacBook pro with a single 6-Core Intel Core i9, with total system memory of 16 GB. So processing the wiki data myself is not possible at this time. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Please download raw JSONL files at here: https://kaikki.org/dictionary/rawdata.html , it contains data extracted from all pages. You can ignore the error page. And also ignore all "*_templates" fields, inflection form data are in the "forms" field. |
Beta Was this translation helpful? Give feedback.
The file in your posted link only have English words, it's 2.5GB. The file contains all pages(words in all languages) is 18.5GB.