This repository contains various Word Lists and related information.
-
wordlist-Internet-[version].zip
A list of more than 2 million unique words automatically extracted from HTML content of many popular Uyghur websites as well as Wikipeda. This word list contains majority of Uyghur words used on the Internet. Notice that it contains many misspelled or erroneous words. The list also includes some numerucal statistical information of words such as raw frequency and document frequency. Each line of the file consists of three fields separated by comma:
[Word],[Raw frequency],[Document frequency]
where, Raw frequency: number of times that a word occurs in all documents (web pages). Document frequency: number of documents containing a word.
This work is licensed under a Creative Commons Attribution 4.0 International License.