Utilities to help me with Chinese-language work and other NLP tasks
-
json_texts
: Contains research files in progress, in JSON format. Format specification is atjson_format_for_prosody.txt
. Programhandle_files.py
enables encrypted version of data files to be pushed to repo, but keeps contents private. -
character_count.py
: Count the Chinese characters (only) in a file and return their overall percentages. File to be opened must be in directoryDATA
. -
separate_pinyin/
Takes a string of Pīnyīn as input and returns a list of the discrete component syllables. There is a second programcount_syllables.py
to count the number of syllables found. -
convert_pinyin/
: Convert files in Pages (v. 3, "Pages '08") format so that their non-standard tonal diacritics are normalized to Unicode. Does not work with later versions of Pages. Sample font ("shyrbaw" 時報, based on Times) is included in directory. -
statistics/
: Little programs to calculate statistical tests. -
poetry_flask/
: The beginnings of a web application to assist the study of medieval Chinese prosody. -
hanamin_fonts/
: Copy of the HANAMIN fonts for use with this project.
[end]