latin-frequencies

Simple script that generates word frequency lists for Latin texts

Prerequisites

spaCy with the la_core_web_lg LatinCy pipeline, XlsxWriter

pip install -U XlsxWriter
pip install -U pip setuptools wheel
pip install -U spacy
pip install https://huggingface.co/latincy/la_core_web_lg/resolve/main/la_core_web_lg-any-py3-none-any.whl

Usage

Example:

  python frequency.py --stopwords=stopwords.txt --coverage=80 --output_type=excel --output=output.xlsx documents

Collects lemma counts for all (plaintext) files in the "documents" directory, using the supplied stop-word file. Returns the top lemmata until at least 80% coverage are achieved in an excel file.

Arguments

filename/folder: Path of a file or folder to process (obligatory).
--stopwords=filename: Path of a textfile containing a list of stop words (one entry per line). If not supplied, the default stop-word list of the spaCy model is used. Point this to an empty file, if you don't want to use any stop words.
--output=filename: Where to store the output. If not supplied, output will be printed to stdout.
--output_type=excel/csv: What kind of output to generate. Only applies if --output is specified, defaults to "csv".
--coverage=n: Lists the most common lemmata in descending order until at least "n" percent of vocabulary coverage is achieved. Takes precedence over --top.
--top=n: Lists the "n" most frequent lemmata. If neither --coverage nor --top are supplied, all lemmata are listed.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
frequency.py		frequency.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

latin-frequencies

Prerequisites

Usage

Arguments

About

Releases

Packages

Languages

License

katharinaost/latin-frequencies

Folders and files

Latest commit

History

Repository files navigation

latin-frequencies

Prerequisites

Usage

Arguments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages