Skip to content

A stopwords list for NLP work in processing Classical Latin

Notifications You must be signed in to change notification settings

KotobaSuke/latin-stopwords

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Latin Stopword List

The wordlist (latin) is a stopword list for processing Classical Latin literatures, which can be opened as a .txt file.

The list is not complete, with words collected by me with personal judgement and bias.

The words in the list are in lowercase and unlemmatized, i.e. different forms of a single lemma are listed individually. Due to some restrictions, not all forms of a stopped word are listed.

Most of the words that match the following conditions are included:

  • Pronouns: ego, tu, te, quis, quidam
  • Quantifications: unus, duo, ullus, omne
  • Non-adjective-derivative adverbs: mox, tunc, num, non
  • Conjunctions: et, atque, sed, seu
  • Prepositions: in, ad, trans, propter
  • Degree qualifications: quantum, magis, minor, talis
  • Light verbs: forms of esse, facere, habere, dicere
  • Other particles: ecce, o
  • Classical name abbreviations: c, agr, m, cn
  • Roman numerals: i, ii, v, xl
  • Greek distractors: kai, ton (frequent in some contexts with Greek quotes)

About

A stopwords list for NLP work in processing Classical Latin

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published