Skip to content

sanskrit/data

Repository files navigation

sanskrit-data

Versioned Sanskrit linguistic data.

The data has been cobbled together from a variety of sources. Together, the data covers almost all lexical forms in Classical Sanskrit literature.

Quickstart

git clone https://github.com/sanskrit/data.git && cd data
python bin/make_data.py
ls all-data

The data comes from several sources, each with its own format. make_data.py converts all of the data to a common format and stores the results in the all-data directory. This is the data that downstream systems should use.

About the data

Verbs, participles, nouns, adjectives, pronouns, indeclinables, morphemes, and sandhi rules. If it's a Sanskrit word, it's probably here.

Each of the data sources used has its own license. Check the LICENSE files in learnsanskrit.org, sanskrit-heritage-site, and monier-williams for details.

All Sanskrit strings are written in SLP1, mainly because it is extremely convenient when processing Sanskrit programmatically. You can transliterate this data to some other representation by using a variety of transliterators.

About

Versioned Sanskrit linguistic data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published