A dataset of Belarusian expressions collected from open Internet sources.
The latest version of the dataset is in .csv format is here.
The dataset was put together for Notion to receive a random quote every day.
Notion template:
The data was collected from:
-
Web pages:
1.1. a selection of quotes from Radio Svaboda
1.2. selection from the National library website
1.3. quotes from the website dumki.org -
pdf books:
2.1. Belarusian folk art. Proverbs and sayings. In two parts
2.2. Ales Zaika. Proverbs and sayings from Kosovchyna
I wanted to start somewhere. I found a certain number of quotes on websites. But it seemed insufficient, so I decided to look at the books again. I chose the first ones that fit the content. Getting something out of books is quite difficult, so for now I have limited myself to these two. The complexities of parsing and reading from pdf can be seen in this notebook.
There are currently 9655 entries in the dataset.