Find the full description of the hands-on task here: https://tiny.cc/versteisch-bahnhof
versteisch-bahnhof is a Swiss German dialect predictor using TF-IDF vector representations and a Random Forest classifier.
The evaluation is based on a publicly available Swiss German kaggle competition. This dataset is based on four different dialects:
BE Bernese
LU Lucerne
ZH Zurich
BS Basel
Whereby the training set consists of 15573 example sentences, wheres as the test set consists of 2499 example sentences.
Python3 is required.
First, install pipenv
using pip
:
pip install --user pipenv
To load all dependencies into an own virtual environment:
pipenv install
Next, you can import the created virtual environment into your preferred IDE and activate it in your shell:
pipenv shell
You can train the model either by train_dialect
(fixed parameter setting) or train_dialect_hyperparameter
(grid
search over different parameter settings). In both cases, the best parameters are logged to the console.