Here you can find some homeworks that I did for the course Statistical Methods in Natural Language Processing I and II held by the professor Jan Hajic.
I've always thought that NLP was a fascinating topic and I am glad I found a way to challenge myself by doing some work on this field.
Here you can find the syllabus of the aforementioned courses:
- Statistical Methods in Natural Language Processing I
- Statistical Methods in Natural Language Processing II
- Assignment 1: Study the perplexity and entropy of a text.
- Assignment 2: Use the EM algorithm to get the parameters that tune the probabilities obtained from the training data over the heldout data. Finally, evaluate the model by computing the cross entropy over a separate test set.
- Assignment 3: Train your own Brill's tagger by defining the template and deciding appropriately the maximum number of rules
- Assignment 4: Train an HMM Tagger both in a supervised and unsupervised way (Viterbi training and Baum-Welch training, respectively).