-
Notifications
You must be signed in to change notification settings - Fork 53
Exercises
jasonbaldridge edited this page Apr 8, 2013
·
9 revisions
- Spelling Correction Exercise, Part 1: Error detection and inverted index for n-gram character candidate generation.
- Spelling Correction Exercise, Part 2: Candidate generation (n-gram character and edit based) and scoring with unigram and bigram language models.
- Spelling Correction Exercise, Part 3: TBA, Edit distance and error model computation and use.
- Topic Modeling Exercise: Compute topics using Mallet.
- Additional reading/resource: See my post about a Gibbs sampler in R for a toy topic model example, which implements a sampler and runs on the data for the overview paper Probabilistic Topic Models by Steyvers and Griffiths (2007).
- Spark on AWS, Exercise 1: Launch clusters on Amazon's EC2 service and use Spark to count words in files, and work with HDFS.
- Spark on AWS, Exercise 2: Run standalone jobs using Spark, attach EBS volumes and work with larger files HDFS.