-
Notifications
You must be signed in to change notification settings - Fork 53
Schedule
jasonbaldridge edited this page Apr 26, 2013
·
39 revisions
- Unless otherwise noted, assignment submissions are due by 1pm on the Friday of the week indicated. (They will also be clearly indicated in each homework description.)
- The schedule is subject to change, as the semester progresses, however any changes will be made at least one week in advance of the dates affected.
- Topics
- NLP Overview
- Scala Overview
- Readings (1/16)
NOTE: No class on Monday, January 21 (Martin Luther King, Jr. Day)
-
Topics
- Regular expressions
- Scala: functional programming
-
Readings (1/23)
-
Due. Homework1: Scala
-
Topics
- Authorship attribution
- Spelling correction
- Vector-space models and computing similarity
- Scala: object-oriented programming
- Build systems
-
Readings (1/28)
-
Readings (1/30)
-
Due. Project Phase One
- Topics
- Clustering
- Data formats (CSV, XML, JSON)
- Readings (2/4)
- Basic XML processing with Scala
- Manning et al, Chapter 16, Sections 16.1-16.4.
- Readings (2/6)
- Processing JSON in Scala using Jerkson
- Manning et al, Chapter 17, Sections 17.1-17.4.
- Due. Homework2: Regular expressions
- Topics
- Classification
- Readings (2/11)
- Manning et al, Chapter 13, Sections 13.1,13.2,13.6
- Readings (2/13)
- Manning et al, Chapter 14, Sections 14.0,14.1,14.3-14.5
- Topics
- Evaluation
- Sentiment analysis
- Due (2/18). Project Phase Two
- Topics
- Topic models
- Due (2/25): Homework3: Clustering
- Topics
- In class exercise: Topic Modelling
- Due (3/8). Project Phase Three
Spring Break. March 11-16
-
Topics
- Part-of-speech Tagging
- Label propagation
-
Due. Homework4: Classification
- Topics
- Actor-based programming with Akka
- PageRank
- Topics
- Named entity recognition
- MapReduce, Hadoop and Spark
- Amazon Web Services
- In class exercise: Launch clusters on Amazon's EC2 service and use Spark to count words in files, and work with HDFS
- Topics
- Spark
- Pseudo-grounding of text
- In class exercise: Run standalone jobs using Spark, attach EBS volumes and work with larger files HDFS.
- Due (4/10). Project4
- Topics
- Text processing pipelines
- In class exercise: Chalk tutorial for sentence detection, tagging and NER
- Topics
- Randomized data structures and algorithms for NLP and machine learning.
- Whistle-stop tour of further topics.
- Guest talk by Elizabeth Winkler of Mass Relevance.
- Due (4/24). Homework 5: Twitter sentiment analysis
- Topics
- Project demos/presentations
- Wrap-up
- Due (5/6). Project: step 5
- Due (5/13, 1pm). Project code and write-up