The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.
Task 1: Implement a MapReduce algorithm to produce count of every word in the document.
Task 2: Implement a MapReduce algorithm that will produce modified tri-grams around the key words, after replacing the key word with ‘$’.
Task 3: Implement a MapReduce algorithm to produce inverted index for the dataset.
Task 4: Implement a MapReduce algorithm to join two datasets using a primary key.
Task 5: Implement KNN algorithm using MapReduce on the test and train data.