Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 720 Bytes

README.md

File metadata and controls

11 lines (9 loc) · 720 Bytes

Big-Data-Processing-with-Hadoop

The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.

Task 1: Implement a MapReduce algorithm to produce count of every word in the document.
Task 2: Implement a MapReduce algorithm that will produce modified tri-grams around the key words, after replacing the key word with ‘$’.
Task 3: Implement a MapReduce algorithm to produce inverted index for the dataset.
Task 4: Implement a MapReduce algorithm to join two datasets using a primary key.
Task 5: Implement KNN algorithm using MapReduce on the test and train data.