Big-Data-Processing-with-Hadoop

The objective of this project is to get started with big data processing with Hadoop. The goals of the projects are to implement basic text processing tasks from scratch on the Hadoop framework.

Task 1: Implement a MapReduce algorithm to produce count of every word in the document.
Task 2: Implement a MapReduce algorithm that will produce modified tri-grams around the key words, after replacing the key word with ‘$’.
Task 3: Implement a MapReduce algorithm to produce inverted index for the dataset.
Task 4: Implement a MapReduce algorithm to join two datasets using a primary key.
Task 5: Implement KNN algorithm using MapReduce on the test and train data.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Task 1		Task 1
Task 2		Task 2
Task 3		Task 3
Task 4		Task 4
Task 5		Task 5
data		data
README.md		README.md
Report.docx		Report.docx
haddop_commands_used.txt		haddop_commands_used.txt
project_description.pdf		project_description.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big-Data-Processing-with-Hadoop

About

Releases

Packages

Languages

manishabiswas/Big-Data-Processing-with-Hadoop

Folders and files

Latest commit

History

Repository files navigation

Big-Data-Processing-with-Hadoop

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages