A Reliable Topical Diversity Measure for Text Summarization

This project creates a summary (<= 665 bytes) for a given cluster of documents.
The main inspiration for the work Jeff Bilmes on Class of submodullar functions for document summarization
The choice of submodular function lies in their inherent property od diminishing returns which suggests that adding content to a subset smaller summary than to a bigger one increases the function value more.
I implemented the whole paper and tried to enhance the results by working aroung the clustering techniques.

I am using the standard DUC-2004 dataset which is used to test generic multi-document summarization.

The first approach basically suggests the exact work done in the papaer by Jeff Bilmes
In the second approach we dealt the problem by Spectral Clustering since I feel that the inherent distribution/shape which the sentences follow should not be disturbed by using partition-based clustering.
In the final approach, we approach the problem using another technique on which we fist get the most the similar cluster of sentence and get the best sentence base on the best coverage. We follow on till we have extracted sufficient summary.

python .py

Many python libraries and dependencies are needed like : sklearn,joblib, muiltiprocessing, PyStemmer, etc,
CLUTO for clustering.
ROUGE to check the quality of the summaries

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
C_Rouge		C_Rouge
DUC-2004		DUC-2004
Temp		Temp
TestData		TestData
cluto		cluto
doc2mat		doc2mat
Final-Report.pdf		Final-Report.pdf
MultiCoreSummarisation.py		MultiCoreSummarisation.py
README.md		README.md
SingelcoreSummarisation.py		SingelcoreSummarisation.py
Summarisation_Spectral_Clustering.py		Summarisation_Spectral_Clustering.py
Sweep_GridSearch.txt		Sweep_GridSearch.txt
stopwords.txt		stopwords.txt

Provide feedback