The earnest goal of this project is to help our users detect Plagiarisation in the quickest and most effective manner. With the help of ‘Plagiarisation Detector’ we aim to check for plagiarisation between two documents and also provide our users features like word count, line count, paragraph count and also spelling errors. All of this is mounted on a robust and user-friendly GUI. The algorithm for string matching is KMP (Knuth Morris Pratt) algorithm.
- Preprocessing time : O(m)
- Matching time : O(n)
Thus we can see that the total time complexity of KMP algorithm is O(m+n) which is a linear time complexity and where m is size of pattern string and n is size of main string.
- Input document :
- Input document is given for recognition the document can be any journal or paper.
- Keyword extraction
- The keywords from that document have been extracted for comparing it with other document.
- Algorithm :
- Here we have used the KMP algorithm to find out any suspicious matter in the document.
- Discovery of similarity :
- After going through the algorithms any similarity can be judged or discovered.
- Plagiarism percent:
- The plagiarism percentage is calculated by the formula below: 2* (count of similar words)*100/(parsed words of doc1 + parsed words of doc2)%
- Input document :
- Input document is given for analysis.
- Preprocessing :
- Document is broken down to paragraphs, sentences, words.
- Discovery of spell errors:
- Using TextBlob library we discover the words with spelling mistakes
- Output :
- Displaying the word, paragraph and sentence count.
I would like to thank the collaborators