A machine learning API to detect duplicate questions.
Here are the details of all the files-
-
Feature engineering - A set of features were extracted from our available dataset using various nltk techniques, and using google's pretrained word2vec news word corpus.
-
Model selection - Different Classifiers were applied on the dataset and the one which gave the most accuracy was selected.