To predict whether a pair of questions are duplicate/similar or not.
Major focus was on feature engineering to predict how similar can the questions be.
Needed to reduce log-loss as performance metric using various machine learning algorithms, BOW and TFIDF vectorizers.
Used Logistic Regression, Linear SVM, XGBoost for modelling.