This project shows:
- ability to write structured code in Python.
- ability to use existing utilities (libraries) for processing
- preprocessing skills
- Text preprocessing:
- lemmatization
- working with regular expressions
- text conversion tf-idf
- using Machine Learning models
The project includes:
- Working with NLP:
- Pre-processing, text transformation for Machine Learning models
- Working with Machine Learning models:
- Logistic Regression.
- Decision Tree
- Random Forest
- XGBoost
- LightGBM
Provided data - comments with markup about the toxicity of edits.
Column | Description | Column type |
---|---|---|
text | Comment | features |
toxic | Indicator whether comment is toxic or not | target |
The online store is launching a new service. The store needs a tool that will detect toxic comments and send them to be edited or viewed. Build a model that will classify comments into positive and negative.
The customer is concerned about:
- The value of the F1 metric must be at least 75.
pandas numpy matplotlib seaborn scipy nltk xgboost lightgbm time re