NLP Specific Projects

Introduction

This repository contains two key projects focusing on Natural Language Processing (NLP):

Character Text Generation: Training a character language model to predict the next character in a sequence and generate new text sequences. (Nov-Dec 2023)
Text Pre-processing and Features: Exploring text pre-processing techniques and feature extraction methods, including Bag of Words (BoW), TF-IDF, and manual word embeddings. (Nov-Dec 2023)
POS Tagger: POS tagging with hidden markov model, optimized by the viterbi algorithm

Installation

To use the projects in this repository, ensure you have Python installed along with Jupyter Notebooks or JupyterLab. Follow these steps:

# Clone my repo
git clone <repository-url>

# Go to the repository directory
cd NLP-Projects

# Install necessary Python dependencies
pip install numpy matplotlib keras nltk scikit-learn

Additional Setup
For Text Pre-processing and Features, you may need to download additional NLTK data:

import nltk
nltk.download('punkt')

Usage

Character Text Generation: This notebook guides you through the process of training a LSTM model for character-based text generation using Keras. It includes:

Data preparation and preprocessing.
Model architecture and training.
Generating text with different strategies like temperature sampling and beam search.

Text Pre-processing and Features: This notebook explores various text pre-processing methods and features extraction techniques covering:

Bag of Words and TF-IDF representations.
Using Scikit-learn and NLTK for text processing.
Manual extraction of word embeddings through dimension reduction.

POS Tagger:

For more regarding part of speech tagging with HMMs, please read the following article: https://medium.com/analytics-vidhya/parts-of-speech-pos-and-viterbi-algorithm-3a5d54dfb346

Requirements

Python 3.x
Jupyter Notebook or JupyterLab
numpy
matplotlib
keras
nltk
scikit-learn

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ad_hoc_ir_sys		ad_hoc_ir_sys
regex		regex
viterbi-pos-tagger		viterbi-pos-tagger
.DS_Store		.DS_Store
CharacterTextGeneration.ipynb		CharacterTextGeneration.ipynb
Lab_Preprocessing_and_features.ipynb		Lab_Preprocessing_and_features.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP Specific Projects

Table of Contents

Introduction

Installation

Usage

Requirements

About

Releases

Packages

Languages

jladrover/NLP-projects

Folders and files

Latest commit

History

Repository files navigation

NLP Specific Projects

Table of Contents

Introduction

Installation

Usage

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages