GitHub - ColdFire87/sentdex-tensorflow-tutorial: TensorFlow code based on sentdex's tutorial at https://www.youtube.com/watch?v=6rDWwL6irG0

Tensorflow Neural Networks

Based on sentdex's tutorial at https://www.youtube.com/watch?v=6rDWwL6irG0

Contains code for DNN, LSTM (RNN) and CNN.

NOTES:

LSTM (RNN) code is buggy.
CNN only tested with MNIST dataset.

Prerequisites:

Required dependencies:

Anaconda python distribution:
- numpy
- scipy
- pandas
- matplotlib
- NLTK
tensorflow (ideally with GPU support)
tqdm (pip install tqdm)

Make sure you have the following folder structure (create folders if needed):

<project_dir>
    |
    |--- preprocessing
                |
                |--- large_data
                |       |
                |       |--- data
                |       |--- saved
                |       |--- temp
                |
                |--- small_data
                        |
                        |--- data
                        |--- saved

Only the small dataset is included in the repo.

Download large dataset (see bottom of this readme for link) and place it at preprocessing/large_data/data/training.1600000.processed.noemoticon.csv

Usage:

To preprocess small dataset:

python3 preprocessing/create_sentiment_feature_sets_small.py

This takes as input the data in preprocessing/small_data/data and saves output to preprocessing/small_data/saved

To preprocess large dataset:

python3 preprocessing/create_sentiment_feature_sets_large.py

This takes as input the data in preprocessing/large_data/data and saves output to preprocessing/large_data/saved. Temporary files are stored in preprocessing/large_data/temp.

To train neural network:

python3 tf_nn.py

Use the following options in the __main__ function to specify behaviour:

SMALL_DATA = False
USE_MNIST = False

# TODO: LSTM (RNN) model is buggy (chunking part during training)
network_model = MODELS[0]  # (0 - DNN, 1 - LSTM (RNN), 2 - CNN)

Improvements:

1

The original tutorial uses regular lists for storing large sparse matrices and serializes them either using the pickle module or as CSV files. This leads to very large files.

In one case, a generated CSV file is 19.6GB in size.

By transforming these lists into scipy sparse matrices and serializing them as zipped numpy arrays we reduce the size greatly.

In the previous example, the size is reduced from 19.6GB to 26.7MB.

Using sparse matrices also means we can fit all data in RAM!

2

Added progress bars (using tqdm) for:

generating bag of words (word vectors) for large dataset.
training epochs

Datasets from:

small dataset (pos.txt and neg.txt): https://pythonprogramming.net/static/downloads/machine-learning-data/
large dataset: download from http://help.sentiment140.com/for-students

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tf_nn.py		tf_nn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tensorflow Neural Networks

Prerequisites:

Usage:

To preprocess small dataset:

To preprocess large dataset:

To train neural network:

Improvements:

1

2

Datasets from:

About

Releases

Packages

Languages

License

ColdFire87/sentdex-tensorflow-tutorial

Folders and files

Latest commit

History

Repository files navigation

Tensorflow Neural Networks

Prerequisites:

Usage:

To preprocess small dataset:

To preprocess large dataset:

To train neural network:

Improvements:

1

2

Datasets from:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages