GitHub - sjtu-cs/web-mining

SJTU WebMining homework project

hope everyone of us can engaged in this project and improve our homework quality

This project is created for finishing our web mining homework. The base project is dennybritz/cnn-text-classification-tf. It can classify the positive and negative movie reviews. We modified it to adapt to our homework that have 5 labels(0,1,2,3,4)

The base project code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.

It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

Requirements

Python 3
Tensorflow > 0.12
Numpy

Install tensorflow

please see the install guide in tensorflow website

Training

Print parameters:

./train.py --help

optional arguments:
  -h, --help            show this help message and exit
  --data_file DATA_FILE
            File for training or evaluation (default: /data/train.txt for training, /data/dev.txt for evaluation)
  --embedding_dim EMBEDDING_DIM
            Dimensionality of character embedding (default: 128)
  --filter_sizes FILTER_SIZES
            Comma-separated filter sizes (default: '3,4,5')
  --num_filters NUM_FILTERS
            Number of filters per filter size (default: 128)
  --l2_reg_lambda L2_REG_LAMBDA
            L2 regularizaion lambda (default: 0.0)
  --dropout_keep_prob DROPOUT_KEEP_PROB
            Dropout keep probability (default: 0.5)
  --batch_size BATCH_SIZE
            Batch Size (default: 64)
  --num_epochs NUM_EPOCHS
            Number of training epochs (default: 100)
  --evaluate_every EVALUATE_EVERY
            Evaluate model on dev set after this many steps
            (default: 100)
  --checkpoint_every CHECKPOINT_EVERY
            Save model after this many steps (default: 100)
  --allow_soft_placement ALLOW_SOFT_PLACEMENT
            Allow device soft device placement
  --noallow_soft_placement
  --log_device_placement LOG_DEVICE_PLACEMENT
            Log placement of ops on devices
  --nolog_device_placement

Train:

./train.py

Train, including the kaggle dataset:

./train.py --data_file ./data/train-all.txt

Evaluating

./eval.py --checkpoint_dir="./runs/1459637919/checkpoints/"

Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py script to load your data.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
data		data
doc		doc
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_helpers.py		data_helpers.py
eval.py		eval.py
evaltest.py		evaltest.py
text_cnn.py		text_cnn.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SJTU WebMining homework project

Requirements

Install tensorflow

Training

Evaluating

References

About

Releases

Packages

Contributors 10

Languages

License

sjtu-cs/web-mining

Folders and files

Latest commit

History

Repository files navigation

SJTU WebMining homework project

Requirements

Install tensorflow

Training

Evaluating

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages