Image Captioning in Keras

(Note: You can read an in-depth tutorial about the implementation in this blogpost.)

This is an implementation of image captioning model based on Vinyals et al. with a few differences:

For CNN we use Inception v3 instead of Inception v1.
For RNN we use multi-layered LSTM instead of single-layered one.
We don't have a special start-of-sentence word so we feed the first word at t = 1 instead of t = 2.
We use different values for some hyperparameters:

Hyperparameter Value

Learning rate 0.00051

Batch size 32

Epochs 33

Dropout rate 0.22

Embedding size 300

LSTM output size 300

LSTM layers 3

Examples of Captions Generated by the Proposed Model

Evaluation Metrics

Quantitatively, the proposed model's performance is on par with Vinyals' model on Flickr8k dataset:

Metric	Proposed Model	Vinyals' Model
BLEU-1	61.8	63
BLEU-2	40.8	41
BLEU-3	27.8	27
BLEU-4	19.0	N/A
METEOR	21.5	N/A
CIDEr	41.5	N/A

Environment Setup

Download the dataset needed.
```
./scripts/download_dataset.sh
```

Download pretrained word vectors.

./scripts/download_pretrained_word_vectors.sh

Download pycocoevalcap data.

./scripts/download_pycocoevalcap_data.sh

Install the dependencies.

Note: It was only tested on Python 2.7. It may need minor code changes to work on Python 3.
```
# Optional: Create and activate your virtualenv / Conda environment

pip install -r requirements.txt
```
Setup PYTHONPATH.
```
source ./scripts/setup_pythonpath.sh
```

Using a Pretrained Model

Download a pretrained model from releases page.
Copy model-weights.hdf5 to keras-image-captioning/results/flickr8k/final-model.

Now you can run an inference from that checkpoint by executing a command below from keras-image-captioning directory:

python -m keras_image_captioning.inference \
--dataset-type test \
--method beam_search \
--beam-size 3 \
--training-dir results/flickr8k/final-model

Training from Scratch

1. Run a Training

For reproducing the model, execute:

python -m keras_image_captioning.training \
  --training-label repro-final-model \
  --from-training-dir results/flickr8k/final-model

There are many arguments available that you can look inside training.py.

2. Run an Inference and Evaluate It

python -m keras_image_captioning.inference \
  --dataset-type test \
  --method beam_search \
  --beam-size 3 \
  --training-dir var/flickr8k/training-results/repro-final-model

Note:

dataset_type can be either 'validation' or 'test'.
You can look the captions generated at var/flickr8k/training-results/repro-final-model/test-predictions-3-20.yaml. You can compare it with my result at results/flickr8k/final-model/test-predictions-3-20.yaml.

License

MIT License. See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 233 Commits
.vscode		.vscode
keras_image_captioning		keras_image_captioning
notes		notes
pycocoevalcap		pycocoevalcap
results/flickr8k/final-model		results/flickr8k/final-model
scripts		scripts
var		var
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt
results-without-errors.jpg		results-without-errors.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning in Keras

Examples of Captions Generated by the Proposed Model

Evaluation Metrics

Environment Setup

Using a Pretrained Model

Training from Scratch

1. Run a Training

2. Run an Inference and Evaluate It

License

About

Releases 1

Packages

Contributors 2

Languages

Hyperparameter	Value
Learning rate	0.00051
Batch size	32
Epochs	33
Dropout rate	0.22
Embedding size	300
LSTM output size	300
LSTM layers	3

License

danieljl/keras-image-captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning in Keras

Examples of Captions Generated by the Proposed Model

Evaluation Metrics

Environment Setup

Using a Pretrained Model

Training from Scratch

1. Run a Training

2. Run an Inference and Evaluate It

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages