Show and Tell: A Neural Image Caption Generator

Brief

Pull requests and issues: @litleCarl

A CoreML implementation of the image-to-text model described in the paper:

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge."

Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan.

IEEE transactions on pattern analysis and machine intelligence (2016).

Full text available at: http://arxiv.org/abs/1609.06647

Demo

Usage

Simple use

let showAndTell = ShowAndTell()
let results = showAndTell.predict(image: uiimage2predict, beamSize: 3, maxWordNumber: 30)

// Parameter explaination
//    image:         The image to be used to generate the caption.
//    beamSize:      Max caption count in result to be reserved in beam search.(Affect the performance greatly)
//    maxWordNumber: Max number of words in a sentence to be predicted.
class ShowAndTell {
  ...
  func predict(image: UIImage, beamSize: Int = 3, maxWordNumber:Int = 20) -> PriorityQueue<Caption>
  ...
}

Benchmark (Tested on iPhone 7+, welcome PR for more devices)

maxWordNumber = 20

maxWordNumber = 30

beamSize	Time (ms)
1	480.12
2	845.78
3	1443.82
4	2001.30
5	2648.48
6	3158.53
7	4179.14
8	4861.66
9	6003.65
10	7087.97
11	8134.95
12	9627.79

beamSize	Time (ms)
1	451.12
2	1194.65
3	1965.27
4	2971.92
5	3798.28
6	4391.35
7	5714.87
8	6937.60
9	8482.03
10	10421.52
11	12460.80
12	13777.67

Line chart for Time vs Beam Size (When maxWordNumber = 30)

So it is recommeneded to set beamSize=1 on mobile devices due to less gpu/cpu time usage for saving battery life.

Original Model

This coreml model is exported from keras which is trained with MSCOCO dataset for about 40k epoches. And presently it is not in the state of art yet. You may not use this in production. I trained the dataset with only one GTX Force 1080Ti for about 48 hours and currently don't have more time to train on it.Hope for community to keep on it.

Requirements

iOS 11.0+
Xcode 9.0+ (Swift 4.x)

TODO

Train on the dataset to 100k epoches. (currently 40k)
Open source origin model based on Keras which is trained with.
More language support (Chinese).

Thanks for third party lib in demo

Contact

曹佳鑫（tsao）An iOS developer with experience in deep learning living in Shanghai.
Pull requests and issues are welcome.
Mail: [email protected]

License

ShowAndTell is available under the MIT license. See the LICENSE file for more info.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DemoImages		DemoImages
ShowAndTell.xcodeproj		ShowAndTell.xcodeproj
ShowAndTell		ShowAndTell
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Show and Tell: A Neural Image Caption Generator

Brief

Demo

Usage

Simple use

Benchmark (Tested on iPhone 7+, welcome PR for more devices)

Original Model

Requirements

TODO

Thanks for third party lib in demo

Contact

License

About

Releases

Packages

Languages

hsiaoer/ShowAndTell

Folders and files

Latest commit

History

Repository files navigation

Show and Tell: A Neural Image Caption Generator

Brief

Demo

Usage

Simple use

Benchmark (Tested on iPhone 7+, welcome PR for more devices)

Original Model

Requirements

TODO

Thanks for third party lib in demo

Contact

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages