VGG16 class probabilities and audio forced alignments for the Flickr8k dataset
Generated by Mark Hasegawa-Johnson, 3/22/2017,
Frossard's code and data
Young and Hockenmaier.
- vgg_flickr8k.tgz contains
the neural network outputs (posterior class
probabilities), and a simple convenience program
showprobs.py to help you browse
contains the neural net penults, that is, the activations
of the layer just below the softmax; this is intended to
be the same as the set of features used by
and Glass in their ASRU 2015
paper. images_40k.npz is the
same thing in npz format.
forced alignments to the audio, as computed by Markus
Mueller for WS17.
- License: Creative Commons Attribution-ShareAlike license.
is part of the Flickr8k dataset, publicly distributed under a
CreativeCommons Attribution-ShareAlike license by: M. Hodosh,
P. Young and J. Hockenmaier
Image Description as a Ranking Task: Data, Models and
Evaluation Metrics, Journal of Artificial Intelligence
Research, Volume 47, pages 853-899
was written by Davi Frossard.
The imagenet class probabilities were computed
Frossard's TensorFlow port of the VGG16 network.
David Harwath and Jim Glass
captions for the Flickr8k corpus.
The following code chooses an image file at random, shows you the top five ImageNet classes associated with that image by the VGG16 network, and then list the five text transcriptions that were provided by Turkers in the Flickr8k dataset.
>>> import showprobs
If you want to see all 1000 of the class probabilities for your randomly selected image, do this: