VGG16 class probabilities and audio forced alignments for the Flickr8k dataset
Generated by Mark Hasegawa-Johnson, 3/22/2017,
using Davi
Frossard's code and data
by Hodosh,
Young and Hockenmaier.
DOWNLOAD:
- Download
files:
- vgg_flickr8k.tgz contains
the neural network outputs (posterior class
probabilities), and a simple convenience program
showprobs.py to help you browse
them.
-
vgg_flickr8k_nnet_penults.tgz
contains the neural net penults, that is, the activations
of the layer just below the softmax; this is intended to
be the same as the set of features used by
their Harwath
and Glass in their ASRU 2015
paper. images_40k.npz is the
same thing in npz format.
-
flickr_labels.txt Contains
forced alignments to the audio, as computed by Markus
Mueller for WS17.
- License: Creative Commons Attribution-ShareAlike license.
- The
file Flickr8k.tokens.txt
is part of the Flickr8k dataset, publicly distributed under a
CreativeCommons Attribution-ShareAlike license by: M. Hodosh,
P. Young and J. Hockenmaier
(2013) Framing
Image Description as a Ranking Task: Data, Models and
Evaluation Metrics, Journal of Artificial Intelligence
Research, Volume 47, pages 853-899
- The
file imagenet_classes.py
was written by Davi Frossard.
-
The imagenet class probabilities were computed
using Davi
Frossard's TensorFlow port of the VGG16 network.
-
David Harwath and Jim Glass
distribute spoken
captions for the Flickr8k corpus.
USAGE:
The following code chooses an image file at random, shows you the top five ImageNet classes associated with that image by the VGG16 network, and then list the five text transcriptions that were provided by Turkers in the Flickr8k dataset.
$ python
>>> import showprobs
>>> showprobs.showrandom()
If you want to see all 1000 of the class probabilities for your randomly selected image, do this:
>>> showprobs.showrandom(1000)