This project is funded by NSF grant 0807329, with the goal of achieving the following research results:

Data Transformations

Transform audio into a visual representation that facilitates rapid search for anomalous events.

Software Testbeds:

Develop software testbeds for public outreach, and in order to evaluate data transformations in controlled experiments.

Audio Class Discovery

Anomalies, by definition, don't happen very often. Zipf's law applies: if your training data contain an example of the acoustic event you're looking for, then by definition, that event is not anomalous. It is therefore necessary to develop some sort of technique that will detect classes in the test data that were never heard in the training data, and to learn models of those classes. Tentatively we plan to do this by extending the methods described in Jui-Ting's 2008 Speech Prosody paper.

Web-Based Multimedia Analytics: Audio Attribute Extraction

Visual analysis of text often leverages parametric semantic spaces, computed using methods such as latent semantic analysis. Typically a parametric semantic space is computed by creating a feature vector for each document, then transforming the document vector using a transform matrix: x = W d. The feature vector d characterizes the m'th document. It is typically of very high dimension, e.g., it may contain one entry for each word in the dictionary. The semantic vector x must be of a much lower dimension, so that it may be easily visualized; the transform matrix W is therefore a short, wide matrix, computed in order to summarize the important semantic distinctions among documents in a training database. Audio documents can be easily inserted into the x = W d framework by computing a feature vector d for each audio document. The feature vector is unlikely to be commensurate with feature vectors computed for text databases, therefore it will not be immediately possible to merge the d spaces computed for text and audio databases; the goal during the first part of this research will be simply to generate feature vectors d and transform matrices W that summarize audio documents well enough to allow the creation of visually analyzable document clusters, themes, and semantic spaces.