Data Description

These data are the result of an NSF project, "Audio Diarization---Towards Comprehensive Description of Audio Events," IIS 08-03219, 2008-2010. Principal Investigators in this research were Mark Hasegawa-Johnson, Thomas S. Huang, and Dirk Bernhardt-Walther. All of the annotations listed below were created by undergraduate annotators under the supervision of post-doctoral fellow Kyung-Tae Kim, who then worked with Kai-Hsiang Lin to create algorithms. The goals of this research included creation of the following two types of annotation:

  • Salient Events: Annotators were told: Imagine that you were in the conference room you're listening to. You might focus on the conversation between members in the conference room or not. During listening you should mark the moment when you hear any sound which you unintentionally pay attention to or which attracts your attention. The sound might be any sound, including speech. The resulting raw annotations were time-aligned, to create a set of "salient" audio segments, with saliency rating ranging from 0 to 12.
  • Non-Speech Acoustic Event Labels: Second, annotators went back through the corpus more carefully, and labeled occurrences of nonspeech acoustic events. Some annotators labeled the events by listening to the file without video (two per file); some annotators listened and watched the video simultaneously (two per file).

Download Information

This site provides only annotations; the corresponding audio and video files are available for download from the AMI project. Salient and nonspeech acoustic events were annotated in the following formats:

  • eaf.tgz - All annotations were originally produced using ELAN. These are the original annotation files. Filenames are coded in the format [AMI_FILE_NAME]_[TYPE]_[OBS]_[SUB][WEEK].eaf, where the components are defined as:
    • AMI_FILE_NAME: name of audio and video filenames in the AMI distribution.
    • TYPE: sa (salience) or ae (acoustic event)
    • OBS: av (audiovisual) or ax (audio only)
    • SUB: subject identifier (one letter, A through M)
    • WEEK: week of the experiment in which this annotation was produced (1 through 12).
  • lab.tgz - Annotations in HTK label-file format, with start and end times coded in hnsu. Root filenames are identical to eaf.tgz, except that week number is not coded.
  • scp_hms.tgz - One line per annotation, similar to HTK format, but with start times and end times coded in hh:mm:ss format.
  • SA.tgz - Salient event listings only.
    • FILENAME_sa_ax_X.flt : [float]binary(on/off) saliency annotation for each subject
    • FILENAME_sa_ax.flt : [float]number of agreements of saliency over 12 human subjects
    • total.flt : the concatenated file for all the *_sa_ax.flt
    These were compiled from the .lab files using scp2sal.m.