Spring 2017 SST Group Meetings

  • Tuesday, January 24, 4:00-5:00, BI 2169
    Organization
  • Tuesday, January 31, 4:00-5:00, BI 2169 No Meeting
  • Tuesday, February 7, 4:00-5:00, BI 2169
    Xiang Kong: I want to talk about the landmark-based consonant voicing detector and progress of landmark-based ASR Xuesong and I did last semester. Anytime after February(except March 7th) will be fine for me to present. Here are the URLs related to my presentation. (Stevens, 2002), (Stevens and Klatt, 1974), (Abdelatty Ali, van der Spiegel and Mueller, 2001)
  • Tuesday, February 14, 4:00-5:00, BI 2169 Mary Pietrowicz, Discovering Dimensions of Perceived Vocal Expression in Semi-Structured, Unscripted Oral History Accounts,'' ICASSP practice talk.
  • Tuesday, February 21, 4:00-5:00, BI 2169
    Meeting Cancelled
  • Tuesday, February 28, 4:00-5:00, BI 2169
    Wenda Chen: unsupervised and supervised learning for clusters of zero-resource language data. In this talk, I will firstly review the general unsupervised learning techniques (PhD thesis, Kamper, https://arxiv.org/pdf/1701.00851.pdf) and the task of learning probabilistic transcriptions from mismatched crowdsourcing data (Jyothi and Hasegawa-Johnson, http://www.isle.illinois.edu/sst/pubs/2014/jyothi14aaai.pdf). Then I will continue from my own work (Chen et. al. http://aclweb.org/anthology/W/W16/W16-3714.pdf and discuss the recent approaches. Thanks a lot!
  • Tuesday, March 7, 4:00-5:00, BI 2169
    Meeting cancelled
  • Tuesday, March 14, 4:00-5:00, BI 2169
    Meeting cancelled
  • Tuesday, March 28, 4:00-5:00, BI 2169
    Kaizhi Qian: I will talk about speech enhancement using Wavenet, a deep generative neural network. https://arxiv.org/pdf/1609.03499.pdf.
  • Tuesday, April 4, 4:00-5:00, BI 2169, CANCELLED
  • Tuesday, April 11, CANCELLED
  • Tuesday, April 18, 4:00-5:00, BI 2169
    Yang Zhang, Title: RNN-TA: F0 Model with Semantics
    Abstract: F0 models with deep learning structures are widely used in speech synthesis systems. The most common paradigm of these models is to fit F0 contour or its state-wise statistics directly. One limitation of this approach is that much of the memory and model power is used to capture local F0 movement, and little is left to capture the long-term semantic information encoded in F0. As a result, most F0 models mainly focus on modeling phonetic, lexical and syntactic information. The recently proposed RNN-TA provides a promising alternative. It frees the RNN's memory and modeling power by introducing the pitch target model for local F0 movement. This talk discusses our work that investigates RNN-TA's ability to capture long-term relation and semantic information by directly feeding word embedding features.
  • Tuesday, April 25, 4:00-5:00, BI 2169
    Leda Sari: Audiovisual speech recognition using CNN+HMM: (Mroueh, Marcheret and Goel, 2015). Speaker normalization using fMLLR: Mark Gales, Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition.
  • Tuesday, May 2, 4:00-5:00, BI 2169
    Mary Pietrowicz: The function of laughter in the veterans history corpus and its relationship to emotion, voice quality, prosody, etc. How it functions within the different expressive dimensions we have found. An LSA paper would be useful: Thomas K. Landauer, Peter W. Foltz, Darrell Laham, "An Introduction to Latent Semantic Analysis," Discourse Processes, 25(2,3): 259-284, 1998. Useful LSA Tutorial I found on the web: Alex Thomo, Latent Semantic Analysis (Tutorial). Papers on laughter detection would be useful as background (but we will probably do something different): Lakshmish Kaushik, Abhijeet Sangwan, and John H.L. Hansen, "Laughter and Filler Detection in Naturalistic Audio," INTERSPEECH 2015.
2016F | 2016S | 2015F | 2015S | 2014F | 2014S | 2013F