|
Publications of the Statistical Speech Technology GroupBeckman Institute, University of Illinois at Urbana-Champaign |
- Speech-to-Text: Acoustic Features | Acoustic Modeling | Articulatory and Multimodal | Pronunciation Modeling
- Speech-to-Meaning: Information Retrieval | Meta-Linguistics: Emotion, Cognitive State, Social Networks | Task Semantics | Automatic Recognition of Prosody
- Human-Computer Interaction: Text-to-Speech | Speech and Language Technology in Education | Universal Access and Clinical Applications | Multimedia Analytics
- Computer Audition: Acoustic Event Detection | Audio Enhancement | Speech Coding | Speaker Recognition and Language ID
- Computer Cognition: Computer Vision | Machine Learning | Natural Language Processing
- Human Speech and Language: Speech Production | Neurophysiology | Auditory Perception | Distinctive Features | Formant Tracking | Prosody Analysis
- Cartoons
Speech Recognition
Acoustic Features
- Sarah King and Mark Hasegawa-Johnson, Accurate Speech Segmentation by Mimicking Human Auditory Processing, Proc. ICASSP 2013 (NSF 0807329)
- Po-Sen Huang, Li Deng, Mark Hasegawa-Johnson and Xiaodong He, Random Features for Kernel Deep Convex Network, Proc. ICASSP 2013
- Sarah King and Mark Hasegawa-Johnson, Detection of Acoustic-Phonetic Landmarks in Mismatched Conditions Using a Biomimetic Model of Human Auditory Processing, CoLing 2012 (QNRF NPRP 09-410-1-069 and NSF CCF 0807329)
- Mark Hasegawa-Johnson, Elabbas Benmamoun, Eiman Mustafawi, Mohamed Elmahdy and Rehab Duwairi, On The Definition of the Word `Segmental', Speech Prosody 2012 (QNRF NPRP 410-1-069)
- Sarah Borys, An SVM Front End Landmark Speech Recognition System, M.S. Thesis, 2008.
- Bryce Lobdell, Mark Hasegawa-Johnson, and Jont B. Allen, Human Speech Perception and Feature Extraction, Interspeech 2008
- Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, and Zhengyou Zhang, Frequency Domain Correspondence for Speaker Normalization, in Proc. Interspeech pp. 274-7, Antwerp, August, 2007.
- Xi Zhou, Yu Fun, Ming Liu, Mark Hasegawa-Johnson, and Thomas Huang, "Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification," ICME 2007 (VACE NBCHC060160; NSF 0426627).
- Sarah Borys and Mark Hasegawa-Johnson, "Distinctive Feature Based SVM Discriminant Features for Improvements to Phone Recognition on Telephone Band Speech." ISCA Interspeech, October 2005 (NSF 0132900; Software to intertranslate HTK and libsvm).
- Mark Hasegawa-Johnson, Sarah Borys and Ken Chen, ``Experiments in Landmark-Based Speech Recognition.'' Sound to Sense: Workshop in Honor of Kenneth N. Stevens, June, 2004 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Model Enforcement: A Unified Feature Transformation Framework for Classification and Recognition, IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 2701-2710, 2004 (NSF 0132900).
- Stefan Geirhofer, Feature Reduction with Linear Discriminant Analysis and its Performance on Phoneme Recognition. Undergraduate research project.
- Mohamed Kamal Mahmoud Omar, Acoustic Feature Design for Speech Recognition: A Statistical Information-Theoretic Approach. Ph.D. Thesis, 2003.
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Approximately Independent Factors of Speech Using Nonlinear Symplectic Transformation, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 660-671, 2003 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Non-Linear Independent Component Analysis for Speech Recognition, International Conference on Computer, Communication and Control Technologies (CCCT '03), 2003 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Strong-Sense Class-Dependent Features for Statistical Recognition, IEEE Workshop on Statistical Signal Processing, St. Louis, MO, 2003, 473-476 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Maximum Conditional Mutual Information Projection For Speech Recognition, Interspeech, September, 2003, 505-508 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, Non-Linear Maximum Likelihood Feature Transformation For Speech Recognition, Interspeech, September, 2003, 2497-2500 (NSF 0132900).
- Mark Hasegawa-Johnson, Finding the Best Acoustic Measurements for Landmark-Based Speech Recognition, Accumu Magazine, Kyoto Computer Gakuin, Kyoto, Japan, 2002 (NSF 0132900).
- Mohammed Kamal Omar, Ken Chen, Mark Hasegawa-Johnson and Yigal Brandman, An Evaluation of using Mutual Information for Selection of Acoustic-Features Representation of Phonemes for Speech Recognition, Interspeech, Denver, CO, September 2002, pp. 2129-2132 (Phonetact, Inc.).
- Zhinian Jing and Mark Hasegawa-Johnson, Auditory-Modeling Inspired Methods of Feature Extraction for Robust Automatic Speech Recognition, ICASSP Student Session, May 2002, IV:4176 (NSF 0132900).
- Mohammed Kamal Omar and Mark Hasegawa-Johnson, "Maximum Mutual Information Based Acoustic Features Representation of Phonological Features for Speech Recognition," ICASSP, May 2002, I:81-84.
- Zhinian Jing, Voice Index and Frame Index for Recognition of Digits in Speech Background. M.S. Thesis, 2002.
- Wira Gunawan and Mark Hasegawa-Johnson, "PLP Coefficients can be Quantized at 400 bps," ICASSP, Salt Lake City, UT, pp. 2.2.1-4, 2001.
Acoustic Modeling
- Sujeeth Bharadwaj, Mark Hasegawa-Johnson, Jitendra Ajmera, Om Deshmukh, and Ashish Verma, Sparse Hidden Markov Models for Purer Clusters, Proc. ICASSP 2013
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi, A Baseline Speech Recognition System for Levantine Colloquial Arabic, Proceedings of ESOLEC 2012 (QNRF NPRP 410-1-069)
- Po-Sen Huang and Mark Hasegawa-Johnson, Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition, International Conference on Arabic Language Processing CITALA 2012 (QNRF NPRP 410-1-069)
- Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi, Rehab Duwairi, and Wolfgang Minker, Challenges and Techniques for Dialectal Arabic Speech Recognition and Machine Translation, Qatar Foundation Annual Research Forum, p. 244 (QNRF NPRP 410-1-069)
- Jui-Ting Huang, Semi-Supervised Learning for Acoustic and Prosodic Modeling in Speech Applications, Ph.D. thesis, University of Illinois, 2012
- Mark Hasegawa-Johnson, Jui-Ting Huang, Roxana Girju, Rehab Mustafa Mohamma Duwairi, Eiman Mohd Tayyeb H B Mustafawi, and Elabbas Benmamoun, Learning to Recognize Speech from a Small Number of Labeled Examples, Qatar Foundation Annual Research Forum, p. 269 (QNRF NPRP 410-1-069)
- Mark Hasegawa-Johnson, Jui-Ting Huang, Sarah King and Xi Zhou, Normalized recognition of speech and audio events, Journal of the Acoustical Society of America 130:2524 (NSF 0807329)
- Mark Hasegawa-Johnson, Jui-Ting Huang, and Xiaodan Zhuang, Semi-supervised learning for speech and audio processing, Journal of the Acoustical Society of America 130:2408 (NSF 0703624)
- Boon Pang Lim, Computational Differences between Whispered and Non-whispered Speech, Ph.D. Thesis, University of Illinois, 2011
- Jui-Ting Huang, Mark Hasegawa-Johnson, and Jennifer Cole, How Unlabeled Data Change the Acoustic Models For Phonetic Classification, Workshop on New Tools and Methods for Very Large Scale Phonetics Research, University of Pennsylvania, Jan. 2011
- Jui-Ting Huang, Po-Sen Huang, Yoonsook Mo, Mark Hasegawa-Johnson, Jennifer Cole, Prosody-Dependent Acoustic Modeling Using Variable-Parameter Hidden Markov Models, Speech Prosody 2010 100623:1-4 (NSF 0703624).
- Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang, ,Toward Robust Learning of the Gaussian Mixture State Emission Densities for Hidden Markov Models, ICASSP 2010 (NSF 0803219)
- Jui-Ting Huang, Xi Zhou, Mark Hasegawa-Johnson and Thomas Huang, Kernel Metric Learning for Phonetic Classification, ASRU 2009 pp. 141-5 (NSF 0703624 and 0534133)
- Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, and Elliot Saltzman, Articulatory Phonological Code for Word Recognition, Interspeech, 34549:1-4, Brighton, September 2009 (NSF 0703624)
- Bowon Lee and Mark Hasegawa-Johnson, A Phonemic Restoration Approach for Automatic Speech Recognition with Highly Nonstationary Background Noise, DSP in Cars workshop, Dallas, July 2009
- Jui-Ting Huang and Mark Hasegawa-Johnson, On semi-supervised learning of Gaussian mixture models for phonetic classification, NAACL HLT Workshop on Semi-Supervised Learning, 2009, pp. 75-83 (NSF 0534106 and NSF 0703624).
- Jui-Ting Huang and Mark Hasegawa-Johnson, Maximum Mutual Information Estimation with Unlabeled Data for Phonetic Classification. Proc. Interspeech 2008 (NSF 0534133).
- Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, and Elliot Saltzman, The Entropy of Articulatory Phonological Code: Recognizing Gestures from Tract Variables, Interspeech 2008 (NSF 0703624, NSF 0703782, NIH DC02717).
- Arthur Kantor and Mark Hasegawa-Johnson, Stream Weight Tuning in Dynamic Bayesian Networks, Proc. ICASSP pp. 4525-8, 2008 (NSF 0703624).
- Bowon Lee, Robust Speech Recognition in a Car Using a Microphone Array. Ph.D. thesis, 2006 (Motorola RPS19; Software; Data)
- Rahul Chitturi and Mark Hasegawa-Johnson, Novel Entropy-Based Moving Average Refiners for HMM Landmarks. Interspeech, September 2006 (NSF 0132900).
- Mark Hasegawa-Johnson, James Baker, Sarah Borys, Ken Chen, Emily Coogan, Steven Greenberg, Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan, Jennifer Muller, Kemal Sönmez, and Tianyu Wang, "Landmark-Based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop." ICASSP, March 2005, pp. 1213-1216 (NSF 0121285).
- Yeojin Kim and Mark Hasegawa-Johnson, Phonetic Segment Rescoring Using SVMs. Midwest Computational Linguistics Colloquium, Columbus, OH, 2005 (NSF 0132900).
- Mark Hasegawa-Johnson, James Baker, Steven Greenberg, Katrin Kirchhoff, Jennifer Muller, Kemal Sonmez, Sarah Borys, Ken Chen, Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan, Emily Coogan, and Tianyu Wang, Landmark-Based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop. technical report of the Johns Hopkins Center for Language and Speech Processing, 2005 (NSF 0121285).
- Mark Hasegawa-Johnson, Landmark-Based Speech Recognition: The Marriage of High-Dimensional Machine Learning Techniques with Modern Linguistic Representations, talk given at Tsinghua University, October 2004 (NSF 0132900).
- Ameya Deoras and Mark Hasegawa-Johnson, "A Factorial HMM Approach to Robust Isolated Digit Recognition in Background Music." Interspeech, October, 2004 (NSF 0132900).
- Ameya Deoras and Mark Hasegawa-Johnson, A Factorial HMM Approach to Simultaneous Recognition of Isolated Digits Spoken by Multiple Talkers on One Audio Channel, ICASSP 2004 (NSF 0132900).
- Yanli Zheng and Mark Hasegawa-Johnson, Acoustic segmentation using switching state Kalman Filter, ICASSP 2003, April 2003, I:752-755 (NSF 0132900).
- Ameya Deoras, A Factorial HMM Approach to Robust Isolated Digit Recognition in Non-Stationary Noise. B.S. Thesis, 2003.
- Mohammed K. Omar, Mark Hasegawa-Johnson and Stephen E. Levinson, Gaussian Mixture Models of Phonetic Boundaries for Speech Recognition, ASRU 2001 (NSF 0132900).
- Mark Hasegawa-Johnson, Multivariate-State Hidden Markov Models for Simultaneous Transcription of Phones and Formants, ICASSP, Istanbul, pp. 1323-26, 2000
Articulatory and Multimodal
- Sujeeth Bharadwaj, Raman Arora, Karen Livescu and Mark Hasegawa-Johnson, Multi-View Acoustic Feature Learning Using Articulatory Measurements, IWSML(Internat. Worksh. on Statistical Machine Learning for Sign. Process.), 2012 (NSF 0905633)
- İ. Yücel Ozbek, Mark Hasegawa-Johnson and Mübeccel Demirekler, On Improving Dynamic State Space Approaches to Articulatory Inversion with MAP based Parameter Estimation, IEEE Transactions on Audio, Speech, and Language, in press
- İ. Yücel Ozbek, Mark Hasegawa-Johnson and Mübeccel Demirekler, Estimation of Articulatory Trajectories Based on Gaussian Mixture Model (GMM) with Audio-Visual Information Fusion and Dynamic Kalman Smoothing, IEEE Transactions on Audio, Speech, and Language 19(5):1180-1195, 2011
- Thomas S. Huang, Mark A. Hasegawa-Johnson, Stephen M. Chu, Zhihong Zeng, and Hao Tang, Sensitive Talking Heads, IEEE Signal Processing Magazine 26(4):67-72, July 2009
- Mark Hasegawa-Johnson, Multi-Stream Approach to Audiovisual Automatic Speech Recognition, IEEE 9th Workshop on Multimedia Signal Processing (MMSP) pp. 328-31, 2007
- Mark Hasegawa-Johnson, Karen Livescu, Partha Lal and Kate Saenko, Audiovisual Speech Recognition with Articulator Positions as Hidden Variables, in Proc. International Congress on Phonetic Sciences (ICPhS) 1719:297-302, Saarbrücken, August, 2007 (NSF 0121285).
- Mark Hasegawa-Johnson, Audio-Visual Speech Recognition: Audio Noise, Video Noise, and Pronunciation Variability, talk given to the Signal Processing Society, IEEE Japan, June 2007 (NSF 0534106; NIH DC008090A).
- Yun Fu, Xi Zhou, Ming Liu, Mark Hasegawa-Johnson, and Thomas S. Huang, Lipreading by Locality Discriminant Graph, IEEE International Conference on Image Processing (ICIP) III:325-8, 2007 (VACE NBCHC060160; NSF 0426627).
- Karen Livescu, Ozgur Cetin, Mark Hasegawa-Johnson, Simon King, Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Haggerty, Bronwyn Woods, Joe Frankel, Matthew Magimai-Doss, and Kate Saenko, Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer Workshop. ICASSP, May 2007, pp. 621-4 (NSF 0121285).
- Karen Livescu, Özgür Çetin, Mark Hasegawa-Johnson, Simon King, Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Hagerty, Bronwyn Woods, Joe Frankel, Mathew Magimai-Doss, and Kate Saenko, Articulatory-Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: 2006 JHU Summer Workshop Final Report. Johns Hopkins Center for Language and Speech Processing, 2007 (NSF 0121285).
- Mark Hasegawa-Johnson, Object Tracking and Asynchrony in Audio-Visual Speech Recognition. talk given to the Artificial Intelligence, Vision, and Robotics seminar series, August, 2006 (NSF 0534106; NIH DC008090A).
- Mark Hasegawa-Johnson, Dealing with Acoustic Noise. Part IIII: Video. tutorial presentation given at WS06, Center for Language and Speech Processing, July 2006 (NSF 0121285).
- Camille Goudeseune and Bowon Lee, AVICAR: Audio-Visual Speech Recognition in a Car Environment. Promotional Film, 2006 (Motorola RPS19).
- Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, and Thomas Huang, AVICAR: Audio-Visual Speech Corpus in a Car Environment. Interspeech, October 2004, pp. 380-383 (Motorola RPS19; Data)
- Stephen E. Levinson, Thomas S. Huang, Mark A. Hasegawa-Johnson, Ken Chen, Stephen Chu, Ashutosh Garg, Zhinian Jing, Danfeng Li, J. Lin, Mohammed Kamal Omar and Z. Wen, Multimodal Dialog Systems Research at Illinois, ARPA Workshop on Multimodal Speech Recognition and SPINE, June, 2002 (NSF 0132900).
Pronunciation Modeling
- Mahmoud Abunasser, Abbas Benmamoun, and Mark Hasegawa-Johnson, Pronunciation Variation Metric for Four Dialects of Arabic, presentation at AIDA 10 (Association Internationale de Dialectologie Arabe), Qatar University, 2013
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi, Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition, International Journal of Computational Linguistics, volume 3, issue 1, 2012
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi, "Hybrid Pronunciation Modeling for Arabic Large Vocabulary Speech Recognition," Qatar Foundation Annual Research Forum, 2012
- Arthur Kantor and Mark Hasegawa-Johnson, HMM-based Pronunciation Dictionary Generation, Workshop on New Tools and Methods for Very Large Scale Phonetics Research, University of Pennsylvania, Jan. 2011 (NSF 0703624, 0913188; Software).
- Arthur Kantor, Pronunciation modeling for large vocabulary speech recognition, Ph.D. Thesis 2010, University of Illinois (NSF 0703624, 0913188; Software).
- Chi Hu, FSM-Based Pronunciation Modeling using Articulatory Phonological Code, M.S. Thesis 2010, University of Illinois (NSF 0703624 and NSF 0623805).
- Chi Hu, Xiaodan Zhuang, and Mark Hasegawa-Johnson, FSM-Based Pronunciation Modeling using Articulatory Phonological Code, Proceedings of Interspeech 2010 pp. 2274-2277, (NSF 0703624).
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Espy-Wilson, and Mark Hasegawa-Johnson, A procedure for estimating gestural scores from natural speech, Proceedings of Interspeech 2010 (NSF 0703624)
- Karen Livescu, Özgür Çetin, Mark Hasegawa-Johnson, Simon King, Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Hagerty, Bronwyn Woods, Joe Frankel, Mathew Magimai-Doss, and Kate Saenko, Articulatory-Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: 2006 JHU Summer Workshop Final Report. Johns Hopkins Center for Language and Speech Processing, 2007 (NSF 0121285).
- Karen Livescu, Ozgur Cetin, Mark Hasegawa-Johnson, Simon King, Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Haggerty, Bronwyn Woods, Joe Frankel, Matthew Magimai-Doss, and Kate Saenko, Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer Workshop. ICASSP, May 2007 (NSF 0121285).
- Ken Chen and Mark Hasegawa-Johnson, Modeling pronunciation variation using artificial neural networks for English spontaneous speech. Interspeech, October 2004, pp. 400-403 (NSF 0414117).
Speech-to-Meaning
Information Retrieval
- Rania Al-Sabbagh, Roxana Girju, Mark Hasegawa-Johnson, Elabbas Ben-Mamoun, Rahab Duwairi, and Eiman Mustafawi, Using Web-Mining Techniques to Build a Multi-Dialect Lexicon of Arabic, Linguistics in the Gulf Conference, March 2011 (QNRF NPRP 410-1-069)
- Xiaodan Zhuang, Jui-Ting Huang, and Mark Hasegawa-Johnson, Speech Retrieval in Unknown Languages: a Pilot Study, NAACL HLT Cross-Lingual Information Access Workshop (CLIAWS) pp. 3-11, 2009 (NSF 0534106 and NSF 0703624)
Meta-Linguistics and Emotion
- Shobhit Mathur, Marshall Scott Poole, Feniosky Pena-Mora, Mark Hasegawa-Johnson and Noshir Contractor, Detecting interaction links in a collaborating group using manually annotated data, Social Networks doi:10.1016/j.socnet.2012.04.002, 2012 (NSF 0941268)
- Hao Tang, Stephen M. Chu, Mark Hasegawa-Johnson, Thomas S. Huang, Emotion Recognition from Speech via Boosted Gaussian Mixture Models, 2009 International Conference on Multimedia & Expo (ICME'09), pp. 294-7 (NIH R21 DC008090 A)
- Tong Zhang, Mark Hasegawa-Johnson and Stephen E. Levinson, Cognitive State Classification in a spoken tutorial dialogue system, Speech Communication 48(6):616-632, 2006(NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, Children's Emotion Recognition in an Intelligent Tutoring Scenario. Interspeech, October, 2004, pp. 735-738 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, An empathic-tutoring system using spoken language, Australian conference on computer-human interaction (OZCHI), 2003, pp. 498-501 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, Mental State Detection of Dialogue System Users via Spoken Language, ISCA/IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR), April 2003, MAP17.1-4 (NSF 0085980).
Task Semantics
- Tong Zhang, Mark Hasegawa-Johnson and Stephen E. Levinson, "Extraction of Pragmatic and Semantic Salience from Spontaneous Spoken English," Speech Communication, 2007 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, A Hybrid Model for Spontaneous Speech Understanding. AAAI 2005, 10.1.1.80.879:1-8 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson and Stephen E. Levinson, Automatic detection of contrast for speech understanding. Interspeech, October, 2004 (NSF 0085980).
- Yuexi Ren, Mark Hasegawa-Johnson and Stephen E. Levinson. Semantic analysis for a speech user interface in an intelligent-tutoring system, Intl. Conf. on Intelligent User Interfaces. Madeira, Portugal, 2004 (NSF 0085980).
Automatic Recognition of Prosody
- Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Jennifer Cole, Mark Hasegawa-Johnson and Margaret Fleck, Feature Sets for the Automatic Detection of Prosodic Prominence, New Tools and Methods for Very Large Scale Phonetics Research, University of Pennsylvania, Jan. 2011
- Jui-Ting Huang and Mark Hasegawa-Johnson, Unsupervised Prosodic Break Detection in Mandarin Speech, SpeechProsody 2008 pp. 165-8 (NSF 0534133).
- Xiaodan Zhuang and Mark Hasegawa-Johnson, Towards Interpretation of Creakiness in Switchboard, SpeechProsody 2008 pp. 37-40 (NSF 0414117).
- Taejin Yoon, Jennifer Cole, and Mark Hasegawa-Johnson, Detecting Non-Modal Phonation in Telephone Speech, SpeechProsody, 2008 pp. 33-6 (NSF 0414117).
- Taejin Yoon, A Predictive Model of Prosody Through Grammatical Interface: A Computational Approach, Ph.D. Thesis, 2007.
- Ken Chen, Mark Hasegawa-Johnson and Jennifer Cole, ``A Factored Language Model for Prosody-Dependent Speech Recognition,'' in Speech Synthesis and Recognition}, Robust Speech Recognition and Understanding, Michael Grimm and Kristian Kroschel (Ed.), INTECH Publishing, pp. 319-332, 2007.
- Mark Hasegawa-Johnson, Jennifer Cole, Ken Chen, Partha Lal, Amit Juneja, Taejin Yoon, Sarah Borys, and Xiaodan Zhuang, Prosodically Organized Automatic Speech Recognition. Linguistic Processes in Spontaneous Speech, Language and Linguistics Monograph Series A25, Academica Sinica, Taiwan, 2008, pp. 101-128 (NSF 0414117; NSF 0121285).
- Mark Hasegawa-Johnson, Phonology and the Art of Automatic Speech Recognition, Director's Seminar Series, Beckman Institute, University of Illinois at Urbana-Champaign, November 2006 (NSF 0414117).
- Taejin Yoon, Xiaodan Zhuang, Jennifer Cole, and Mark Hasegawa-Johnson, Voice Quality Dependent Speech Recognition, Linguistic Patterns in Spontaneous Speech, Language and Linguistics Monograph Series A25, Academica Sinica, Taiwan, 2008, pp. 77-100 (NSF 0414117).
- Taejin Yoon, Xiaodan Zhuang, Jennifer Cole, and Mark Hasegawa-Johnson, Voice Quality Dependent Speech Recognition. Midwest Computational Linguistics Colloquium, Urbana, IL, 2006 (NSF 0414117).
- Rajiv Reddy and Mark Hasegawa-Johnson, Analysis of Pitch Contours in Repetition-Disfluency using Stem-ML, Midwest Computational Linguistics Colloquium, 2006
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys, Sung-Suk Kim, Jennifer Cole and Jeung-Yoon Choi, Prosody Dependent Speech Recognition on Radio News Corpus of American English. IEEE Transactions on Speech and Audio Processing, 14(1):232-245, 2006 (NSF 0132900).
- Cole, Jennifer, Mark Hasegawa-Johnson, Chilin Shih, Eun-Kyung Lee, Heejin Kim, H. Lu, Yoonsook Mo, Tae-Jin Yoon. (2005). Prosodic Parallelism as a Cue to Repetition and Hesitation Disfluency, Proceedings of DISS'05 (An ISCA Tutorial and Research Workshop), Aix-en-Provence, France, pp. 53-58 (NSF 0414117).
- Mark Hasegawa-Johnson, Ken Chen, Jennifer Cole, Sarah Borys, Sung-Suk Kim, Aaron Cohen, Tong Zhang, Jeung-Yoon Choi, Heejin Kim, Taejin Yoon, and Sandra Chavarria, Simultaneous Recognition of Words and Prosody in the Boston University Radio Speech Corpus. Speech Communication 46(3-4):418-439, 2005 (NSF 0132900).
- Yoon, Tae-Jin, Cole, Jennifer, Mark Hasegawa-Johnson, and Chilin Shih. Detecting Non-modal Phonation in Telephone Speech. Unpublished manuscript, 2005 (NSF 0414117).
- Yoon, Tae-Jin, Cole, Jennifer, Mark Hasegawa-Johnson, and Chilin Shih. (2005). Acoustic correlates of non-modal phonation in telephone speech, The Journal of the Acoustical Society of America 117(4), p. 2621 (NSF 0414117).
- Sarah Borys, Mark Hasegawa-Johnson, Ken Chen, and Aaron Cohen, Modeling and Recognition of Phonetic and Prosodic Factors for Improvements to Acoustic Speech Recognition Models. Interspeech, October, 2004 (NSF 0132900).
- Mark Hasegawa-Johnson, Speech Recognition Models of the Interdependence Among Syntax, Prosody, and Segmental Acoustics, talk given at Tsinghua University, October 2004 (NSF 0414117).
- Mark Hasegawa-Johnson, Jennifer Cole, Chilin Shih, Ken Chen, Aaron Cohen, Sandra Chavarria, Heejin Kim, Taejin Yoon, Sarah Borys, and Jeung-Yoon Choi, Speech Recognition Models of the Interdependence Among Syntax, Prosody, and Segmental Acoustics, Human Language Technologies: Meeting of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), Workshop on Higher-Level Knowledge in Automatic Speech Recognition and Understanding, May, 2004, pp. 56-63 (NSF 0414117).
- Ken Chen and Mark Hasegawa-Johnson, How Prosody Improves Word Recognition, SpeechProsody 2004, Nara, Japan, March 2004, 583-586 (NSF 0132900).
- Aaron Cohen, A Survey of Machine Learning Methods for Predicting Prosody in Radio Speech. M.S. Thesis, 2004.
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, and Jennifer Cole, A Maximum Likelihood Prosody Recognizer, SpeechProsody 2004, Nara, Japan, March 2004, 509-512 (NSF 0132900; Illinois CRI; Software).
- Ken Chen and Mark Hasegawa-Johnson, An Automatic Prosody Labeling System Using ANN-Based Syntactic-Prosodic Model and GMM-Based Acoustic-Prosodic Model, ICASSP 2004 (NSF 0132900; Illinois CRI).
- Sung-Suk Kim, Mark Hasegawa-Johnson, and Ken Chen, Automatic Recognition of Pitch Movements Using Multilayer Perceptron and Time-Delay Recursive Neural Network, IEEE Signal Processing Letters 11(7):645-648, 2004(NSF 0132900; Illinois CRI).
- Yuexi Ren, Sung-Suk Kim, Mark Hasegawa-Johnson, and Jennifer Cole, Speaker-Independent Automatic Detection of Pitch Accent, SpeechProsody 2004, Nara, Japan, March 2004, 521-524 (NSF 0085980).
- Ken Chen, Mark Hasegawa-Johnson and Sung-Suk Kim, An Intonational Phrase Boundary and Pitch Accent Dependent Speech Recognizer. International Conference on Systems, Cybernetics, and Intelligence, 2003 (Illinois CRI).
- Ken Chen and Mark Hasegawa-Johnson, ``Improving the robustness of prosody dependent language modeling based on prosody syntax cross-correlation.'' ASRU, 2003 (Illinois CRI).
- Ken Chen, Mark Hasegawa-Johnson and Jennifer Cole, "Prosody Dependent Speech Recognition on Radio News," IEEE Workshop on Statistical Signal Processing, St. Louis, MO, 2003 (Illinois CRI).
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys, and Jennifer Cole, Prosody Dependent Speech Recognition with Explicit Duration Modelling at Intonational Phrase Boundaries. Interspeech, September, 2003, 393-396 (Illinois CRI; Software diffs, TGZ, ZIP)
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys, and Jennifer Cole, Prosody Dependent Speech Recognition with Explicit Duration Modelling at Intonational Phrase Boundaries. Interspeech, September, 2003, 393-396 (Illinois CRI).
- Sarah Borys, Recognition of Prosodic Factors and Detection of Landmarks for Improvements to Continuous Speech Recognition Systems. B.S. Thesis, 2003.
- Sarah Borys, Mark Hasegawa-Johnson and Jennifer Cole, The Importance of Prosodic Factors in Phoneme Modeling with Applications to Speech Recognition, ACL Student Session, 2003 (NSF 0132900).
- Sarah Borys, Mark Hasegawa-Johnson and Jennifer Cole, Prosody as a Conditioning Variable in Speech Recognition, Illinois Journal of Undergraduate Research, 2003 (Illinois CRI).
Human-Computer Interaction
Speech Synthesis
- Xiaodan Zhuang, Lijuan Wang, Frank Soong, and Mark Hasegawa-Johnson, A Minimum Converted Trajectory Error (MCTE) Approach to High Quality Speech-to-Lips Conversion, Proceedings of Interspeech 2010 pp. 1736-1739, (NSF 0703624)
- Thomas S. Huang, Mark A. Hasegawa-Johnson, Stephen M. Chu, Zhihong Zeng, and Hao Tang, Sensitive Talking Heads, IEEE Signal Processing Magazine 26(4):67-72, July 2009
- Hao Tang, Yun Fu, Jilin Tu, Mark Hasegawa-Johnson, and Thomas S. Huang, Humanoid Audio-Visual Avatar with Emotive Text-to-Speech Synthesis, IEEE Trans. Multimedia 10(6):969-981, 2008
- Hao Tang, Yuxiao Hu, Yun Fu, Mark Hasegawa-Johnson and Thomas S. Huang, Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar, IEEE International Conference on Multimedia and Expo (ICME) 2008, pp. 1205-8
- Hao Tang, Xi Zhou, Matthias Odisio, Mark Hasegawa-Johnson, and Thomas Huang, Two-Stage Prosody Prediction for Emotional Text-to-Speech Synthesis, Interspeech 2008, pp. 2138-41 (VACE; NSF 0426227).
- Hao Tang, Yun Fu, Jilin Tu, Thomas Huang, and Mark Hasegawa-Johnson, EAVA: A 3D Emotive Audio-Visual Avatar, IEEE Workshop on Applications of Computer Vision (IEEE WACV '08) pp. 1-6, 2008 (VACE; NSF 0426227).
Speech and Language Technology in Education
- Mark Hasegawa-Johnson, Camille Goudeseune, Jennifer Cole, Hank Kaczmarski, Heejin Kim, Sarah King, Timothy Mahrt, Jui-Ting Huang, Xiaodan Zhuang, Kai-Hsiang Lin, Harsh Vardhan Sharma, Zhen Li, and Thomas S. Huang, Multimodal Speech and Audio User Interfaces for K-12 Outreach, APSIPA 2011 (NSF 0534106; 0703624; 0807329)
- Suma Bhat, Mark Hasegawa-Johnson and Richard Sproat, Automatic Fluency Assessment by Signal-Level Measurement of Spontaneous Speech, 2010 INTERSPEECH Satellite Workshop on Second Language Studies: Acquisition, Learning, Education and Technology
- Su-Youn Yoon, Mark Hasegawa-Johnson, and Richard Sproat, Landmark-based Automated Pronunciation Error Detection, Proceedings of Interspeech 2010 pp. 614-617
- Suma Bhat, Richard Sproat, Mark Hasegawa-Johnson and Fred Davidson, ``Automatic fluency assessment using thin-slices of spontaneous speech,'' Language Testing Research Colloquium 2010, Denver, CO
- Su-Youn Yoon, Richard Sproat, and Mark Hasegawa-Johnson, Automated Pronunciation Scoring using Confidence Scoring and Landmark-based SVM, Interspeech 80100:1-4, Brighton, September 2009
- Su-Youn Yoon, Mark Hasegawa-Johnson and Richard Sproat, Automated Pronunciation Scoring for L2 English Learners, CALICO workshop, 2009
- Su-Youn Yoon, Lisa Pierce, Amanda Huensch, Eric Juul, Samantha Perkins, Richard Sproat, and Mark Hasegawa-Johnson, Construction of a rated speech corpus of L2 learners' speech, CALICO journal, 2009 (Data Access: Rated L2 Speech Corpus (public data)
- Su-Youn Yoon, Lisa Pierce, Amanda Huensch, Eric Juul, Samantha Perkins, Richard Sproat, and Mark Hasegawa-Johnson, "Construction of a rated speech corpus of L2 learners' speech," CALICO workshop, 2008
- Tong Zhang, Mark Hasegawa-Johnson and Stephen E. Levinson, Cognitive State Classification in a spoken tutorial dialogue system, Speech Communication 48(6):616-632, 2006(NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, Children's Emotion Recognition in an Intelligent Tutoring Scenario. Interspeech, October, 2004 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen E. Levinson, An empathic-tutoring system using spoken language, Australian conference on computer-human interaction (OZCHI), 2003 (NSF 0085980).
Universal Access and Clinical Applications
- Harsh Vardhan Sharma and Mark Hasegawa-Johnson, Acoustic Model Adaptation using in-domain Background Models for Dysarthric Speech Recognition, Computer Speech and Language, in press
- Panying Rong, Torrey Loucks, Heejin Kim, and Mark Hasegawa-Johnson, Relationship between kinematics, F2 slope and speech intelligibility in dysarthria due to cerebral palsy, in Clinical Linguistics and Phonetics, September 2012, Vol. 26, No. 9 , Pages 806-822 (doi:10.3109/02699206.2012.706686)
- Harsh Vardhan Sharma, Acoustic Model Adaptation for Recognition of Dysarthric Speech, Ph.D. Thesis, University of Illinois, 2012
- Heejin Kim, Mark Hasegawa-Johnson and Adrienne Perlman, Temporal and spectral characteristics of fricatives in dysarthria, Journal of the Acoustical Society of America 130:2446
- Heejin Kim, Mark Hasegawa-Johnson, and Adrienne Perlman, "Vowel Contrast and Speech Intelligibility in Dysarthria," Folia Phoniatrica et Logopaedica 63(4):187-194, 2011 (NIH DC0032301)
- Heejin Kim, Katie Martin, Mark Hasegawa-Johnson, and Adrienne Perlman, "Frequency of consonant articulation errors in dysarthric speech," Clinical Linguistics & Phonetics 24(10):759-770, 2010 (NIH DC0032301)
- Harsh Vardhan Sharma and Mark Hasegawa-Johnson, State Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition, HLT/NAACL Workshop on Speech and Language Processing for Assistive Technology (SLPAT) pp. 72-79, 2010 (NSF 0534106).
- Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Acoustic Cues to Lexical Stress in Spastic Dysarthria, Speech Prosody 2010 100891:1-4 (NIH R21-DC008090-A).
- Heejin Kim, Panying Rong, Torrey M. Loucks and Mark Hasegawa-Johnson, Kinematic Analysis of Tongue Movement Control in Spastic Dysarthria, Proceedings of Interspeech 2010, pp. 2578-2581 (NSF 0534106).
- Harsh Vardhan Sharma, Mark Hasegawa-Johnson, Jon Gunderson, and Adrienne Perlman, Universal Access: Speech Recognition for Talkers with Spastic Dysarthria, Interspeech 42862:1-4, Brighton, September 2009 (NIH R21 DC008090A)
- Harsh Vardhan Sharma, Universal Access: Experiments in Automatic Recognition of Dysarthric Speech, M.S. Thesis, 2008 (NSF 0534106).
- Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas Huang, Kenneth Watkin, and Simone Frame, Dysarthric Speech Database for Universal Access Research, Interspeech 2008, pp. 1741-4 (NSF 0534106; NIH DC008090A; Data).
- Weimo Zhu, Mark Hasegawa-Johnson, Karen Chapman-Novakofski, and Arthur Kantor, Cellphone-Based Nutrition E-Diary. National Nutrient Database Conference, 2007 (Robert Wood Johnson Foundation).
- Weimo Zhu, Mark Hasegawa-Johnson, Arthur Kantor, Dan Roth, Yong Gao, Youngsik Park, and Lin Yang, "E-coder for Automatic Scoring Physical Activity Diary Data: Development and Validation." ACSM, 2007 (Robert Wood Johnson Foundation).
- Mark Hasegawa-Johnson, Jonathan Gunderson, Adrienne Perlman, and Thomas Huang, HMM-Based and SVM-Based Recognition of the Speech of Talkers with Spastic Dysarthria, ICASSP III:1060-3, May 2006 (NSF 0534106; NIH DC008090A).
- Weimo Zhu, Mark Hasegawa-Johnson, and Mital Arun Gandhi, Accuracy of Voice-Recognition Technology in Collecting Behavior Diary Data. Association of Test Publishers (ATP): Innovations in Testing, March 2005 (Robert Wood Johnson Foundation).
Multimedia Analytics
- Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson, and Thomas S. Huang, Saliency-Maximized Audio Visualization and Efficient Audio-Visual Browsing for Faster-than-Real-Time Human Acoustic Event Detection, ACM Transactions on Applied Perception, in press (NSF 0807329)
- Camille Goudeseune, 2012. Effective browsing of long audio recordings. ACM International Workshop on Interactive Multimedia on Mobile and Portable Devices, 2012 (NSF 0807329; Software).
- Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King, Mark Hasegawa-Johnson and Thomas Huang, Improving Faster-than-Real-Time Human Acoustic Event Detection by Saliency-Maximized Audio Visualization, ICASSP 2012, pp. 2277-2280 (NSF 0807329)
- Xiaodan Zhuang, Modeling Audio and Visual Cues for Real-world Event Detection, Ph.D. Thesis, University of Illinois, April 2011
- David Cohen, Camille Goudeseune and Mark Hasegawa-Johnson. 2009. Efficient Simultaneous Multi-Scale Computation of FFTs. Technical report GT-FODAVA-09-01 (NSF 0807329; Software).
- David Petruncio, Evaluation of Various Features for Music Genre Classification with Hidden Markov Models. B.S. Thesis, 2002.
- James Beauchamp, Heinrich Taube, Sever Tipei, Scott Wyatt, Lippold Haken and Mark Hasegawa-Johnson, "Acoustics, Audio, and Music Technology Education at the University of Illinois," JASA, 110(5):2961, 2001.
- Mark Hasegawa-Johnson, Jul Cha, Shamala Pizza and Katherine Haker, CTMRedit: A case study in human-computer interface design, International Conference On Public Participation and Information Tech., Lisbon, pp. 575-584, 1999 (NIH DC0032301; Software).
Computer Audition
- Christopher Co, 2013. Room Reconstruction and Navigation Using Acoustically Obtained Room Impulse Responses and a Mobile Robot Platform. Ph.D. Thesis, University of Illinois (Software).
Acoustic Event Detection
- Po-Sen Huang, Mark Hasegawa-Johnson, Wotao Yin and Tom Huang, Opportunistic Sensing: Unattended Acoustic Sensor Selection Using Crowdsourcing Models, IEEE Workshop on Machine Learning in Signal Processing 2012
- Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang, Pooling Robust Shift-Invariant Sparse Representations of Acoustic Signals, Interspeech 2012
- Mark Hasegawa-Johnson, Xiaodan Zhuang, Xi Zhou, Camille Goudeseune, Hao Tang, Kai-Hsiang Lin, Mohamed Omar, and Thomas Huang, Toward Better Real-world Acoustic Event Detection, Presentation given at Seoul National University, May 30, 2012
- Po-Sen Huang, Robert Mertens, Ajay Divakaran, Gerald Friedland, and Mark Hasegawa-Johnson, How to Put it into Words---Using Random Forests to Extract Symbol Level Descriptions from Audio Content for Concept Detection, ICASSP 2012 (ARO W911NF-09-1-0383)
- R. Mertens, P.-S. Huang, L. Gottlieb, G. Friedland, A. Divakaran, On the Application of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval, IEEE International Symposium on Multimedia, pp. 446-451, 2011
- Po-Sen Huang, Mark Hasegawa-Johnson, and Thyagaraju Damarla, Exemplar Selection Methods to Distinguish Human from Animal Footsteps, Second Annual Human and Light Vehicle Detection Workshop, Maryland, pp. 14:1-10, 2011 (ARO W911NF-09-1-0383)
- Po-Sen Huang, Thyagaraju Damarla and Mark Hasegawa-Johnson, Multi-sensory features for Personnel Detection at Border Crossings, Fusion 2011, to appear (ARO W911NF-09-1-0383)
- Xiaodan Zhuang, Modeling Audio and Visual Cues for Real-world Event Detection, Ph.D. Thesis, University of Illinois, April 2011
- Po-Sen Huang, Xiaodan Zhuang, and Mark Hasegawa-Johnson, Improving Acoustic Event Detection using Generalizable Visual Features and Multi-modality Modeling, ICASSP 2011, pp. 349-352 (ARO W911NF-09-1-0383)
- Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson, and Thomas S. Huang, Real-world Acoustic Event Detection, Pattern Recognition Letters, 31, 2 (Sep. 2010), 1543-1551 (NSF 0807329).
- Mark Hasegawa-Johnson, Xiaodan Zhuang, Xi Zhou, Camille Goudeseune, and Thomas S. Huang Adaptation of tandem HMMs for non-speech audio event detection, JASA 125:2730, 2009.
- Xiaodan Zhuang, Jing Huang, Gerasimos Potamianos and Mark Hasegawa-Johnson, Acoustic Fall Detection using Gaussian Mixture Models and GMM Supervectors, ICASSP 2009, pp. 69-72 (NetCarity).
- Xiaodan Zhuang, Xi Zhou, Thomas S. Huang and Mark Hasegawa-Johnson, Feature Analysis and Selection for Acoustic Event Detection, ICASSP pp. 17-20, 2008 (VACE; NSF 0414117; NSF 0534106).
- Xi Zhou, Xiaodan Zhuang, Ming Lui, Hao Tang, Mark Hasegawa-Johnson and Thomas Huang, HMM-Based Acoustic Event Detection with AdaBoost Feature Selection, Lecture Notes in Computer Science, 2008, Volume 4625/2008, 345-353 (VACE; NSF 0414117; NSF 0534106).
Audio Enhancement
- Po-Sen Huang, Scott Deeann Chen, Paris Smaragdis, and Mark Hasegawa-Johnson, Singing-Voice Separation from Monaural Recordings using Robust Principal Component Analysis, ICASSP 2012 (ARO W911NF-09-1-0383)
- Lae-Hoon Kim, Statistical Model Based Multi-Microphone Speech Processing: Toward Overcoming Mismatch Problem, Ph.D. Thesis, August 2010, University of Illinois (NSF 0913188)
- Lae-Hoon Kim and Mark Hasegawa-Johnson, Toward Overcoming Fundamental Limitation in Frequency-Domain Blind Source Separation for Reverberant Speech Mixtures, Proceedings of Asilomar 2010 (NSF 0913188)
- Lae-Hoon Kim, Kyung-Tae Kim, and Mark Hasegawa-Johnson, Robust Automatic Speech Recognition with Decoder Oriented Ideal Binary Mask Estimation, Proceedings of Interspeech 2010 pp. 2066-2069 (NSF 0913188)
- Lae-Hoon Kim, Kyungtae Kim, and Mark Hasegawa-Johnson, "Speech enhancement beyond minimum mean squared error with perceptual noise shaping," 2010 spring meeting of the ASA (Illinois CIRS)
- Lae-Hoon Kim, Mark Hasegawa-Johnson, Gerasimos Potamianos, and Vit Libal, Joint Estimation of DOA and Speech Based on EM Beamforming, ICASSP 2010 (Illinois CIRS)
- Lae-Hoon Kim and Mark Hasegawa-Johnson, Optimal Multi-Microphone Speech Enhancement in Cars, DSP in Cars workshop, Dallas, July 2009 (NSF 0803219 and 0807329)
- Lae-Hoon Kim, Mark Hasegawa-Johnson, Jun-Seok Lim, and Koeng-Mo Sung, Acoustic model for robustness analysis of optimal multipoint room equalization, JASA 123(4):2043-2053, 2008 (Illinois CIRS).
- Lae-Hoon Kim and Mark Hasegawa-Johnson, Optimal Speech Estimator Considering Room Response as well as Additive Noise: Different Approaches in Low and High Frequency Range, ICASSP pp. 4573-6, 2008 (Illinois CIRS).
- Bowon Lee and Mark Hasegawa-Johnson, Minimum Mean Squared Error A Posteriori Estimation of High Variance Vehicular Noise, in 2007 Biennial on DSP for In-Vehicle and Mobile Systems, Istanbul, June, 2007 (Motorola RPS19; NSF 0534106; Software).
- Bowon Lee, Robust Speech Recognition in a Car Using a Microphone Array. Ph.D. thesis, 2006 (Motorola RPS19; Software; Data)
- Mark Hasegawa-Johnson, Dealing with Acoustic Noise. Part II: Beamforming. tutorial presentation given at WS06, Center for Language and Speech Processing, July 2006 (NSF 0121285).
- Mark Hasegawa-Johnson, Dealing with Acoustic Noise. Part I: Spectral Estimation. tutorial presentation given at WS06, Center for Language and Speech Processing, July 2006 (NSF 0121285).
- Lae-Hoon Kim, Mark Hasegawa-Johnson and Keung-Mo Sung, Generalized Optimal Multi-Microphone Speech Enhancement Using Sequential Minimum Variance Distortionless Response (MVDR) Beamforming and Postfiltering, ICASSP III:65-8, May 2006 (Illinois CIRS).
- Lae-Hoon Kim and Mark Hasegawa-Johnson, Generalized multi-microphone spectral amplitude estimation based on correlated noise model. 119th Convention of the Audio Engineering Society, New York, October 2005 (Illinois CIRS).
- Mital Gandhi and Mark Hasegawa-Johnson, Source Separation using Particle Filters. Interspeech, October 2004 (NSF 0132900).
- Bowon Lee, Mark Hasegawa-Johnson, and Camille Goudeseune, Open Loop Multichannel Inversion of Room Impulse Response, JASA 113(4):2202-3, 2003 (NSF 0132900; Data).
Speech Coding
- Mark Hasegawa-Johnson and Abeer Alwan, Speech Coding: Fundamentals and Applications, Wiley Encyclopedia of Telecommunications and Signal Processing, J. Proakis, Ed., Wiley and Sons, NY, December 2002 (NSF 0132900).
- Wira Gunawan and Mark Hasegawa-Johnson, "PLP Coefficients can be Quantized at 400 bps," ICASSP, Salt Lake City, UT, pp. 2.2.1-4, 2001.
- Tomohiko Taniguchi and Mark Johnson, Speech coding and decoding system (transform stochastic codebook so that, after perceptual weighting, it will be orthogonal to the adaptive codebook), U.S. Patent 5799131, August 25, 1998 (Fujitsu).
- Tomohiko Taniguchi, Mark Johnson, Yasuji Ohta, Hideki Kurihara, Yoshinori Tanaka, and Yoshihito Sakai, Speech coding system having codebook storing differential vectors between each two adjoining code vectors, U.S. Patent 5323486, June 21, 1994 (Fujitsu).
- Tomohiko Taniguchi and Mark Johnson, Speech coding system (hexagonal lattice code), U.S. Patent 5245662, September 14, 1993 (Fujitsu).
- Tomohiko Taniguchi, Mark Johnson, Hideki Kurihara, Yoshinori Tanaka, and Yasuji Ohta, Speech coding and decoding system (sparse adaptive codebook), U.S. Patent 5199076, March 30, 1993 (Fujitsu).
- Mark Hasegawa-Johnson and Tomohiko Taniguchi, "On-line and off-line computational reduction techniques using backward filtering in CELP speech coders," IEEE Transactions Acoustics, Speech, and Signal Processing, vol. 40, pp. 2090-2093, 1992 (Fujitsu).
- Mark A. Johnson and Tomohiko Taniguchi, "Low-complexity multi-mode VXC using multi-stage optimization and mode selection," ICASSP, Toronto, Canada, pp. 221-224, 1991 (Fujitsu).
- Tomohiko Taniguchi, Mark A. Johnson, and Yasuji Ohta, Pitch sharpening for perceptually improved CELP, and the sparse-delta codebook for reduced computation, ICASSP, Toronto, Canada, pp. 241-244, 1991 (Fujitsu).
- Tomohiko Taniguchi, Fumio Amano, and Mark A. Johnson, "Improving the performance of CELP-based speech coding at low bit rates," International Symposium on Circuits and Systems, Singapore, 1991 (Fujitsu).
- Mark A. Johnson and Tomohiko Taniguchi, "Computational reduction in sparse-codebook CELP using backward-weighting of the input," Institute of Electr., Information, and Comm. Eng. Symposium, DSP 90-15, Hakata, 61-66, 1990 (Fujitsu).
- Tomohiko Taniguchi, Mark A. Johnson and Yasuji Ohta, "Multi-vector pitch-orthogonal LPC: quality speech with low complexity at rates between 4 and 8 kbps," ICSLP, Kobe, pp. 113-116, 1990 (Fujitsu).
- Mark A. Johnson and Tomohiko Taniguchi, "Pitch-orthogonal code-excited LPC," IEEE Global Telecommunications Conference (GLOBECOM), San Diego, CA, pp. 542-546, 1990 (Fujitsu).
Speaker Recognition
- Hao Tang, Stephen Chu, Mark Hasegawa-Johnson, and Thomas Huang, Partially Supervised Speaker Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 34(5):959-971, 2012
- David Harwath and Mark Hasegawa-Johnson, Phonetic Landmark Detection for Automatic Language Identification, Speech Prosody 2010 100231:1-4 (NSF 0703624).
- Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, and Zhengyou Zhang, "Frequency Domain Correspondence for Speaker Normalization," in Proc. Interspeech, pp. 274-277, Antwerp, August, 2007.
- Xi Zhou, Yu Fun, Ming Liu, Mark Hasegawa-Johnson, and Thomas Huang, Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification, International Conference on Multimedia and Expo 2007, pp. 188-191 (VACE NBCHC060160; NSF 0426627).
- Ming Liu, Zhengyou Zhang, Mark Hasegawa-Johnson, and Thomas Huang, Exploring Discriminative Learning for Text-Independent Speaker Recognition, ICME 2007, pp. 56-59 (NSF 0426627).
Computer Cognition
Computer Vision
- Xiaodan Zhuang, Modeling Audio and Visual Cues for Real-world Event Detection, Ph.D. Thesis, University of Illinois, April 2011
- Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark A. Hasegawa-Johnson, and Thomas S. Huang, Novel Gaussianized Vector Representation for Improved Natural Scene Categorization, Pattern Recognition Letters, 31, 8 (Jun. 2010), 702-708 (NSF 0807329).
- Hao Tang, Mark Hasegawa-Johnson, and Thomas S. Huang, Non-Frontal View Facial Expression Recognition, ICME 2010, pp. 1202-7
- Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson and Thomas S. Huang, Efficient Object Localization with Gaussianized Vector Representation, IMCE 2009 pp. 89-96 (NSF 0803219).
- Xiaodan Zhuang, Xi Zhou, Mark Hasegawa-Johnson, and Thomas Huang, Face Age Estimation Using Patch-based Hidden Markov Model Supervectors, ICPR 2008 10.1.1.139.846:1-4 (NSF 0534106; VACE).
- Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark Hasegawa-Johnson, and Thomas Huang, A Novel Gaussianized Vector Representation for Natural Scene Categorization, ICPR 2008 10.1.1.139.846:1-4 (NSF 0534106; VACE).
- Xi Zhou, Xiaodan Zhuang, Shuicheng Yan, Shih-Fu Chang, Mark Hasegawa-Johnson, and Thomas S. Huang, SIFT-Bag Kernel for Video Event Analysis, ACM Multimedia 2008 (NSF 0534106; VACE).
- Shuicheng Yan, Xi Zhou, Ming Liu, Mark Hasegawa-Johnson, and Thomas S. Huang, Regression from Patch Kernel, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2008, pp. 1-8
Machine Learning
- Mark Hasegawa-Johnson, David Harwath, Harsh Vardhan Sharma, and Po-Sen Huang, Transfer Learning for Multi-Person and Multi-Dialect Spoken Language Interface, presentation given at the 2012 Urbana Neuroengineering Conference (NSF 0905633)
- Jui-Ting Huang and Mark Hasegawa-Johnson, Semi-Supervised Training of Gaussian Mixture Models by Conditional Entropy Minimization, Proceedings of Interspeech 2010 pp. 1353-1356 (NSF 0703624)
- Mark Hasegawa-Johnson, Camille Goudeseune, Kai-Hsiang Lin, David Cohen, Xi Zhou, Xiaodan Zhuang, Kyungtae Kim, Hank Kaczmarski and Thomas Huang, Visual Analytics for Audio, NIPS Workshop on Visual Analytics, 2009 (NSF 0807329)
- Mark Hasegawa-Johnson, Pattern Recognition in Acoustic Signal Processing, Machine Learning Summer School, University of Chicago, 2009 (NSF 0807329)
- Jui-Ting Huang and Mark Hasegawa-Johnson, On semi-supervised learning of Gaussian mixture models for phonetic classification, NAACL HLT Workshop on Semi-Supervised Learning, 2009 (NSF 0534106 and NSF 0703624).
- Mark Hasegawa-Johnson, Tutorial: Pattern Recognition in Signal Processing, JASA 125:2698, 2009 (NSF 0803219 and 0807329).
- Yang Li, Incremental Training and Growth of Artificial Neural Networks, M.S. Thesis, 2008 (NSF 0534106).
- Mohammad Kamal Omar and Mark Hasegawa-Johnson, Model Enforcement: A Unified Feature Transformation Framework for Classification and Recognition, IEEE Transactions on Signal Processing, vol. 52, no. 10, pp. 2701-2710, 2004 (NSF 0132900).
Natural Language Processing
- Ali Sakr and Mark Hasegawa-Johnson, Topic Modeling of Phonetic Latin-Spelled Arabic for the Relative Analysis of Genre-Dependent and Dialect-Dependent Variation, CITALA 2012 (QNRF NPRP 410-1-069).
- Rania Al-Sabbagh, Roxana Girju, Mark Hasegawa-Johnson, Elabbas Benmamoun, Rehab Duwairi, and Eiman Mustafawi, Using Web Mining Techniques to Build a Multi-Dialect Lexicon of Arabic, Linguistics in the Gulf Conference, 2011. Abstract here.
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Mark Hasegawa-Johnson, Carol Espy-Wilson, Elliot Saltzman, and Louis Goldstein, ``A procedure for estimating gestural scores from speech acoustics,'' in {\em J. Acoustical Society of America}, 132(6):3980-3989
- Mark Hasegawa-Johnson, Shamala Pizza, Abeer Alwan, Jul Cha, and Katherine Haker, Vowel Category Dependence of the Relationship Between Palate Height, Tongue Height, and Oral Area, Journal of Speech, Language, and Hearing Research, vol. 46, no. 3, pp. 738-753, 2003 (NIH DC0032301; Data).
- Yanli Zheng, Mark Hasegawa-Johnson, and Shamala Pizza, PARAFAC Analysis of the Three dimensional tongue Shape, Journal of the Acoustical Society of America, vol. 113, no. 1, pp. 478-486, January 2003 (NIH DC0032301).
- Mark Hasegawa-Johnson, Line Spectral Frequencies are the Poles and Zeros of a Discrete Matched-Impedance Vocal Tract Model, Journal of the Acoustical Society of America, vol. 108, no. 1, pp. 457-460, 2000 (NIH DC0032301).
- Yanli Zheng and Mark Hasegawa-Johnson, Three Dimensional Tongue shape Factor Analysis, American Speech-Language Hearing Association National Convention, Washington, DC, 2000. Published in the magazine ASHA Leader, 5(16):144 (NIH 0032301).
- Mark Hasegawa-Johnson, Preliminary Work and Proposed Continuation: Imaging of Speech Anatomy and Behavior. Talk given at the Universities of Illinois Inter-campus Biomedical Imaging Forum, 2001 (NIH 0032301).
- Mark Hasegawa-Johnson, Jul Cha and Katherine Haker, CTMRedit: A Matlab-based tool for segmenting and interpolating MRI and CT images in three orthogonal planes, 21st Annual International Conference of the IEEE/EMBS Society, pp. 1170. 1999 (NIH 0032301).
- Mark Hasegawa-Johnson, "Combining magnetic resonance image planes in the Fourier domain for improved spatial resolution." International Conference On Signal Processing Applications and Technology, Orlando, FL, pp. 81.1-5, 1999 (NIH 0032301)
- Mark Hasegawa-Johnson, Electromagnetic Exposure Safety of the Carstens Articulograph AG100, Journal of the Acoustics Society of America, vol. 104, pp. 2529-2532, 1998 (NIH 0032301).
- Mark A. Johnson, "Using beam elements to model the vocal fold length in breathy voicing," JASA 91:2420-2421, 1992.
- Soo-Eun Chang, Nicoline Ambrose, Kirk Erickson, and Mark Hasegawa-Johnson, "Brain Anatomy Differences in Childhood Stuttering." Neuroimage, in press (NIH DC05210, Illinois Research Board).
- Soo-Eun Chang, Kirk I. Erickson, Nicoline G. Ambrose, Mark Hasegawa-Johnson, and C.L. Ludlow, "Deficient white matter development in left hemisphere speech-language regions in children who stutter." Society for Neuroscience, Atlanta, GA, 2006 (NIH DC05210, Illinois Research Board).
- Soo-Eun Chang, Nicoline Ambrose, and Mark Hasegawa-Johnson, "An MRI (DTI) study on children with persistent developmental stuttering." 2004 ASHA Convention, American Speech Language and Hearing Association, November, 2004 (Illinois Research Board).
- Kyung-Tae Kim, Kai-Hsiang Lin, Dirk B. Walther, Mark Hasegawa-Johnson and Thomas S. Huang, "Automatic Detection of Auditory Salience with Optimal Filters Derived from Human Annotation," in preparation (Data)
- Jeremy Tidemann, Characterization of the Head-Related Transfer Function using Chirp and Maximum Length Sequence Excitation Signals, M.S. Thesis, 2011.
- Bryce E Lobdell, Jont B Allen, Mark A Hasegawa-Johnson, Intelligibility predictors and neural representation of speech, Speech Communication, in press
- Bryce Lobdell, Models of Human Phone Transcription in Noise Based on Intelligibility Predictors, Bryce Lobdell, Ph.D. Thesis, 2009
- Yoonsook Mo, Jennifer Cole and Mark Hasegawa-Johnson, How do ordinary listeners perceive prosodic prominence? Syntagmatic vs. Paradigmatic comparison. Spring Meeting of the ASA, 2009 (NSF 0703624)
- Bryce Lobdell, Mark Hasegawa-Johnson, and Jont B. Allen, Human Speech Perception and Feature Extraction, Interspeech 2008
- Yoonsook Mo, Jennifer Cole and Mark Hasegawa-Johnson, Frequency and repetition effects outweigh phonetic detail in prominence perception, LabPhon 11 pp. 29-30, 2008.
- Mark Hasegawa-Johnson, Bayesian Learning for Models of Human Speech Perception, IEEE Workshop on Statistical Signal Processing, St. Louis, MO, 2003, 393-396(NSF 0132900).
- Sumiko Takayanagi, Mark Hasegawa-Johnson, Laurie S. Eisner and Amy Schaefer-Martinez, Information theory and variance estimation techniques in the analysis of category rating data and paired comparisons. JASA, 102:3091, 1997
- Elabbas Benmamoun and Mark Hasegawa-Johnson, How Different are Arabic Dialects from Each Other and from Classical Arabic, 6th Annual Arabic Linguistics Symposium, Ifrane, Morocco, June 2013.
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Mark Hasegawa-Johnson, Carol Espy-Wilson, Elliot Saltzman and Louis Goldstein, ``Automatic gestural annotation of the U. Wisconsin X-ray Microbeam corpus,'' Workshop on New Tools and Methods for Very Large Scale Phonetics Research, University of Pennsylvania, Jan. 2011
- Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, and Elliot Saltzman, Articulatory Phonological Code for Word Recognition, Interspeech, 34549:1-4, Brighton, September 2009 (NSF 0703624)
- Sarah Borys, An SVM Front End Landmark Speech Recognition System, M.S. Thesis, 2008.
- Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis Goldstein, and Elliot Saltzman, The Entropy of Articulatory Phonological Code: Recognizing Gestures from Tract Variables, Interspeech 2008 (NSF 0703624, NSF 0703782, NIH DC02717).
- Rahul Chitturi and Mark Hasegawa-Johnson, Novel Time-Domain Multi-class SVMs for Landmark Detection, Interspeech, September 2006.
- Mark Hasegawa-Johnson, "Time-Frequency Distribution of Partial Phonetic Information Measured Using Mutual Information," Interspeech IV:133-136, Beijing, 2000 (Data).
- Mark A. Hasegawa-Johnson, "Burst spectral measures and formant frequencies can be used to accurately discriminate stop place of articulation," JASA, 98:2890, 1995 (a href="http://mickey.ifp.illinois.edu/speechWiki/index.php/TIMIT">Comments on the data)
- Mark A. Johnson, "A mapping between trainable generalized properties and the acoustic correlates of distinctive features," MIT Speech Communication Group Working Papers, vol. 9, pp. 94-105, 1994.
- Mark Johnson, Automatic context-sensitive measurement of the acoustic correlates of distinctive features, ICSLP, Yokohama, pp. 1639-1643, 1994
- Mark A. Johnson, "A mapping between trainable generalized properties and the acoustic correlates of distinctive features," JASA, vol. 94, p. 1865, 1993.
- İ. Yücel Özbek, Mark Hasegawa-Johnson, and Yübeccel Demirekler, Formant Trajectories for Acoustic-to-Articulatory Inversion, Interspeech 95957:1-4, Brighton, September 2009
- Yanli Zheng, Feature Extraction and Acoustic Modeling for Speech Recognition. Ph.D. Thesis, 2005 (NSF 0132900; Software)
- Yanli Zheng, Mark Hasegawa-Johnson, and Sarah Borys, Stop Consonant Classification by Dynamic Formant Trajectory. Interspeech pp. 396-9, October, 2004 (NSF 0132900).
- Yanli Zheng and Mark Hasegawa-Johnson, Formant Tracking by Mixture State Particle Filter, ICASSP 2004 (NSF 0132900).
- Yanli Zheng and Mark Hasegawa-Johnson, Particle Filtering Approach to Bayesian Formant Tracking, IEEE Workshop on Statistical Signal Processing, September, 2003, 581-584 (NSF 0132900).
- Tim Mahrt, Jennifer Cole, Margaret Fleck and Mark Hasegawa-Johnson, Accounting for Speaker Variation in the Production of Prominence using the Bayesian Information Criterion, Speech Prosody 2012 (NSF 0703624)
- Jui-Ting Huang, Semi-Supervised Learning for Acoustic and Prosodic Modeling in Speech Applications, Ph.D. thesis, University of Illinois, 2012
- Yoonsook Mo, Jennifer Cole, and Mark Hasegawa-Johnson, Prosodic effects on temporal structure of monosyllabic CVC words in American English, Speech Prosody 2010 100208:1-4 (NSF 0703624).
- Jennifer Cole, Yoonsook Mo, and Mark Hasegawa-Johnson, Signal-based and expectation-based factors in the perception of prosodic prominence, Journal of Laboratory Phonology, in press (NSF 0703624)
- Yoonsook Mo, Jennifer Cole and Mark Hasegawa-Johnson, Prosodic effects on vowel production: evidence from formant structure, Interspeech 19096:1-4, Brighton, September 2009 (NSF 0703624)
- Yoonsook Mo, Jennifer Cole and Mark Hasegawa-Johnson, How do ordinary listeners perceive prosodic prominence? Syntagmatic vs. Paradigmatic comparison. Spring Meeting of the ASA, 2009 (NSF 0703624)
- Taejin Yoon, Jennifer Cole and Mark Hasegawa-Johnson, On the edge: Acoustic cues to layered prosodic domains, in Proc. International Congress on Phonetic Sciences (ICPhS) 1264:1017-1020 Saarbrücken, August, 2007 (NSF 0414117).
- Taejin Yoon, Jennifer Cole and Mark Hasegawa-Johnson, On the edge: Acoustic cues to layered prosodic domains. 81st Annual Meeting of the Linguistic Society of America, Anaheim, CA, January 5, 2007 (NSF 0414117).
- Jennifer Cole, Heejin Kim, Hansook Choi, and Mark Hasegawa-Johnson, "Prosodic effects on acoustic cues to stop voicing and place of articulation: Evidence from Radio News speech." J Phonetics 35:180-209, 2007 (NSF 0414117).
- Heejin Kim, Taejin Yoon, Jennifer Cole and Mark Hasegawa-Johnson, Acoustic differentiation of L- and L-L% in Switchboard and Radio News speech. Proceedings of Speech Prosody 2006, Dresden (NSF 0414117).
- Taejin Yoon, "Mapping Syntax and Prosody." Midwest Computational Linguistics Colloquium, Columbus, OH, 2005 (NSF 0414117).
- Jeung-Yoon Choi, Mark Hasegawa-Johnson, and Jennifer Cole, "Finding Intonational Boundaries Using Acoustic Cues Related to the Voice Source." Journal of the Acoustical Society of America 118(4):2579-88, 2005 (Illinois CRI).
- Cole, Jennifer, Mark Hasegawa-Johnson, Chilin Shih, Eun-Kyung Lee, Heejin Kim, H. Lu, Yoonsook Mo, Tae-Jin Yoon. (2005). "Prosodic Parallelism as a Cue to Repetition and Hesitation Disfluency," Proceedings of DISS'05 (An ISCA Tutorial and Research Workshop), Aix-en-Provence, France, pp. 53-58 (NSF 0414117).
- Taejin Yoon, Sandra Chavarria, Jennifer Cole, and Mark Hasegawa-Johnson, Intertranscriber Reliability of Prosodic Labeling on Telephone Conversation Using ToBI. Interspeech, October, 2004 (Illinois CRI).
- Tae-Jin Yoon, Heejin Kim, and Sandra Chavarría. "Local Acoustic Cues Distinguishing Two Levels of prosodic Phrasing: Speech Corpus Evidence," Lab phon 9, University of Illinois at Urbana-Champaign, 2004 (Illinois CRI).
- Heejin Kim, Jennifer Cole, Hansook Choi, and Mark Hasegawa-Johnson, The Effect of Accent on Acoustic Cues to Stop Voicing and Place of Articulation in Radio News Speech, SpeechProsody 2004, Nara, Japan, March 2004, 29-32 (Illinois CRI).
- Sandra Chavarria, Taejin Yoon, Jennifer Cole, and Mark Hasegawa-Johnson, Acoustic differentiation of ip and IP boundary levels: Comparison of L- and L-L% in the Switchboard corpus, Speech Prosody 2004, Nara, Japan, March 2004, 333-336 (Illinois CRI).
- Jennifer Cole, Hansook Choi, Heejin Kim, and Mark Hasegawa-Johnson, The Effect of Accent on the Acoustic Cues to Stop Voicing in Radio News Speech, Proceedings of the International Congress of Phonetic Sciences, Barcelona, Spain, August, 2003 (Illinois CRI).
- Mark A. Johnson, "Analysis of durational rhythms in two poems by Robert Frost," MIT Speech Communication Group Working Papers, vol. 8, pp. 29-42, 1992.
- Lovable Indestructible Grad Student of Chaos, the cartoons.
- Lovable Indestructible Grad Student of Chaos, the Ph.D. thesis