Publications of the Statistical Speech Technology Group
Speech-to-Text
Acoustic Modeling
-
Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill,
Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, and Chang D. Yoo, ``
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction,''
accepted to Interspeech 2023
-
Jialu Li and Mark
Hasegawa-Johnson, Autosegmental
Neural Nets 2.0: An Extensive Study of Training
Synchronous and Asynchronous Phones and Tones for
Under-Resourced Tonal Languages, IEEE Transactions
on Audio, Speech and Language 30:1918-1926, 5/2022,
doi:10.1109/TASLP.2022.3178238
- Heting Gao, Xiaoxuan Wang, Sunghun Kang, Rusty Mina,
Dias Issa, John Harvill, Leda Sari, Mark Hasegawa-Johnson
and Chang
D. Yoo, Seamless
Equal Accuracy Ratio for Inclusive CTC Speech
Recognition, Speech Communication 136:76-83, 2022
-
Leda Sarı, Mark Hasegawa-Johnson and Chang D. Yoo,
Counterfactually
Fair Automatic Speech Recognition,
IEEE Transactions on Audio, Speech, and
Language 29:3515-3525, 2021, doi:10.1109/TASLP.2021.3126949
-
Heting Gao, Junrui Ni, Yang Zhang, Kaizhi Qian, Shiyu Chang and Mark Hasegawa-Johnson,
Zero-shot
Cross-Lingual Phonetic Recognition with External Language
Embedding, Proc. Interspeech 2021
-
Leda Sarı, Mark Hasegawa-Johnson and Samuel Thomas,
Auxiliary
Networks for Joint Speaker Adaptation and Speaker Change
Detection, IEEE Transactions on Audio, Speech, and
Language, accepted for publication.
-
Mark Hasegawa-Johnson, Leanne Rolston, Camille Goudeseune,
Gina-Anne Levow and Katrin
Kirchhoff, Grapheme-to-Phoneme Transduction for Cross-Language ASR,
Proceedings of Statistical Language and Speech Processing,
Lecture Notes in Computer Science 12379:3-19, 2020.
-
Jialu Li and Mark
Hasegawa-Johnson, Autosegmental
Neural Nets: Should Phones and Tones be Synchronous or
Asynchronous? in Proc. Interspeech 2020, accepted for
publication
-
Piotr Zelasko, Laureano Moro-Velazquez, Mark
Hasegawa-Johnson, Odette Scharenborg and Najim Dehak,
That Sounds
Familiar: an Analysis of Phonetic Representations Transfer
Across Languages, in Proc. Interspeech 2020, accepted
for publication
-
Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson and Michael Picheny,
Speaker
Adaptation of Neural Networks with Learning Speaker Aware
Offsets, Interspeech 2019
-
Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark
Hasegawa-Johnson and Deming
Chen, When CTC Training Meets
Acoustic Landmarks, ICASSP 2019, pp. 1-5, paper 2150
-
Van Hai Do, Nancy F. Chen, Boon Pang Lim, and Mark
Hasegawa-Johnson, Multitask
Learning for Phone Recognition of Underresourced Languages Using
Mismatched Transcription, IEEE/ACM Transactions on Audio, Speech
and Language Processing (TASLP), Volume 26 Issue 3, March 2018
Page 501-514, doi:10.1109/TASLP.2017.2782360
- Jialu Li and Mark Hasegawa-Johnson,
A Comparable Phone Set for the
TIMIT Dataset Discovered in Clustering of Listen, Attend and
Spell, in Proceedings of the Workshop on
Interpretability
and Robustness in Audio, Speech, and Language, NeurIPS 2018
- Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson
and Najim
Dehak,
Visualizing
Phoneme Category Adaptation in Deep Neural Networks,
in Proc. Interspeech 2018, pp. 1707:1-5,
doi:10.21437/Interspeech.2018-1707
-
Leda Sari and Mark
Hasegawa-Johnson,
Speaker Adaptation with an Auxiliary Network,
MLSLP (ISCA Workshop on Machine
Learning for Speech and Language Processing), 2018
-
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and
Deming Chen, Improved ASR for
under-resourced languages through Multi-task Learning with
Acoustic Landmarks, in Proc. Interspeech 2018, pp. 1124:1-5,
doi:10.21437/Interspeech.2018-1124
-
Amit Das, Speech Recognition with Probabilistic
Transcriptions and End-to-End Systems Using Deep Learning, Ph.D. Thesis,
University of Illinois, 2018
-
Amit Das and Mark Hasegawa-Johnson, Improving
DNNs Trained With Non-Native Transcriptions Using Knowledge
Distillation and Target Interpolation, in Proc. Interspeech
2018, pp. 1450:1-5, doi:10.21437/Interspeech.2018-1450
- Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen,
Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux,
Lukas Burget, François Yvon, and Sanjeev Khudanpur, ``Bayesian
Models for Unit Discovery on a Very Low Resource Language,''
Proc. ICASSP 2018
-
Wenda Chen, Mark Hasegawa-Johnson, and Nancy Chen,
Recognizing
Zero-resourced Languages based on Mismatched Machine
Transcriptions, Proc. ICASSP 2018, pp. 5979-5983, doi:
10.1109/ICASSP.2018.8462481
-
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas,
Bhuvana Ramabhadran, and Mark
Hasegawa-Johnson, Joint
Modeling of Accents and Acoustics for Multi-Accent Speech
Recognition, in Proc. ICASSP 2018, pp. 5989-5993, doi:
10.1109/ICASSP.2018.8462557
- Van Hai Do, Nancy F. Chen, Boon Pang Lim, and Mark
Hasegawa-Johnson, Multi-Task
Learning for Phone Recognition of Under-resourced Languages using
Mismatched Transcription, IEEE Transactions on Audio, Speech,
and Language, in press
-
Odette Scharenborg, Francesco Ciannella, Shruti Palaskar, Alan
Black, Florian Metze, Lucas Ondel, and Mark Hasegawa-Johnson,
Building an ASR System for a
Low-Resource Language Through the Adaptation of a High-Resource
Language ASR System: Preliminary Results, in
Proc. Internat. Conference on Natural Language, Signal and
Speech Processing (ICNLSSP)} 2017, Casablanca, Morocco
- Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, and Boon Pang Lim,
Mismatched Crowdsourcing from
Multiple Annotator Languages For Recognizing Zero-resourced
Languages: A Nullspace Clustering Approach, Proc. Interspeech
2017, paper 1567
-
Pavlos Papadopoulos, Ruchir Travadi, Colin Vaz, Nikolaos
Malandrakis, Ulf Hermjakob, Nima Pourdamghani, Michael
Pust, Boliang Zhang, Xiaoman Pan, Di Lu, Ying Lin,
Ondrej Glembek, Murali Karthick B, Martin Karafiat,
Lukas Burget, Mark Hasegawa-Johnson, Heng Ji, Jonathan
May, Kevin Knight, and Shrikanth Narayanan,
Team
ELISA System for DARPA LORELEI Speech Evaluation 2016,
Proc. Interspeech 2017
- Amit Das, Mark Hasegawa-Johnson and Karel Vesely,
Deep Autoencoder Based
Multi-task Learning Using Probabilistic Transcription,
Proc. Interspeech 2017 582:1-5,
doi:10.21437/Interspeech.2017-582
- Yang Zhang, Generative Models
for Speech and Time Domain Signals, Ph.D. Thesis,
2017
- Van Hai Do, Nancy F. Chen, Boon Pang Lim, and Mark
Hasegawa-Johnson, Multi-task
Learning using Mismatched Transcription for Under-resourced Speech
Recognition, Proc. Interspeech, 2017
- Preethi Jyothi and Mark Hasegawa-Johnson,
Low-Resource
Grapheme-to-Phoneme Conversion using Recurrent Neural
Networks,
Proc. ICASSP 2017, Paper ID: 2093
-
Mark Hasegawa-Johnson, Preethi Jyothi, Daniel McCloy, Majid
Mirbagheri, Giovanni di Liberto, Amit Das, Bradley Ekin, Chunxi Liu,
Vimal Manohar, Hao Tang, Edmund C. Lalor, Nancy Chen, Paul Hager,
Tyler Kekona, Rose Sloan, and Adrian KC
Lee, ASR for
Under-Resourced Languages from Probabilistic Transcription,
IEEE/ACM Trans. Audio, Speech and Language 25(1):46-59, 2017 (Print
ISSN: 2329-9290, Online ISSN: 2329-9304, Digital Object Identifier:
10.1109/TASLP.2016.2621659)
(Data)
-
Van Hai Do, Nancy F. Chen, Boon Pang Lim and Mark
Hasegawa-Johnson, ``Speech recognition of under-resourced
languages using mismatched transcriptions,'' International
Conference on Asian Language Processing IALP 2016,
Tainan, Taiwan, 11/21-23, 2016
-
Van Hai Do, Nancy F. Chen, Boon Pang Lim and Mark
Hasegawa-Johnson, ``A many-to-one phone mapping approach for
cross-lingual speech recognition,'' 12th IEEE-RIVF International
Conference on Computing and Communication Technologies, Hanoi,
Vietnam, 11/7-9, 2016
-
Amit Das and Mark
Hasegawa-Johnson, An
investigation on training deep neural networks using
probabilistic transcription. Interspeech 2016
(Software)
- Amit Das, Preethi Jyothi and Mark
Hasegawa-Johnson, Automatic speech
recognition using probabilistic transcriptions in Swahili,
Amharic and Dinka. Interspeech 2016
(Software)
(Data)
- Chunxi Liu, Preethi Jyothi, Hao Tang, Vimal Manohar, Rose Sloan,
Tyler Kekona, Mark Hasegawa-Johnson, Sanjeev
Khudanpur, Adapting ASR for
Under-Resourced Languages Using Mismatched Transcriptions,
Proc. ICASSP 2016
- Amit Das and Mark
Hasegawa-Johnson, Cross-lingual
transfer learning during supervised training in low resource
scenarios, Interspeech 2015, pp. 3531-3535
- Xiayu Chen, Yang Zhang and Mark
Hasegawa-Jonson, An Iterative
Approach to Decision-Tree Training, Interspeech 2014
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi,
Automatic Long Audio
Alignment and Confidence Scoring for Conversational Arabic Speech,
The 9th edition of the Language Resources and Evaluation
Conference (LREC 2014), ISBN 9782951740884, Reykjavik, Iceland,
(QNRF NPRP 09-410-1-069)
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi, Development
of a TV Broadcasts Speech Recognition System for Qatari Arabic,
The 9th edition of the Language Resources and Evaluation Conference
(LREC 2014), pp. 3057-3061, ISBN 9782951740884, Reykjavik, Iceland,
(QNRF NPRP 09-410-1-069)
-
Raymond
Yeh, Divergence Guided Two Beams
Viterbi Algorithm on Factorial HMMs, B.S. Thesis, 2014
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi, A Transfer
Learning
Approach for Under-Resourced Arabic Dialects Speech Recognition,
Workshop on Less Resourced Languages, new technologies, new
challenges and opportunities (LTC 2013), pp. 60-64 (QNRF NPRP
09-410-1-069)
-
Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi,
``Automatic Long Audio Alignment for Conversational Arabic Speech,''
Qatar Foundation Annual Research Conference 2013,
DOI: 10.5339/qfarf.2013.ICTP-03
-
Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi,
``Development of a Spontaneous Large Vocabulary Speech Recognition
System for Qatari Arabic,'' Qatar Foundation Annual Research
Conference 2013, DOI: 10.5339/qfarf.2013.ICTP-053
-
Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi, A Framework for
Conversational Arabic Speech Long Audio Alignment, Proc. 6th Language
and Technology Conference (LTC 2013), pp. 290-293 (QNRF NPRP 09-410-1-069)
- Sujeeth Bharadwaj, Mark Hasegawa-Johnson, Jitendra Ajmera, Om
Deshmukh, and Ashish
Verma, Sparse Hidden Markov
Models for Purer Clusters, Proc. ICASSP 2013
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman
Mustafawi, A Baseline Speech
Recognition System for Levantine Colloquial Arabic, Proceedings of
ESOLEC 2012 (QNRF NPRP 410-1-069)
-
Po-Sen Huang and Mark
Hasegawa-Johnson, Cross-Dialectal
Data Transferring for Gaussian Mixture Model Training in Arabic
Speech Recognition,
International Conference on Arabic
Language Processing CITALA 2012, pp. 119-122, ISBN
978-9954-9135-0-5 (QNRF NPRP 410-1-069).
-
Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi, Rehab
Duwairi, and Wolfgang
Minker, Challenges and
Techniques for Dialectal Arabic Speech Recognition and Machine
Translation, Qatar Foundation Annual Research Forum, p. 244 (QNRF
NPRP 410-1-069)
(Abstract)
-
Jui-Ting Huang, Semi-Supervised
Learning for Acoustic and Prosodic Modeling in Speech
Applications,
Ph.D. thesis, University of Illinois, 2012
-
Mark Hasegawa-Johnson, Jui-Ting Huang, Roxana Girju, Rehab Mustafa
Mohamma Duwairi, Eiman Mohd Tayyeb H B Mustafawi, and Elabbas
Benmamoun,
Learning to Recognize Speech from a Small Number of Labeled Examples,
Qatar Foundation Annual Research Forum, p. 269 (QNRF NPRP 410-1-069)
-
Mark Hasegawa-Johnson, Jui-Ting Huang, Sarah King and Xi Zhou,
Normalized recognition of speech and audio events
, Journal of the Acoustical Society of
America 130:2524 (NSF 0807329)
-
Mark Hasegawa-Johnson, Jui-Ting Huang, and Xiaodan
Zhuang, Semi-supervised
learning for speech and audio processing, Journal of the
Acoustical Society of America 130:2408 (NSF 0703624)
-
Boon Pang Lim, Computational
Differences between Whispered and Non-whispered Speech,
Ph.D. Thesis, University of Illinois, 2011
-
Jui-Ting Huang, Mark Hasegawa-Johnson, and Jennifer Cole,
How Unlabeled Data Change the Acoustic
Models For Phonetic Classification, Workshop on New Tools and
Methods for Very Large Scale Phonetics Research, University of
Pennsylvania, Jan. 2011
-
Jui-Ting Huang, Po-Sen Huang, Yoonsook Mo, Mark Hasegawa-Johnson,
Jennifer
Cole, Prosody-Dependent
Acoustic Modeling Using Variable-Parameter Hidden Markov Models,
Speech Prosody 2010 100623:1-4 (NSF 0703624).
-
Hao Tang, Mark Hasegawa-Johnson, Thomas S. Huang,
,Toward
Robust Learning of the Gaussian Mixture State Emission Densities for
Hidden Markov Models, ICASSP 2010 (NSF 0803219)
-
Jui-Ting Huang, Xi Zhou, Mark Hasegawa-Johnson and Thomas
Huang, Kernel
Metric Learning for Phonetic Classification, ASRU 2009 pp. 141-5
(NSF 0703624 and 0534133)
-
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis
Goldstein, and Elliot
Saltzman, Articulatory
Phonological Code for Word Recognition, Interspeech, 34549:1-4,
Brighton, September 2009 (NSF 0703624)
-
Bowon Lee and Mark Hasegawa-Johnson, A
Phonemic Restoration Approach for Automatic Speech Recognition with
Highly Nonstationary Background Noise, DSP in Cars workshop,
Dallas, July 2009
-
Jui-Ting Huang and Mark
Hasegawa-Johnson, On
semi-supervised learning of Gaussian mixture models for phonetic
classification, NAACL HLT Workshop on Semi-Supervised Learning,
2009, pp. 75-83 (NSF 0534106 and NSF 0703624).
-
Jui-Ting Huang and Mark
Hasegawa-Johnson,
Maximum Mutual Information
Estimation with Unlabeled Data for Phonetic Classification.
Proc. Interspeech 2008 (NSF 0534133)
(software).
-
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis
Goldstein, and Elliot
Saltzman, The
Entropy of Articulatory Phonological Code: Recognizing Gestures from
Tract Variables, Interspeech 2008 (NSF 0703624, NSF 0703782, NIH
DC02717).
-
Arthur Kantor and Mark Hasegawa-Johnson,
Stream Weight Tuning in Dynamic Bayesian
Networks, Proc. ICASSP pp. 4525-8, 2008 (NSF 0703624).
-
Bowon Lee, Robust Speech
Recognition in a Car Using a Microphone Array. Ph.D. thesis, 2006
(Motorola
RPS19; Software;
Data)
-
Rahul Chitturi and Mark
Hasegawa-Johnson,
Novel Entropy-Based Moving Average Refiners for HMM Landmarks.
Interspeech, September 2006 (NSF 0132900).
-
Mark Hasegawa-Johnson, James Baker, Sarah Borys, Ken Chen, Emily
Coogan, Steven Greenberg, Amit Juneja, Katrin Kirchhoff, Karen
Livescu, Srividya Mohan, Jennifer Muller, Kemal Sönmez, and Tianyu
Wang, "Landmark-Based
Speech Recognition: Report of the 2004 Johns Hopkins Summer
Workshop." ICASSP, March 2005, pp. 1213-1216 (NSF 0121285).
-
Yeojin Kim and Mark
Hasegawa-Johnson, Phonetic Segment
Rescoring Using SVMs. Midwest Computational Linguistics
Colloquium, Columbus, OH, 2005 (NSF 0132900).
-
Mark Hasegawa-Johnson, James Baker, Steven Greenberg, Katrin
Kirchhoff, Jennifer Muller, Kemal Sonmez, Sarah Borys, Ken Chen,
Amit Juneja, Katrin Kirchhoff, Karen Livescu, Srividya Mohan,
Emily Coogan, and Tianyu Wang,
Landmark-Based Speech Recognition: Report of the 2004 Johns
Hopkins Summer Workshop. technical report of the Johns
Hopkins Center for Language and Speech Processing, 2005 (NSF
0121285).
-
Mark
Hasegawa-Johnson,
Landmark-Based Speech Recognition: The Marriage of
High-Dimensional Machine Learning Techniques with Modern
Linguistic Representations, talk given at Tsinghua
University, October 2004 (NSF 0132900).
-
Ameya Deoras and Mark
Hasegawa-Johnson,
A Factorial HMM Approach to Robust Isolated Digit
Recognition in Background Music.
Interspeech, October, 2004 (NSF 0132900).
-
Ameya Deoras and Mark
Hasegawa-Johnson,
A Factorial HMM Approach to Simultaneous Recognition of Isolated Digits Spoken
by Multiple Talkers on One Audio Channel,
ICASSP 2004 (NSF 0132900).
-
Yanli Zheng and Mark
Hasegawa-Johnson, Acoustic
segmentation using switching state Kalman Filter, ICASSP
2003, April 2003, I:752-755 (NSF 0132900).
-
Ameya Deoras, A Factorial HMM
Approach to Robust Isolated Digit Recognition in Non-Stationary
Noise.
B.S. Thesis, 2003.
-
Mohammed K. Omar, Mark Hasegawa-Johnson and Stephen
E. Levinson, Gaussian Mixture
Models of Phonetic Boundaries for Speech Recognition,
ASRU 2001 (NSF 0132900).
-
Mark
Hasegawa-Johnson,
Multivariate-State Hidden Markov Models for Simultaneous
Transcription of Phones and Formants,
ICASSP, Istanbul, pp. 1323-26, 2000
Landmarks and Features
-
Mahir Morshed and Mark Hasegawa-Johnson, ``Cross-lingual
articulatory feature information transfer for speech recognition
using recurrent progressive neural neworks,'' Interspeech 2022
-
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson,
and Deming
Chen,
Acoustic landmarks contain more information about the
phone string than other frames for automatic speech
recognition with deep neural network acoustic model,
Journal of the Acoustical Society of America
143(6):3207-3219, doi:10.1121/1.5039837
-
Xiang Kong, Xuesong Yang, Jeung-Yoon Choi, Mark
Hasegawa-Johnson and Stefanie Shattuck-Hufnagel,
``Landmark-based consonant voicing detection on
multilingual corpora,'' Acoustics 17, Boston, June 25,
2017
- Kaizhi Qian, Yang Zhang and Mark
Hasegawa-Johnson, Application of Local
Binary Patterns for SVM based Stop Consonant Detection, Speech
Prosody 2016
- Sarah King and Mark
Hasegawa-Johnson, Accurate Speech
Segmentation by Mimicking Human Auditory Processing, Proc. ICASSP
2013 (NSF 0807329)
- Po-Sen Huang, Li Deng, Mark Hasegawa-Johnson and Xiaodong He,
Random Features for Kernel Deep
Convex Network, Proc. ICASSP 2013, pages=8096--8900
- Sarah King and Mark
Hasegawa-Johnson, Detection of
Acoustic-Phonetic Landmarks in Mismatched Conditions Using a
Biomimetic Model of Human Auditory Processing, CoLing 2012 589--598 (QNRF
NPRP 09-410-1-069 and NSF CCF 0807329)
- Mark Hasegawa-Johnson, Elabbas Benmamoun, Eiman Mustafawi, Mohamed
Elmahdy and Rehab
Duwairi, On The
Definition of the Word `Segmental', Speech Prosody 2012,
pp. 159-162 (ISBN 978-7-5608-486-3, QNRF
NPRP 410-1-069)
- Sarah
Borys, An
SVM Front End Landmark Speech Recognition System, M.S. Thesis,
2008.
- Bryce Lobdell, Mark Hasegawa-Johnson, and Jont
B. Allen, Human Speech Perception
and Feature Extraction, Interspeech 2008
- Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, and
Zhengyou
Zhang, Frequency
Domain Correspondence for Speaker Normalization, in
Proc. Interspeech pp. 274-7, Antwerp, August, 2007.
-
Xi Zhou, Yu Fun, Ming Liu, Mark Hasegawa-Johnson, and
Thomas Huang, Robust
Analysis and Weighting on MFCC Components for Speech
Recognition and Speaker Identification,
ICME 2007 (VACE NBCHC060160; NSF 0426627).
- Sarah Borys and Mark
Hasegawa-Johnson,
Distinctive Feature Based SVM Discriminant Features for
Improvements to Phone Recognition on Telephone Band
Speech.
ISCA Interspeech, October 2005 (NSF 0132900;
Software to intertranslate HTK
and libsvm).
-
Yanli Zheng,
Feature Extraction and Acoustic Modeling for Speech Recognition.
Ph.D. Thesis, 2005 (NSF 0132900;
Software)
-
Mark Hasegawa-Johnson, Sarah Borys and Ken
Chen,
Experiments in Landmark-Based Speech Recognition.
Sound to Sense: Workshop in Honor of Kenneth N. Stevens,
June, 2004 (NSF 0132900).
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson,
Model Enforcement: A Unified Feature Transformation Framework for
Classification and Recognition, IEEE Transactions on Signal
Processing, vol. 52, no. 10, pp. 2701-2710, 2004 (NSF 0132900).
-
Stefan
Geirhofer,
Feature Reduction with Linear Discriminant Analysis and its
Performance on Phoneme Recognition.
Undergraduate research project.
-
Mohamed Kamal Mahmoud Omar,
Acoustic Feature Design for Speech Recognition: A Statistical
Information-Theoretic Approach.
Ph.D. Thesis, 2003.
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson,
Approximately Independent Factors of Speech Using Nonlinear Symplectic
Transformation, IEEE Transactions on Speech and Audio Processing,
vol. 11, no. 6, pp. 660-671, 2003 (NSF 0132900).
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson,
Non-Linear Independent Component Analysis for Speech
Recognition,
International Conference on Computer, Communication and
Control Technologies (CCCT '03), 2003 (NSF 0132900).
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson,
Strong-Sense Class-Dependent Features for Statistical
Recognition,
IEEE Workshop on Statistical Signal Processing,
St. Louis, MO, 2003, 473-476 (NSF 0132900).
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson,
Maximum Conditional Mutual Information Projection For Speech
Recognition,
Interspeech, September, 2003, 505-508 (NSF 0132900).
-
Mohammed Kamal Omar and
Mark Hasegawa-Johnson,
Non-Linear Maximum Likelihood Feature Transformation For Speech
Recognition,
Interspeech, September, 2003, 2497-2500 (NSF
0132900).
- Mark
Hasegawa-Johnson,
Finding the Best Acoustic Measurements for Landmark-Based
Speech Recognition,
Accumu Magazine, Kyoto Computer Gakuin, Kyoto, Japan,
2002 (NSF 0132900).
-
Mohammed Kamal Omar, Ken Chen, Mark Hasegawa-Johnson and
Yigal Brandman, An
Evaluation of using Mutual Information for Selection of
Acoustic-Features Representation of Phonemes for Speech
Recognition, Interspeech, Denver, CO, September 2002,
pp. 2129-2132 (Phonetact, Inc.).
-
Zhinian Jing and Mark
Hasegawa-Johnson,
Auditory-Modeling Inspired Methods of Feature Extraction
for Robust Automatic Speech Recognition, ICASSP
Student Session, May 2002, IV:4176 (NSF 0132900).
-
Mohammed Kamal Omar and Mark
Hasegawa-Johnson, Maximum
Mutual Information Based Acoustic Features Representation
of Phonological Features for Speech Recognition,
ICASSP, May 2002, I:81-84.
-
Zhinian Jing, Voice Index
and Frame Index for Recognition of Digits in Speech
Background. M.S. Thesis, 2002.
-
Wira Gunawan and Mark
Hasegawa-Johnson, PLP
Coefficients can be Quantized at 400 bps, ICASSP, Salt
Lake City, UT, pp. 2.2.1-4, 2001.
-
Wira Gunawan,
Distributed
Speech Recognition, M.S. Thesis, 2000
Pronunciation Modeling
-
Liming Wang, Siyuan Feng, Mark A. Hasegawa-Johnson and
Chang D. Yoo, Self-supervised Semantic-driven Phoneme
Discovery for Zero-resource Speech Recognition, ACL 2022
-
Piotr Zelasko and Siyuan Feng and Laureano Moro-Velazquez
and Ali Abavisani and Saurabchand Bhati and Odette
Scharenborg and Mark Hasegawa-Johnson and Najim Dehak,
``Discovering Phonetic Inventories Discovering Phonetic
Inventories with Crosslingual Automatic Speech Recognition,''
Computer Speech and Language, 2021
-
Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali
Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg,
Najim Dehak,
How
Phonotactics Affect Multilingual and Zero-shot ASR
Performance, Proc. ICASSP 2021
pp. 7238-7242
-
Mark Hasegawa-Johnson, Leanne Rolston, Camille Goudeseune,
Gina-Anne Levow, and Katrin
Kirchhoff, Grapheme-to-Phoneme
Transduction for Cross-Language ASR,
Proc. International Conference on Statistical Language and Speech Processing,
Lecture Notes in Computer Science 12379:3-19, 2020
-
Preethi Jyothi and Mark
Hasegawa-Johnson, Improving
Hindi Broadcast ASR by Adapting the Language Model and Pronunciation
Model Using A Priori Syntactic and Morphophonemic Knowledge,
Interspeech 2015, pp. 3164-3168
- Mahmoud Abunasser, Abbas Benmamoun, and Mark
Hasegawa-Johnson, Pronunciation
Variation Metric for Four Dialects of Arabic, presentation at AIDA
10 (Association Internationale de Dialectologie Arabe), Qatar
University, 2013
-
Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi,
Hybrid Phonemic and Graphemic
Modeling for Arabic Speech Recognition,
International Journal of
Computational Linguistics, volume 3, issue 1, pp. 88-96, ISSN
2180-1266 (QNRF NPRP 09-410-1-069), 2012
- Mohamed Elmahdy, Mark Hasegawa-Johnson and Eiman Mustafawi,
Hybrid
Pronunciation Modeling for Arabic Large Vocabulary Speech
Recognition, Qatar Foundation Annual Research Forum, 2012 (QNRF
09-410-1-069)
- Arthur Kantor and Mark
Hasegawa-Johnson, HMM-based
Pronunciation Dictionary Generation, Workshop on New Tools and
Methods for Very Large Scale Phonetics Research, University of
Pennsylvania, Jan. 2011 (NSF 0703624,
0913188; Software).
- Arthur
Kantor,
Pronunciation modeling for large vocabulary speech
recognition,
Ph.D. Thesis 2010, University of Illinois (NSF
0703624, 0913188;
Software
).
- Chi Hu, FSM-Based
Pronunciation Modeling using Articulatory Phonological Code,
M.S. Thesis 2010, University of Illinois (NSF 0703624 and NSF
0623805).
- Chi Hu, Xiaodan Zhuang, and Mark
Hasegawa-Johnson, FSM-Based
Pronunciation Modeling using Articulatory Phonological Code,
Proceedings of Interspeech 2010 pp. 2274-2277, (NSF 0703624).
- Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman,
Louis Goldstein, Carol Espy-Wilson, and Mark Hasegawa-Johnson,
A procedure for estimating
gestural scores from natural speech, Proceedings of Interspeech
2010 (NSF 0703624)
- Karen Livescu, Özgür Çetin, Mark Hasegawa-Johnson, Simon King,
Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa Yung, Ari
Bezman, Stephen Dawson-Hagerty, Bronwyn Woods, Joe Frankel, Mathew
Magimai-Doss, and Kate
Saenko,
Articulatory-Feature-Based Methods for Acoustic and Audio-Visual
Speech Recognition: 2006 JHU Summer Workshop Final Report. Johns
Hopkins Center for Language and Speech Processing, 2007 (NSF 0121285).
-
Karen Livescu, Ozgur Cetin, Mark Hasegawa-Johnson, Simon
King, Chris Bartels, Nash Borges, Arthur Kantor, Partha Lal, Lisa
Yung, Ari Bezman, Stephen Dawson-Haggerty, Bronwyn Woods, Joe Frankel,
Matthew Magimai-Doss, and Kate
Saenko,
Articulatory Feature-Based Methods for Acoustic and Audio-Visual
Speech Recognition: Summary from the 2006 JHU Summer Workshop.
ICASSP, May 2007 (NSF 0121285).
-
Ken Chen and Mark
Hasegawa-Johnson,
Modeling pronunciation variation using artificial neural networks for
English spontaneous speech. Interspeech, October 2004, pp. 400-403
(NSF 0414117).
Multimodal: Articulatory, Audiovisual and EEG
-
Justin van der Hout, Mark Hasegawa-Johnson and Odette Scharenborg,
Evaluating
Automatically Generated Phoneme Captions for Images,
Proc. Interspeech 2020, accepted for publication.
-
Liming Wang and Mark
Hasegawa-Johnson, A
DNN-HMM-DNN Hybrid Model for Discovering Word-like Units
from Spoken Captions and Image Regions,
Proc. Interspeech 2020, accepted for publication.
-
Liming Wang and Mark
Hasegawa-Johnson, Multimodal
word discovery and retrieval with spoken descriptions and
visual concepts, IEEE Transactions on Audio, Speech
and Language 28:1560-1573, 2020,
doi:10.1109/TASLP.2020.2996082
-
Odette Scharenborg and Mark
Hasegawa-Johnson, Position
Paper: Brain Signal-based Dialogue
Systems, International
Workshop on Spoken Dialog Systems, 2019
-
Leda Sari, Mark Hasegawa-Johnson, S. Kumaran, Georg Stemmer, and
N. Nair Krishnakumar, Speaker
Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of
AVICAR, in Proc. Interspeech 2018, pp. 2359:1-5,
doi:10.21437/Interspeech.2018-2359
-
Sujeeth Bharadwaj, Raman Arora, Karen Livescu and Mark
Hasegawa-Johnson,
Multi-View Acoustic Feature
Learning Using Articulatory Measurements,
IWSML(Internat. Worksh. on Statistical Machine Learning for
Sign. Process.), 2012 (NSF 0905633)
- İ. Yücel Ozbek, Mark Hasegawa-Johnson and Mübeccel Demirekler,
On Improving Dynamic State Space Approaches to
Articulatory Inversion with MAP based Parameter
Estimation, IEEE Transactions on Audio,
Speech, and Language, in press
- İ. Yücel Ozbek, Mark Hasegawa-Johnson and Mübeccel Demirekler,
Estimation of Articulatory Trajectories
Based on Gaussian Mixture Model (GMM) with Audio-Visual Information
Fusion and Dynamic Kalman Smoothing, IEEE Transactions on Audio,
Speech, and Language 19(5):1180-1195, 2011
- Thomas S. Huang, Mark A. Hasegawa-Johnson, Stephen M. Chu, Zhihong
Zeng, and Hao Tang, Sensitive Talking
Heads, IEEE Signal Processing Magazine 26(4):67-72, July 2009
- Mark Hasegawa-Johnson, Multi-Stream
Approach to Audiovisual Automatic Speech Recognition, IEEE 9th
Workshop on Multimedia Signal Processing (MMSP) pp. 328-31, 2007
- Mark Hasegawa-Johnson, Karen Livescu, Partha Lal and Kate Saenko,
Audiovisual Speech Recognition with Articulator Positions as Hidden
Variables, in Proc. International Congress on Phonetic Sciences
(ICPhS) 1719:297-302, Saarbrücken, August, 2007 (NSF 0121285).
- Mark
Hasegawa-Johnson,
Audio-Visual Speech Recognition: Audio Noise, Video Noise, and
Pronunciation Variability, talk given to the Signal Processing
Society, IEEE Japan, June 2007 (NSF 0534106; NIH DC008090A).
- Yun Fu, Xi Zhou, Ming Liu, Mark Hasegawa-Johnson, and Thomas
S. Huang,
Lipreading by Locality Discriminant Graph, IEEE International
Conference on Image Processing (ICIP) III:325-8, 2007 (VACE NBCHC060160; NSF
0426627).
-
Karen Livescu, Ozgur Cetin, Mark Hasegawa-Johnson, Simon
King, Chris Bartels, Nash Borges, Arthur Kantor, Partha
Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Haggerty,
Bronwyn Woods, Joe Frankel, Matthew Magimai-Doss, and Kate
Saenko,
Articulatory Feature-Based Methods for Acoustic and
Audio-Visual Speech Recognition: Summary from the 2006
JHU Summer Workshop.
ICASSP, May 2007, pp. 621-4 (NSF 0121285).
-
Karen Livescu, Özgür Çetin, Mark Hasegawa-Johnson, Simon
King, Chris Bartels, Nash Borges, Arthur Kantor, Partha
Lal, Lisa Yung, Ari Bezman, Stephen Dawson-Hagerty,
Bronwyn Woods, Joe Frankel, Mathew Magimai-Doss, and Kate
Saenko,
Articulatory-Feature-Based Methods for Acoustic and
Audio-Visual Speech Recognition: 2006 JHU Summer Workshop
Final Report.
Johns Hopkins Center for Language and Speech Processing,
2007 (NSF 0121285).
- Mark
Hasegawa-Johnson,
Object Tracking and Asynchrony in Audio-Visual Speech
Recognition. talk given to the Artificial Intelligence, Vision,
and Robotics seminar series, August, 2006 (NSF 0534106; NIH
DC008090A).
- Mark
Hasegawa-Johnson,
Dealing with Acoustic Noise. Part IIII: Video. tutorial
presentation given at WS06, Center for Language and Speech Processing,
July 2006 (NSF 0121285).
-
Camille Goudeseune and Bowon
Lee,
AVICAR: Audio-Visual Speech Recognition in a Car Environment.
Promotional Film, 2006 (Motorola RPS19).
-
Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu
Kamdar, Sarah Borys, Ming Liu, and Thomas
Huang, AVICAR: Audio-Visual
Speech Corpus in a Car Environment. Interspeech, October 2004,
pp. 380-383 (Motorola
RPS19; Data)
-
Stephen E. Levinson, Thomas S. Huang, Mark
A. Hasegawa-Johnson, Ken Chen, Stephen Chu, Ashutosh Garg, Zhinian
Jing, Danfeng Li, J. Lin, Mohammed Kamal Omar and
Z. Wen,
Multimodal Dialog Systems Research at Illinois, ARPA Workshop on
Multimodal Speech Recognition and SPINE, June, 2002 (NSF 0132900).
Speech-to-Meaning
-
Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu
Chang and Mark
Hasegawa-Johnson, WAVPROMPT:
Towards Few-Shot Spoken Language Understanding with Frozen
Language Models, Interspeech 2022
-
Leda Sari, Samuel Thomas and Mark Hasegawa-Johnson,
Training
Spoken Language Understanding Systems with Non-Parallel
Speech and Text, in Proc. ICASSP 2020, pp. 8109-8113
Speech Synthesis and Voice Conversion
-
Wonjune Kang, Mark Hasegawa-Johnson and Deb Roy,
"End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions,"
accepted to Interspeech 2023
-
Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang
Zhang, Shiyu Chang and Mark
Hasegawa-Johnson, Unsupervised
Text-to-Speech Synthesis by Unsupervised Automatic Speech
Recognition, Interspeech 2022
-
Kaizhi Qian, Yang Zhang, Shiyu Chang, Chuang Gan, David
D. Cox, Mark Hasegawa-Johnson and Jinjun
Xiong, Global
Rhythm Style Transfer Without Text Transcriptions,
ICML 2021
-
Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg,
Show
and Speak: Directly Synthesize Spoken Description of
Images, Proc. ICASSP 2021
-
Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark
Hasegawa-Johnson, and David
Cox, Unsupervised
Speech Decomposition via Triple Information Bottleneck,
in Proc. International Conference on Machine Learning
(ICML 2020), accepted for publication
-
Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson and Gautham
Mysore, F0-Consistent
Many-to-Many Non-Parallel Voice Conversion via Conditional
Autoencoder, in Proc. ICASSP 2020, pp. 6284-6288
-
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark
Hasegawa-Johnson, AutoVC:
Zero-Shot Voice Style Transfer with Only Autoencoder Loss,
Proceedings of Machine Learning Research 97:5210-5219, 2019
- Mark Hasegawa-Johnson, Alan Black, Lucas Ondel, Odette Scharenborg, and
Francesco Ciannella, Image2speech: Automatically
generating audio descriptions of images, Journal of the International
Science and General Applications (ISGA), vol. 1, no. 1, 2018
- Yang Zhang, Zhijian Ou and Mark Hasegawa-Johnson,
``Incorporating AM-FM effect in voiced speech for probabilistic
acoustic tube model,'' Proc. WASPAA 2015
- Yang Zhang, Zhijian Ou, and Mark
Hasegawa-Johnson, Improvement of
Probabilistic Acoustic Tube Model for Speech Decomposition, ICASSP
2014 (Illinois Interdisciplinary Innovation Initiative)
- Xiaodan Zhuang, Lijuan Wang, Frank Soong, and Mark Hasegawa-Johnson,
A Minimum Converted Trajectory
Error (MCTE) Approach to High Quality Speech-to-Lips Conversion,
Proceedings of Interspeech 2010 pp. 1736-1739, (NSF 0703624)
- Thomas S. Huang, Mark A. Hasegawa-Johnson, Stephen M. Chu, Zhihong
Zeng, and Hao Tang, Sensitive Talking
Heads, IEEE Signal Processing Magazine 26(4):67-72, July 2009
- Hao Tang, Yun Fu, Jilin Tu, Mark Hasegawa-Johnson, and Thomas
S. Huang, Humanoid Audio-Visual Avatar
with Emotive Text-to-Speech Synthesis, IEEE Trans. Multimedia
10(6):969-981, 2008
-
Hao Tang, Yuxiao Hu, Yun Fu, Mark Hasegawa-Johnson and
Thomas S. Huang,
Real-time conversion from a single 2D face image to a 3D
text-driven emotive audio-visual avatar,
IEEE International Conference on Multimedia and Expo
(ICME) 2008, pp. 1205-8
-
Hao Tang, Xi Zhou, Matthias Odisio, Mark Hasegawa-Johnson,
and Thomas Huang,
Two-Stage Prosody Prediction for Emotional Text-to-Speech
Synthesis,
Interspeech 2008, pp. 2138-41 (VACE; NSF 0426227).
-
Hao Tang, Yun Fu, Jilin Tu, Thomas Huang, and Mark Hasegawa-Johnson,
EAVA: A 3D Emotive Audio-Visual Avatar,
IEEE Workshop on Applications of Computer Vision (IEEE
WACV '08) pp. 1-6, 2008 (VACE; NSF 0426227).
-
Jul Setsu Cha,
Articulatory Speech Synthesis of Female and Male
Talkers,
M.S. Thesis, UCLA, 2000
Audio Enhancement
-
Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson,
Multi-Decoder
DPRNN: High Accuracy Source Counting and Separation,
Proc. ICASSP 2021, pp. 3420-3424
-
Teck Yian Lim, Raymond Yeh, Yijia Xu, Minh Do, Mark
Hasegawa-Johnson, Time-Frequency Networks for Audio
Super-Resolution, Proc. ICASSP 2018
-
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei
Florencio, and Mark
Hasegawa-Johnson, Deep
Learning Based Speech Beamforming, Proc. ICASSP 2018, pp. 5389-5393, doi:
10.1109/ICASSP.2018.8462430
- Yang Zhang, Xuesong Yang, Zhijian Ou, and Mark Hasegawa-Johnson,
Glottal Residual Assisted
Beamforming, Proc. Interspeech 2017
- Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei
Florencio, and Mark
Hasegawa-Johnson, Speech
Enhancement Using Bayesian Wavenet, Proc. Interspeech
2017
- Ruobai Wang, Yang Zhang, Zhijian Ou and Mark Hasegawa-Johnson,
Use of Particle Filtering and MCMC
for Inference in Probabilistic Acoustic Tube Model,
IEEE Workshop on Statistical Signal Processing, 2016
- Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson and Paris
Smaragdis, Joint
Optimization of Masks and Deep Recurrent Neural Networks
for Monaural Source Separation, IEEE Trans. Audio,
Speech and Language Processing 23(12):2136-2147, 2015
(Software, Examples).
- Po-Sen Huang, Minje Kim, Paris Smaragdis and Mark Hasegawa-Johnson,
Deep Learning for Monaural Speech
Separation, ICASSP 2014 (ARO W911NF-09-1-0383)
-
Po-Sen Huang, Scott Deeann Chen, Paris Smaragdis, and Mark
Hasegawa-Johnson,
Singing-Voice Separation
from Monaural Recordings using Robust Principal Component Analysis,
ICASSP 2012 (ARO W911NF-09-1-0383)
(Software,
Examples)
- Dongeek Shin,
Speech Bandwidth Extension Using Articulatory Features,
B.S. Thesis, 2011
- Lae-Hoon
Kim, Statistical Model
Based Multi-Microphone Speech Processing: Toward Overcoming Mismatch
Problem, Ph.D. Thesis, August 2010, University of Illinois (NSF
0913188)
- Lae-Hoon Kim and Mark
Hasegawa-Johnson, Toward Overcoming
Fundamental Limitation in Frequency-Domain Blind Source Separation for
Reverberant Speech Mixtures, Proceedings of Asilomar 2010 (NSF
0913188)
- Lae-Hoon Kim, Kyung-Tae Kim, and Mark
Hasegawa-Johnson, Robust
Automatic Speech Recognition with Decoder Oriented Ideal Binary Mask
Estimation, Proceedings of Interspeech 2010 pp. 2066-2069 (NSF
0913188; software)
- Lae-Hoon Kim, Kyungtae Kim, and Mark Hasegawa-Johnson, "Speech
enhancement beyond minimum mean squared error with perceptual noise
shaping," 2010 spring meeting of the ASA (Illinois CIRS)
- Lae-Hoon Kim, Mark Hasegawa-Johnson, Gerasimos Potamianos, and Vit
Libal, Joint Estimation of DOA and
Speech Based on EM Beamforming, ICASSP 2010 (Illinois CIRS)
- Lae-Hoon Kim and Mark Hasegawa-Johnson,
Optimal Multi-Microphone Speech
Enhancement in Cars, DSP in Cars workshop, Dallas, July 2009 (NSF
0803219 and 0807329)
- Lae-Hoon Kim, Mark Hasegawa-Johnson, Jun-Seok Lim, and Koeng-Mo
Sung, Acoustic model for robustness
analysis of optimal multipoint room equalization, JASA
123(4):2043-2053, 2008 (Illinois CIRS).
- Lae-Hoon Kim and Mark Hasegawa-Johnson,
Optimal
Speech Estimator Considering Room Response as well as Additive Noise:
Different Approaches in Low and High Frequency Range, ICASSP
pp. 4573-6, 2008 (Illinois CIRS).
- Bowon Lee and Mark
Hasegawa-Johnson,
Minimum Mean Squared Error A Posteriori Estimation of High Variance
Vehicular Noise, in 2007 Biennial on DSP for In-Vehicle and Mobile
Systems, Istanbul, June, 2007 (Motorola RPS19; NSF 0534106;
Software).
- Bowon Lee, Robust Speech
Recognition in a Car Using a Microphone Array. Ph.D. thesis, 2006
(Motorola
RPS19; Software;
Data)
- Mark
Hasegawa-Johnson, Dealing
with Acoustic Noise. Part II: Beamforming. tutorial presentation
given at WS06, Center for Language and Speech Processing, July 2006
(NSF 0121285).
- Mark
Hasegawa-Johnson,
Dealing with Acoustic Noise. Part I: Spectral Estimation.
tutorial presentation given at WS06, Center for Language and Speech
Processing, July 2006 (NSF 0121285).
- Lae-Hoon Kim, Mark Hasegawa-Johnson and Keung-Mo Sung, Generalized
Optimal Multi-Microphone Speech Enhancement Using Sequential Minimum
Variance Distortionless Response (MVDR) Beamforming and Postfiltering,
ICASSP III:65-8, May 2006 (Illinois CIRS).
- Lae-Hoon Kim and Mark
Hasegawa-Johnson,
Generalized multi-microphone spectral amplitude estimation based on
correlated noise model. 119th Convention of the Audio Engineering
Society, New York, October 2005 (Illinois CIRS).
- Mital Gandhi and Mark
Hasegawa-Johnson,
Source Separation using Particle Filters. Interspeech, October
2004 (NSF 0132900).
- Bowon Lee, Mark Hasegawa-Johnson, and Camille
Goudeseune,
Open Loop Multichannel Inversion of Room Impulse Response, JASA
113(4):2202-3, 2003 (NSF 0132900;
Data).
Speech Coding
- Mark Hasegawa-Johnson and Abeer
Alwan,
Speech Coding: Fundamentals and Applications, Wiley Encyclopedia
of Telecommunications and Signal Processing, J. Proakis, Ed., Wiley
and Sons, NY, December 2002 (NSF 0132900).
- Wira Gunawan and Mark
Hasegawa-Johnson, "PLP
Coefficients can be Quantized at 400 bps," ICASSP, Salt Lake City,
UT, pp. 2.2.1-4, 2001.
- Wira Gunawan, Distributed
Speech Recognition, M.S. Thesis, 2000
- Tomohiko Taniguchi and Mark
Johnson, Speech
coding and decoding system (transform stochastic codebook so that,
after perceptual weighting, it will be orthogonal to the adaptive
codebook), U.S. Patent 5799131, August 25, 1998 (Fujitsu).
- Tomohiko Taniguchi, Mark Johnson, Yasuji Ohta, Hideki Kurihara,
Yoshinori Tanaka, and Yoshihito
Sakai, Speech
coding system having codebook storing differential vectors between
each two adjoining code vectors, U.S. Patent 5323486, June 21,
1994 (Fujitsu).
- Tomohiko Taniguchi and Mark
Johnson, Speech
coding system (hexagonal lattice code), U.S. Patent 5245662,
September 14, 1993 (Fujitsu).
-
Tomohiko Taniguchi, Mark Johnson, Hideki Kurihara, Yoshinori
Tanaka, and Yasuji Ohta,
Speech coding and decoding system
(sparse adaptive codebook), U.S. Patent
5199076, March 30, 1993 (Fujitsu).
- Mark Hasegawa-Johnson and Tomohiko
Taniguchi, "On-line
and off-line computational reduction techniques using backward
filtering in CELP speech coders," IEEE Transactions Acoustics,
Speech, and Signal Processing, vol. 40, pp. 2090-2093, 1992
(Fujitsu).
- Mark A. Johnson and Tomohiko
Taniguchi, "Low-complexity
multi-mode VXC using multi-stage optimization and mode selection,"
ICASSP, Toronto, Canada, pp. 221-224, 1991 (Fujitsu).
- Tomohiko Taniguchi, Mark A. Johnson, and Yasuji
Ohta,
Pitch sharpening for perceptually improved CELP, and the sparse-delta
codebook for reduced computation, ICASSP, Toronto, Canada,
pp. 241-244, 1991 (Fujitsu).
- Tomohiko Taniguchi, Fumio Amano, and Mark A. Johnson, "Improving the
performance of CELP-based speech coding at low bit rates,"
International Symposium on Circuits and Systems, Singapore, 1991
(Fujitsu).
- Mark A. Johnson and Tomohiko Taniguchi, "Computational reduction in
sparse-codebook CELP using backward-weighting of the input," Institute
of Electr., Information, and Comm. Eng. Symposium, DSP 90-15, Hakata,
61-66, 1990 (Fujitsu).
- Tomohiko Taniguchi, Mark A. Johnson and Yasuji Ohta, "Multi-vector
pitch-orthogonal LPC: quality speech with low complexity at rates
between 4 and 8 kbps," ICSLP, Kobe, pp. 113-116, 1990 (Fujitsu).
- Mark A. Johnson and Tomohiko Taniguchi, "Pitch-orthogonal code-excited
LPC," IEEE Global Telecommunications Conference (GLOBECOM), San Diego,
CA, pp. 542-546, 1990 (Fujitsu).
Speech-to-Meaning
Speech Technology for Unwritten Languages
-
Liming Wang, Mark Hasegawa-Johnson and Chang D. Yoo,
``A Theory of Unsupervised Speech Recognition,'' ACL 2023
-
Liming Wang, Junrui Ni, Heting Gao, Jialu Li, Kai Chieh Chang,
Xulin Fan, Junkai Wu, Mark Hasegawa-Johnson and Chang D. Yoo,
``Speak, Decipher and Sign: Toward Unsupervised Speech-to-Sign
Language Recognition.'' Findings of ACL 2023
-
Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, Odette
Scharenborg, Najim Dehak,
Align
or Attend? Toward More Efficient and Accurate Spoken Word
Discovery Using Speech-to-Image Retrieval, Proc. ICASSP
2021
- Liming Wang, Mark
A. Hasegawa-Johnson,
Multimodal
Word Discovery and Retrieval with Phone Sequence and Image
Concepts,
in Proc. Interspeech 2019, pp. 2684-2687
-
Mark Hasegawa-Johnson, Najim Dehak and Odette
Scharenborg, Position
Paper: Indirect Supervision for Dialog Systems in Unwritten
Languages, International
Workshop on Spoken Dialog Systems, 2019
-
Odette Scharenborg, Patrick Ebel, Francesco Ciannella, Mark
Hasegawa-Johnson and Najim
Dehak, Building an ASR
System for Mboshi Using a Cross-language Definition of Acoustic
Units Approach, in Proc. SLTU (Speech and Language Technology
for Under-resourced languages), 2018
-
Odette Scharenborg, Laurent Besacier, Alan Black, Mark
Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian
Stüker, Pierre Godard, Markus Müller, Lucas Ondel,
Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing
Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, and
Emmanuel Dupoux, ``Linguistic Unit Discovery from Multi-Modal
Inputs in Unwritten Languages: Summary of the "Speaking Rosetta"
JSALT 2017 Workshop,'' in Proc. ICASSP 2018
-
Wenda Chen, Mark Hasegawa-Jonson and Nancy F.Y. Chen,
Topic and Keyword
Identification for Low-resourced Speech Using Cross-Language
Transfer Learning, in Proc. Interspeech 2018, pp. 1283:1-5,
doi:10.21437/Interspeech.2018-1283
-
Mark Hasegawa-Johnson, Alan Black, Lucas Ondel, Odette
Scharenborg, and Francesco
Ciannella, Image2speech:
Automatically generating audio descriptions of images, in
Proc. Internat. Conference on Natural Language, Signal and
Speech Processing (ICNLSSP)} 2017, Casablanca, Morocco.
-
Lee Estelle, Lim Zhi Yi Vanessa, Ang Hui Shan1 and Lim Boon Pang,
Singapore Hokkien Speech Recognition and Applications,
A*STAR research symposium, 2015
-
Rania Al-Sabbagh, Roxana Girju, Mark Hasegawa-Johnson, Elabbas
Ben-Mamoun, Rahab Duwairi, and Eiman Mustafawi,
Using Web-Mining
Techniques to Build a Multi-Dialect Lexicon of Arabic, Linguistics
in the Gulf Conference, March 2011 (QNRF NPRP 410-1-069)
-
Xiaodan Zhuang, Jui-Ting Huang, and Mark
Hasegawa-Johnson, Speech
Retrieval in Unknown Languages: a Pilot Study, NAACL HLT
Cross-Lingual Information Access Workshop (CLIAWS) pp. 3-11, 2009 (NSF
0534106 and NSF 0703624)
Social Cues
-
Jialu Li, Mark Hasegawa-Johnson and Nancy McElwain,
"Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio,"
accepted to Interspeech 2023
-
Jialu Li, Mark Hasegawa-Johnson and Nancy McElwain,
Analysis
of Acoustic and Voice Quality Features for the
Classification of Infant and Mother Vocalizations,
Speech Communication 133:41-61, 2021
-
Yijia Xu, Acoustic Event,
Spoken Keyword and Emotional Outburst Detection, M.S. Thesis, 2019
-
Yijia Xu, Mark Hasegawa-Johnson, and Nancy
L. McElwain, Infant emotional
outbursts detection in infant-parent spoken interactions, in
Proc. Interspeech 2018, pp. 2429:1-5, doi:10.21437/Interspeech.2018-2429
-
Di He, Zuofu Cheng, Mark Hasegawa-Johnson and Deming Chen,
Using Approximated Auditory
Roughness as a Pre-filtering Feature for Human Screaming and
Affective Speech AED, Proc. Interspeech 2017
-
Mary Pietrowicz, Exposing
the Hidden Vocal Channel: Analysis of Vocal Expression,
Ph.D. Thesis, 2017
-
Mary Pietrowicz, Mark Hasegawa-Johnson, and Karrie
Karahaliqos, Discovering
Dimensions of Perqceived Vocal Expression in Semi-Structured,
Unscripted Oral History Accoqunts, Proc. ICASSP 2017, Paper ID:
2901
-
Mary Pietrowicz, Mark Hasegawa-Johnson and Karrie
Karahalios, Acoustic
Correlates for Perceived Effort Levels in Expressive Speech,
Interspeech 2015, pp. 3720-3724
-
Shobhit Mathur, Marshall Scott Poole, Feniosky Pena-Mora, Mark
Hasegawa-Johnson and Noshir
Contractor, Detecting interaction links in
a collaborating group using manually annotated data, Social
Networks doi:10.1016/j.socnet.2012.04.002, 2012 (NSF 0941268)
-
Hao Tang, Stephen M. Chu, Mark Hasegawa-Johnson, Thomas S. Huang,
Emotion
Recognition from Speech via Boosted Gaussian Mixture Models, 2009
International Conference on Multimedia & Expo (ICME'09), pp. 294-7 (NIH R21
DC008090 A)
-
Tong Zhang, Mark Hasegawa-Johnson and Stephen
E. Levinson,
Cognitive State Classification in a spoken tutorial dialogue
system, Speech Communication 48(6):616-632, 2006(NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
Children's Emotion Recognition in an Intelligent Tutoring
Scenario. Interspeech, October, 2004, pp. 735-738 (NSF 0085980).
-
Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
An empathic-tutoring systemq using spoken language, Australian
conference on computer-human interactionq (OZCHI), 2003, pp. 498-501 (NSF 0085980).
-
Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
Mental State Detection of Dialogue System Users via Spoken
Language,
ISCA/IEEE Workshop on Spontaneous Speech Processing and
Recognition (SSPR), April 2003, MAP17.1-4 (NSF 0085980).
Automatic Recognition of Prosody
-
Wanyue Zhai and Mark Hasegawa-Johnson, ``Wav2ToBI: a new approach to
automatic ToBI transcription,'' accepted to Interspeech 2023
-
Andrew Rosenberg and Mark Hasegawa-Johnson, Automatic
Prosody Labeling and Assessment,
in Oxford
Handbook of Language Prosody, Carlos Gussenhoven and
Aoju Chen, eds., Oxford University Press, 2021
- Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Jennifer Cole, Mark
Hasegawa-Johnson and Margaret
Fleck, Feature Sets for the Automatic
Detection of Prosodic Prominence, New Tools and Methods for Very
Large Scale Phonetics Research, University of Pennsylvania, Jan. 2011
- Jui-Ting Huang and Mark
Hasegawa-Johnson,
Unsupervised Prosodic Break Detection in Mandarin Speech,
SpeechProsody 2008 pp. 165-8 (NSF 0534133).
-
Xiaodan Zhuang and Mark
Hasegawa-Johnson,
Towards Interpretation of Creakiness in Switchboard, SpeechProsody
2008 pp. 37-40 (NSF 0414117).
-
Taejin Yoon, Jennifer Cole, and Mark
Hasegawa-Johnson,
Detecting Non-Modal Phonation in Telephone Speech, SpeechProsody,
2008 pp. 33-6 (NSF 0414117).
-
Taejin Yoon,
A Predictive Model of Prosody Through Grammatical Interface: A Computational Approach,
Ph.D. Thesis, 2007.
- Ken Chen, Mark Hasegawa-Johnson and Jennifer Cole,
A Factored Language Model for Prosody-Dependent Speech Recognition
, in Speech Synthesis and Recognition}, Robust Speech
Recognition and Understanding, Michael Grimm and Kristian Kroschel
(Ed.), INTECH Publishing, pp. 319-332, 2007.
-
Mark Hasegawa-Johnson, Jennifer Cole, Ken Chen, Partha Lal,
Amit Juneja, Taejin Yoon, Sarah Borys, and Xiaodan
Zhuang,
Prosodically Organized Automatic Speech Recognition. Linguistic
Processes in Spontaneous Speech, Language and Linguistics Monograph
Series A25, Academica Sinica, Taiwan, 2008, pp. 101-128 (NSF 0414117;
NSF 0121285).
- Mark
Hasegawa-Johnson,
Phonology and the Art of Automatic Speech Recognition, Director's
Seminar Series, Beckman Institute, University of Illinois at
Urbana-Champaign, November 2006 (NSF 0414117).
- Taejin Yoon, Xiaodan Zhuang, Jennifer Cole, and Mark
Hasegawa-Johnson,
Voice Quality Dependent Speech Recognition,
Linguistic Patterns in
Spontaneous Speech, Language and Linguistics Monograph Series A25,
Academica Sinica, Taiwan, 2008, pp. 77-100 (NSF 0414117).
- Tong Zhang, Mark Hasegawa-Johnson and Stephen
E. Levinson, "Extraction
of Pragmatic and Semantic Salience from Spontaneous Spoken
English," Speech Communication, 2007 (NSF 0085980).
- Taejin Yoon, Xiaodan Zhuang, Jennifer Cole, and Mark
Hasegawa-Johnson,
Voice Quality Dependent Speech Recognition. Midwest Computational
Linguistics Colloquium, Urbana, IL, 2006 (NSF 0414117).
- Rajiv Reddy and Mark
Hasegawa-Johnson, Analysis of Pitch
Contours in Repetition-Disfluency using Stem-ML, Midwest
Computational Linguistics Colloquium, 2006
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys,
Sung-Suk Kim, Jennifer Cole and Jeung-Yoon
Choi, Prosody Dependent Speech
Recognition on Radio News Corpus of American English. IEEE
Transactions on Speech and Audio Processing, 14(1):232-245, 2006 (NSF
0132900).
-
Cole, Jennifer, Mark Hasegawa-Johnson, Chilin Shih, Eun-Kyung
Lee, Heejin Kim, H. Lu, Yoonsook Mo, Tae-Jin
Yoon. (2005).
Prosodic Parallelism as a Cue to Repetition and Hesitation
Disfluency, Proceedings of DISS'05 (An ISCA Tutorial and Research
Workshop), Aix-en-Provence, France, pp. 53-58 (NSF 0414117).
- Mark Hasegawa-Johnson, Ken Chen, Jennifer Cole, Sarah Borys,
Sung-Suk Kim, Aaron Cohen, Tong Zhang, Jeung-Yoon Choi, Heejin Kim,
Taejin Yoon, and Sandra
Chavarria,
Simultaneous Recognition of Words and Prosody in the Boston University
Radio Speech Corpus. Speech Communication 46(3-4):418-439, 2005
(NSF 0132900).
- Yoon, Tae-Jin, Cole, Jennifer, Mark Hasegawa-Johnson, and
Chilin
Shih.
Detecting Non-modal Phonation in Telephone Speech. Unpublished
manuscript, 2005 (NSF 0414117).
-
Yoon, Tae-Jin, Cole, Jennifer, Mark Hasegawa-Johnson, and
Chilin
Shih. (2005).
Acoustic correlates of non-modal phonation in telephone speech,
The Journal of the Acoustical Society of America 117(4), p. 2621 (NSF
0414117).
-
Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
A Hybrid Model for Spontaneous Speech Understanding.
AAAI 2005,
10.1.1.80.879:1-8 (NSF 0085980).
-
Tong Zhang, Mark Hasegawa-Johnson and Stephen
E. Levinson,
Automatic detection of contrast for speech understanding.
Interspeech, October, 2004 (NSF 0085980).
- Yuexi Ren, Mark Hasegawa-Johnson and Stephen
E. Levinson. Semantic
analysis for a speech user interface in an intelligent-tutoring
system, Intl. Conf. on Intelligent User Interfaces. Madeira,
Portugal, 2004 (NSF 0085980).
- Sarah Borys, Mark Hasegawa-Johnson, Ken Chen, and Aaron
Cohen,
Modeling and Recognition of Phonetic and Prosodic Factors for
Improvements to Acoustic Speech Recognition Models. Interspeech,
October, 2004 (NSF 0132900).
- Ken Chen, Prosody Dependent Speech
Recognition on American Radio News Speech, Ph.D. Thesis, 2004
- Mark
Hasegawa-Johnson,
Speech Recognition Models of the Interdependence Among Syntax,
Prosody, and Segmental Acoustics, talk given at Tsinghua
University, October 2004 (NSF 0414117).
- Mark Hasegawa-Johnson, Jennifer Cole, Chilin Shih, Ken Chen,
Aaron Cohen, Sandra Chavarria, Heejin Kim, Taejin Yoon, Sarah Borys,
and Jeung-Yoon
Choi,
Speech Recognition Models of the Interdependence Among Syntax,
Prosody, and Segmental Acoustics, Human Language Technologies:
Meeting of the North American Chapter of the Association for
Computational Linguistics (HLT/NAACL), Workshop on Higher-Level
Knowledge in Automatic Speech Recognition and Understanding, May,
2004, pp. 56-63 (NSF 0414117).
-
Ken Chen and Mark
Hasegawa-Johnson,
How Prosody Improves Word Recognition, SpeechProsody 2004, Nara,
Japan, March 2004, 583-586 (NSF 0132900).
-
Aaron
Cohen,
A Survey of Machine Learning Methods for Predicting Prosody in Radio
Speech. M.S. Thesis, 2004.
- Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, and Jennifer Cole,
A
Maximum Likelihood Prosody Recognizer, SpeechProsody 2004, Nara,
Japan, March 2004, 509-512 (NSF 0132900; Illinois CRI;
Software).
- Ken Chen and Mark
Hasegawa-Johnson,
An Automatic Prosody Labeling System Using ANN-Based
Syntactic-Prosodic Model and GMM-Based Acoustic-Prosodic Model,
ICASSP 2004 (NSF 0132900; Illinois CRI).
-
Sung-Suk Kim, Mark Hasegawa-Johnson, and Ken
Chen,
Automatic Recognition of Pitch Movements Using Multilayer Perceptron
and Time-Delay Recursive Neural Network, IEEE Signal Processing
Letters 11(7):645-648, 2004(NSF 0132900; Illinois CRI).
-
Yuexi Ren, Sung-Suk Kim, Mark Hasegawa-Johnson, and Jennifer
Cole,
Speaker-Independent Automatic Detection of Pitch Accent,
SpeechProsody 2004, Nara, Japan, March 2004, 521-524 (NSF 0085980).
-
Ken Chen, Mark Hasegawa-Johnson and Sung-Suk
Kim, An
Intonational Phrase Boundary and Pitch Accent Dependent Speech
Recognizer. International Conference on Systems, Cybernetics, and
Intelligence, 2003 (Illinois CRI).
-
Ken Chen and Mark Hasegawa-Johnson,
Improving the robustness of prosody
dependent language modeling based on prosody syntax
cross-correlation.
ASRU, 2003 (Illinois CRI).
-
Ken Chen, Mark Hasegawa-Johnson and Jennifer Cole,
Prosody Dependent Speech Recognition on Radio News,
IEEE Workshop on Statistical Signal Processing,
St. Louis, MO, 2003 (Illinois CRI).
-
Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys,
and Jennifer
Cole,
Prosody Dependent Speech Recognition with Explicit
Duration Modelling at Intonational Phrase Boundaries.
Interspeech, September, 2003, 393-396 (Illinois CRI;
Software
diffs,
TGZ,
ZIP)
-
Ken Chen, Mark Hasegawa-Johnson, Aaron Cohen, Sarah Borys,
and Jennifer Cole,
Prosody Dependent Speech Recognition with Explicit
Duration Modelling at Intonational Phrase Boundaries.
Interspeech, September, 2003, 393-396 (Illinois CRI).
-
Sarah
Borys,
Recognition of Prosodic Factors and Detection of Landmarks for
Improvements to Continuous Speech Recognition Systems.
B.S. Thesis, 2003.
-
Sarah Borys, Mark Hasegawa-Johnson and Jennifer
Cole, The
Importance of Prosodic Factors in Phoneme Modeling with Applications
to Speech Recognition, ACL Student Session, 2003 (NSF 0132900).
-
Sarah Borys, Mark Hasegawa-Johnson and Jennifer
Cole,
Prosody as a Conditioning Variable in Speech Recognition, Illinois
Journal of Undergraduate Research, 2003 (Illinois CRI).
Speaker Recognition
-
Junzhe Zhu, Mark Hasegawa-Johnson, Nancy
McElwain, A
Comparison Study on Infant-Parent Voice Diarization,
Proc. ICASSP 2021, pp. 7178-7182
-
Junzhe Zhu, Mark Hasegawa-Johnson and Leda
Sari, Identify
Speakers in Cocktail Parties with End-to-End
Attention, in Proc. Interspeech 2020, pp. 3092-3096
-
Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson, Michael
Picheny, Pre-Training of
Speaker Embeddings for Low-Latency Speaker Change
Detection in Broadcast News, in Proc. ICASSP 2019,
pages 1-5, paper 3093
-
Kaizhi Qian, Regularized
Estimation of Gaussian Mixture Models for SVM Based
Speaker Recognition,
B.S. Thesis, May 2014
- Hao Tang, Stephen Chu, Mark Hasegawa-Johnson, and Thomas Huang,
Partially Supervised Speaker
Clustering, IEEE Transactions on Pattern Analysis and Machine
Intelligence 34(5):959-971, 2012
- David Harwath and Mark Hasegawa-Johnson,
Phonetic
Landmark Detection for Automatic Language Identification, Speech
Prosody 2010 100231:1-4 (NSF 0703624).
- Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, and
Zhengyou Zhang, "Frequency Domain
Correspondence for Speaker Normalization," in Proc. Interspeech,
pp. 274-277, Antwerp, August, 2007.
-
Xi Zhou, Yu Fun, Ming Liu, Mark Hasegawa-Johnson, and Thomas
Huang,
Robust Analysis and Weighting on MFCC Components for Speech
Recognition and Speaker Identification, International Conference
on Multimedia and Expo 2007, pp. 188-191 (VACE NBCHC060160; NSF
0426627).
-
Ming Liu, Zhengyou Zhang, Mark Hasegawa-Johnson, and Thomas Huang,
Exploring
Discriminative Learning for Text-Independent Speaker Recognition,
ICME 2007, pp. 56-59 (NSF 0426627).
Artificial Intelligence
Machine Learning
-
Raymond Yeh, Mark Hasegawa-Johnson and Alexander Schwing,
``Equivariance Discovery by Learned Parameter-Sharing,''
AISTATS 2022
-
Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian,
Mark Hasegawa-Johnson, and Jishen
Zhao, Continuous
CNN for Nonuniform Time Series, Proc. ICASSP 2021
-
Leda Sari and Mark
Hasegawa-Johnson, Deep
F-measure Maximization for End-to-End Speech
Understanding, Proc. Interspeech 2020, accepted for
publication.
-
Tom Le Paine, Pooya Khorrami, Shiyu Chang, Yang Zhang, Prajit
Ramachandran, Mark A. Hasegawa-Johnson, and Thomas S. Huang,
Fast Wavenet Generation Algorithm,
2016
-
Mark Hasegawa-Johnson, Preethi Jyothi, Wenda Chen, and Van Hai
Do, Mismatched
Crowdsourcing: Mining Latent Skills to Acquire Speech
Transcriptions, in Proceedings of Asilomar, 2017 (DARPA
LORELEI)
-
Shiyu Chang, Yang Zhang, Wei Han, Mo Yu,Xiaoxiao Guo, Wei Tan,
Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, and Thomas
Huang, Dilated
Recurrent Neural Networks, NIPS 2017
- Shiyu Chang, Yang Zhang, Jiling Tang, Dawei Yin, Yi Chang, Mark
Hasegawa-Johnson and Thomas
Huang, Streaming Recommender
Systems, WWW 2017
- Mark Hasegawa-Johnson, Preethi Jyothi, Daniel McCloy, Majid
Mirbagheri, Giovanni di Liberto, Amit Das, Bradley Ekin, Chunxi Liu,
Vimal Manohar, Hao Tang, Edmund C. Lalor, Nancy Chen, Paul Hager,
Tyler Kekona, Rose Sloan, and Adrian KC
Lee, ASR for
Under-Resourced Languages from Probabilistic Transcription,
IEEE/ACM Trans. Audio, Speech and Language 25(1):46-59, 2017 (Print
ISSN: 2329-9290, Online ISSN: 2329-9304, Digital Object Identifier:
10.1109/TASLP.2016.2621659)
(Data)
- Wenda Chen, Mark Hasegawa-Johnson, Nancy Chen, Preethi Jyothi, and
Lav
Varshney, Clustering-based
Phonetic Projection in Mismatched Crowdsourcing Channels for
Low-resourced ASR, WSSAP (Workshop on South and Southeast Asian
Natural Language Processing, 2016, pp. 133-141
- Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson and
Minh N. Do, Semantic
Image Inpainting with Perceptual and Contextual Losses, 26 Jul
2016. Covered in the blog
post Image
Completion with Deep Learning in TensorFlow by Brandon Amos,
August 9, 2016.
- Van Hai Do, Nancy F. Chen, Boon Pang Lim and Mark
Hasegawa-Johnson, Analysis of
Mismatched Transcriptions Generated by Humans and Machines for
Under-Resourced Languages. Interspeech 2016
- Shiyu Chang, Yang Zhang, Jiliang Tang, Dawei Lin, Yi Chang, Mark
Hasegawa-Johnson and Thomas
Huang, Positive-Unlabeled
Learning in Streaming Networks, KDD 2016
- Xiang Kong, Preethi Jyothi, and Mark Hasegawa-Johnson,
Performance
Improvement of Probabilistic Transcriptions with Language-specific
Constraints. Procedia Computer Science 81:30-36, 2016
(doi:10.1016/j.procs.2016.04.026; DARPA LORELEI)
- Lav Varshney, Preethi Jyothi and Mark Hasegawa-Johnson,
Language Coverage for Mismatched
Crowdsourcing, Workshop on Information Theory and Applications,
2016 (NSF 1550145)
- Raymond Yeh, Mark Hasegawa-Johnson, Minh
Do, Stable and Symmetric Filter
Convolutional Neural Network, Proc. ICASSP 2016
- Mark Hasegawa-Johnson, Ed Lalor, KC LEe, Preethi Jyothi,
Majid Mirbagheri, Amit Das, Giovannie Di Liberto, Brad Ekin,
Chunxi Liu, Vimal Manohar, Hao Tang, Paul Hager, Tyler
Kekona, and Rose
Sloan, Probabilistic
Transcription WS15 Group Final Presentation slides.
-
Preethi Jyothi and Mark
Hasegawa-Johnson, Transcribing
Continuous Speech Using Mismatched Crowdsourcing,
Interspeech 2015, pp. 2774-2778
- Mark Hasegawa-Johnson, Jennifer Cole, Preethi Jyothi and Lav Varshney,
Models of Dataset Size, Question
Design, and Cross-Language Speech Perception for Speech
Crowdsourcing Applications, Journal of Laboratory Phonology,
6(3-4):381-431, 2015
(published
copy).
- Preethi Jyothi and Mark Hasegawa-Johnson,
Acquiring Speech Transcriptions
Using Mismatched Crowdsourcing,
Proc. AAAI, 2015, pp. 1263-1269
- Mark Hasegawa-Johnson, David Harwath, Harsh Vardhan Sharma, and
Po-Sen Huang, Transfer
Learning for Multi-Person and Multi-Dialect Spoken Language
Interface, presentation given at the 2012 Urbana Neuroengineering
Conference (NSF 0905633)
- Jui-Ting Huang and Mark
Hasegawa-Johnson, Semi-Supervised
Training of Gaussian Mixture Models by Conditional Entropy
Minimization, Proceedings of Interspeech 2010 pp. 1353-1356 (NSF
0703624)
- Mark Hasegawa-Johnson, Camille Goudeseune, Kai-Hsiang
Lin, David Cohen, Xi Zhou, Xiaodan Zhuang, Kyungtae Kim,
Hank Kaczmarski and Thomas
Huang, Visual
Analytics for Audio, NIPS Workshop on Visual Analytics,
2009 (NSF 0807329)
-
Mark
Hasegawa-Johnson, Pattern
Recognition in Acoustic Signal Processing, Machine
Learning Summer School, University of Chicago, 2009 (NSF
0807329)
- Jui-Ting Huang and Mark
Hasegawa-Johnson, On
semi-supervised learning of Gaussian mixture models for phonetic
classification, NAACL HLT Workshop on Semi-Supervised Learning,
2009 (NSF 0534106 and NSF 0703624).
-
Mark
Hasegawa-Johnson,
Tutorial: Pattern Recognition in Signal Processing,
JASA 125:2698, 2009 (NSF 0803219 and 0807329).
- Yang Li, Incremental
Training and Growth of Artificial Neural Networks,
M.S. Thesis, 2008 (NSF 0534106).
- Mohammad Kamal Omar and Mark
Hasegawa-Johnson,
Model Enforcement: A Unified Feature Transformation Framework for
Classification and Recognition, IEEE Transactions on Signal
Processing, vol. 52, no. 10, pp. 2701-2710, 2004 (NSF 0132900).
Natural Language Processing
-
Hee Suk Yoon, Eunseop Yoon, John Harvill, Sunjae Yoon, Mark Hasegawa-Johnson and Chang D. Yoo,
``SMSMix: Sense Maintained Sentence Mixup for Word Sense Disambiguation,''
EMNLP 2022, accepted for publication
-
John Harvill, Roxana Girju and Mark Hasegawa-Johnson,
Syn2Vec: Synset Colexification Graphs for Lexical Semantic
Similarity, Proc. NAACL 2022
-
Kiran Ramnath, Leda Sarı, Mark Hasegawa-Johnson and Chang Yoo,
Worldly
Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based
Visual Spoken-Question Answering, Proc. NAACL 2021,
pp. 1908–1919
-
Sujeeth Bharadwaj and Mark Hasegawa-Johnson,
A PAC-Bayesian Approach to Minimum Perplexity Language
Modeling,
Proceedings of CoLing 2014 (NSF 0941268).
- Ali Sakr and Mark Hasegawa-Johnson,
Topic Modeling of Phonetic Latin-Spelled Arabic for the
Relative Analysis of Genre-Dependent and
Dialect-Dependent Variation,
CITALA 2012 pp. 153-158, ISBN 978-9954-9135-0-5 (QNRF NPRP
410-1-069).
-
Rania Al-Sabbagh, Roxana Girju, Mark Hasegawa-Johnson,
Elabbas Benmamoun, Rehab Duwairi, and Eiman
Mustafawi, Using
Web Mining Techniques to Build a Multi-Dialect Lexicon of
Arabic, Linguistics in the Gulf Conference,
2011. Abstract
here.
Computer Vision
- Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiaong, Wen-mei
Hwu, Mark Hasegawa-Johnson and Humphrey
Shi, Interpretable
Visual Reasoning via Induced Symbolic
Space, ICCV 2021
- Raymond A. Yeh, Teck Yian Lim, Chen Chen, Alexander G. Schwing,
Mark Hasegawa-Johnson, and Minh
N. Do, Image
Restoration with Deep Generative Models, Proc. IEEE ICASSP
pp. 6772-6772, doi:10.1109/ICASSP.2018.8462317
- Raymond Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing,
Mark Hasegawa-Johnson, Minh
N. Do, Semantic Image Inpainting with
Deep Generative Networks, CVPR 2017
-
Kai-Hsiang Lin, Pooya Khorrami, Jiangping Wang, Mark
Hasegawa-Johnson, and Thomas
S. Huang, Foreground Object
Detection in Highly Dynamic Scenes Using Saliency,
Proceedings of ICIP 2014
- Zhaowen Wang, Zhangyang Wang, Mark Moll, Po-Sen Huang, Devin
Grady, Nasser Nasrabadi, Thomas Huang, Lydia Kavraki, and Mark
Hasegawa-Johnson, Active Planning, Sensing
and Recognition Using a Resource-Constrained Discriminant POMDP,
CVPR Multi-Sensor Fusion Workshop, 2014 (ARO W911NF-09-1-0383)
-
Xiaodan Zhuang, Modeling
Audio and Visual Cues for Real-world Event Detection,
Ph.D. Thesis, University of Illinois, April 2011
-
Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark
A. Hasegawa-Johnson, and Thomas
S. Huang, Novel
Gaussianized Vector Representation for Improved Natural
Scene Categorization, Pattern Recognition Letters,
31, 8 (Jun. 2010), 702-708 (NSF 0807329).
-
Hao Tang, Mark Hasegawa-Johnson, and Thomas
S. Huang, Non-Frontal
View Facial Expression Recognition, ICME 2010,
pp. 1202-7
-
Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson and
Thomas S. Huang,
Efficient
Object Localization with Gaussianized Vector Representation, IMCE
2009 pp. 89-96 (NSF 0803219).
-
Xiaodan Zhuang, Xi Zhou, Mark Hasegawa-Johnson, and Thomas
Huang,
Face Age Estimation Using Patch-based Hidden Markov
Model Supervectors,
ICPR 2008 10.1.1.139.846:1-4 (NSF 0534106; VACE).
-
Xi Zhou, Xiaodan Zhuang, Hao Tang, Mark Hasegawa-Johnson, and
Thomas Huang, A Novel Gaussianized
Vector Representation for Natural Scene Categorization, ICPR 2008
10.1.1.139.846:1-4 (NSF 0534106; VACE).
-
Xi Zhou, Xiaodan Zhuang, Shuicheng Yan, Shih-Fu Chang, Mark
Hasegawa-Johnson, and Thomas S.
Huang, SIFT-Bag Kernel for Video
Event Analysis, ACM Multimedia 2008 (NSF 0534106; VACE).
- Shuicheng Yan, Xi Zhou, Ming Liu, Mark
Hasegawa-Johnson, and Thomas
S. Huang,
Regression from Patch Kernel,
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 2008, pp. 1-8
Acoustic Event Detection and Modeling
-
Yang Zhang, Nasser Nasrabadi and Mark Hasegawa-Johnson,
``Multichannel Transient Acoustic Signal Classification
Using Task-Driven Dictionary with Joint Sparsity and
Beamforming,'' Proc. ICASSP 2015, 2591:1--5 (ARO
W911NF-09-1-0383 and AHRQ)
- Austin Chen and Mark Hasegawa-Johnson, Mixed Stereo Audio
Classification Using a Stereo-Input Mixed-to-Panned Level Feature,
IEEE Trans. Speech and Audio Proc. 22(12):2025-2033, 2014 (doi
10.1109/TASLP.2014.2359628; QNRF NPRP 09-410-1-069)
-
Mark Hasegawa-Johnson,
Probabilistic Segmental Model For Doppler Ultrasound
Heart Rate Monitoring,
United States Patent Number 8727991 B2 for Salutron
Corporation, May 20, 2014
- Austin Chen, Automatic
Classification of Electronic Music and Speech/Music Audio Content,
M.S. Thesis, 2014
- Austin Chen and Mark Hasegawa-Johnson, Mixed Stereo Audio
Classification Using Estimated Voice-to-Music Ratios, Manuscript in
review (Database
Definition)
- Robert Mertens, Po-Sen Huang, Luke Gottlieb, Gerald Friedland,
Ajay Divakaran, Mark Hasegawa-Johnson, ``On the Application of Speaker
Diarization to Audio Indexing of Non-Speech and Mixed
Non-Speech/Speech Video Soundtracks,'' International Journal of
Multimedia Data Engineering and Management (IJDEM), April 2013, Volume
3, Issue 3, pp. 1--19, DOI: 10.4018/jmdem.2012070101
-
Po-Sen Huang, Mark Hasegawa-Johnson, Wotao Yin and Tom
Huang,
Opportunistic Sensing: Unattended
Acoustic Sensor Selection Using Crowdsourcing Models, IEEE
Workshop on Machine Learning in Signal Processing 2012
- Christopher Co, 2012. Room
Reconstruction and Navigation Using Acoustically Obtained Room Impulse
Responses and a Mobile Robot Platform. M.S. Thesis, University
of Illinois
(Software).
- Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang,
Thomas S. Huang, Pooling Robust
Shift-Invariant Sparse Representations of Acoustic Signals,
Interspeech 2012
- Mark Hasegawa-Johnson, Xiaodan Zhuang, Xi Zhou, Camille
Goudeseune, Hao Tang, Kai-Hsiang Lin, Mohamed Omar, and Thomas Huang,
Toward Better Real-world
Acoustic Event Detection, Presentation given at Seoul National
University, May 30, 2012
- Po-Sen Huang, Robert Mertens, Ajay Divakaran, Gerald Friedland, and Mark
Hasegawa-Johnson, How to Put it into
Words---Using Random Forests to Extract Symbol Level Descriptions from
Audio Content for Concept Detection, ICASSP 2012 (ARO
W911NF-09-1-0383)
-
R. Mertens, P.-S. Huang, L. Gottlieb, G. Friedland,
A. Divakaran, On the Application of
Speaker Diarization to Audio Concept Detection for Multimedia
Retrieval, IEEE International Symposium on Multimedia,
pp. 446-451, 2011
-
Po-Sen Huang, Mark Hasegawa-Johnson, and Thyagaraju Damarla,
Exemplar Selection Methods to
Distinguish Human from Animal Footsteps, Second Annual Human and
Light Vehicle Detection Workshop, Maryland, pp. 14:1-10, 2011
(ARO W911NF-09-1-0383)
- Po-Sen Huang, Thyagaraju Damarla and Mark Hasegawa-Johnson,
Multi-sensory features for
Personnel Detection at Border Crossings, Fusion 2011, to appear
(ARO W911NF-09-1-0383)
- Xiaodan Zhuang, Modeling Audio
and Visual Cues for Real-world Event Detection, Ph.D. Thesis,
University of Illinois, April 2011
- Po-Sen Huang, Xiaodan Zhuang, and Mark
Hasegawa-Johnson, Improving Acoustic
Event Detection using Generalizable Visual Features and Multi-modality
Modeling, ICASSP 2011, pp. 349-352 (ARO W911NF-09-1-0383)
- Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson, and Thomas S.
Huang, Real-world Acoustic Event
Detection, Pattern Recognition Letters, 31, 2 (Sep. 2010),
1543-1551 (NSF 0807329).
- Mark Hasegawa-Johnson, Xiaodan Zhuang, Xi Zhou, Camille
Goudeseune, and Thomas
S. Huang Adaptation
of tandem HMMs for non-speech audio event detection,
JASA 125:2730, 2009.
- Xiaodan Zhuang, Jing Huang, Gerasimos Potamianos and Mark
Hasegawa-Johnson,
Acoustic Fall Detection using Gaussian Mixture Models and GMM Supervectors,
ICASSP 2009, pp. 69-72 (NetCarity).
- Xiaodan Zhuang, Xi Zhou, Thomas S. Huang and Mark
Hasegawa-Johnson,
Feature Analysis and Selection for
Acoustic Event Detection, ICASSP pp. 17-20, 2008 (VACE; NSF 0414117; NSF
0534106).
- Xi Zhou, Xiaodan Zhuang, Ming Lui, Hao Tang, Mark Hasegawa-Johnson
and Thomas
Huang,
HMM-Based Acoustic Event Detection with AdaBoost Feature
Selection, Lecture Notes in Computer Science, 2008, Volume
4625/2008, 345-353 (VACE; NSF 0414117; NSF 0534106).
Human-Computer Interaction
Speech and Language Technology in Education
- Xuesong Yang, Xiang Kong, Mark Hasegawa-Johnson and Yanlu Xie,
Landmark-based Pronunciation Error
Identification on L2 Mandarin Chinese, Speech Prosody 2016
- Jia-Chen Ren, Lawrence Angrave and Mark
Hasegawa-Johnson, ClassTranscribe: A New
Tool with New Educational Opportunities for Student Crowdsourced
College Lecture Transcriptions, SLaTE 2015 (the
Workshop on Speech and Language Technology in Education)
-
Jia Chen Ren, Mark Hasegawa-Johnson, and Lawrence Angrave,
``ClassTranscribe,'' ICER Conference 2015
- Mark Hasegawa-Johnson, Camille Goudeseune, Jennifer Cole, Hank
Kaczmarski, Heejin Kim, Sarah King, Timothy Mahrt, Jui-Ting Huang,
Xiaodan Zhuang, Kai-Hsiang Lin, Harsh Vardhan Sharma, Zhen Li, and
Thomas
S. Huang, Multimodal
Speech and Audio User Interfaces for K-12 Outreach, APSIPA 2011, pages 256:1-8
(NSF 0534106; 0703624; 0807329)
-
Suma Bhat, Mark Hasegawa-Johnson and Richard Sproat,
Automatic Fluency Assessment by Signal-Level Measurement
of Spontaneous Speech,
2010 INTERSPEECH Satellite Workshop on Second Language
Studies: Acquisition, Learning, Education and Technology
- Su-Youn Yoon, Mark Hasegawa-Johnson, and Richard Sproat,
Landmark-based Automated
Pronunciation Error Detection, Proceedings of Interspeech 2010
pp. 614-617
- Suma Bhat, Richard Sproat, Mark Hasegawa-Johnson and
Fred Davidson, ``Automatic fluency assessment using
thin-slices of spontaneous speech,'' Language Testing
Research Colloquium 2010, Denver, CO
- Su-Youn Yoon, Richard Sproat, and Mark Hasegawa-Johnson,
Automated Pronunciation Scoring
using Confidence Scoring and Landmark-based SVM, Interspeech
80100:1-4, Brighton, September 2009
- Su-Youn Yoon, Mark Hasegawa-Johnson and Richard
Sproat, Automated
Pronunciation Scoring for L2 English Learners, CALICO workshop,
2009
- Tong Zhang, Mark Hasegawa-Johnson and Stephen
E. Levinson,
Cognitive State Classification in a spoken tutorial dialogue
system, Speech Communication 48(6):616-632, 2006(NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
Children's Emotion Recognition in an Intelligent Tutoring
Scenario. Interspeech, October, 2004 (NSF 0085980).
- Tong Zhang, Mark Hasegawa-Johnson, and Stephen
E. Levinson,
An empathic-tutoring system using spoken language, Australian
conference on computer-human interaction (OZCHI), 2003 (NSF 0085980).
Medical Applications
-
John Harvill, Mark Hasegawa-Johnson and Changdong Yoo,
Frame-Level
Stutter Detection, Interspeech 2022
-
John Harvill, Yash Wani, Narendra Ahuja, Mark
Hasegawa-Johnson, David Chestek, Mustafa Alam, and David
Beiser, Estimation of Respiratory Rate from Breathing
Audio, 44th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, 2022
-
John Harvill, Yash R. Wani, Mark Hasegawa-Johnson,
Narendra Ahuja, David Beiser, and David
Chestek, Classification
of COVID-19 from Cough Using Autoregressive Predictive
Coding Pretraining and Spectral Data Augmentation, Proc. Interspeech 2021
-
John Harvill, Dias Issa, Mark Hasegawa-Johnson, Changdong Yoo,
Synthesis
of New Words for Improved Dysarthric Speech Recognition on
an Expanded Vocabulary, Proc. ICASSP 2021, pp. 6428-6432
-
Tarek Sakakini, Jong Yoon Lee, Aditya Srinivasa, Renato
Azevedo, Victor Sadauskas, Kuangxiao Gu, Suma Bhat, Dan
Morrow, James Graumlich, Saqib Walayat, Mark
Hasegawa-Johnson, Donald Wilpern, and Ann
Willemsen-Dunlap, ``Automatic Text Simplification of
Health Materials in Low-Resource
Domains,'' LOUHI: 11th
International Workshop on Health Text Mining and
Information Analysis, 2020
-
Ali Abavisani and Mark
Hasegawa-Johnson, Automatic
Estimation of Inteligibility Measure for Consonants in
Speech, in Proc. Interspeech 2020, accepted for
publication
-
Daniel Morrow, Renato F.L. Azevedo, Leda Sari, Kuangxiao
Gu, Tarek Sakakini, Mark Hasegawa-Johnson, Suma Bhat,
James Graumlich, Thomas Huang, Andrew Hariharan, Yunxin
Shao, and Elizabeth Cox.
Closing the Loop in Computer
Agent/Patient Communication. Proceedings of the 2020
Human Factors and Ergonomics Society Annual Meeting,
Chicago, IL.
-
Daniel Morrow, Renato Azevedo, Leitão Ferreira, Rocio
Garcia-Retamero, Mark Hasegawa-Johnson, Thomas Huang, William Schuh,
Kuangxiao Gu, Yang Zhang, ``Contextualizing numeric clinical test
results for gist comprehension: Implications for EHR patient
portals,'' in Journal of Experimental Psychology: Applied, 25(1),
41-61, 2019
http://dx.doi.org/10.1037/xap0000203
-
Laureano Moro-Velazquez, JaeJin Cho, Shinji Watanabe, Mark
A. Hasegawa-Johnson, Odette Scharenborg, Heejin Kim, Najim
Dehak, Study
of the Performance of Automatic Speech Recognition Systems
in Speakers with Parkinson’s Disease, in
Proc. Interspeech 2019, pp. 3875-3879
-
Azevedo, R. F. L., Morrow, D., Gu, K., Huang, T.,
Hasegawa-Johnson, M., Soni, P., Tang, S., Sakakini, T., Bhat, S.,
Willemsen-Dunlap, A., and Graumlich,
J. (2019). The Influence of
Computer Agent Characteristics on User Preferences in Health
Contexts. Proceedings of the 2019 Human Factors and Ergonomics
Society Health Care Symposium.
-
Renato F. L. Azevedo, Dan Morrow, James Graumlich Ann
Willemsen-Dunlap Mark Hasegawa-Johnson Thomas S. Huang, Kuangxiao
Gu, Suma Bhat, Tarek Sakakini, Victor Sadauskas, and Donald
J. Halpin, Using
conversational agents to explain medication instructions to older
adults, AMIA Annu Symp Proc. 2018, pp. 185–194. PMID:
30815056
- Azevedo, Renato; Morrow, Daniel G; Gu, Kuangxiao; Thomas Huang;
Hasegawa-Johnson, Mark Allan; James Graumlich; Victor Sadauskas;
Sakakini, Tarek J; Bhat, Suma Pallathadka; Willemsen-Dunlap, Ann
M.; Halpin, Donald J., Computer
Agents and Patient Memory for Medication Information. APA
Annual Meeting, 2018.
- Daniel Morrow, Mark Hasegawa-Johnson, Thomas Huang, William
Schuh, Renato Azevedo, Kuangxiao Gu, Yang Zhang, Bidisha Roy, Rocio
Garcia-Retamero, ``A Multidisciplinary Approach to Designing and
Evaluating Electronic Medical Record Portal Messages that Support
Patient Self-Care,'' Journal of Biomedical Informatics, in press
- Daniel Morrow, Mark Hasegawa-Johnson, Thomas Huang, William
Schuh, Rocio Garcia-Retamero, Renato Azevedo, Kuangxiao Gu, Yang
Zhang, and Bidisha
Roy, Multimedia formats can
improve older adult comprehension of clinical test results:
Implications for Designing Patient Portals, 28th APS Annual
Convention (Association for Psychological Science), May, 2016
(AHRQ R21HS022948)
-
Renato F. L. Azevedo, Daniel Morrow, Mark
Hasegawa-Johnson, Kuangxiao Gu, Dan Soberal, Thomas Huang,
William Schuh , Rocio Garcia-Retamero,
Improving Patient Comprehension of Numeric Health
Information,
Human Factors Conference, 2015 (AHRQ R21HS022948)
-
Harsh Vardhan Sharma and Mark
Hasegawa-Johnson, Acoustic
Model Adaptation using in-domain Background Models for
Dysarthric Speech Recognition, Computer Speech and
Language, Volume 27, Issue 6, September 2013, Pages
1147–1162, http://dx.doi.org/10.1016/j.csl.2012.10.002
- Panying Rong, Torrey Loucks, Heejin Kim, and Mark
Hasegawa-Johnson, Relationship
between kinematics, F2 slope and speech intelligibility in dysarthria
due to cerebral palsy, in Clinical Linguistics and Phonetics,
September 2012, Vol. 26, No. 9 , Pages 806-822
(doi:10.3109/02699206.2012.706686)
- Harsh Vardhan
Sharma, Acoustic Model Adaptation
for Recognition of Dysarthric Speech, Ph.D. Thesis, University of
Illinois, 2012
- Heejin Kim, Mark Hasegawa-Johnson and Adrienne
Perlman, Temporal and spectral
characteristics of fricatives in dysarthria, Journal of the
Acoustical Society of America 130:2446
- Heejin Kim, Mark Hasegawa-Johnson, and Adrienne Perlman, "Vowel
Contrast and Speech Intelligibility in Dysarthria," Folia Phoniatrica et
Logopaedica 63(4):187-194, 2011 (NIH DC0032301)
- Heejin Kim, Katie Martin, Mark Hasegawa-Johnson, and
Adrienne
Perlman, Frequency
of consonant articulation errors in dysarthric speech,
Clinical Linguistics & Phonetics 24(10):759-770, 2010 (NIH
DC0032301)
- Harsh Vardhan Sharma and Mark
Hasegawa-Johnson, State Transition
Interpolation and MAP Adaptation for HMM-based Dysarthric Speech
Recognition, HLT/NAACL Workshop on Speech and Language Processing
for Assistive Technology (SLPAT) pp. 72-79, 2010 (NSF 0534106).
- Heejin Kim, Mark Hasegawa-Johnson, Adrienne
Perlman, Acoustic
Cues to Lexical Stress in Spastic Dysarthria, Speech Prosody 2010
100891:1-4 (NIH R21-DC008090-A).
- Heejin Kim, Panying Rong, Torrey M. Loucks and Mark Hasegawa-Johnson,
Kinematic Analysis of Tongue
Movement Control in Spastic Dysarthria, Proceedings of Interspeech
2010, pp. 2578-2581 (NSF 0534106).
- Harsh Vardhan Sharma, Mark Hasegawa-Johnson, Jon Gunderson, and
Adrienne Perlman, Universal
Access: Speech Recognition for Talkers with Spastic Dysarthria,
Interspeech 42862:1-4, Brighton, September 2009 (NIH R21 DC008090A)
- Harsh Vardhan
Sharma,
Universal Access: Experiments in Automatic Recognition of Dysarthric
Speech, M.S. Thesis, 2008 (NSF 0534106).
- Heejin Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon
Gunderson, Thomas Huang, Kenneth Watkin, and Simone
Frame,
Dysarthric Speech Database for Universal Access Research,
Interspeech 2008, pp. 1741-4 (NSF 0534106; NIH DC008090A;
Data).
- Weimo Zhu, Mark Hasegawa-Johnson, Karen Chapman-Novakofski,
and Arthur
Kantor,
Cellphone-Based Nutrition E-Diary. National Nutrient Database
Conference, 2007 (Robert Wood Johnson Foundation).
- Weimo Zhu, Mark Hasegawa-Johnson, Arthur Kantor, Dan Roth, Yong
Gao, Youngsik Park, and Lin Yang, "E-coder for Automatic Scoring
Physical Activity Diary Data: Development and Validation." ACSM,
2007 (Robert Wood Johnson Foundation).
- Mark Hasegawa-Johnson, Jonathan Gunderson, Adrienne Perlman,
and Thomas
Huang,
HMM-Based and SVM-Based Recognition of the Speech of Talkers with
Spastic Dysarthria, ICASSP III:1060-3, May 2006 (NSF 0534106; NIH
DC008090A).
-
Arthur Kantor, Weimo Zhu and Mark Hasegawa-Johnson,
Restricted domain speech classification using automatic
transcription and SVMs,
Midwest Computational Linguistics Colloquium, 2005
- Weimo Zhu, Mark Hasegawa-Johnson, and Mital Arun
Gandhi,
Accuracy of Voice-Recognition Technology in Collecting Behavior Diary
Data. Association of Test Publishers (ATP): Innovations in
Testing, March 2005 (Robert Wood Johnson Foundation).
Multimedia Analytics
- Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson and Paris
Smaragdis, Singing-Voice Separation
From Monaural Recordings Using Deep Recurrent Neural Networks.
Proceedings of ISMIR 2014
- Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King,
Mark Hasegawa-Johnson, and Thomas
S. Huang, Saliency-Maximized Audio
Visualization and Efficient Audio-Visual Browsing for
Faster-than-Real-Time Human Acoustic Event Detection, ACM
Transactions on Applied Perception, in press (NSF 0807329)
-
Camille Goudeseune, 2012.
Effective browsing of long audio recordings.
ACM International Workshop on Interactive Multimedia on
Mobile and Portable Devices, 2012 (NSF 0807329;
Software on github
,
Software as a TGZ
).
- Kai-Hsiang Lin, Xiaodan Zhuang, Camille Goudeseune, Sarah King,
Mark Hasegawa-Johnson and Thomas
Huang, Improving Faster-than-Real-Time
Human Acoustic Event Detection by Saliency-Maximized Audio
Visualization, ICASSP 2012, pp. 2277-2280 (NSF 0807329)
- Xiaodan Zhuang, Modeling Audio
and Visual Cues for Real-world Event Detection, Ph.D. Thesis,
University of Illinois, April 2011
- David Cohen, Camille Goudeseune and Mark Hasegawa-Johnson. 2009.
Efficient Simultaneous Multi-Scale
Computation of FFTs. Technical report GT-FODAVA-09-01 (NSF
0807329;
Software
).
- David
Petruncio, Evaluation
of Various Features for Music Genre Classification with Hidden Markov
Models. B.S. Thesis, 2002.
- James Beauchamp, Heinrich Taube, Sever Tipei, Scott Wyatt, Lippold
Haken and Mark
Hasegawa-Johnson, Acoustics,
Audio, and Music Technology Education at the University of Illinois,
JASA, 110(5):2961, 2001.
- Mark Hasegawa-Johnson, Jul Cha, Shamala Pizza and Katherine
Haker,
CTMRedit: A case study in human-computer interface design,
International Conference On Public Participation and Information
Tech., Lisbon, pp. 575-584, 1999 (NIH DC0032301;
Software).
- Robin Bargar, Insook Choi, Sumit Das, Camille Goudeseune.
1994.
Model-based interactive sound for an immersive virtual environment.
Proc. Intl. Computer Music Conf., 471-474, Aarhus, Denmark.
(software,
tutorial)
Human Speech and Language
Speech Production
- Roger Serwy, Hilbert Phase Methods
for Glottal Activity Detection, Ph.D. Thesis, 2017
- Karen Livescu, Frank Rudzicz, Eric Fosler-Lussier, Mark
Hasegawa-Johnson and Jeff Bilmes, ``Speech Production in Speech
Technologies: Introduction to the CSL Special Issue,'' Computer
Speech and Language 36:165-172, 2016
- G. Andrew, R. Arora, S. Bharadwaj, J. Bilmes,
M. Hasegawa-Johnson, and K. Livescu, ``Using articulatory
measurements to learn better acoustic features.'' In Proc. Workshop
on Speech Production in Automatic Speech Recognition, Lyon, France,
2013
- Amit Juneja and Mark Hasegawa-Johnson, ``Experiments on
context-awareness and phone error propagation in human and machine
speech recognition,'' Proc. Workshop on Speech Production in
Automatic Speech Recognition, Lyon, France, 2013
-
Hosung Nam, Vikramjit Mitra, Mark Tiede, Mark
Hasegawa-Johnson, Carol Espy-Wilson, Elliot Saltzman, and
Louis Goldstein, ``A procedure for estimating gestural
scores from speech acoustics,'' in J. Acoustical
Society of America, 132(6):3980-3989
-
Mark Hasegawa-Johnson, Shamala Pizza, Abeer Alwan, Jul Cha, and
Katherine
Haker,
Vowel Category Dependence of the Relationship Between Palate Height,
Tongue Height, and Oral Area, Journal of Speech, Language, and
Hearing Research, vol. 46, no. 3, pp. 738-753, 2003 (NIH DC0032301;
Data).
-
Yanli Zheng, Mark Hasegawa-Johnson, and Shamala
Pizza, PARAFAC Analysis of
the Three dimensional tongue Shape, Journal of the
Acoustical Society of America, vol. 113, no. 1,
pp. 478-486, January 2003 (NIH DC0032301).
-
Mark
Hasegawa-Johnson,
Line Spectral Frequencies are the Poles and Zeros of a Discrete
Matched-Impedance Vocal Tract Model, Journal of the Acoustical
Society of America, vol. 108, no. 1, pp. 457-460, 2000 (NIH
DC0032301).
-
Yanli Zheng and Mark
Hasegawa-Johnson,
Three Dimensional Tongue shape Factor Analysis, American
Speech-Language Hearing Association National Convention, Washington,
DC, 2000. Published in the magazine ASHA Leader, 5(16):144 (NIH
0032301).
-
Mark
Hasegawa-Johnson,
Preliminary Work and Proposed Continuation: Imaging of Speech Anatomy
and Behavior. Talk given at the Universities of Illinois
Inter-campus Biomedical Imaging Forum, 2001 (NIH 0032301).
-
Mark Hasegawa-Johnson, Jul Cha and
Katherine Haker,
CTMRedit: A Matlab-based tool for segmenting and interpolating MRI and
CT images in three orthogonal planes, 21st Annual International
Conference of the IEEE/EMBS Society, pp. 1170. 1999 (NIH 0032301).
-
Mark Hasegawa-Johnson, "Combining magnetic resonance image
planes in the Fourier domain for improved spatial resolution."
International Conference On Signal Processing Applications and
Technology, Orlando, FL, pp. 81.1-5, 1999 (NIH 0032301)
-
Mark
Hasegawa-Johnson,
Electromagnetic Exposure Safety of the Carstens Articulograph
AG100, Journal of the Acoustics Society of America, vol. 104,
pp. 2529-2532, 1998 (NIH 0032301).
-
Mark A. Johnson, "Using beam elements to model the vocal fold length
in breathy voicing," JASA 91:2420-2421, 1992.
Acquisition
-
Junrui Ni, Mark Hasegawa-Johnson and Odette Scharenborg,
The
Time-Course of Phoneme Category Adaptation in Deep Neural
Networks, in Lecture Notes in Artificial Intelligence 11816:3-18,
Proceedings of the 7th International Conference SLSP, Statistical
Language and Speech Processing, Ljubljana, Slovenia, October 14-16
2019
- Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson
and Najim
Dehak, Visualizing
Phoneme Category Adaptation in Deep Neural Networks,
Interspeech 2018
- Su-Youn Yoon, Lisa Pierce, Amanda Huensch, Eric Juul, Samantha
Perkins, Richard Sproat, and Mark
Hasegawa-Johnson, Construction of a rated
speech corpus of L2 learners' speech, CALICO journal, 2009
(Data Access:
Rated L2 Speech Corpus (public data)
- Su-Youn Yoon, Lisa Pierce, Amanda Huensch, Eric Juul, Samantha
Perkins, Richard Sproat, and Mark Hasegawa-Johnson, "Construction of a
rated speech corpus of L2 learners' speech," CALICO workshop, 2008
- Soo-Eun Chang, Nicoline Ambrose, Kirk Erickson, and Mark
Hasegawa-Johnson,
"Brain Anatomy Differences in Childhood Stuttering." Neuroimage,
(NIH DC05210, Illinois Research Board).
-
Soo-Eun Chang, Kirk I. Erickson, Nicoline G. Ambrose, Mark
Hasegawa-Johnson, and C.L. Ludlow, "Deficient white matter development
in left hemisphere speech-language regions in children who stutter."
Society for Neuroscience, Atlanta, GA, 2006 (NIH DC05210, Illinois
Research Board).
-
Soo-Eun Chang, Nicoline Ambrose, and Mark Hasegawa-Johnson,
"An MRI (DTI) study on children with persistent developmental
stuttering." 2004 ASHA Convention, American Speech Language and
Hearing Association, November, 2004 (Illinois Research Board).
Auditory Perception
- Odette Scharenborg, Jiska Koemans, Cybelle Smith, Mark
A. Hasegawa-Johnson, Kara
D. Federmeier, The
Neural Correlates Underlying Lexically-Guided Perceptual
Learning, in Proc. Interspeech 2019, pp. 1223-1227
- Mary Pietrowicz, Carla Agurto, Jonah Casebeer, Mark
Hasegawa-Johnson, Karrie Karahalios, Guillermo Cecchi,
Dimensional
Analysis of Laughter in Female Conversational Speech, in
Proc. ICASSP 2019, pp. 6600-6604, doi: 10.1109/ICASSP.2019.8683566
- Wenda Chen, Mark Hasegawa-Johnson, and Nancy F. Chen,
``Mismatched Crowdsourcing based Language Perception for
Under-resourced Languages.'' Procedia Computer Science 81:23--29,
2016 (doi:10.1016/j.procs.2016.04.025; ASTAR ADSC)
- Yanlu Xie, Mark Hasegawa-Johnson, Leyuan Qu, Jinsong Zhang,
Landmark of Mandarin Nasal Codas and its
Application in Pronunciation Error Detection, Proc. ICASSP 2016
- Kyungtae Kim, Kai-Hsiang Lin, Dirk B Walther, Mark A
Hasegawa-Johnson, and Thomas S
Huang, Automatic Detection of
Auditory Salience with Optimized Linear Filters Derived from Human
Annotation, Pattern Recognition Letters 38(1):78-85, doi:10.1016/j.patrec/2013.11.010
(Data) (NSF 0803219)
- Jeremy
Tidemann, Characterization of the
Head-Related Transfer Function using Chirp and Maximum Length Sequence
Excitation Signals, M.S. Thesis, 2011.
- Bryce E Lobdell, Jont B Allen, Mark A Hasegawa-Johnson,
Intelligibility predictors and neural
representation of speech, Speech Communication, in press
- Bryce
Lobdell, Models
of Human Phone Transcription in Noise Based on Intelligibility
Predictors, Bryce Lobdell, Ph.D. Thesis, 2009
- Yoonsook Mo, Jennifer Cole and Mark
Hasegawa-Johnson, How do ordinary
listeners perceive prosodic prominence? Syntagmatic vs. Paradigmatic
comparison. Spring Meeting of the ASA, 2009 (NSF 0703624)
- Bryce Lobdell, Mark Hasegawa-Johnson, and Jont
B. Allen, Human Speech Perception
and Feature Extraction, Interspeech 2008
- Yoonsook Mo, Jennifer Cole and Mark Hasegawa-Johnson,
Frequency
and repetition effects outweigh phonetic detail in
prominence perception, LabPhon 11 pp. 29-30, 2008.
- Mark
Hasegawa-Johnson,
Bayesian Learning for Models of Human Speech Perception, IEEE
Workshop on Statistical Signal Processing, St. Louis, MO, 2003,
393-396(NSF 0132900).
-
Sumiko Takayanagi, Mark Hasegawa-Johnson, Laurie S. Eisner and
Amy Schaefer-Martinez,
Information theory and variance estimation techniques in the analysis
of category rating data and paired comparisons. JASA, 102:3091,
1997
Distinctive Features
- Xiang Kong, Xuesong Yang, Jeung-Yoon Choi, Mark
Hasegawa-Johnson and Stefanie
Shattuck-Hufnagel, Landmark-based
consonant voicing detection on multilingual corpora,
Acoustics 17, Boston, June 25, 2017
- Di He, Boon Pang Lim, Xuesong Yang, Mark
Hasegawa-Johnson, and Deming
Chen, Selecting
frames for automatic speech recognition based on acoustic
landmarks, Acoustics 17, Boston, June 25, 2017
-
Mahmoud
Abunasser, Computational
Measures of Linguistic Variation: A Study of Arabic
Varieties, Ph.D. Thesis, UIUC, 2015
-
Elabbas Benmamoun and Mark Hasegawa-Johnson,
How Different
are Arabic Dialects from Each Other and from Classical
Arabic, 6th Annual Arabic Linguistics Symposium,
ISBN 9789027236180, Ifrane, Morocco, June 2013.
-
Hosung Nam, Vikramjit Mitra, Mark Tiede, Mark Hasegawa-Johnson,
Carol Espy-Wilson, Elliot Saltzman and Louis Goldstein, ``Automatic
gestural annotation of the U. Wisconsin X-ray Microbeam corpus,''
Workshop on New Tools and Methods for Very Large Scale Phonetics
Research, University of Pennsylvania, Jan. 2011
-
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis
Goldstein, and Elliot
Saltzman, Articulatory
Phonological Code for Word Recognition, Interspeech, 34549:1-4,
Brighton, September 2009 (NSF 0703624)
-
Sarah Borys, An SVM Front
End Landmark Speech Recognition System, M.S. Thesis,
2008.
-
Xiaodan Zhuang, Hosung Nam, Mark Hasegawa-Johnson, Louis
Goldstein, and Elliot
Saltzman,
The Entropy of Articulatory Phonological Code: Recognizing Gestures
from Tract Variables, Interspeech 2008 (NSF 0703624, NSF 0703782,
NIH DC02717).
-
Rahul Chitturi and Mark
Hasegawa-Johnson, Novel
Time-Domain Multi-class SVMs for Landmark Detection, Interspeech,
September 2006.
-
Mark
Hasegawa-Johnson, "Time-Frequency
Distribution of Partial Phonetic Information Measured Using Mutual
Information," Interspeech IV:133-136, Beijing, 2000
(Data).
-
Mark
A. Hasegawa-Johnson, Burst
spectral measures and formant frequencies can be used to
accurately discriminate stop place of articulation,
JASA, 98:2890, 1995
-
Mark A. Johnson, A mapping between trainable generalized
properties and the acoustic correlates of distinctive
features, MIT Speech Communication Group Working Papers,
vol. 9, pp. 94-105, 1994.
-
Mark Johnson, Automatic
context-sensitive measurement of the acoustic correlates of
distinctive features, ICSLP, Yokohama, pp. 1639-1643, 1994
-
Mark
A. Johnson, A
mapping between trainable generalized properties and the
acoustic correlates of distinctive features, JASA,
vol. 94, p. 1865, 1993.
Phonetics
-
Mark
Hasegawa-Johnson, Unwritten
Languages as a Test Case for the Theory of Phonetic
Universals, ISCSLP 2018
-
İ. Yücel Özbek, Mark Hasegawa-Johnson, and Yübeccel
Demirekler,
Formant Trajectories for
Acoustic-to-Articulatory Inversion, Interspeech 95957:1-4,
Brighton, September 2009
-
Yanli Zheng,
Feature Extraction and Acoustic Modeling for Speech
Recognition. Ph.D. Thesis, 2005 (NSF 0132900;
Software)
- Yanli Zheng, Mark Hasegawa-Johnson, and Sarah
Borys, Stop
Consonant Classification by Dynamic Formant
Trajectory. Interspeech pp. 396-9, October, 2004 (NSF
0132900).
-
Yanli Zheng and Mark
Hasegawa-Johnson,
Formant Tracking by Mixture State Particle Filter, ICASSP 2004
(NSF 0132900).
-
Yanli Zheng and Mark
Hasegawa-Johnson,
Particle Filtering Approach to Bayesian Formant Tracking, IEEE
Workshop on Statistical Signal Processing, September, 2003, 581-584
(NSF 0132900).
-
Mark
A. Hasegawa-Johnson, Formant
and Burst Spectral Measurements with Quantitative Error
Models for Speech Sound Classification, Ph.D. Thesis,
MIT, 1996
Prosody Analysis
- Yang Zhang, Gautham Mysore, Florian Berthouzoz and Mark
Hasegawa-Johnson Analysis of Prosody
Increment Induced by Pitch Accents for Automatic Emphasis
Correction, Speech Prosody 2016
- Mark Hasegawa-Johnson, ``Four lectures on phonology, prosody, and
automatic speech recognition.'' Winter School on Speech and Audio
Processing, January 8-11, 2016 (WiSSAP16), Chennai,
India. lecture
1, lecture
2, lecture
3, lecture 4
- Preethi Jyothi, Jennifer Cole, Mark Hasegawa-Johnson and Vandana Puri, An Investigation of Prosody in Hindi Narrative Speech, Proceedings of Speech Prosody 2014 (QNRF 09-410-1-069)
- Tim Mahrt, Jennifer Cole, Margaret Fleck and Mark
Hasegawa-Johnson, Accounting
for Speaker Variation in the Production of Prominence using the
Bayesian Information Criterion, Speech Prosody 2012 (NSF
0703624)
- Jui-Ting Huang, Semi-Supervised
Learning for Acoustic and Prosodic Modeling in Speech
Applications, Ph.D. thesis, University of Illinois, 2012
- Tim Mahrt, Jui-Ting Huang, Yoonsook Mo, Mark Hasegawa-Johnson, and
Jennifer
Cole, Optimal
models of prosodic prominence using the Bayesian information
criterion, Proc. Interspeech pp. 2037-2040, 2011
- Yoonsook Mo, Jennifer Cole, and Mark Hasegawa-Johnson,
Prosodic effects on temporal structure of monosyllabic CVC words in
American English, Speech Prosody 2010 100208:1-4 (NSF
0703624).
- Jennifer Cole, Yoonsook Mo, and Mark Hasegawa-Johnson,
Signal-based
and expectation-based factors in the perception of prosodic
prominence, Journal of Laboratory Phonology, in press (NSF
0703624)
- Yoonsook Mo, Jennifer Cole and Mark
Hasegawa-Johnson, Prosodic effects
on vowel production: evidence from formant structure, Interspeech
19096:1-4, Brighton, September 2009 (NSF 0703624)
- Yoonsook Mo, Jennifer Cole and Mark
Hasegawa-Johnson, How do ordinary
listeners perceive prosodic prominence? Syntagmatic vs. Paradigmatic
comparison. Spring Meeting of the ASA, 2009 (NSF 0703624)
- Taejin Yoon, Jennifer Cole and Mark
Hasegawa-Johnson,
On the edge: Acoustic cues to layered prosodic domains, in
Proc. International Congress on Phonetic Sciences (ICPhS) 1264:1017-1020
Saarbrücken, August, 2007 (NSF 0414117).
- Taejin Yoon, Jennifer Cole and Mark
Hasegawa-Johnson,
On the edge: Acoustic cues to layered prosodic domains. 81st
Annual Meeting of the Linguistic Society of America, Anaheim, CA,
January 5, 2007 (NSF 0414117).
- Jennifer Cole, Heejin Kim, Hansook Choi, and Mark
Hasegawa-Johnson, "Prosodic effects on acoustic cues to stop voicing
and place of articulation: Evidence from Radio News speech." J
Phonetics 35:180-209, 2007 (NSF 0414117).
- Heejin Kim, Taejin Yoon, Jennifer Cole and Mark Hasegawa-Johnson,
Acoustic
differentiation of L- and L-L% in Switchboard and Radio News
speech. Proceedings of Speech Prosody 2006, Dresden (NSF
0414117).
- Rajiv Reddy, Analysis of Pitch Contours
in Repetition-Disfluency Using Stem-ML, B.S. Thesis, 2006
- Taejin Yoon, "Mapping Syntax and
Prosody." Midwest Computational Linguistics Colloquium, Columbus,
OH, 2005 (NSF 0414117).
-
Jeung-Yoon Choi, Mark Hasegawa-Johnson, and Jennifer Cole, "Finding Intonational Boundaries Using
Acoustic Cues Related to the Voice Source." Journal of the Acoustical
Society of America 118(4):2579-88, 2005 (Illinois CRI).
-
Cole, Jennifer, Mark Hasegawa-Johnson, Chilin Shih, Eun-Kyung Lee,
Heejin Kim, H. Lu, Yoonsook Mo, Tae-Jin Yoon. (2005). "Prosodic Parallelism as a Cue to
Repetition and Hesitation Disfluency," Proceedings of DISS'05 (An
ISCA Tutorial and Research Workshop), Aix-en-Provence, France,
pp. 53-58 (NSF 0414117).
-
Taejin Yoon, Sandra Chavarria, Jennifer Cole, and Mark
Hasegawa-Johnson,
Intertranscriber Reliability of Prosodic Labeling on Telephone
Conversation Using ToBI. Interspeech, October, 2004 (Illinois
CRI).
-
Tae-Jin Yoon, Heejin Kim, and Sandra Chavarría. "Local Acoustic Cues Distinguishing Two
Levels of prosodic Phrasing: Speech Corpus Evidence," Lab phon 9,
University of Illinois at Urbana-Champaign, 2004 (Illinois CRI).
-
Heejin Kim, Jennifer Cole, Hansook Choi, and Mark
Hasegawa-Johnson,
The Effect of Accent on Acoustic Cues to Stop Voicing and Place of
Articulation in Radio News Speech, SpeechProsody 2004, Nara,
Japan, March 2004, 29-32 (Illinois CRI).
-
Sandra Chavarria, Taejin Yoon, Jennifer Cole, and Mark
Hasegawa-Johnson,
Acoustic differentiation of ip and IP boundary levels: Comparison of
L- and L-L% in the Switchboard corpus, Speech Prosody 2004, Nara,
Japan, March 2004, 333-336 (Illinois CRI).
-
Jennifer Cole, Hansook Choi, Heejin Kim and Mark
Hasegawa-Johnson, The
effect of accent on the acoustic cues to stop voicing in Radio
News speech, ICPhS 2003, pp. 2665-2668
- Mark A. Johnson, "Analysis of durational rhythms in two poems by
Robert Frost," MIT Speech Communication Group Working Papers, vol. 8,
pp. 29-42, 1992.