Tutorial: Self-supervised Representation Learning for Speech Processing

Hung-yi Lee

Hung-yi Lee

National Taiwan University

Hung-yi Lee received the Ph.D. degree from National Taiwan University (NTU). He was a visiting scientist at the Spoken Language Systems Group of MIT CSAIL. He is an associate professor at National Taiwan University. He is the co-organizer of the special session on “New Trends in self-supervised speech processing” at Interspeech (2020), and the workshop on "Self-Supervised Learning for Speech and Audio Processing" at NeurIPS (2020).

Abdelrahman Mohamed

Abdelrahman Mohamed

Meta AI

Abdelrahman Mohamed is a research scientist at Meta AI research. He received his PhD from the University of Toronto where he was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He has been focusing lately on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.

Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Shinji Watanabe is an Associate Professor at CMU. He was a research scientist at NTT, Japan, a visiting scholar in Georgia Tech, a senior principal research scientist at MERL, and an associate research professor at JHU. He has published more than 200 peer-reviewed papers. He served as an Associate Editor of the IEEE TASLP. He was/has been a member of several technical committees, including the APSIPA SLA, IEEE SPS SLTC, and MLSP.

Tara Sainath

Tara Sainath

Google Research

Tara Sainath is a Principal Research scientist at Google. She received her PhD from MIT in the Spoken Language Systems Group. She is an IEEE and ISCA Fellow and the recipient of the 2021 IEEE SPS Industrial Innovation Award. Her research involves applications of deep neural networks for automatic speech recognition, and has been very active in the community organizing workshops and special sessions on this topic.

Karen Livescu

Karen Livescu

Toyota Technological Institute at Chicago (TTIC)

Karen Livescu is a Professor at TTI-Chicago. She completed her PhD at MIT in the Spoken Language Systems group. She is an ISCA Fellow and an IEEE Distinguished Lecturer, and has served as a program chair for ICLR 2019 and Interspeech 2022. Her recent work includes multi-view representation learning, acoustic word embeddings, visually grounded speech models, spoken language understanding, and automatic sign language recognition.

Shang-Wen Li

Shang-Wen Li

Meta AI

Shang-Wen Li is a Research and Engineering Manager at Meta AI. He worked at Apple Siri, Amazon Alexa and AWS. He completed his PhD in 2016 from the Spoken Language Systems group of MIT CSAIL. He co-organized the workshop of "Self-Supervised Learning for Speech and Audio Processing" at NeurIPS (2020) and AAAI (2022), and the workshop of "Meta Learning and Its Applications to Natural Language Processing" at ACL (2021).

Shu-wen Yang

Shu-wen Yang

National Taiwan University

Shu-wen Yang is a Ph.D. student at National Taiwan University. He co-created a benchmark for Self-Supervised Learning in speech, Speech processing Universal PERformance Benchmark (SUPERB). Before SUPERB, he created the S3PRL toolkit with Andy T. Liu, which supports numerous pretrained models and recipes for both pre-training and benchmarking. He gave a tutorial at the Machine Learning Summer School, Taiwan, 2021.

Katrin Kirchhoff

Katrin Kirchhoff

Amazon

Katrin is a Director of Applied Science at Amazon Web Services, where she heads several teams in speech and audio processing. She was a Research Professor at the UW, Seattle, for 17 years, where she co-founded the Signal, Speech and Language Interpretation Lab. She served on the editorial boards of Speech Communication and Computer, Speech, and Language, and was a member of the IEEE Speech Technical Committee.