Siddhant Arora

I am a Ph.D. student at Carnegie Mellon University's Language Technology Institute, working under the guidance of Prof. Shinji Watanabe. My research interests are in the field of Natural Language Processing (NLP) and Speech Processing, particularly in Spoken Dialog Systems, Spoken Language Understanding, Spoken Language Models, and Speech Foundation Models. My vision is to build real-time conversational agents that can truly listen, think, and speak like humans.

I’m passionate about building the next generation of end-to-end spoken dialogue systems. My recent work explores duplex interactions and chain-of-thought training, with a strong focus on evaluating real-time conversational abilities like turn-taking. I’m excited about the potential of large language models and speech foundation models to transform the conversation AI space.

During my Ph.D., I have had the opportunity to work as a research intern at leading technology companies including Meta Reality Labs (spoken dialogue systems), Apple Intelligence (working on audio foundation models), and IBM Research (streaming ASR models). These experiences have given me valuable insights into both academic research and industrial applications.

I have been fortunate to receive several recognitions for my work, including the IEEE Ganesh N. Ramaswamy Memorial Student Grant for being one of the top papers at ICASSP 2023, and first place wins in multiple tracks at the ICASSP 2023 STOP Challenge. I was also awarded the Institute Silver Medal for securing Department Rank 1 in my undergrad at IIT Delhi.

Email / CV / Google Scholar / LinkedIn / Twitter / Github

Recent and Selected Publications

Chain-of-Thought Training for Open E2E Spoken Dialogue Systems

Siddhant Arora, Yifan Peng, Jinchuan Tian, Hayato Futami, Jee-weon Jung, Jiatong Shi, Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe

INTERSPEECH, 2025

ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe

NAACL, 2025 Conference Demo Track

Github / Demo

On the landscape of spoken language models: A comprehensive survey

Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, Yifan Peng, Haibin Wu, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

Arxiv 2025

On the Evaluation of Speech Foundation Models for Spoken Language Understanding

Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

ACL, 2024 Findings

Semi-Autoregressive Streaming ASR With Label Context

Siddhant Arora, George Saon, Shinji Watanabe, Brian Kingsbury

ICASSP, 2024 Conference

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe

ICASSP, 2023 Conference (IEEE Ganesh N. Ramaswamy Memorial Student Grant)

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

ICASSP, 2022

project page / Github / Demo

Total Publications: 49+ Conference/Journal Papers
For a complete list of publications, please see my CV or Google Scholar.