I am a Ph.D. student at Carnegie Mellon University's Language Technology Institute, working under the guidance of Prof. Shinji Watanabe. My research interests are in the field of Natural Language Processing (NLP) and Speech Processing, particularly in Spoken Dialog Systems, Spoken Language Understanding, Spoken Language Models, and Speech Foundation Models. My vision is to build real-time conversational agents that can truly listen, think, and speak like humans.
I’m passionate about building the next generation of end-to-end spoken dialogue systems. My recent work explores duplex interactions and chain-of-thought training, with a strong focus on evaluating real-time conversational abilities like turn-taking. I’m excited about the potential of large language models and speech foundation models to transform the conversation AI space.
During my Ph.D., I have had the opportunity to work as a research intern at leading technology companies including Meta Reality Labs (spoken dialogue systems), Apple Intelligence (working on audio foundation models), and IBM Research (streaming ASR models). These experiences have given me valuable insights into both academic research and industrial applications.
I have been fortunate to receive several recognitions for my work, including the IEEE Ganesh N. Ramaswamy Memorial Student Grant for being one of the top papers at ICASSP 2023, and first place wins in multiple tracks at the ICASSP 2023 STOP Challenge. I was also awarded the Institute Silver Medal for securing Department Rank 1 in my undergrad at IIT Delhi.
Total Publications: 49+ Conference/Journal Papers
For a complete list of publications, please see my CV or Google Scholar.