Sparsh Jain

नमस्ते, I'm Sparsh

I am a Research Associate at the renowned AI4Bharat Lab at IIT Madras, where I have the privilege of being guided by Prof. Mitesh M. Khapra, Prof. Anoop Kunchukuttan, and Dr. Raj Dabre. My current research focuses on developing resources and evaluation methodologies for large-scale multilingual models across various modalities, including vision and audio. Previously, I was a research intern right here at AI4Bharat ( I liked the work so much, I decided to stay! ), where I had the valuable opportunity to contribute to the IndicLLMSuite project. I also gained industry experience as a data science intern at Culinda, working under the supervision of Shrasthi Singal. I earned my Bachelor's degree in Computer Science and Engineering from Maharaja Agrasen Institute of Technology, Delhi. I am deeply passionate about new advances in generative AI, exploring the frontiers of AI, and building robust, scalable models, datasets, and benchmarks. If you'd like to chat about research, academia, or potential collaborations, please feel free to reach out to me at sjshiva8287@gmail.com.

Latest News

7th Aug 2025

Our paper, “Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts,” has been accepted to the MELT Workshop at COLM 2025

1st Jul 2025

Our new preprint, “Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts,” is now available on arXiv. Check it out and let us know your thoughts!

16th Jun 2025

Thrilled to be selected as a volunteer for ACL 2025 in Vienna, Austria 🇦🇹! Grateful for the opportunity.

25th May 2025

After cooking for a while, our work “Bhasaanuvaad” has been accepted to the main conference at ACL 2025.

14th Aug 2024

Our paper, IndicLLMSuite, has received an Outstanding Paper Award 🏅 at ACL 2024!

Recent Publications

Mark My Words: A Robust Multilingual Model for Punctuation in Text and Speech Transcripts

Sidharth Pulipaka, Sparsh Jain, Ashwin Sankar, Raj Dabre

1st Workshop on Multilingual and Equitable Language Technologies (MELT) @ COLM 2025

MELT 2025

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages

Ashwin Sankar*, Sparsh Jain*, Nikhil Narasimhan, Devilal Choudhary, Dhairya Suman, Mohammed Safi Ur Rahman Khan, Anoop Kunchukuttan, Mitesh M. Khapra, Raj Dabre

The 63rd Annual Meeting of the Association for Computational Linguistics Vienna, Austria

ACL 2025

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

Mohammed Safi Ur Rahman Khan*, Priyam Mehta*, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad B, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

The 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand

ACL 2024
🏆 Outstanding Paper Award

© 2025 Sparsh Jain. All rights reserved.