About me

I’m a fourth year Ph.D. candidate from X-LANCE Lab, Shanghai Jiao University, supervised by Prof. Mengyue Wu and Prof. Kai Yu. My research interest includes audio / speech / music understanding and generation, and large language models.

You can find my CV here.

I am going to graduate in June, 2025 and I am open to job opportunities in 2025. Please feel free to contact me via LinkedIn or Wechat.

🚀 Research & Projects

My research mainly focuses on general audio understanding and generation, including tasks such as audio captioning, text to audio grounding, audio-text retrieval and text to audio generation. I am also interested in speech / music understanding and generation, and their interaction with general audio.

Here are some selected publications or projects:

Audio Understanding
- Audio captioning with enhanced accuracy, diversity, temporal accuracy and efficiency
- Text to audio grounding: the task and weakly-supervised training paradigm
Audio-Text Data Curation / Augmentation and Audio-Text Pre-Training
Audio Generation
- Visual-enhanced diverse generation
- PicoAudio with a temporal-sensitive evaluation benchmark
Audio Codec for Audio LLM
- SemantiCodec
Content Creation with LLM Agent
- AI storytelling for children

For my full publication list, please refer to publication page.

📖 Education Experience

2019.9 ~ 2025.6, Ph.D., Shanghai Jiao Tong University, Shanghai, China
- Supervised by Prof. Mengyue Wu and Prof. Kai Yu.
- Member of Wu Wenjun honorary doctoral class in artificial intelligence
2023.10 ~ 2024.4, visiting Ph.D., University of Surrey, Guildford, UK
- Supervised by Prof. Mark D. Plumbley and Prof. Wenwu Wang
2015.9 ~ 2019.6, Bachalor, Shanghai Jiao University, Shanghai, China
- Member of international pilot class
- Supervised by Leyun Wang
- 2016 National Scholarship

Xuenan Xu

🚀 Research & Projects

📖 Education Experience