About me
Iām a fourth year Ph.D. candidate from X-LANCE Lab, Shanghai Jiao University, supervised by Prof. Mengyue Wu and Prof. Kai Yu. My research interest includes audio / speech / music understanding and generation, and large language models.
You can find my CV here.
I am going to graduate in June, 2025 and I am open to job opportunities in 2025. Please feel free to contact me via LinkedIn or Wechat.
š Research & Projects
My research mainly focuses on general audio understanding and generation, including tasks such as audio captioning, text to audio grounding, audio-text retrieval and text to audio generation. I am also interested in speech / music understanding and generation, and their interaction with general audio.
Here are some selected publications or projects:
- Audio Understanding
- Audio captioning with enhanced accuracy, diversity, temporal accuracy and efficiency
- Text to audio grounding: the task and weakly-supervised training paradigm
- Audio-Text Data Curation / Augmentation and Audio-Text Pre-Training
- Audio Generation
- Visual-enhanced diverse generation
- PicoAudio with a temporal-sensitive evaluation benchmark
- Audio Codec for Audio LLM
- Content Creation with LLM Agent
For my full publication list, please refer to publication page.
š Education Experience
- 2019.9 ~ 2025.6, Ph.D., Shanghai Jiao Tong University, Shanghai, China
- Supervised by Prof. Mengyue Wu and Prof. Kai Yu.
- Member of Wu Wenjun honorary doctoral class in artificial intelligence
- 2023.10 ~ 2024.4, visiting Ph.D., University of Surrey, Guildford, UK
- Supervised by Prof. Mark D. Plumbley and Prof. Wenwu Wang
- 2015.9 ~ 2019.6, Bachalor, Shanghai Jiao University, Shanghai, China
- Member of international pilot class
- Supervised by Leyun Wang
- 2016 National Scholarship