Publications
You can also find my articles on my Google Scholar profile.
Audio Captioning
- Efficient Audio Captioning with Encoder-Level Knowledge Distillation, Xuenan Xu, Haohe Liu, Mengyue Wu, et al.
- Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu.
- Enhance Temporal Relations in Audio Captioning with Sound Event Detection, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu.
- Can Audio Captions be Evaluated with Image Caption Metrics?, Zelin Zhou, Zhiling Zhang, Xuenan Xu, Mengyue Wu, Kai Yu.
- Diversity-Controllable and Accurate Audio Captioning Based on Neural Condition, Xuenan Xu, Mengyue Wu, Kai Yu.
- Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu.
- A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu.
Audio-Text Retrieval
- Audio-Text Retrieval in Context, Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu.
Audio Event Detection and Separation
- Towards Weakly Supervised Text-to-Audio Grounding, Xuenan Xu, Ziyang Ma, Mengyue Wu, Kai Yu.
- Category-Adapted Sound Event Enhancement with Weakly Labeled Data, Guangwei Li, Xuenan Xu, Mengyue Wu, Kai Yu.
- Navigating Audio-Visual Event Detection across Mismatched Modalities, Guangwei Li, Xuenan Xu, Mengyue Wu, Kai Yu.
- A Lightweight Framework for Online Voice Activity Detection in the Wild, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu.
- Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events, Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu.
- Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training, Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu, and Kai Yu.
Audio-Text Data Curation
- AutoACD: A Large-scale Dataset for Audio-Language Representation Learning, Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie.
- A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds, Xuenan Xu, Xiaohang Xu, Zeyu Xie, et al.
- BLAT: Bootstrapping Language-Audio Pre-Training Based on AudioSet Tag-Guided Synthetic Data, Xuenan Xu, Zhiling Zhang, Zelin Zhou, et al.
Audio Generation
- PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation, Zeyu Xie, Xuenan Xu, Zhizheng Wu, Mengyue Wu.
- DiveSound: LLM-Assisted Automatic Taxonomy Construction for Diverse Audio Generation, Bihan Li, Zeyu Xie, Xuenan Xu, et al.
- Enhancing Audio Generation Diversity with Visual Information, Zeyu Xie, Baihan Li, Xuenan Xu, Mengyue Wu, Kai Yu.
- Diverse and Vivid Sound Generation from Text Descriptions, Guangwei Li, Xuenan Xu, Mengyue Wu, Kai Yu.