Portrait
Xiufeng Song
Shanghai Jiao Tong University
Shanghai Artificial Intelligence Laboratory
About Me

Hi, I am Xiufeng Song, a three-year MSc. student in Shanghai Jiao Tong University, supervised by Prof. Xiaohong Liu. Currently, I work as a research intern at Shanghai Artificial Intelligence Laboratory, advised by Prof. Lei Bai.

My research involves building embodied agents and intelligence on optimal decision-making, strategic planning, and generalizable manipulation, including:

  • Robotic planning and manipulation;
  • Autonomous and collaborative multi-agent intelligence;
  • World models for bridging Sim2Real gaps.

Feel free to contact me for collaboration if you share a similar interest.

Service:

  • Conference reviewer: AAAI 2025, ICLR 2025, CVPR 2025.
  • Journal reviwer: TPAMI.

Education
  • Shanghai Jiao Tong University
    Shanghai Jiao Tong University
    SJTU Multimedia Lab
    MSc in Computer Science and Technology
    Sep. 2023 - present
  • Shanghai Jiao Tong University
    Shanghai Jiao Tong University
    B.S. in Computer Science and Technology
    Sep. 2019 - Jun. 2023
Experience
  • Shanghai Artificial Intelligence Laboratory
    Shanghai Artificial Intelligence Laboratory
    Research Intern, advised by Prof. Lei Bai
    Sep. 2024 - present
News
2025
We release the MARS Challenge at NeurIPS 2025 SpaVLE Workshop.
Sep 18
Viki-r has been accepted to NeurIPS 2025 Benchmark.
Sep 18
RoboFactory has been accepted to ICCV 2025, also honored as Best Paper Award at CVPR 2025 MEIS Workshop.
Jun 30
M2F2-Det and UniSTD have been accepted to CVPR 2025.
Feb 27
2024
MMDet has been accepted to NeurIPS 2024.
Sep 27
Selected Publications (view all )
Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning
Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin# (* equal contribution, # corresponding author)

Neural Information Processing Systems (NeurIPS) Benchmark 2025

Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin# (* equal contribution, # corresponding author)

Neural Information Processing Systems (NeurIPS) Benchmark 2025

Robofactory: Exploring embodied agent collaboration with compositional constraints
Robofactory: Exploring embodied agent collaboration with compositional constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai# (* equal contribution, # corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

Robofactory: Exploring embodied agent collaboration with compositional constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai# (* equal contribution, # corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024 Oral

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024 Oral

On learning multi-modal forgery representation for diffusion generated video detection
On learning multi-modal forgery representation for diffusion generated video detection

Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, Lei Bai, Xiaoming Liu, Guangtao Zhai, Xiaohong Liu# (# corresponding author)

Neural Information Processing Systems (NeurIPS) 2024

On learning multi-modal forgery representation for diffusion generated video detection

Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, Lei Bai, Xiaoming Liu, Guangtao Zhai, Xiaohong Liu# (# corresponding author)

Neural Information Processing Systems (NeurIPS) 2024

All publications