Xiufeng Song
Logo Shanghai Jiao Tong University
Logo Shanghai Artificial Intelligence Laboratory

Hi, I am Xiufeng Song, a three-year MSc. student in Shanghai Jiao Tong University, supervised by Prof. Xiaohong Liu. Currently, I work as a research intern at Shanghai Artificial Intelligence Laboratory, advised by Prof. Lei Bai.

My research involves building embodied agents and intelligence on optimal decision-making, strategic planning, and generalizable manipulation, including:

  • Robotic planning and manipulation;
  • Autonomous and collaborative multi-agent intelligence;
  • World models for bridging Sim2Real gaps.

Feel free to contact me for collaboration if you share a similar interest.

Service:

  • Conference reviewer: AAAI 2025, ICLR 2025, CVPR 2025.
  • Journal reviwer: TPAMI.


Education
  • Shanghai Jiao Tong University
    Shanghai Jiao Tong University
    SJTU Multimedia Lab
    MSc in Computer Science and Technology
    Sep. 2023 - present
  • Shanghai Jiao Tong University
    Shanghai Jiao Tong University
    B.S. in Computer Science and Technology
    Sep. 2019 - Jun. 2023
Experience
  • Shanghai Artificial Intelligence Laboratory
    Shanghai Artificial Intelligence Laboratory
    Research Intern, advised by Prof. Lei Bai
    Sep. 2024 - present
Honors
2025
Best Paper Award at CVPR 2025 MEIS Workshop
Oct 18
2024
John Hopcroft Excellent Master Award
Dec 14
2023
2nd winner of ICCV's workshop challenge on Deepfake Detection
Sep 22
News
2025
We release the MARS Challenge at NeurIPS 2025 SpaVLE Workshop.
Sep 18
Viki-r has been accepted to NeurIPS 2025 Benchmark.
Sep 18
RoboFactory has been accepted to ICCV 2025, also honored as Best Paper Award at CVPR 2025 MEIS Workshop.
Jun 30
M2F2-Det and UniSTD have been accepted to CVPR 2025.
Feb 27
2024
MMDet has been accepted to NeurIPS 2024.
Sep 27
Selected Publications (view all )
Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning
Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin# (* equal contribution, # corresponding author)

Neural Information Processing Systems (NeurIPS) Benchmark 2025

Viki-r: Coordinating embodied multi-agent cooperation via reinforcement learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai, Zhenfei Yin# (* equal contribution, # corresponding author)

Neural Information Processing Systems (NeurIPS) Benchmark 2025

Robofactory: Exploring embodied agent collaboration with compositional constraints
Robofactory: Exploring embodied agent collaboration with compositional constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai# (* equal contribution, # corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

Robofactory: Exploring embodied agent collaboration with compositional constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin, Xiaohong Liu, Xihui Liu, Ruimao Zhang, Lei Bai# (* equal contribution, # corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024

UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines

Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024 Oral

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu# (# corresponding author)

Computer Vision and Pattern Recognition Conference (CVPR) 2024 Oral

On learning multi-modal forgery representation for diffusion generated video detection
On learning multi-modal forgery representation for diffusion generated video detection

Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, Lei Bai, Xiaoming Liu, Guangtao Zhai, Xiaohong Liu# (# corresponding author)

Neural Information Processing Systems (NeurIPS) 2024

On learning multi-modal forgery representation for diffusion generated video detection

Xiufeng Song, Xiao Guo, Jiache Zhang, Qirui Li, Lei Bai, Xiaoming Liu, Guangtao Zhai, Xiaohong Liu# (# corresponding author)

Neural Information Processing Systems (NeurIPS) 2024

All publications