Captured in 2021 with our MonoStage algorithm.

Qing Shuai 帅青

I am a researcher in computer vision, with a focus on the multimodal understanding and generation of 3D content. My work spans reconstruction, animation, and generation of humans and scenes, with an emphasis on bringing these capabilities from research prototypes to real applications.

Since 2024, I have been a Senior Researcher at Tencent Hunyuan, where I work on production-grade 3D human motion generation, conditioned on a range of modalities including text, audio, and video. One recent output from this line of work is HY-Motion.

I received my BSc from Zhejiang University in 2019 and my PhD from the same institution in 2024, advised by Prof. Xiaowei Zhou. During my doctorate I led the development of EasyMocap, an open-source system for markerless motion capture from multi-view video, and published on neural rendering and free-viewpoint video of interacting people.

I have been particularly excited by recent progress in LLMs and agents — AI is evolving from a chatbot into a system that can invoke tools and interact with a computer, and increasingly observe, reason, and act the way a person does. I am interested in equipping such agents with multimodal understanding and generation capabilities, and, more broadly, in how AI can genuinely engage with the physical world.

Email GitHub Google Scholar Zhihu

↓

Experience

2024 —

Senior Researcher

Tencent Hunyuan · Shenzhen, China

Main driver of 3D human motion generation from multimodal control inputs, taking the technology from research prototype to production. Along the way, built out a full pipeline spanning data curation, model training, evaluation, and deployment. Also open-sourced part of this work as HY-Motion, which reached the Hugging Face weekly trending list.
2019 — 2024

Ph.D. in Computer Science

Zhejiang University · State Key Lab of CAD & CG · Hangzhou, China

Advised by Prof. Xiaowei Zhou. Focused on markerless motion capture and neural scene representations. Designed and maintained EasyMocap, a widely used open-source toolkit and currently the most starred motion capture system on GitHub. Published at CVPR, ICCV, ECCV, and SIGGRAPH — building a solid foundation in computer vision and computer graphics.
2015 — 2019

B.Eng. in Mechanical Engineering

Zhejiang University · Chu Kochen Honors College · Hangzhou, China

Coursework centered on modern control theory and the theory of wheeled and legged robots. The engineering mindset — sensing, actuation, and feedback — later shaped how I approach vision and 3D.

Featured projects

A thread runs through my work: how do we turn the physical world and human behavior into something a computer can perceive, model, and act within? The three projects below attack three sides of that question — perceiving people, modeling the world they inhabit, and acting inside it. The long-term goal is an AI that understands and moves through the real world the way we do.

EasyMocap

Open-source infrastructure for capturing human data.

Understanding people starts with data. EasyMocap turns a handful of ordinary cameras into a full pipeline for markerless motion capture — calibration, keypoint estimation, and SMPL fitting — bringing 3D human data within reach of anyone with a few cameras.

zju3dv/EasyMocap
4.7k stars
LoG

Real-time interactive photorealistic 3D scenes.

Behavior only makes sense inside a world. LoG (Level-of-Gaussians) proposes an adaptive hierarchical Gaussian representation whose level of detail follows the viewpoint, delivering real-time interaction across scales — from a single object to an entire city.

zju3dv/LoG
770 stars
HY-Motion 1.0

Understanding and generating human motion.

The last step is turning human intent into motion. We curated a large-scale human motion dataset and trained an MMDiT generative model on it, producing 3D human motion from language, audio, and control signals — with strong instruction understanding and fine-grained controllability.

Tencent-Hunyuan/HY-Motion-1.0
2.5k stars

Publications

Selected papers, most recent first. See Google Scholar for the complete list.

2026

PRISM: Streaming Human Motion Generation with Per-Joint Latent Decomposition

Zeyu Ling, Qing Shuai, Teng Zhang, Shiyang Li, Bo Han, Changqing Zou

arXiv Code
CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

Chengfeng Zhao, Jiazhi Shu, Yubo Zhao, Tianyu Huang, Jiahao Lu, Zekai Gu, Chengwei Ren, Zhiyang Dou, Qing Shuai, Yuan Liu

arXiv Code Project
AnyAct: Towards Human Reenactment of Character Motion From Video

Liuhan Chen, Lei Zhong, Jiewei Wang, Qing Shuai, Li Yuan, Leidong Fan, Qing Li, Kanglin Liu

arXiv
AnchorCrafter: Animate Cyber-Anchors Selling Your Products via Human-Object Interacting Video Generation

Ziyi Xu, Ziyao Huang, Juan Cao, Yong Zhang, Xiaodong Cun, Qing Shuai, Yuchen Wang, Linchao Bao, Fan Tang

TVCG Code Project

2025

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Yuxin Wen, Qing Shuai, Di Kang, Jing Li, Cheng Wen, Yue Qian, Ningxin Jiao, Changhai Chen, Weijie Chen, Yiran Wang, others

arXiv Code Demo
Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generations

Ruoxi Guo, Huaijin Pi, Zehong Shen, Qing Shuai, Zechen Hu, Zhumei Wang, Yajiao Dong, Ruizhen Hu, Taku Komura, Sida Peng, others

ICCV Code Project
IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu

CVPR Code
Dyn-e: Local Appearance Editing of Dynamic Neural Radiance Fields

Yinji ShenTu, Shangzhan Zhang, Mingyue Xu, Qing Shuai, Tianrun Chen, Sida Peng, Xiaowei Zhou

Computers & Graphics
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation

Zhi Cen, Huaijin Pi, Sida Peng, Qing Shuai, Yujun Shen, Hujun Bao, Xiaowei Zhou, Ruizhen Hu

ICLR Code Project

2024

Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos

Sida Peng, Zhen Xu, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Hujun Bao, Xiaowei Zhou

TPAMI
AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model

Beijia Chen, Yuefan Shen, Qing Shuai, Xiaowei Zhou, Kun Zhou, Youyi Zheng

arXiv

2023

Reconstructing Close Human Interactions from Multiple Views

Qing Shuai, Zhiyuan Yu, Zhize Zhou, Lixin Fan, Haijun Yang, Can Yang, Xiaowei Zhou

SIGGRAPH Asia
Implicit Neural Representations with Structured Latent Codes for Human Body Modeling

Sida Peng, Chen Geng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Xiaowei Zhou, Hujun Bao

TPAMI
Representing Volumetric Videos As Dynamic MLP Maps

Sida Peng, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou

CVPR
Learning Analytical Posterior Probability for Human Mesh Recovery

Qi Fang, Kang Chen, Yinghui Fan, Qing Shuai, Jiefeng Li, Weidong Zhang

CVPR Code
Learning Human Mesh Recovery in 3D Scenes

Zehong Shen, Zhi Cen, Sida Peng, Qing Shuai, Hujun Bao, Xiaowei Zhou

CVPR

2022

Novel View Synthesis of Human Interactions from Sparse Multi-view Videos

Qing Shuai, Chen Geng, Qi Fang, Sida Peng, Wenhao Shen, Xiaowei Zhou, Hujun Bao

SIGGRAPH PDF
QuickPose: Real-Time Multi-View Multi-Person Pose Estimation in Crowded Scenes

Zhize Zhou, Qing Shuai, Yize Wang, Qi Fang, Xiaopeng Ji, Fashuai Li, Hujun Bao, Xiaowei Zhou

SIGGRAPH PDF
Efficient Neural Radiance Fields for Interactive Free-viewpoint Video

Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, Xiaowei Zhou

SIGGRAPH Asia
Reconstructing Hand-Held Objects from Monocular Video

Di Huang, Xiaopeng Ji, Xingyi He, Jiaming Sun, Tong He, Qing Shuai, Wanli Ouyang, Xiaowei Zhou

SIGGRAPH Asia Code
Shape Prior Guided Instance Disparity Estimation for 3D Object Detection

Linghao Chen, Jiaming Sun, Yiming Xie, Siyu Zhang, Qing Shuai, Qinhong Jiang, Guofeng Zhang, Hujun Bao, Xiaowei Zhou

TPAMI

2021

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Sida Peng, Yuanqing Zhang, Yinghao Xu, Qianqian Wang, Qing Shuai, Hujun Bao, Xiaowei Zhou

CVPR Project
Reconstructing 3D Human Pose by Watching Humans in the Mirror

Qi Fang, Qing Shuai^*, Junting Dong, Hujun Bao, Xiaowei Zhou

CVPR
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

Sida Peng, Junting Dong, Qianqian Wang, Shangzhan Zhang, Qing Shuai, Xiaowei Zhou, Hujun Bao

ICCV

2020

Motion Capture from Internet Videos

Junting Dong^*, Qing Shuai^*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ECCV Project

Qing Shuai 帅青

Senior Researcher

Ph.D. in Computer Science

B.Eng. in Mechanical Engineering

EasyMocap

LoG

HY-Motion 1.0

PRISM: Streaming Human Motion Generation with Per-Joint Latent Decomposition

CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos

AnyAct: Towards Human Reenactment of Character Motion From Video

AnchorCrafter: Animate Cyber-Anchors Selling Your Products via Human-Object Interacting Video Generation

HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation

Motion-2-to-3: Leveraging 2D Motion Data for 3D Motion Generations

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

Dyn-e: Local Appearance Editing of Dynamic Neural Radiance Fields

Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation

Animatable Implicit Neural Representations for Creating Realistic Avatars from Videos

AniDress: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model

Reconstructing Close Human Interactions from Multiple Views

Implicit Neural Representations with Structured Latent Codes for Human Body Modeling

Representing Volumetric Videos As Dynamic MLP Maps

Learning Analytical Posterior Probability for Human Mesh Recovery

Learning Human Mesh Recovery in 3D Scenes

Novel View Synthesis of Human Interactions from Sparse Multi-view Videos

QuickPose: Real-Time Multi-View Multi-Person Pose Estimation in Crowded Scenes

Efficient Neural Radiance Fields for Interactive Free-viewpoint Video

Reconstructing Hand-Held Objects from Monocular Video

Shape Prior Guided Instance Disparity Estimation for 3D Object Detection

Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans

Reconstructing 3D Human Pose by Watching Humans in the Mirror

Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

Motion Capture from Internet Videos