Novel View Synthesis of Human Interactions From Sparse Multi-view Videos
Results on ZJUMoCap
Results in the wild
Video comes from 8 GoPro cameras.
Download the example data. For convenient downloads, we just upload the compressed videos, you should first extract images from the videos:
data=<path/to/example/data>
# extract the images
python3 apps/preprocess/extract_image.py ${data}
Then you should extract the vertices from the SMPL parameters:
python3 apps/postprocess/write_vertices.py ${data}/output-smpl-3d/smpl ${data}/output-smpl-3d/vertices --cfg_model ${data}/output-smpl-3d/cfg_model.yml --mode vertices
Install
First you should install the easymocap
environment. This project depends on: pytorch-lightning
, spconv
. See requirements_neuralbody.txt
for more details.
pip install -r requirements_neuralbody.txt
Train
data=/path/to/dataset
# Recommand training with 4x3090
python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0,1,2,3
# Reduce the number of rays if you train with RTX 1080Ti/3060
python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, data_share_args.sample_args.nrays 1024
Demo
# render with 4x3090
python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0,1,2,3 --demo
# (not recommand)
python3 apps/neuralbody/demo.py --mode soccer1_6 ${data} --gpus 0, data_share_args.sample_args.nrays 1024 --demo
Limitations and future work
Currently, the proposed approach is limited to the setting of multiple human performers, only balls as objects, a simple background and a calibrated camera array. As future work, the system can be enhanced in several ways to handle more general settings.
- Recovering the human interaction from moving cameras or even a monocular video can be further investigated.
- More general objects can be handled by tracking the 6DoF poses with object pose trackers.
- If offline scanning of the background is available, the rendering quality of the background can be further improved.
Related Works
There are lots of wonderful works that inspired our work:
Bibtex
@inproceedings{shuai2022multinb,
title={Novel View Synthesis of Human Interactions from Sparse
Multi-view Videos},
author={Shuai, Qing and Geng, Chen and Fang, Qi and Peng, Sida and Shen, Wenhao and Zhou, Xiaowei and Bao, Hujun},
booktitle={SIGGRAPH Conference Proceedings},
year={2022}
}
Acknowledgement
The authors would like to acknowledge support from NSFC (No. 62172364).
We would like to thank Haian Jin for the work in processing the instance segmentation.
We thank Zhengdong Hong’s advice for generating the visualizations.
Special thanks to the Women’s campus football team of Zhejiang University and Beijia Chen.