videos
Papers with tag videos
2022
- Spatio-temporal Tendency Reasoning for Human Body Pose and Shape Estimation from VideosBoyang Zhang, SuPing Wu, Hu Cao, Kehua Ma, Pan Li, and Lei LinIn 2022
In this paper, we present a spatio-temporal tendency reasoning (STR) networkfor recovering human body pose and shape from videos. Previous approaches havefocused on how to extend 3D human datasets and temporal-based learning topromote accuracy and temporal smoothing. Different from them, our STR aims tolearn accurate and natural motion sequences in an unconstrained environmentthrough temporal and spatial tendency and to fully excavate the spatio-temporalfeatures of existing video data. To this end, our STR learns the representationof features in the temporal and spatial dimensions respectively, to concentrateon a more robust representation of spatio-temporal features. More specifically,for efficient temporal modeling, we first propose a temporal tendency reasoning(TTR) module. TTR constructs a time-dimensional hierarchical residualconnection representation within a video sequence to effectively reasontemporal sequences’ tendencies and retain effective dissemination of humaninformation. Meanwhile, for enhancing the spatial representation, we design aspatial tendency enhancing (STE) module to further learns to excite spatiallytime-frequency domain sensitive features in human motion informationrepresentations. Finally, we introduce integration strategies to integrate andrefine the spatio-temporal feature representations. Extensive experimentalfindings on large-scale publically available datasets reveal that our STRremains competitive with the state-of-the-art on three datasets. Our code areavailable at https://github.com/Changboyang/STR.git.
使用了integration的策略来提升性能
- Streaming Radiance Fields for 3D Video SynthesisLingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, and Ping TanIn 2022
We present an explicit-grid based method for efficiently reconstructingstreaming radiance fields for novel view synthesis of real world dynamicscenes. Instead of training a single model that combines all the frames, weformulate the dynamic modeling problem with an incremental learning paradigm inwhich per-frame model difference is trained to complement the adaption of abase model on the current frame. By exploiting the simple yet effective tuningstrategy with narrow bands, the proposed method realizes a feasible frameworkfor handling video sequences on-the-fly with high training efficiency. Thestorage overhead induced by using explicit grid representations can besignificantly reduced through the use of model difference based compression. Wealso introduce an efficient strategy to further accelerate model optimizationfor each frame. Experiments on challenging video sequences demonstrate that ourapproach is capable of achieving a training speed of 15 seconds per-frame withcompetitive rendering quality, which attains 1000 \times speedup over thestate-of-the-art implicit methods. Code is available athttps://github.com/AlgoHunt/StreamRF.
提升训练速度. 提出一个explicit-grid based method, 将3d videos表示为a based model and per-frame model difference.
- NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance FieldsLiangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, and Andreas GeigerIn 2022
Visually exploring in a real-world 4D spatiotemporal space freely in VR hasbeen a long-term quest. The task is especially appealing when only a few oreven single RGB cameras are used for capturing the dynamic scene. To this end,we present an efficient framework capable of fast reconstruction, compactmodeling, and streamable rendering. First, we propose to decompose the 4Dspatiotemporal space according to temporal characteristics. Points in the 4Dspace are associated with probabilities of belonging to three categories:static, deforming, and new areas. Each area is represented and regularized by aseparate neural field. Second, we propose a hybrid representations basedfeature streaming scheme for efficiently modeling the neural fields. Ourapproach, coined NeRFPlayer, is evaluated on dynamic scenes captured by singlehand-held cameras and multi-camera arrays, achieving comparable or superiorrendering performance in terms of quality and speed comparable to recentstate-of-the-art methods, achieving reconstruction in 10 seconds per frame andreal-time rendering.
用一个hybrid representation-based feature streaming scheme表示动态场景.