text2pose
Papers with tag text2pose
2022
- PoseScript: 3D Human Poses from Natural LanguageGinger Delmas, Philippe Weinzaepfel, Thomas Lucas, Francesc Moreno-Noguer, and Grégory RogezIn 2022
Natural language is leveraged in many computer vision tasks such as imagecaptioning, cross-modal retrieval or visual question answering, to providefine-grained semantic information. While human pose is key to humanunderstanding, current 3D human pose datasets lack detailed languagedescriptions. In this work, we introduce the PoseScript dataset, which pairs afew thousand 3D human poses from AMASS with rich human-annotated descriptionsof the body parts and their spatial relationships. To increase the size of thisdataset to a scale compatible with typical data hungry learning algorithms, wepropose an elaborate captioning process that generates automatic syntheticdescriptions in natural language from given 3D keypoints. This process extractslow-level pose information – the posecodes – using a set of simple butgeneric rules on the 3D keypoints. The posecodes are then combined into higherlevel textual descriptions using syntactic rules. Automatic annotationssubstantially increase the amount of available data, and make it possible toeffectively pretrain deep models for finetuning on human captions. Todemonstrate the potential of annotated poses, we show applications of thePoseScript dataset to retrieval of relevant poses from large-scale datasets andto synthetic pose generation, both based on a textual pose description.