Keypoints Definition
Extract keypoints
Quick overview for selecting the model:
Model | Install | Comment |
---|---|---|
mediapipe | easy install | only support 1 person |
yolo+hrnet | medium | on feet keypoints |
openpose | hard | multi person+feet |
In most common usage, we use body25 format
of OpenPose1 as our standard keypoints. Outputs of other method like HRNet
2, mediapipe
3 will be converted to body25 format
.
For each image, we record its 2D pose in a json
file. For an image at root/images/1/000000.jpg
, the 2D pose willl store at root/annots/1/000000.json
. The content of the annotation file is:
{
"filename": "images/0/000000.jpg",
"height": <the height of image>,
"width": <the width of image>,
"annots:[
{
'personID': 0, # ID of person
'bbox': [l, t, r, b, conf],
'keypoints': [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
'area': <the area of bbox>
},
{
'personID': 1, # ID of person
'bbox': [l, t, r, b, conf],
'keypoints': [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
'area': <the area of bbox>
}
]
}
For each keypoints
, [x0, y0, c0]
means the (x
, y
) position in image and confidence
of this keypoints. It’s supposed to be 0
if this keypoint is invisible.
If you use hand and face, the annot is defined as:
{
"personID": i,
"bbox": [l, t, r, b, conf],
"keypoints": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
"bbox_handl2d": [l, t, r, b, conf],
"bbox_handr2d": [l, t, r, b, conf],
"bbox_face2d": [l, t, r, b, conf],
"handl2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
"handr2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
"face2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]]
}
YOLOv4+HRNet
Download the model from their official websites: HRNet
data/models
├── pose_hrnet_w48_384x288.pth
└── yolov4.weights
No other requirements are needed, just run:
python3 apps/preprocess/extract_keypoints.py ${data} --mode yolo-hrnet
OpenPose
OpenPose1 can detect the human body, hand, facial and foot keypoints, you should install this follow their tutorial.
openpose=<path/to/openpose/installation>
# detect the body and feet keypoints
python3 apps/preprocess/extract_keypoints.py ${data} --mode openpose --openpose ${openpose}
# detect the hand and face if needed
python3 apps/preprocess/extract_keypoints.py ${data} --mode openpose --openpose ${openpose} --hand --face
Mediapipe
Install it with pip:
python3 -m pip install mediapipe
Run the detection of full body:
python3 apps/preprocess/extract_keypoints.py ${data} --mode mp-holistic
YOLOv4+Openpose
This mode will first perform human detection4 and second run the OpenPose on the cropped images.
python3 apps/preprocess/extract_keypoints.py ${data} --mode openposecrop --openpose ${openpose}
YOLOv4+HRNet+Openpose
This mode will first perform human detection and second run the HRNet on the cropped images. Finnally use OpenPose to detect the feet keypoints.
python3 apps/preprocess/extract_keypoints.py ${data} --mode yolo-hrnet & python3 apps/preprocess/extract_keypoints.py ${data} --mode feetcrop --openpose ${openpose} --force
-
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: real-time multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018) ↩ ↩2
-
Sun, Ke, et al. “Deep high-resolution representation learning for human pose estimation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. ↩
-
Lugaresi, Camillo, et al. “Mediapipe: A framework for building perception pipelines.” arXiv preprint arXiv:1906.08172 (2019). ↩
-
Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. “Yolov4: Optimal speed and accuracy of object detection.” arXiv preprint arXiv:2004.10934 (2020). ↩