Keypoints Definition

  1. Extract keypoints
    1. YOLOv4+HRNet
    2. OpenPose
    3. Mediapipe
    4. YOLOv4+Openpose
    5. YOLOv4+HRNet+Openpose

Extract keypoints

Quick overview for selecting the model:

Model Install Comment
mediapipe easy install only support 1 person
yolo+hrnet medium on feet keypoints
openpose hard multi person+feet

In most common usage, we use body25 format of OpenPose1 as our standard keypoints. Outputs of other method like HRNet2, mediapipe3 will be converted to body25 format.

For each image, we record its 2D pose in a json file. For an image at root/images/1/000000.jpg, the 2D pose willl store at root/annots/1/000000.json. The content of the annotation file is:

    "filename": "images/0/000000.jpg",
    "height": <the height of image>,
    "width": <the width of image>,
            'personID': 0, # ID of person
            'bbox': [l, t, r, b, conf],
            'keypoints': [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
            'area': <the area of bbox>
            'personID': 1, # ID of person
            'bbox': [l, t, r, b, conf],
            'keypoints': [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
            'area': <the area of bbox>

For each keypoints, [x0, y0, c0] means the (x, y) position in image and confidence of this keypoints. It’s supposed to be 0 if this keypoint is invisible.

If you use hand and face, the annot is defined as:

    "personID": i,
    "bbox": [l, t, r, b, conf],
    "keypoints": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
    "bbox_handl2d": [l, t, r, b, conf],
    "bbox_handr2d": [l, t, r, b, conf],
    "bbox_face2d": [l, t, r, b, conf],
    "handl2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
    "handr2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]],
    "face2d": [[x0, y0, c0], [x1, y1, c1], ..., [xn, yn, cn]]


Download the model from their official websites: HRNet

├── pose_hrnet_w48_384x288.pth
└── yolov4.weights

No other requirements are needed, just run:

python3 apps/preprocess/ ${data} --mode yolo-hrnet


OpenPose1 can detect the human body, hand, facial and foot keypoints, you should install this follow their tutorial.

# detect the body and feet keypoints
python3 apps/preprocess/ ${data} --mode openpose --openpose ${openpose}
# detect the hand and face if needed
python3 apps/preprocess/ ${data} --mode openpose --openpose ${openpose} --hand --face


Install it with pip:

python3 -m pip install mediapipe

Run the detection of full body:

python3 apps/preprocess/ ${data} --mode mp-holistic


This mode will first perform human detection4 and second run the OpenPose on the cropped images.

python3 apps/preprocess/ ${data} --mode openposecrop --openpose ${openpose}


This mode will first perform human detection and second run the HRNet on the cropped images. Finnally use OpenPose to detect the feet keypoints.

python3 apps/preprocess/ ${data} --mode yolo-hrnet & python3 apps/preprocess/ ${data} --mode feetcrop --openpose ${openpose} --force
  1. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: Openpose: real-time multi-person 2d pose estimation using part affinity fields. arXiv preprint arXiv:1812.08008 (2018)  2

  2. Sun, Ke, et al. “Deep high-resolution representation learning for human pose estimation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. 

  3. Lugaresi, Camillo, et al. “Mediapipe: A framework for building perception pipelines.” arXiv preprint arXiv:1906.08172 (2019). 

  4. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. “Yolov4: Optimal speed and accuracy of object detection.” arXiv preprint arXiv:2004.10934 (2020).