Top-Down Multi-person Pose Estimation

Introduction

Pose estimation find the keypoints belong to the people in the image. There are two methods exist for pose estimation.

Bottom-Up first finds the keypoints and associates them into different people in the image. (Generally faster and lower accuracy)
Top-Down first detect people in the image and estimate the keypoints. (Generally computationally intensive but better accuracy)

This repo will only include top-down pose estimation models.

Model Zoo

COCO-val with 56.4 Detector AP

Model	Backbone	Image Size	AP	AP⁵⁰	AP⁷⁵	Params ^(M)	FLOPs ^(B)	FPS	Weights
PoseHRNet	HRNet-w32	256x192	74.4	90.5	81.9	29	7	25	download
	HRNet-w48	256x192	75.1	90.6	82.2	64	15	24	download
SimDR	HRNet-w32	256x192	75.3	-	-	31	7	25	download
	HRNet-w48	256x192	75.9	90.4	82.7	66	15	24	download

Note: FPS is tested on a GTX1660ti with one person per frame including pre-processing, model inference and post-processing. Both detection and pose models are in PyTorch FP32.

COCO-test with 60.9 Detector AP (click to expand)

Model	Backbone	Image Size	AP	AP⁵⁰	AP⁷⁵	Params ^(M)	FLOPs ^(B)	Weights
SimDR*	HRNet-w48	256x192	75.4	92.4	82.7	66	15	download
RLEPose	HRNet-w48	384x288	75.7	92.3	82.9	-	-	-
UDP+PSA	HRNet-w48	256x192	78.9	93.6	85.8	70	16	-

Download Backbone Models' Weights (click to expand)

Model	Weights
HRNet-w32	download
HRNet-w48	download

Requirements

torch >= 1.8.1
torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.

Clone the repository recursively:

$ git clone --recursive https://github.com/sithu31296/pose-estimation.git

Inference

Download a YOLOv5m trained on CrowdHuman dataset from here. (The weights are from deepakcrk/yolov5-crowdhuman.)
Download a pose estimation model's weights from the tables.
Run the following command.

$ python infer.py --source TEST_SOURCE --det-model DET_MODEL_PATH --pose-model POSE_MODEL_PATH --img-size 640

Arguments:

source: Testing sources
- To test an image, set to image file path. (For example, assests/test.jpg)
- To test a folder containing images, set to folder name. (For example, assests/)
- To test a video, set to video file path. (For example, assests/video.mp4)
- To test with a webcam, set to 0.
det-model: YOLOv5 model's weights path
pose-model: Pose estimation model's weights path

Example inference results (image credit: [1, 2]):

References

https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
https://github.com/ultralytics/yolov5

Citations

@article{WangSCJDZLMTWLX19,
  title={Deep High-Resolution Representation Learning for Visual Recognition},
  author={Jingdong Wang and Ke Sun and Tianheng Cheng and 
          Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and 
          Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
  journal   = {TPAMI}
  year={2019}
}

@misc{li20212d,
  title={Is 2D Heatmap Representation Even Necessary for Human Pose Estimation?}, 
  author={Yanjie Li and Sen Yang and Shoukui Zhang and Zhicheng Wang and Wankou Yang and Shu-Tao Xia and Erjin Zhou},
  year={2021},
  eprint={2107.03332},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Top-Down Multi-person Pose Estimation

Introduction

Model Zoo

Requirements

Inference

References

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Top-Down Multi-person Pose Estimation

Introduction

Model Zoo

Requirements

Inference

References

Citations