Popular research‐driven alternatives to MediaPipe include OpenPose by CMU, offering real‐time multi‐person 2D keypoint detection across body, face, and hands citeturn0search5. Another major framework is Detectron2 from Facebook AI Research, providing state‐of‐the‐art object detection, segmentation, and DensePose extensions within an extensible PyTorch ecosystem citeturn0search1. AlphaPose from MVIG‐SJTU achieves high mAP on COCO and real‐time performance for multi‐person tracking with an integrated online pose tracker citeturn0search7. High‐Resolution Network (HRNet) maintains high‐resolution feature maps throughout the entire network, boosting accuracy on pose benchmarks like COCO and PoseTrack citeturn0search3. TensorFlow’s MoveNet offers ultra‐fast 17‐keypoint 2D pose inference with “Lightning” and “Thunder” variants for latency vs. accuracy trade-offs citeturn0search4. Intel OpenVINO™ hosts deployable human‐pose‐estimation models (based on OpenPose or EfficientHRNet) optimized for CPU/GPU inference in its Model Zoo citeturn1search0. 3DiVi’s Nuitrack SDK supplies full‐body 3D skeleton tracking with both classical and AI-based engines for RGB-D sensors, suitable for embedded hardware citeturn1search1. Microsoft Azure Kinect Body Tracking SDK yields robust 3D joint detection for Azure Kinect DK devices on Windows 10/11, complete with C# bindings and Unity samples citeturn1search2. DeepLabCut leverages transfer learning for marker-less animal—and by extension human—pose estimation in Python with excellent accuracy on limited data citeturn1search3. Finally, Apple’s Vision framework supports on-device 2D and 3D body pose detection with up to 19 joint landmarks on iOS and macOS via Core ML citeturn1search4.
OpenPose is one of the earliest open-source real‐time multi-person keypoint detection libraries developed by CMU’s Perceptual Computing Lab citeturn0search5. It can estimate 2D skeletons for body, face, hands, and feet in a single pipeline and has extensive demos and language bindings (C++, Python, Unity) citeturn0search10.
Detectron2 is Facebook AI Research’s next-generation platform for object detection and segmentation, which also supports DensePose—mapping image pixels to a 3D human surface—via Caffe2/Detectron citeturn0search1turn0search16. Its modular design and PyTorch foundation make it highly extensible for custom pose-estimation research and production use cases citeturn0search6.
AlphaPose is an accurate multi-person pose estimator that first surpassed 70 mAP on COCO and integrates Pose Flow for online pose tracking across video frames citeturn0search7. It runs in real time on GPUs and provides ready‐to‐use demos in PyTorch and ONNX formats citeturn0search2.
High‐Resolution Network (HRNet) maintains high‐resolution feature representations throughout its layers rather than recovering them from low-resolution maps citeturn0search3. This architecture achieves state-of-the-art accuracy on multiple pose benchmarks while remaining compatible with standard PyTorch training and inference pipelines citeturn0search8.
MoveNet is TensorFlow’s ultra-fast 2D pose‐estimation model available via TF Hub, with two variants—Lightning for sub-15 ms inference and Thunder for higher accuracy—and achieves 30+ FPS on most devices citeturn0search4. It also integrates smoothly into TensorFlow.js for browser‐based pose detection citeturn0search9.
Intel OpenVINO™ offers optimized human-pose-estimation models in its Model Zoo, including human-pose-estimation-0001 (OpenPose-based MobileNet) and human-pose-estimation-0007 (EfficientHRNet), for efficient CPU/GPU inference citeturn1search0turn1search5. Example notebooks and demos (webcam, SDK) enable rapid prototyping on Intel hardware citeturn1search10.
Nuitrack by 3DiVi provides 3D skeleton tracking middleware with two engines—classical for speed and embedded use, and AI for complex poses—running on Windows, Linux, Android, and Unity via C#/C++ APIs citeturn1search1turn1search6. It supports RGB-D sensors like RealSense, Orbbec, and Azure Kinect citeturn1search21.
Microsoft’s Azure Kinect Body Tracking SDK delivers robust 3D joint detection using the Azure Kinect DK’s depth and RGB streams, with official C/C#/Python wrappers and Unity samples, and is updated regularly on GitHub citeturn1search2turn1search12.
DeepLabCut is a Python-based toolbox for markerless pose estimation initially designed for animal behavior but adaptable to humans, leveraging transfer learning to achieve human‐level accuracy with minimal training frames citeturn1search3turn1search8.
Apple’s Vision framework on iOS and macOS supports both 2D (up to 19 keypoints) and 3D body pose detection via Core ML models, enabling on-device inference for live camera feeds without external dependencies citeturn1search9turn1search4.