Yes – by late 2025 there are models that report higher accuracy than GVHMR, but it depends on which setting you care about (global/world-grounded vs camera-space, single-shot vs multi-shot, biomechanics, etc.).
Here’s the short version by category.
GVHMR (SIGGRAPH Asia 2024) introduced Gravity-View coordinates and was clearly SOTA at the time for world-grounded motion from monocular video, outperforming WHAM and others on RICH/EMDB and 3DPW while being quite fast. citeturn1search0turn5view0
Since then:
WATCH jointly models camera and human trajectories and directly compares against GVHMR and WHAM on the same benchmarks and metrics.
On EMDB-2 (world space), using DPVO camera trajectories:
- GVHMR: WA-MPJPE₁₀₀ ≈ 111.0 mm, W-MPJPE₁₀₀ ≈ 276.5 mm, RTE ≈ 2.0 m
- WATCH (w/ cam traj.): WA-MPJPE₁₀₀ ≈ 107.6 mm, W-MPJPE₁₀₀ ≈ 272.2 mm, RTE ≈ 1.9 m, and lower jitter/foot-sliding. citeturn7view0
With GT gyro data, WATCH improves further (WA-MPJPE₁₀₀ ≈ 106.4 mm vs 109.1 for GVHMR, and lower RTE and jitter). citeturn7view0
In camera space (EMDB-1 / RICH / 3DPW), WATCH also shows slightly better MPJPE / PVE and smoother motion than GVHMR on most benchmarks, while using a similar training recipe. citeturn7view0
The authors explicitly state that WATCH achieves superior global and camera-space performance compared to GVHMR on RICH and EMDB. citeturn7view0
👉 Conclusion: In the same “world-grounded from monocular video” regime, WATCH is numerically more accurate than GVHMR on the standard global-motion benchmarks (RICH, EMDB, 3DPW) while also improving smoothness.
HumanMM: Global Human Motion Recovery from Multi-shot Videos targets multi-shot sequences (multiple cuts) from a single camera and compares directly to SLAHMR, WHAM, and GVHMR on their ms-Motion benchmark.
On ms-Motion (multi-shot AIST & Human3.6M), the table shows:
- GVHMR row vs HumanMM row: HumanMM achieves substantially lower PA-MPJPE, WA-MPJPE, RTE and ROE across shot counts, indicating more accurate global trajectories and orientations than GVHMR on this benchmark. citeturn4view0
👉 Conclusion: For multi-shot monocular global motion, HumanMM is clearly more accurate than GVHMR on the published benchmark.
WATCH’s paper notes that some camera-trajectory–centric approaches like TRAM and variants of PromptHMR-vid can achieve even better numerical global-motion metrics on some datasets by relying heavily on high-precision SLAM trajectories. However, they may suffer from physically implausible poses and discontinuities, so “better accuracy” depends on whether you care more about raw error numbers or physical realism. citeturn7view0turn6view0
If by “human motion recovery” you mainly mean pose and shape in camera coordinates from monocular images/videos, there are also newer, more accurate methods than GVHMR:
PromptHMR is a transformer-based, promptable HMR model that processes full images and accepts spatial & semantic prompts (boxes, masks, language). The paper reports state-of-the-art accuracy for camera-space pose/shape on standard HPS benchmarks (e.g., 3DPW) with both image and video versions. citeturn9search1
It doesn’t focus on world-grounded trajectories; it’s more about flexible, highly accurate 3D pose/shape from monocular inputs, especially in crowded/complex scenes.
BioPose aims at biomechanically accurate 3D poses from monocular videos, combining an MQ-HMR backbone, a Neural IK stage, and 2D-informed refinement. It shows significantly reduced joint errors against motion-capture ground truth and prior learning-based HPE/HMR methods on biomechanics benchmarks. citeturn1search5turn1search11
This is a different notion of “accuracy” (matching real joint locations / kinematics rather than 3DPW-style metrics), but if you care about biomechanics, BioPose can be “more accurate” in that sense than general HMR models like GVHMR.
So, as of November 2025:
- Yes – for world-grounded monocular motion, both WATCH (2025) and HumanMM (CVPR 2025) report better quantitative performance than GVHMR on their respective benchmarks, under comparable conditions. citeturn4view0turn7view0
- For camera-space pose/shape, methods like PromptHMR surpass older HMR models on standard 3DPW / Human3.6M metrics, and BioPose improves biomechanical realism relative to prior work. citeturn9search1turn1search11
- GVHMR is still a strong, widely-used baseline and is often used inside later pipelines (e.g., as a motion estimator), but it’s no longer the clear top performer on global-motion benchmarks.
If you tell me your exact use-case (single long shot vs multi-shot, static vs moving camera, need for strict biomechanics, etc.), I can suggest which of WATCH / HumanMM / PromptHMR / BioPose (or combinations) is likely the best fit.