InteractVLM: 3D Interaction Reasoning from 2D Foundational Models

Abstract

Computer Vision and Pattern Recognition – CVPR 2025

Publication
Conference on Computer Vision and Pattern Recognition (CVPR), 2025

We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes. Existing methods rely on 3D contact annotations collected via expensive motion-capture systems or tedious manual labeling, limiting scalability and generalization. To overcome this, InteractVLM harnesses the broad visual knowledge of large Vision-Language Models (VLMs), fine-tuned with limited 3D contact data. However, directly applying these models is non-trivial, as they reason “only” in 2D, while human-object contact is inherently 3D. Thus we introduce a novel “Render-Localize-Lift” module that: (1) embeds 3D body and object surfaces in 2D space via multiview rendering, (2) trains a novel multi-view localization model (MV-Loc) to infer contacts in 2D, and (3) lifts these to 3D. Additionally, we propose a new task called Semantic Human Contact estimation, where human contact predictions are conditioned explicitly on object semantics, enabling richer interaction modeling. InteractVLM outperforms existing work on contact estimation and also facilitates 3D reconstruction from an in-the-wild image. To estimate 3D human and object pose, we infer initial body and object meshes, then infer contacts on both of these via InteractVLM, and last exploit these for fitting the meshes to image evidence. Results show that our approach performs promisingly in the wild. Code and models are available at https://interactvlm.is.tue.mpg.de.

Sai Kumar Dwivedi
Sai Kumar Dwivedi
Co-advising with M. J. Black
Dimitrije Antić
Dimitrije Antić
Shashank Tripathi
Shashank Tripathi
PhD of M. J. Black
During our collaboration:
- PhD Scholar @ Meta
- Intern @ EPIC
Omid Taheri
Omid Taheri
Co-advising with M. J. Black
Interned as PhD: Adobe
Finalist for Meta PhD Scholarship
Next: PostDoc @ MPI-IS