To pick up a fragile object, a robot hand has to control how hard each fingertip presses, not just where the fingers go. Press too lightly and it slips, press too hard and the egg cracks. And every dexterous hand feels and produces that force differently, so a skill learned on one hand rarely works on another.
Our research fixes this by making contact itself speak a common language. We turn each hand's raw effort readings into real physical force in newtons, so "I'm squeezing this hard" means the same thing on every hand. Motion is shared the same way. The result is one unified force–position interface and one policy that transfers to new hands and tasks with little retuning.
A unified force–position encoding: a VLM grounds instructions into keypoints, a state machine dispatches each phase to a contact-aware primitive, and the same policy drives structurally different hands.
One Unitree G1 humanoid, four dexterous hands. The same robot body and the same three embodiments are used across every task.
🧤 50 teleoperated demos per hand (Manus gloves + Vive tracker + foot-pedal clutch)
The same pipeline on Inspire, Wuji, and Wuji12 — rigid tools and fragile deformables.
Rigid object pick
Compliant object pick
Zero-shot transfer to unseen Wuji15 hand
Hand over tools
Place fruit (long-horizon)
Failure without force feedback
The same unified force–position interface, evaluated across embodiments to test calibration gains, force-channel causality, and zero-shot transfer.
Mean reach-and-lift success per hand. MARC improves every embodiment — and carries over zero-shot to a hand it never trained on. Cross-hand mean rises from 0.44 to 0.71, with the largest jump on the contact-critical marker (Inspire 0.2 → 1.0, Wuji 0.1 → 0.9).
Seen hands: mean over all objects, 10 trials each. Wuji₁₅: no retraining, evaluated on cup, bun, fruit & ice scoop — within 0.1 of in-distribution Wuji on its objects.
Mean success per object. The biggest gains are exactly the contact-critical cases: the occluded slim marker, and compliant objects with a narrow grip window.
Full per-object tables and ablations are in the paper.
Dexterous grasping depends on contact regulation, not motion alone. Stable manipulation requires fingers to maintain appropriate object loading as contacts slip, deform, or become visually occluded. Existing cross-embodiment dexterous policies unify motion through retargeted hand poses or latent actions, but force feedback remains tied to each hand's sensing and actuation, limiting transfer. This work introduces a cross-embodiment force-position interface for contact-aware manipulation across heterogeneous dexterous hands. Motion intent is represented in a shared hand-pose latent, while each hand's effort signal is calibrated through system identification into physical joint torque in N·m. These torques are mapped to fingertip forces and compact per-finger load descriptors, giving the policy comparable observations of where the hand should move and how the object is loaded. Using this interface, a flow-matching visuomotor policy is trained on vision, proprioception, and calibrated contact, with structured visual masking that encourages reliance on force under grasp-relevant occlusion. The same calibrated signal drives a hybrid force-position controller for demonstration collection and execution, keeping force targets consistent across training and deployment. Experiments across structurally different hands show that calibrated contact feedback enables transferable compliant grasping, with learned primitives reusable in long-horizon manipulation pipelines.
@misc{atar2026transferringcontactjustmotion,
title={Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands},
author={Soofiyan Atar and Yao-Ting Huang and Michael Yip},
year={2026},
eprint={2606.15516},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2606.15516},
}