⚖️ Unified force–position 🪶 Compliant grasping 🤲 Cross-embodiment 🔗 Long-horizon

At a Glance

To pick up a fragile object, a robot hand has to control how hard each fingertip presses, not just where the fingers go. Press too lightly and it slips, press too hard and the egg cracks. And every dexterous hand feels and produces that force differently, so a skill learned on one hand rarely works on another.

Our research fixes this by making contact itself speak a common language. We turn each hand's raw effort readings into real physical force in newtons, so "I'm squeezing this hard" means the same thing on every hand. Motion is shared the same way. The result is one unified force–position interface and one policy that transfers to new hands and tasks with little retuning.

Because the policy can feel how hard it presses, it holds compliant objects without crushing them. And because those grasps stay stable, they become reusable building blocks for long-horizon jobs like reach, lift, carry, place, and hand over, with no new policy per task.
0.44 → 0.71
Mean reach-and-lift success
vs. DiT-flow baseline
3 + 1
Structurally different hands,
plus one unseen hand — zero-shot
8
Objects: rigid tools to
eggs, buns & stacked cups
+0.53
Biggest gain on the slim marker —
the most occluded, contact-critical grasp
Overview figure

A unified force–position encoding: a VLM grounds instructions into keypoints, a state machine dispatches each phase to a contact-aware primitive, and the same policy drives structurally different hands.

Setup

One Unitree G1 humanoid, four dexterous hands. The same robot body and the same three embodiments are used across every task.

Inspire
5 fingers · 12 DoF (6 actuated)
mimic coupling
Wuji
5 fingers · 20 DoF
full actuation
Wuji12
12 DoF · ring & little fingers
removed 🚫
Wuji15
15 DoF · little finger removed
unseen at training · zero-shot

🧤 50 teleoperated demos per hand (Manus gloves + Vive tracker + foot-pedal clutch)

Method Video

Compliant Grasping Across Hands

The same pipeline on Inspire, Wuji, and Wuji12 — rigid tools and fragile deformables.

Rigid & Compliant Objects

Rigid object pick

Compliant object pick

Unseen Hands

Zero-shot transfer to unseen Wuji15 hand

Handover

Hand over tools

Long Horizon

Place fruit (long-horizon)

Without Force — Failure

Failure without force feedback

Results

The same unified force–position interface, evaluated across embodiments to test calibration gains, force-channel causality, and zero-shot transfer.

Cross-Embodiment Transfer

Mean reach-and-lift success per hand. MARC improves every embodiment — and carries over zero-shot to a hand it never trained on. Cross-hand mean rises from 0.44 to 0.71, with the largest jump on the contact-critical marker (Inspire 0.2 → 1.0, Wuji 0.1 → 0.9).

0 0.25 0.50 0.75 1.0 0.54 0.84 ▲ +0.30 Inspire 12 DoF, coupled 0.48 0.79 ▲ +0.31 Wuji 20 DoF, full 0.29 0.51 ▲ +0.22 Wuji₁₂ no ring & little 0.45 zero-shot Wuji₁₅ ✨ unseen at training sDiT baseline MARC (ours)

Seen hands: mean over all objects, 10 trials each. Wuji₁₅: no retraining, evaluated on cup, bun, fruit & ice scoop — within 0.1 of in-distribution Wuji on its objects.

Where Compliance Matters Most

Mean success per object. The biggest gains are exactly the contact-critical cases: the occluded slim marker, and compliant objects with a narrow grip window.

sDiT baseline MARC (ours)
🖊️ Marker
0.20
0.73
🥤 Stacked cups
0.43
0.83
🥐 Brioche bun
0.65
0.90
🥢 Tongs
0.53
0.80
🍎 Toy fruit
0.53
0.77
🥚 Egg
0.50
0.73
🥄 Ice cream scoop
0.57
0.70
🍪 Cookie
0.17
0.37
+0.27
Every hand improves. Inspire 0.54 → 0.84, Wuji 0.48 → 0.79, Wuji12 0.29 → 0.51.
0.75 → 0.48
The force channel is the gain. Ablating calibrated contact collapses performance back to the baseline — not architecture, not masking.
zero-shot
Unseen 15-DoF hand. Within 0.1 of in-distribution Wuji numbers, with no retraining at all.

Full per-object tables and ablations are in the paper.

Abstract

Dexterous grasping depends on contact regulation, not motion alone. Stable manipulation requires fingers to maintain appropriate object loading as contacts slip, deform, or become visually occluded. Existing cross-embodiment dexterous policies unify motion through retargeted hand poses or latent actions, but force feedback remains tied to each hand's sensing and actuation, limiting transfer. This work introduces a cross-embodiment force-position interface for contact-aware manipulation across heterogeneous dexterous hands. Motion intent is represented in a shared hand-pose latent, while each hand's effort signal is calibrated through system identification into physical joint torque in N·m. These torques are mapped to fingertip forces and compact per-finger load descriptors, giving the policy comparable observations of where the hand should move and how the object is loaded. Using this interface, a flow-matching visuomotor policy is trained on vision, proprioception, and calibrated contact, with structured visual masking that encourages reliance on force under grasp-relevant occlusion. The same calibrated signal drives a hybrid force-position controller for demonstration collection and execution, keeping force targets consistent across training and deployment. Experiments across structurally different hands show that calibrated contact feedback enables transferable compliant grasping, with learned primitives reusable in long-horizon manipulation pipelines.

BibTeX

@misc{atar2026transferringcontactjustmotion,
      title={Transferring Contact, Not Just Motion: Compliant Grasping Across Dexterous Hands}, 
      author={Soofiyan Atar and Yao-Ting Huang and Michael Yip},
      year={2026},
      eprint={2606.15516},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2606.15516}, 
}