In-the-Wild Compliant Manipulation with UMI-FT

Published 15 Jan 2026 in cs.RO | (2601.09988v1)

Abstract: Many manipulation tasks require careful force modulation. With insufficient force the task may fail, while excessive force could cause damage. The high cost, bulky size and fragility of commercial force/torque (F/T) sensors have limited large-scale, force-aware policy learning. We introduce UMI-FT, a handheld data-collection platform that mounts compact, six-axis force/torque sensors on each finger, enabling finger-level wrench measurements alongside RGB, depth, and pose. Using the multimodal data collected from this device, we train an adaptive compliance policy that predicts position targets, grasp force, and stiffness for execution on standard compliance controllers. In evaluations on three contact-rich, force-sensitive tasks (whiteboard wiping, skewering zucchini, and lightbulb insertion), UMI-FT enables policies that reliably regulate external contact forces and internal grasp forces, outperforming baselines that lack compliance or force sensing. UMI-FT offers a scalable path to learning compliant manipulation from in-the-wild demonstrations. We open-source the hardware and software to facilitate broader adoption at:https://umi-ft.github.io/.

Abstract PDF Upgrade to Chat

Summary

The paper introduces UMI-FT, a novel system using finger-level CoinFT sensors to capture precise six-axis force data for compliant manipulation.
It employs deep MLP calibration and transformer-based fusion of RGB, depth, and tactile data to drive adaptive, multimodal control policies.
Experimental results demonstrate robust performance in tasks like whiteboard wiping, zucchini skewering, and lightbulb insertion, outperforming traditional baselines.

In-the-Wild Compliant Manipulation with UMI-FT

Introduction and Motivation

Force modulation is central to dexterous robotic manipulation, particularly when interacting with soft or fragile objects. Conventional F/T sensors have inhibited data-driven policy learning due to cost and structural constraints. This paper introduces UMI-FT, a handheld manipulation interface that leverages highly compact CoinFT sensors mounted on each finger, collecting six-axis wrench information alongside high-frequency RGB, depth, and pose modalities. This architecture enables precise measurement of both external contact forces and internal grasp forces during human demonstrations, addressing major limitations of wrist-based and tactile-only approaches by providing low-latency, finger-level force sensing with high scalability.

Hardware and Calibration

UMI-FT integrates CoinFT sensors and an iPhone Pro in a rigidly coupled structure, supporting seamless data synchronization across modalities and real-time feedback. Sensor placement prioritizes compressive loads to avoid delamination, enabling robust grasping. CoinFT calibration is performed via supervised regression versus a Gamma (ATI) reference sensor, utilizing a deep MLP to map nonlinear capacitance profiles into accurate six-axis forces and torques.

Figure 1: UMI-FT finger calibration methodology, with random force/torque sampling, MLP mapping, and evaluation results on shear force axes.

This architecture achieves low mean squared error (<0.6 N for force, <0.23 Nm for torque), supporting reliable force feedback in manipulation regimes with broad load and disturbance diversity.

Controller Architecture and Adaptive Compliance Policy

UMI-FT controller structure consists of three parallel feedback loops: a slow learned multimodal policy, a high-frequency wrist admittance controller, and a gripper force controller. Multimodal policy inputs include most recent RGB/depth/pose frames and 32-step force sensory history, fused via transformer layers and CNN encoders. Outputs comprise robot end-effector target pose, virtual compliance targets, and explicit stiffness/grasp force references.

Figure 2: Overall controller structure, detailing sensor-to-policy mapping and compliance/ref. output routing.

Figure 3: ACP policy block diagram, showing expanded input and output channels to exploit richer multimodal and force/tactile data.

Grasp force ( $f_G$ ) is regulated by velocity-resolved admittance control with tunable gains, allowing dynamic adaptation to object geometry and environmental changes. Wrist compliance modulates external contacts based on fused wrench measurements from both fingers, transformed to a unified tool coordinate frame.

Empirical Evaluation and Results

The approach is benchmarked against three baselines: Diffusion Policy with Force (no compliance), vanilla Diffusion Policy (position control only), and Diffusion Policy with contact microphone input (audio-tactile, no force control). Tasks encompass whiteboard wiping, skewering soft zucchini slices, and lightbulb insertion—each presenting unique force modulation and generalization challenges.

Whiteboard Wiping

UMI-FT based ACP delivers robust modulation of normal and shear forces, maintaining consistent grasp while wiping across variations in board/eraser pose, height, and object geometry. Success rates reach 92%, with competing baselines (force only, audio, or position control) collapsing due to excessive or insufficient contact force, or failing in object grasp adaptation.

Figure 4: Representative wipe-task policy rollouts, test setup variants, and typical baseline failure modes.

Skewering Zucchini

The platform demonstrates marked improvement in secure grasp maintenance and rotational slip mitigation versus baselines lacking explicit force feedback. ACP achieves 80% success across unseen object (thickness/color/fork), while force-only and position-only policies suffer substantial performance degradation.

Figure 5: Skewering task progression, with task variants and baseline failure analysis.

In extensive "in-the-wild" generalization tests, ACP trained with scene-diverse data attains nearly perfect success (100%); in-lab data alone drops success below 20%.

Figure 6: Representative in-the-wild skewering scenarios with diverse environmental clutter and adaptation.

Lightbulb Insertion

Insertion tasks require both robust grasp and nuanced compliance for haptic search. ACP policy achieves 95% overall success, consistently performing slot location and insertion via controlled contact forces—a regime where vision-only approaches fail completely. Compliance is shown to be critical for alignment and avoiding slip even under increased socket resistance.

Implications and Future Directions

The UMI-FT platform demonstrates scalable, multimodal data collection for compliant manipulation via low-cost, finger-level F/T sensors. Policies trained with this data generalize substantially better to unseen environments, adaptively modulate force, and perform haptic search operations with increased stability versus position/vision or audio-tactile baselines. The open-source hardware/software provides a path toward broader deployment and large-scale policy learning in robot learning research.

Future work includes wireless sensor upgrades, improved sensor mechanical reliability, and expanded utilization of ultrawide visual data. The UMI-FT paradigm paves the way for richer imitation learning from authentic human demonstrations, supporting manipulation tasks with stringent force and generalization demands.

Conclusion

UMI-FT establishes a practical framework for compliant manipulation leveraging scalable, fine-grained force sensing at the fingers. Experimental results substantiate the claim that explicit compliance and grasp force control are critical enablers of robust policy generalization and execution in complex manipulation tasks. The integration of finger-level F/T measurements into adaptive multimodal controllers represents a significant technical advancement for in-the-wild manipulation learning and deployment.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces UMI-FT, a handheld tool that helps robots learn how to use the right amount of force when they touch and move things. Think of tasks like wiping a whiteboard, poking a slice of zucchini onto a skewer, or twisting a lightbulb into a socket: too little force and nothing happens; too much force and you break or drop something. UMI-FT adds tiny, coin-sized “feel” sensors to each finger of a gripper and uses a phone camera to record what’s happening. With this data, the team trains a robot to move like a “smart spring,” pushing just hard enough and gripping just tight enough.

What goals or questions does the paper explore?

The paper sets out to:

Build a low-cost, tough, and compact force sensor for each gripper finger, so robots can “feel” what humans feel during demonstrations.
Collect video, depth, pose, and finger-level force data while a person shows how to do a task.
Train a robot policy (a learned set of rules) that decides where to move, how tight to grip, and how “stiff” or “soft” to act, so the robot can safely handle contact-rich tasks.
Test this on real tasks—whiteboard wiping, skewering zucchini, and inserting a lightbulb—and compare it to methods that don’t use force feedback or compliance.
Share the hardware and software openly so others can build and use it.

How did they do it? (Methods in simple terms)

Hardware: teaching robots to “feel” with their fingers

Each gripper finger gets a tiny, six-axis force/torque sensor (called CoinFT). “Six-axis” means it senses pushes and pulls in 3 directions and twists in 3 ways—like knowing how hard and in which direction someone presses or twists.
An iPhone 15 Pro is attached to the tool to record RGB video, depth (how far things are), and the tool’s pose (where it is and how it’s oriented).
The sensors run fast (hundreds of times per second), so the robot gets quick, detailed signals about contact.
They carefully redesigned the fingers so the sensors mostly feel compressive forces (pushing), which they handle well.

Calibration: teaching sensors what their numbers mean

Raw sensor readings are like numbers from a bathroom scale that hasn’t been set yet.
The team pressed and twisted the fingers in controlled ways while using a trusted commercial sensor as ground truth.
They trained a small neural network (a multi-layer perceptron, or MLP) to translate the raw readings (capacitance) into real forces and torques. This makes the finger sensors accurate and consistent.

Robot control: moving like a “smart spring”

The robot uses a three-part control setup:
- Where to move next,
- How tight to grip,
- How stiff or soft to be.
- 2) A wrist compliance controller makes the arm act like a spring-damper: if the fingers feel a push, the robot yields just enough, like a soft but steady hand.
- 3) A grasp force controller adjusts how fast the gripper closes or opens to reach a target grip force (strong enough to hold, not so strong it slips or crushes).
“Compliance” here means the robot doesn’t act rigid. It flexes slightly under force—like a shock absorber—so it can stay stable while sliding along surfaces, aligning parts, or keeping contact during rotation.

Learning from demonstrations

People perform tasks holding the UMI-FT tool. The system records video, depth, pose, and finger forces.
The policy uses:
- A vision encoder (ViT) for RGB and depth images,
- A force encoder for recent finger forces,
- A transformer to fuse these signals,
- A diffusion-based action generator to output motion targets, stiffness, and grip commands.
At runtime, the policy’s outputs drive the compliance controllers, making the robot move smoothly and grip intelligently.

What did they find, and why does it matter?

The team tested three contact-heavy tasks and compared their method (UMI-FT + compliance) to baselines that lacked compliance or fine force sensing.

Whiteboard wiping:
- Their method reliably wiped clean across new positions, heights, drawings, and a narrower eraser.
- Policies without compliance often used too much or too little force, failing the wipe or losing the eraser.
- Using only a contact microphone (audio) detected touch events but couldn’t control continuous force, causing failures.
- Bottom line: sensing and regulating both grip and contact force makes wiping effective and safe.
Skewering zucchini:
- Grip force regulation was crucial. Without it, the zucchini slipped or rotated out of grasp during puncture.
- With finger-level force sensing, success rose noticeably, including with thicker slices and a fork (harder to puncture).
- Training on “in-the-wild” data (many different scenes) dramatically improved generalization: the robot handled unseen clutter and layouts much better.
- Bottom line: measuring and controlling how tight you hold an item prevents slips during forceful actions, and varied training data boosts robustness.
Lightbulb insertion:
- Compliance enabled “haptic search”: staying in gentle contact while rotating, so the bayonet pin slides into the slot even when the camera view is blocked.
- Baselines struggled to keep contact; they slipped, rotated too early, or missed alignment.
- The method also handled a stiffer socket (needing more push) by modulating force without overdoing it.
- Bottom line: smart, spring-like behavior is essential when vision is occluded and precise alignment depends on feel.

Across tasks, the key wins were:

Finger-level force sensing helps the robot understand both outside contact forces and inside grip forces—just like a person’s hand feels.
Compliance control keeps contact stable and smooth, especially when sliding, aligning, or rotating under force.
The system is low-cost and robust (around $10 per sensor), so it’s more practical to scale than expensive, fragile wrist sensors.

What’s the impact?

This work shows a practical path to teaching robots “in the wild”—outside polished lab setups—to handle everyday, force-sensitive tasks safely and reliably. By combining:

Affordable, tough finger sensors,
Multimodal data (video, depth, pose, force),
A learned policy that controls motion, grip, and stiffness, robots can:
Wipe surfaces without damage,
Skewer soft foods without dropping them,
Insert parts like lightbulbs by feel, even when vision isn’t enough.

Because the hardware and software are open-source (https://umi-ft.github.io/), other teams can build on this to tackle home chores, assembly lines, kitchens, and more. In short, teaching robots to “feel” at their fingertips—and act like smart springs—makes them better partners for real-world manipulation.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

Below is a consolidated, actionable list of what remains uncertain or unexplored in the paper and could guide future research.

Sensing and Hardware

Long-term stability of CoinFT: No characterization of drift, hysteresis, thermal sensitivity, humidity effects, or aging under prolonged, real-world use; unclear recalibration frequency needed to maintain accuracy.
Mechanical robustness boundaries: Tensile delamination and pillar/bonding failure modes are acknowledged but not systematically quantified (number of impact cycles to failure, allowable off-axis moments, overload protection, survivability after drops).
Force/torque range and saturation: Sensor calibrated up to 25 N (normal), 20 N (shear), 500 mNm—no study of performance near saturation, soft-clipping strategies, or extension to higher-force tasks.
Bandwidth and latency: 360 Hz sensing but no end-to-end latency/bandwidth characterization through the full pipeline (sensor → conversion MLP → transport → controller); impact of latency and jitter on stability/passivity not analyzed.
Contact localization along the finger: UMI-FT measures load through the structural path but cannot localize contact on the finger surface; unclear how ambiguity in contact location affects control or learning, and whether hybridization with tactile arrays would help.
Structural compliance and frame transforms: Rigid-body adjoint transforms assume negligible finger deformation; no quantification of how finger compliance distorts wrench transformation to the tool frame and affects wrist-level force feedback.
Wireless operation: Tetherless sensing is proposed but not realized; unknown effects of wireless links (Bluetooth) on reliability, timing, and safety-critical control loops.
Sensor-to-sensor and unit-to-unit consistency: While “standard and consistent measurements” are claimed, there is no cross-unit inter-calibration study quantifying variance across multiple devices or how well a model trained on one unit transfers to others.

Calibration and Metrology

Calibration generalization: MLP-based capacitance-to-wrench mapping is trained per finger with a specific finger geometry; unclear transferability to other fingers/variants and sensitivity to mechanical changes (e.g., after wear).
Low-load and near-zero performance: No quantification of resolution, noise floor, or thresholding behavior at very small forces/torques relevant to delicate manipulation.
Cross-axis coupling: The calibration errors reported per axis lack analysis of coupling between axes; no decoupling metrics, linearity plots, or hysteresis curves.
Field recalibration workflows: Dependence on an ATI Gamma for ground-truth calibration limits scalability; no lightweight, in-situ, low-cost recalibration protocol (e.g., self-calibration with gravity, known fixtures, or self-supervised schemes).
Time synchronization accuracy: Training relies on post hoc synchronization; no error bounds on timestamp alignment between iPhone (60/30 Hz) and CoinFT (360 Hz), nor analysis of how sync errors degrade learning/performance.

Control and Modeling

Internal vs. external force decomposition: Grasp force estimated by averaging along the grasp axis; no formal validation against ground-truth internal/external force separation, particularly under asymmetric contacts, object torque, or multi-contact scenarios.
Stability/passivity analysis: Admittance control with measurement noise, delays, and compliance at the finger level lacks formal stability or passivity guarantees; controller gain selection guidelines and robustness margins are not provided.
Full 6-DoF compliance: Experiments use translational compliance only; implications for tasks requiring rotational compliance or torque regulation (e.g., tight-tolerance insertions) are untested.
Multi-contact ambiguity: When both fingers contact the environment or when object-environment-finger tri-contacts occur, the summed wrench feedback may confound control; no strategy to disambiguate or prioritize contacts.
Gripper control bandwidth: Grasp force controller runs at 30 Hz; no analysis of whether this rate suffices for slip prevention under fast transients or how performance scales with gripper dynamics and friction changes.
Environment parameter estimation: The policy selects stiffness scalars but there is no online estimation of environment stiffness/friction; unclear whether adaptive identification would improve safety, speed, or success on novel surfaces.

Learning and Policy

Ground-truth label quality: Stiffness and virtual target labels are derived from post-processing; robustness of these labels to sensor noise, sync errors, and modeling assumptions (spring approximation) is not studied.
Representation limits: Stiffness is encoded via a scalar rather than a full 6×6 matrix; open question whether richer compliance parameterization (anisotropy, damping/inertia) would improve performance/generalization.
Rate and timing of the learned loop: The policy is the slowest loop, but its actual frequency and impact on task performance/stability are not reported; optimal policy/control co-design remains unexplored.
Visual encoders and depth: The iPhone depth quality at close range and under different materials is not assessed; no ablation on the value of depth vs RGB, or ultrawide camera utility for occlusion-heavy or larger workspaces.
Data efficiency and scaling: No study of sample complexity, performance vs number/diversity of demos, or transfer across tasks/objects without retraining.
Cross-user generalization: Demonstrations appear to come from a limited set of users; no evaluation of how inter-operator variability affects learned policies or whether user-agnostic models are feasible.
Comparative baselines: No comparison to a strong wrist F/T baseline with compliance control (e.g., ATI at the wrist) or to state-of-the-art visuotactile methods; difficult to isolate the unique contribution of per-finger F/T.
Slip detection and mitigation: Grasp force control does not explicitly detect slip; integrating force transients, torsional cues, or micro-vibration for slip-aware policies remains open.

Evaluation Scope and External Validity

Task diversity and difficulty: Evaluations cover three tasks with modest force ranges; no tests on high-precision assembly (e.g., tight peg-in-hole), brittle objects, highly dynamic contacts, or tools with complex kinematics.
Statistical rigor: Small numbers of rollouts per condition; no confidence intervals, variance analysis, or significance testing to support robustness claims.
Robustness to environment variation: Limited exploration of frictional variability (wet/dirty surfaces), contact materials, lighting extremes, and occlusions; effect of ARKit pose drift on policy performance unquantified.
Generalization breadth: In-the-wild generalization is only demonstrated for one task (zucchini); unclear transfer to other tasks or to unseen robots/end-effectors.
Safety guarantees: Safety thresholds rely on CoinFT accuracy; no independent safety monitors, fault detection for sensor failures, or formal safety certification pathway.

System Integration and Practicality

Assembly, calibration, and deployment cost/time: While BOM per sensor is low, the overall system assembly complexity, calibration time per unit, and maintenance burden for large-scale deployments are not quantified.
Multi-robot and cross-hardware portability: Results are on a UR5e and WSG50; no evidence of portability to other arms/grippers, different control stacks, or non-industrial platforms.
Open-source reproducibility: Hardware/software are open-sourced, but reproducibility studies across labs (with their own prints, materials, phones) are absent.

Open Technical Extensions

Hybrid sensing: How to fuse per-finger F/T with tactile arrays or vision-based tactile for contact localization and slip detection without overwhelming compute and latency budgets.
Self-calibration/self-diagnostics: Methods for on-robot, task-driven calibration, anomaly detection, and automatic re-zeroing to maintain accuracy without lab-grade references.
Learning stability-aware controllers: Joint learning of policy and low-level compliance gains with formal stability constraints or passivity layers.
6-DoF impedance/admittance learning: Policies predicting full 6-DoF stiffness/damping/inertia and their scheduling across phases of contact-rich tasks.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

The following applications can be deployed using the paper’s open-sourced hardware/software and standard industrial robots/grippers, leveraging per-finger 6-axis force/torque sensing plus adaptive compliance policies.

Force-aware assembly and insertion on cobots (manufacturing, electronics, automotive)
- Use cases: bayonet-twist insertions (lightbulb-like), push-fit and snap-fit connectors, cable seating, trim panel clips, screw starting with controlled preload.
- Tools/workflows: UMI-FT kit for data capture; ACP-based compliance controller for admittance; grasp-force controller plugin for common grippers (e.g., WSG); task-specific “skill packs” trained from in-the-wild demos.
- Assumptions/dependencies: Backdrivable or admittance-capable arms; gripper with velocity/force control; calibration per finger; forces within CoinFT range; basic vision for pose.
Reliable surface contact tasks (facilities, labs, manufacturing QA, service robotics)
- Use cases: whiteboard/monitor wiping, adhesive application, deburring/polishing light passes, paint touch-ups with pressure limits, panel probing/gauging.
- Tools/workflows: Pressure-consistent wiping/polishing policies; force-limit watchdog using per-finger F/T to prevent overloading.
- Assumptions/dependencies: Known max contact force; simple path/coverage planner; environmental compliance within safe limits.
Food handling and prep pilots (food service, CPG R&D, test kitchens)
- Use cases: skewering/piercing, gentle grasp-and-place of soft produce, controlled pressing/spreading.
- Tools/workflows: Grasp-force regulation to prevent slip; small library of food-contact skills trained via UMI-FT.
- Assumptions/dependencies: Food-safe materials and sanitation; variable object compliance; policies tuned for slip detection/avoidance.
Teach-by-demonstration for force-critical skills (industrial engineering, integration)
- Use cases: Quickly programming cobots for new insertions, latching/unlatching, compliant alignment/homing.
- Tools/workflows: Handheld UMI-FT for on-site demos; auto-labeling of stiffness/virtual targets; one-click policy training to deploy on the line.
- Assumptions/dependencies: Sufficient demonstration coverage; line cycle-time constraints; process tolerances similar to demo conditions.
Safety monitoring and force governance (EHS, cobot safety)
- Use cases: Real-time enforcement of per-finger force limits; detection of pinch or crush risk at fingertips.
- Tools/workflows: Software watchdog reading CoinFT streams; logs for post-incident analysis.
- Assumptions/dependencies: Reliable synchronization; validated thresholds; integration with robot stop/safe-stop.
Academic datasets and benchmarks for compliant manipulation (academia, consortia)
- Use cases: Curating in-the-wild, multimodal RGB–depth–pose–per-finger F/T datasets; benchmarking haptic search and grasp-force control.
- Tools/workflows: Open-source pipeline for calibration, alignment, and policy training; standardized tasks (wiping, insertion, piercing).
- Assumptions/dependencies: Consistent calibration across labs; public dataset licensing; reproducible hardware builds.
Gripper firmware/software upgrade for grasp-force control (robotics, gripper OEMs)
- Use cases: Drop-in “force-mode” for position-only grippers using per-finger F/T feedback; slip reduction in pick-and-place.
- Tools/workflows: v-G (velocity-resolved admittance) grasp controller; SDK for integrating CoinFT into existing gripper stacks.
- Assumptions/dependencies: Access to gripper velocity commands; stable closed-loop rates (~30 Hz or higher).
Educational kits for haptics and compliance (education, workforce training)
- Use cases: Teaching compliance control, force tuning, and multimodal policy learning in robotics courses or bootcamps.
- Tools/workflows: Low-cost UMI-FT build; lab exercises on calibration, admittance, and policy ablations.
- Assumptions/dependencies: Basic machine shop/3D printing; iPhone/ARKit; safety protocols for contact tasks.

Long-Term Applications

The following opportunities are promising but require further R&D, scaling, robustness engineering, and/or regulatory progress.

General-purpose home service robots with compliant manipulation (consumer robotics)
- Use cases: Plug insertion/removal, appliance knobs/switches, faucet/valve operation, safe cleaning of diverse surfaces, lightbulb replacement.
- Tools/products: Consumer-grade compliant gripper with embedded per-finger F/T; “compliance policy studio” for households.
- Dependencies: Cost/robustness at scale; robust perception in clutter; long-horizon task planning; household safety certification.
Healthcare and assistive manipulation (healthcare, eldercare)
- Use cases: Dressing assistance, catheter/tube insertion aids, orthosis handling, gentle ADLs (activities of daily living).
- Tools/products: Sterilizable compliant end-effectors; clinical-grade calibration/traceability; intent-aware policies with force caps.
- Dependencies: Strict regulatory approval; redundant safety; fail-safe behaviors; high reliability beyond lab demos.
Full 6D compliance for precision assembly and finishing (advanced manufacturing)
- Use cases: Peg-in-hole with tight tolerances, press-fit bearings, precision deburring/polishing with torque control.
- Tools/products: Expanded ACP to rotational DoFs; torque-aware grasp modulation; multi-sensor arrays per finger.
- Dependencies: Higher-rate controllers; improved torque accuracy; sensor durability under moments and tensile loads.
Dexterous hands with pervasive per-link F/T (robotic hands, logistics)
- Use cases: In-hand manipulation with stable force sharing; slip-robust pick of deformable/fragile items; tool use requiring fingertip torques.
- Tools/products: Multi-finger grippers integrating several CoinFT-class sensors; learned grasp distribution and compliance profiles.
- Dependencies: Mechanical robustness to tensile/delamination; wiring/packaging; controller complexity and bandwidth.
Shared autonomy and teleoperation with haptic mirroring (teleop, remote service)
- Use cases: Remote maintenance/repair with force-aware execution; assistive teleop that “fills in” compliant micro-motions.
- Tools/products: Bilateral haptic devices mapped to fingertip F/T; ACP policies that stabilize and augment operator inputs.
- Dependencies: Low-latency comms; high-fidelity haptic rendering; safety envelopes for force amplification.
Standardized, in-the-wild multimodal force datasets and evaluation suites (ecosystem, policy)
- Use cases: Cross-institution benchmarks for compliant tasks; policy evaluation protocols tied to force safety limits.
- Tools/products: Dataset hubs; task taxonomies; compliance metrics and certification-like tests.
- Dependencies: Community coordination; IP and data governance; hardware standardization for comparability.
Wireless, battery-powered data collection at scale (platform evolution)
- Use cases: Large-scale field data capture without tethers; crowd-sourced demos in real homes/workplaces.
- Tools/products: Bluetooth/Thread-enabled CoinFT microcontrollers; phone-only logging apps with secure sync.
- Dependencies: Robust time sync; energy management; BLE bandwidth and reliability; privacy/consent frameworks.
Safety certification frameworks leveraging per-finger force logs (regulatory, insurance)
- Use cases: Evidence-based compliance for cobot deployments near humans; insurance underwriting informed by force profiles.
- Tools/products: Force audit trails; conformance tests for grasp and contact limits; continuous safety monitoring services.
- Dependencies: Standards development (e.g., ISO/ANSI) updates; trustworthy logging; tamper resistance.
Skill libraries for sector-specific compliant tasks (vertical solutions)
- Use cases: Pretrained “haptic search” modules for sockets/connectors; “gentle wipe” for sensitive displays; “pierce-and-place” for food assembly lines.
- Tools/products: Marketplace of ACP policies; adapters for popular robot stacks; simulation-to-real validation packs.
- Dependencies: Domain adaptation; interoperability across robot brands; maintenance of skill libraries over product lifecycles.
Energy and utilities field tasks with fragile infrastructure (energy, utilities)
- Use cases: Operating aging valves/switchgear with force caps; manipulating glass/ceramic indicators.
- Tools/products: Ruggedized compliant end-effectors; on-site teach-by-demonstration with UMI-FT.
- Dependencies: Environmental robustness (dust, moisture, temperature); intrinsic safety; operator training.

Notes on Cross-Cutting Assumptions

Hardware: Admittance/impedance-capable robots; grippers that accept velocity/force commands; sensor calibration stability; mechanical designs that minimize tensile loading on sensors.
Software: Real-time control loops (robot ~500 Hz, gripper ~30 Hz); accurate frame transforms; synchronized multimodal logging; policies trained with adequate task diversity, including in-the-wild data.
Safety: Enforced force thresholds; watchdogs for out-of-range contact; task-level guard conditions.
Scaling: Manufacturing of robust CoinFT-class sensors; wireless comms for field capture; standardized datasets/protocols to enable reproducibility and certification.

View Paper Prompt View All Prompts

Glossary

Adjoint transformation matrix: A matrix that transforms wrenches or twists between coordinate frames in rigid-body mechanics. "where ${\rm Ad}_{S_1T}, {\rm Ad}_{S_2T}$ denotes the adjoint transformation matrix from one of the sensor frames to the robot tool frame."
Admittance control: A compliance control strategy that converts measured force into motion on robots with stiff actuation. "The wrist compliance controller implements 6D task space admittance control to move the robot arm like a virtual spring-mass-damper system."
ARKit: Apple’s augmented reality framework used for device pose and depth sensing. "It provides synchronized main RGB ..., depth, and pose data via ARKit."
ArUco markers: Fiducial markers used for pose estimation and measurement in vision systems. "to better capture the finger workspace and ArUco markers (for measuring gripper width)."
Backdrivable robots: Robots whose actuators can be moved by external forces, enabling compliant behaviors. "specific compliance profiles can be achieved with standard impedance control on backdrivable robots"
Bayonet pin: A mechanical feature on bulbs that mates with a socket via insertion and rotation. "align the bayonet pins with the slot on the socket"
Capacitive sensing: Measuring changes in capacitance to infer mechanical quantities like force or deformation. "due to both its mechanical properties and the physics of capacitive sensing"
Causal convolutional network: A temporal CNN that respects time order for sequence encoding. "Force/torque measurements from each CoinFT are encoded using a causal convolutional network"
CoinFT: A compact, capacitive six-axis force/torque sensor designed for robotic fingertips. "The compact design of CoinFT enables a sensor to be mounted at each finger."
Compliance: The elastic response of a system to external forces, often parameterized by stiffness and damping. "Compliance refers to the elastic behavior of a physical body under external force"
Compliance controller: A controller that enforces a desired compliance profile (e.g., stiffness) during interaction. "predicts position targets, grasp force, and stiffness for execution on standard compliance controllers."
Delamination: Separation of bonded layers in a layered structure under stress. "long moment arms can lead to delamination of the sensor."
Dielectric layer: The insulating layer in a capacitive sensor that affects sensitivity and range. "The dielectric layer consists of pillars with oval cross section"
Diffusion Policy: A visuomotor policy framework that generates actions via diffusion models. "The original diffusion policy \cite{chi2024diffusionpolicy} without force observation."
End-effector: The tool or gripper mounted at the robot’s wrist that interacts with the environment. "the robot end-effector retains the same design as the handheld UMI-FT"
Fin-ray finger design: A compliant finger structure inspired by fish fins that distributes load for grasping. "Significant modifications were made to the original fin-ray finger design of the UMI to accommodate the constraints of CoinFT"
Force/torque (F/T) sensor: A sensor that measures forces and torques along multiple axes. "The first employs a wrist-mounted commercial six-axis F/T sensor"
Grasp force modulation: Adjusting the gripper’s applied force to maintain secure yet safe contact. "benefits of compliance control with grasp force modulation."
Haptic feedback: Tactile and force sensations perceived during manipulation. "due to their flexibility and natural haptic feedback."
Haptic search: Using controlled contact and touch sensing to locate features not reliably visible. "Compliance control is critical for haptic search."
Impedance control: A compliance strategy that commands force/torque based on motion error on backdrivable robots. "specific compliance profiles can be achieved with standard impedance control on backdrivable robots"
Kinesthetic teaching: Providing demonstrations by physically guiding the robot to record motion and force. "the robot tool frame (TCP) was set close to the robot flange for comfortable kinesthetic teaching"
Moment arm: The perpendicular distance from a force’s line of action to the rotation axis, scaling torque. "excessive moments caused by grasp forces and long moment arms can lead to delamination of the sensor."
Proprioception: The robot’s internal sensing of its own states (e.g., joint angles, end-effector pose). "Sensor modalities are listed on the left, with proprioception omitted for clarity."
Spring-mass-damper system: A canonical compliant model used to describe motion under forces. "move the robot arm like a virtual spring-mass-damper system"
Stiffness matrix: A matrix encoding directional resistance to displacement under force in task space. "The policy outputs the position target for the robot, the stiffness matrix, the reference grasp force, and the gripper action"
TCP (Tool Center Point): The reference point on the end-effector used for control and measurement. "we found it helpful to set TCP to the center of the two fingertips."
Tool frame: The coordinate frame attached to the robot’s tool used for control and sensing. "measured wrenches can be transformed into any desired coordinate frame (typically the robot tool frame)."
Ultrawide RGB: A wide field-of-view color imaging stream from the device’s camera. "It provides synchronized main RGB (approximately 80° diagonal FoV), ultrawide RGB (120° FoV), depth, and pose data via ARKit."
Vision-based tactile sensors: Tactile sensors that use embedded cameras to infer contact geometry and forces. "Vision-based tactile sensors have also grown in popularity due to their high resolution and sensitivity"
Virtual spring target: A reference point in a compliant controller that the system tracks with spring-like behavior. "We use the external force measurement in a standard compliance controller to track a virtual spring target."
Virtual target pose: The actual pose command used by the compliance controller after policy decoding. "Virtual target pose: another 9D pose vector representing the actual set target of the compliance controller."
Wrench: A 6D vector combining forces and torques acting on a body. "UMI-FT uses coin-sized force/torque sensors to record the wrench at each compliant fingertip."

In-the-Wild Compliant Manipulation with UMI-FT

Summary

In-the-Wild Compliant Manipulation with UMI-FT

Introduction and Motivation

Hardware and Calibration

Controller Architecture and Adaptive Compliance Policy

Empirical Evaluation and Results

Whiteboard Wiping

Skewering Zucchini

Lightbulb Insertion

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What goals or questions does the paper explore?

How did they do it? (Methods in simple terms)

Hardware: teaching robots to “feel” with their fingers

Calibration: teaching sensors what their numbers mean

Robot control: moving like a “smart spring”

Learning from demonstrations

What did they find, and why does it matter?

What’s the impact?

Knowledge Gaps

Knowledge Gaps, Limitations, and Open Questions

Sensing and Hardware

Calibration and Metrology

Control and Modeling

Learning and Policy

Evaluation Scope and External Validity

System Integration and Practicality

Open Technical Extensions

Practical Applications

Immediate Applications

Long-Term Applications

Notes on Cross-Cutting Assumptions

Glossary

Open Problems

Continue Learning

Related Papers

Authors (8)

Collections

GitHub

Tweets