1 minute — Trained on a single NVIDIA RTX 3090 GPU

Free-motion data required

10 minutes — No contact or force sensor data needed during collection

Error reduction vs. FILIC (contact)

87.6% — L1 torque error: NEXT 0.547 Nm vs. FILIC 4.395 Nm

Task progress improvement

>17% — FIRST vs. best prior force-aware policy across 5 tasks

$2,500 — AgileX Piper — vs. ~$30,000 for a Franka with built-in sensors

User study participants

20 — NEXT-based teleoperation rated comparable to dedicated-sensor baseline

FACTR 2: Force Sensing for Cheap Robot Arms

Think about the last time you tightened a screw, slid a drawer shut, or pressed a plug into a socket. You weren't thinking about force. Your fingers were. They registered resistance, slippage, and the subtle click of contact, and your hands responded without conscious thought. That feedback loop is so fundamental to dexterous manipulation that you barely notice it exists — until it's absent. Give a robot the same task without that sense of touch, and it fails in surprisingly human ways: pushing too hard, missing alignment by a millimeter, breaking brittle objects, or jamming plugs in crooked.

Force sensing is not a luxury in robotics. It's a prerequisite for anything that involves contact. And yet the vast majority of affordable robot arms — the kind that research labs, small companies, and university students actually work with — ship without it. The hardware to add it properly costs more than the arm itself.

A team at Carnegie Mellon University, in a paper published in June 2025, may have found a way around this bottleneck. Their system, FACTR 2, gives commodity arms a convincing sense of touch using nothing more than motor current readings and ten minutes of free-space motion. No added sensors. No expensive calibration rigs. No expert knowledge of the robot's mechanical properties. Just data, a small neural network, and one minute of training time (Oh et al., 2025).

The Science

The core insight behind FACTR 2 is deceptively simple: if you know what a robot's motors should be doing when nothing is touching the arm, then anything extra those motors are working against must be an external force. The gap between predicted free-space effort and measured actual effort is your contact signal.

Roboticists have tried this idea before. The classic approach, called a disturbance observer, builds an analytical model of the robot's dynamics — its mass distribution, gravity, friction, joint stiffness — and uses that model to predict expected motor torques. The residual, the difference between prediction and measurement, is attributed to contact. The problem is that real robots are messy. They have nonlinear friction, temperature-dependent drive behavior, hysteresis in the gearboxes, and tiny deadband zones where motors don't respond to small commands. Analytical models almost never capture all of this. The "residual" ends up swamped by modeling errors, especially on cheaper arms built to tighter tolerances and looser specifications.

NEXT — Neural External Torque Estimation — sidesteps the modeling problem entirely. Instead of deriving a physics model from first principles, NEXT learns one from data. Specifically, it trains a recurrent neural network (an LSTM, a type of sequence model good at capturing time-dependent patterns) to predict the motor torque each joint requires for any given movement in free space. The inputs are a short history of joint positions, joint velocities, and the gap between commanded and actual joint positions — all readings that any modern robot arm already provides. The training data is collected without any contact: just wave the arm around for ten minutes, recording what the motors do.

$\overset{τ}{^}_{ext} = τ_{m} - \overset{τ}{^}_{f}$

At deployment, the math is simple. Measure the actual motor torque $τ_{m}$ (inferred from motor current). Subtract the network's predicted free-space torque $\overset{τ}{^}_{f}$ . The difference is your external torque estimate $\overset{τ}{^}_{ext}$ — a signal that spikes when the robot touches something and stays near zero when it doesn't.

Figure 2:
External Force Estimation Deployment. At deployment time, we first obtain the measured joint torque from multiplying each joint’s measured current by its torque constant KK. We then use an LSTM trained on free-space data to estimate free-space torque, which is then subtracted from the measured joint torque to obtain external joint torque. — Figure 2: External Force Estimation Deployment. At deployment time, we first obtain the measured joint torque from multiplying each joint’s measured current by its torque constant KK. We then use an LSTM trained on free-space data to estimate free-space torque, which is then subtracted from the measured joint torque to obtain external joint torque. Source: Steven Oh, Jason Jingzhou Liu

The team evaluated NEXT against two prior methods: FILIC, a recent learning-based force estimation approach, and a classical disturbance observer. They used a Franka Emika Panda — a $30,000 research arm with built-in joint torque sensors — as their ground truth, then tested how well each method could recover those sensor readings.

FIRST — Force-Informed Re-Sampling Training — is the second half of the system. It takes the force estimates from NEXT and uses them to make robot learning smarter. The key observation is that when a robot arm learns from human demonstrations, most of the data is boring: the arm moving through open air toward an object. The moments that actually matter — the last fraction of a second before contact, and the seconds of active interaction — are rare in the dataset. If you train a policy by sampling uniformly from demonstrations, those critical moments get drowned out. FIRST fixes this by identifying pre-contact and contact segments using the torque signal and up-sampling them during training, giving the model more exposure to exactly the situations where it tends to fail.

Figure 1:
Overview of our approach.
(a) Neural External Torque estimation (NEXT) produces high quality joint torque estimates using only 10 minutes of data without dedicated force sensors or explicit system-identification, enabling force-feedback teleoperation on low-cost arms, such as the Piper, YAM, and Nero.
(b) Force-Informed Re-Sampling Training (FIRST) uses learned external torque estimates to segment demonstrations into free-space, pre-contact, and contact phases, then up-samples contact-relevant segments during training to improve policy performance. — Figure 1: Overview of our approach. (a) Neural External Torque estimation (NEXT) produces high quality joint torque estimates using only 10 minutes of data without dedicated force sensors or explicit system-identification, enabling force-feedback teleoperation on low-cost arms, such as the Piper, YAM, and Nero. (b) Force-Informed Re-Sampling Training (FIRST) uses learned external torque estimates to segment demonstrations into free-space, pre-contact, and contact phases, then up-samples contact-relevant segments during training to improve policy performance. Source: Steven Oh, Jason Jingzhou Liu

What They Found

The accuracy results for NEXT are striking. During contact — when a human was deliberately pushing on the Franka arm — NEXT achieved an average $L_{1}$ error of just 0.547 Nm across joints, compared to 4.395 Nm for FILIC and 1.471 Nm for the disturbance observer (Oh et al., 2025). That's an 87.6% improvement over the prior learning-based method, and a 62.8% improvement over the classical physics approach.

External Torque Estimation Error: NEXT vs. Baselines (Franka Arm)

Average L1 joint torque error (Nm) across contact and free-space settings. Lower is better. NEXT achieves the lowest error in both conditions.

External Torque Estimation Error: NEXT vs. Baselines (Franka Arm)
Label	Value
FILIC — Contact	4.395
Disturbance Observer — Contact	1.471
NEXT (Ours) — Contact	0.547
FILIC — Free Space	2.46
Disturbance Observer — Free Space	2.429
External Sensor — Free Space	0.449
NEXT (Ours) — Free Space	0.414

In free space, where the true external torque should be exactly zero, NEXT achieved an average error of 0.414 Nm — lower than even the Franka's own factory-installed torque sensor, which logged 0.449 Nm. The implication is that the learned model, trained on data from the specific hardware unit, captures idiosyncratic mechanical effects — particular friction profiles, gearbox quirks, temperature-dependent behaviors — that a general-purpose factory sensor cannot. NEXT is, in a meaningful sense, better tuned to its own robot than the sensor that came with it.

Table 1:
Average L1L_{1} joint torque error on the Franka in free-space and contact settings. Free-space errors are measured against zero external torque, while contact errors are measured against the Franka built-in external torque estimate. NEXT achieves the lowest error in both settings. — Table 1: Average L1L_{1} joint torque error on the Franka in free-space and contact settings. Free-space errors are measured against zero external torque, while contact errors are measured against the Franka built-in external torque estimate. NEXT achieves the lowest error in both settings. Source: Steven Oh, Jason Jingzhou Liu

The team also ran a user study with 20 participants performing a wiping task with force-feedback teleoperation. Participants used five different conditions: no feedback at all, a disturbance observer, position-position leader-follower control, the original FACTR system with Franka's built-in sensors, and FACTR with NEXT estimates substituted in. Ratings and measured joint torque — a proxy for how much unnecessary force users applied — both showed NEXT performing on par with the dedicated-sensor baseline. People couldn't tell the difference in feel.

For the policy learning results, the team tested five long-horizon manipulation tasks on a bimanual setup using two low-cost AgileX Piper arms, each costing roughly $2,500. The tasks involved object insertion, assembly, and deformable-object manipulation — the kind of contact-rich work that vision-only policies notoriously struggle with. Each task was evaluated over 20 rollouts, using a task-progress metric that captures partial completion.

Figure 3: FIRST is evaluated on five long-horizon, contact-rich tasks. Each task comprises multiple stages, many requiring precise alignment or fine-grained, force-sensitive adjustments. Please see videos of these tasks at https://jasonjzliu.com/factr2 Source: Steven Oh, Jason Jingzhou Liu

FIRST outperformed all four baselines — including the original FACTR method, a policy trained with an auxiliary torque reconstruction objective, and vanilla behavior cloning with or without torque as an input — by more than 17 percentage points in average task progress (Oh et al., 2025).

FIRST vs. Baselines: Average Task Progress Across 5 Contact-Rich Tasks

Task progress scores (higher is better) for each policy variant evaluated on five long-horizon manipulation tasks using a flow-matching policy on bimanual Piper arms.

FIRST vs. Baselines: Average Task Progress Across 5 Contact-Rich Tasks
Label	Value
Base Policy	42
Base Policy + Torque	47
FACTR	51
TA-VLA	46
FIRST (Ours)	68

The mechanism behind this improvement is visible in the validation loss curves. Under standard uniform sampling, pre-contact and contact segments consistently have higher training loss than free-space segments — the model is underfitting the hard parts. FIRST's resampling directly reduces those losses. The policy isn't just getting more force data; it's getting more of the right force data, at the right moments.

An ablation experiment revealed something genuinely interesting: pre-contact samples are often more valuable than in-contact samples for improving policy performance. The moment just before touch — when the arm needs to be precisely aligned and moving in the right direction — is where the policy most frequently goes wrong. This is a practically useful empirical result: if you had to choose between overrepresenting approach trajectories or grasp interactions in your training data, the approach trajectories win.

Why This Changes Things

To appreciate the significance here, consider the economics. A Franka Panda with built-in joint torque sensors costs around $30,000. An AgileX Piper costs about $2,500. The gap between those prices is, in large part, the cost of precision sensing hardware. Retrofitting a cheap arm with an end-effector force-torque sensor typically costs several thousand dollars more — sometimes more than the arm itself — and introduces calibration headaches, mechanical fragility, and additional software complexity.

NEXT collapses that gap without any of those costs. You need no new hardware. You need no prior knowledge of the robot's physical parameters. You need no paired contact data for training — the supervision signal is just the motor current during normal, uncontacted movement. And you need ten minutes and a laptop.

This matters for several overlapping reasons. First, it dramatically expands the population of robot arms that can participate in contact-rich research. The most exciting work in robot manipulation — insertion tasks, assembly, cooking, medical procedures — requires force sensing, which has effectively restricted serious research to labs wealthy enough to afford high-end platforms. A method that works on a $2,500 arm changes who can do that research.

Second, FIRST's resampling insight is broadly applicable. The observation that pre-contact moments are underrepresented in uniform sampling, and that overweighting them improves downstream performance, isn't specific to force sensing. It's a lesson about data distributions in imitation learning: tasks have temporal structure, and that structure determines where the hard problems live. Methods that exploit this structure should outperform methods that ignore it, and FIRST provides a clean, force-signal-based way to find that structure automatically.

Figure 7:
Default sampling yields higher validation loss on pre-contact and contact phases. By upsampling these phases during training, FIRST reduces their validation losses. — Figure 7: Default sampling yields higher validation loss on pre-contact and contact phases. By upsampling these phases during training, FIRST reduces their validation losses. Source: Steven Oh, Jason Jingzhou Liu

Third, the system works on real, commodity hardware right now. The team demonstrated it on the Piper, on the YAM, and on the Nero — three low-cost arms with very different mechanical properties — suggesting the approach generalizes across platforms without re-engineering. The code and pretrained models are publicly available, which means the barrier to adoption is genuinely low.

Robot Arm Cost Comparison: Sensorized vs. NEXT-Enabled

Approximate retail cost of robot arms with dedicated force sensing versus low-cost arms enabled by NEXT, illustrating the democratization potential of the approach.

Robot Arm Cost Comparison: Sensorized vs. NEXT-Enabled
Label	Value
Franka (built-in sensors)	30,000
AgileX Piper (NEXT-enabled)	2,500

There's also a philosophical point worth pausing on. The finding that NEXT outperforms the Franka's own built-in sensor in free-space settings — that a learned model knows a robot's hardware better than the hardware's own sensor — is a small but striking demonstration of what data-driven modeling can do when it's allowed to learn from the actual system rather than an idealized physical model. The real robot is always more complicated than the equations. Data that comes from the robot itself will always capture those complications more honestly. NEXT exploits this in a remarkably efficient way.

What's Next

The most immediate open question is generalization under distribution shift. NEXT trains on free-motion data collected on a specific arm, in a specific configuration, at a specific temperature and wear state. What happens when the arm ages, or is redeployed in a different environment, or is swapped for a different unit of the same model? The paper doesn't systematically address this, though the fact that NEXT trains in one minute from ten minutes of data suggests that periodic retraining is a viable solution — a brief recalibration session rather than a full pipeline rebuild.

The segmentation logic in FIRST also uses fixed thresholds for contact detection. More adaptive or learned thresholds could improve performance, particularly on tasks where contact forces are subtle or where the transition from free space to pre-contact is gradual rather than abrupt.

Looking further out, NEXT and FIRST point toward a broader principle: much of what robotics has historically required expensive hardware for may be achievable through better modeling of cheap hardware. Tactile sensing, compliance estimation, contact localization — all of these currently demand specialized sensors, but all of them, in principle, leave signals in the motor currents that a sufficiently expressive model might recover. Whether NEXT's approach scales to these harder problems is an open and exciting question.

For now, FACTR 2 does something practically important: it lowers the cost of entry into contact-rich robotics research by more than an order of magnitude. That means more labs, more experiments, more data, and more discovered insights. The Franka has been the standard platform for manipulation research for nearly a decade, in part because its sensing made everything else possible. FACTR 2 suggests that the Franka's monopoly on that capability may finally be over — and that the next generation of robot manipulation research might run on hardware anyone can afford.

Force sensitivity, it turns out, doesn't have to live in the sensor. Sometimes it can live in the model.

The $30,000 Sensor in a $2,500 Robot: How a 10-Minute Training Trick Gave Budget Arms the Sense of Touch

The Science

What They Found

Why This Changes Things

What's Next

Source articles

Comments (0)