Why Dexterous Hands Matter for Embodied AI

Embodied AI stands to redefine the autonomy economy by enabling machines to manipulate the physical world in a human-like manner.¹ Manual dexterity underpins over 90% of daily human actions, emphasizing how central hands are to manipulation tasks. Without similarly dexterous hands, robots remain limited to perception and locomotion, unable to fully act with purpose. This post examines why dexterous robotic hands are the frontier of embodied AI, outlining their complexity, the current state of the art, and the transformative potential they unlock.

Dexterous hands are the decisive interface between embodied AI and the physical world: any technical advance that brings robotic hands to millimeter-level precision with sub-Newton force control will immediately unlock real-world applications such as minimally invasive surgery, a market expected to exceed U.S.$70 billion in 2025. Similarly, achieving reliable grasp and manipulation of irregular objects at high throughput, on par with human workers, would transform parcel logistics, a sector valued at U.S.$520 billion in 2025, delivering over 400 billion packages globally. These numbers underline how incremental gains in dexterity can convert robots from laboratory curiosities into tools integral to key global industries.

Yet achieving that level of dexterity remains among robotics’ toughest challenges.² The human hand, crafted by evolution with 27 bones, over 30 muscles, and thousands of touch sensors, merges strength, precision, and perception effortlessly. Building a robotic counterpart involves tightly integrating compact mechanics, dense actuation, tactile sensing, and adaptive control into a system that must function reliably outside controlled settings. Many designs face trade-offs: high-controllability setups tend to be heavy and fragile, while more streamlined mechanisms compromise adaptability. Actuators must strike a balance between precision, power density, and compliance; no technology yet excels in all three. Tactile sensors still fall short of human skin in durability and resolution, and control strategies, whether classical or learning-based, often fail to generalize beyond laboratory conditions or across the simulation-to-reality gap.

These intertwined challenges collectively define the frontier of embodied AI. Dexterous robotic hands are not a subsystem but the linchpin; any progress will unlock entirely new domains of robotic application.

Current Technical Landscape

Progress in dexterous hands is paced by four threads that must converge on the same, low-latency hardware: mechanics and actuation, perception, control, and compute. Each has advanced in the last few years, but the gap to robust, human-level manipulation persists because all four must mature together:

Mechanics and Actuation: Tendon-driven electric designs remain the workhorse for controllability and packaging, while soft/pneumatic elements add compliance but complicate precise control; recent reviews map these trade-offs and the rise of compact transmissions for multi-degrees of freedom (DOF) fingers.³

Perception (tactile/visuo-tactile): Camera-based fingertip sensors now provide high-resolution, local 3D contact sensing on curved or omnidirectional surfaces.^5,7 Beyond hardware, advances in signal processing and learning are driving further gains in accuracy and robustness. Together, these developments broaden what robotic hands can feel, although a full-hand with durable “skin” remains an open challenge.

Control: The state of the art increasingly fuses models with learning (RL/imitation) and uses visuo-tactile inputs; evaluation is shifting from single task demos to generalization on new benchmarks such as DexArt.¹ Despite progress, policies that look strong in sim still degrade on physical hardware, underscoring persistent sim-to-real gaps.

Compute: Multi-camera visuo-tactile streams and high-DOF control loops enforce tight real-time budgets (≈10-to-20 ms perception, ≈1-to-5 ms control), typically sizing onboard inference around ~3-to-5 TOPS. The bottleneck isn’t peak TOPS but finding the sweet spot among latency, energy, and compute-unit cost, sustaining low-jitter throughput within hand/wrist power-thermal and memory-bandwidth limits without overspecing the module.

The Dexterity Gap

Despite decades of progress, dexterous robotic hands still fall short of matching the precision, strength, and versatility of human hands, and the gap is clearest when compared on measurable metrics.

Human fingertips resolve spatial detail on the order of a few millimeters, with recent clinical data placing adult static two-point discrimination around 2.8–3.5 mm at the distal pads. Modern vision-based tactile fingertips, by contrast, can reconstruct local contact geometry at 30-to-100 µm, and newer variants resolve 95 µm surface grains at sub-Newton contact forces. These advances mean robots can “see by touch” far more finely than they can yet use that information across an entire hand.

Force sensitivity shows a similar split. Around the light-contact regime, human just-noticeable differences in fingertip force scale with a Weber fraction near ~16% (e.g., around 0.16 N at 1 N).¹ State-of-the-art visuo-tactile fingertips can estimate local forces with median errors near ~0.026 N over a ~0.03-to-0.8 N range under favorable “tactile fovea” conditions, underscoring how performance still depends strongly on sensor design and calibration.⁶

In strength and manipulation headroom, typical adult key/lateral pinch spans roughly ~70-to-100 N, while precision two-point pinch averages ~7 kg (~69 N) in healthy populations. Current research hands advertise per-fingertip forces on the order of a few dozen newtons, sufficient for many assembly tasks but still shy of human peaks when compact packaging and durability are required.³

At the task scale, humans routinely and rapidly handle sub-centimeter parts; a standard clinical dexterity test uses 7 mm-diameter pegs placed and removed in seconds. Robotic benchmarks increasingly target difficult geometries and articulated objects, yet many controlled evaluations remain at centimeter scale, underscoring how robots still lag behind humans in fine, rapid manipulation.

In summary, at the level of individual sensing or actuation components, robotic hands can already surpass human fingertips in local resolution or small-force estimation. But at the system level, where perception, actuation, control, and durability must all work seamlessly together, robots remain far from the versatility and reliability of the human hand.

References:

Bao, C., Xu, H., Qin, Y., and Wang, X., 2023. Dexart: Benchmarking generalizable dexterous manipulation with articulated objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 21190-21200).
Bicchi, A., 2002. Hands for dexterous manipulation and robust grasping: A difficult road toward simplicity. IEEE Transactions on robotics and automation, 16 (6), pp.652-662.
Kim, U., Jung, D., Jeong, H., Park, J., Jung, H.M., Cheong, J., Choi, H.R., Do, H., and Park, C., 2021. Integrated linkage-driven dexterous anthropomorphic robotic hand. Nature Communications 12 (1), p.7177.
Liu, S., 2024. Shaping the outlook for the autonomy economy. Communications of the ACM, 67 (6), pp.10-12.
Padmanabha, A., Ebert, F., Tian, S., Calandra, R., Finn, C., and Levine, S., 2020, May. Omnitact: A multi-directional high-resolution touch sensor. In 2020 IEEE International Conference on Robotics and Automation (ICRA) (pp. 618-624). IEEE.
Sun, H., Kuchenbecker, K.J. and Martius, G., 2022. A soft thumb-sized vision-based sensor with accurate all-round force perception. Nature Machine Intelligence 4 (2), pp.135-145.
Tippur, M.H. and Adelson, E.H., 2023, April. Gelsight360: An omnidirectional camera-based tactile sensor for dexterous robotic manipulation. In 2023 IEEE International Conference on Soft Robotics (RoboSoft) (pp. 1-8). IEEE.
Vertongen, J., Kamper, D.G., Smit, G., and Vallery, H., 2021. Mechanical aspects of robot hands, active hand orthoses, and prostheses: A comparative review. IEEE/ASME Transactions on Mechatronics, 26 (2), pp.955-965.
Wheat, H.E., Salo, L.M., and Goodwin, A.W., 2004. Human ability to scale and discriminate forces typical of those occurring during grasp and manipulation. Journal of Neuroscience 24 (13), pp.3394-3401

Shaoshan Liu is a member of the ACM U.S. Technology Policy Committee, and a member of U.S. National Academy of Public Administration’s Technology Leadership Panel Advisory Group. His educational background includes a Ph.D. in Computer Engineering from the University of California Irvine, and a Master of Public Administration (MPA) degree from Harvard Kennedy School.

Current Technical Landscape

The Dexterity Gap

Why Dexterous Hands Matter for Embodied AI

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.