News
Architecture and Hardware

Can LLMs Make Robots Smarter?

Large language models may be used to do a lot of the planning that robots require, from within the robot.

Posted
humanoid robot

Advances in robotics have arrived at a fast and furious pace. Yet, for all their remarkable capabilities, these machines remain fairy dumb. They are unable to understand all but a few human commands, they cannot adapt to conditions that fall outside their rigid programming, and they are unable to adjust to events on the fly.

These challenges are prompting researchers to explore ways to build large language models (LLMs) into robotic devices. These semantic frameworks, which would serve as something resembling a “brain,” could imbue robotics with conversational skills, better reasoning, and an ability to process complex commands—whether a request to prepare an omelet or tend to a patient in a care facility.

“Adding LLMs could fundamentally change the way robots operate and how humans interact with them,” said Anirudha Majumdar, an associate professor in the Department of Mechanical and Aerospace Engineering at Princeton University. “An ability to process open-ended instructions via natural language has been a grand ambition for decades. It is now becoming feasible.”

The ultimate goal is to develop “agentic computing” systems that use LLMs to power robots through complex scenarios that require numerous steps. Yet, developing these more advanced robots is fraught with obstacles. For one thing, GPT and other models lack grounding—the context required to address real-world situations. For another, AI is subject to errors and fabrications, also known as hallucinations. This could lead to unexpected and even disastrous outcomes—including unintentionally injuring or killing humans.

As a result, researchers are proceeding cautiously. Although agentic computing systems that use LLMs introduce better reasoning and a more interactive experience, “fully autonomous systems that control a robot from the start to the end remain in the future,” said Sergey Levine, an associate professor in the Department of Electrical Engineering and Computer Sciences at the University of California (UC) Berkeley.

Beyond Words

Despite impressive leaps in robotics over the past few years—including the emergence of lint-trapping Roombas, burger-flipping kitchen assistants, surgical robots, and warehouse bots—the sleek, autonomous devices of sci-fi films like Millennial Man, I, Robot, and Ex Machina remain a sci-fi fantasy.

Plugging in LLMs—or perhaps more specialized Kitchen-GPT or Medical-GPT models—could reboot robotics and help agentic computing take shape. With the ability to execute multi-step processes within an overall plan, the robot can function in a more iterative and human-like way. At its best, agentic systems can take actions independently, based on pre-determined goals, rules, or learning models. They do not require direct and continuous human intervention. The LLM can spot problems, gaps and inconsistencies, and revamp the system on the fly.

At the University of Southern California, third-year Ph.D. student Ishika Singh is exploring ways to construct robots that can do more than cook french fries to perfection. She wants to build robots that prepare entire meals—from slicing and chopping veggies and mixing spices to sautéing foods, setting the table, serving the meal, and then clearing the table and cleaning up.

Today, robots can handle discreet tasks—pouring oil into a pan, cracking eggs, scrambling them, and placing them on a plate, for example. Yet here’s the catch: stringing these actions together into one smooth, intuitive process is not yet possible. The problem is that robots require a rigid, step-by-step planning pipeline that meticulously dictates each action. Deviations from the script? Not an option. Even tweaking the code won’t help. Consequently, a real-world request like “use the Crock-Pot” or “make it gluten-free” will likely fall on deaf (robot) ears.

Robots powered by LLM models could change the recipe. They would prepare meals and do the laundry; aid patients at the bedside; handle dangerous construction work; and perhaps even accelerate the development of autonomous vehicles. “Moving beyond a system wholly dependent on code and using language to control a robot fundamentally changes things,” Singh said. “You suddenly could accomplish practical tasks and do many useful things.”

Yet building an LLM-powered control system for a robot is a complex process. There’s a need to train the language model, followed by teaching the system how to interpret words and context. Next comes integrating the LLM with the robot’s control system, breaking down complex commands into actionable steps, and using onboard sensors—microphones, cameras, pressure sensors, and LiDAR—to navigate through real-world conditions. Finally, engineers must add robust error handling and feedback loops, along with adaptive algorithms that allow the robot to continuously improve.

Singh’s research demonstrated that an LLM could nudge robots to better performance. The method—ProgPrompt—relies on a hybrid approach that involves direct interaction with an LLM along with using ChatGPT to write code. The approach led to the desired action as much as 75% of the time.a Yet, problems persisted. These mostly centered on the robot’s inability to understand commands, and it would sometimes become confused and stop altogether. “The results were far from perfect, but this is better than what we can achieve with conventional programming,” she said.

Rethinking Robotics

Large language models might dial up a robot’s ability to process requests and handle various tasks, but at least for the foreseeable future, it brings them no closer to a functional system. The inherent vagueness of words, phrases, and thinking can lead to “seemingly silly mistakes,” Levine noted. “Because LLMs are not ‘grounded’, they lack the context needed to solve problems.” This might include opening a jar or pill container, for example.

Solving the problem involves more than simply plugging in GPT-powered intelligence. There is a need for agentic AI workflows and multi-agent collaboration. LLMs lack an intrinsic understanding of the physical world. They rely on probabilistic patterns to transform ideas into actions. For example, “You might ask a robot to hand you a knife,” Majumdar said. “Depending on the type of knife, how you plan to use it, and who is asking for it—it might be a child—it may or may not be correct or safe to hand it over.” A carving knife isn’t a butter knife. “As you dive into a process and try to understand all the possibilities and anomalies, you realize how complicated things become,” he added.

Hallucinations represent another obstacle. It is one thing for an LLM to spit out an absurd declaration or serve up incorrect math on ChatGPT or Gemini; it is an entirely different thing for the glitch to cause a robot to go haywire and damage property or put humans at risk. “Because autonomous robots are physical systems and they do physical things, they introduce potential risks. The reliability level must be extraordinarily high,” explained Andrew Hundt, a computing innovation fellow at Carnegie Mellon University.

In fact, understanding what safety checks to install in a robot—and how to build a failsafe mechanism that’s dependable—is a core area of research. Beyond a basic need to shut down a robot on command if it begins doing something questionable, incorrect, or dangerous, there’s a secondary issue: what happens when a human can’t respond quickly enough to avert a problem? “There’s a need to build robots with a defense in-depth framework that has multiple layers of systems and safeguards,” Hundt pointed out.

There’s also a more basic question about whether it’s wise or practical to use robotics in certain situations, Hundt said. “Just because we can use a robot for a given task—and it can do that thing effectively—doesn’t mean we should use it. Sometimes, the conventional way of doing things is better.” For example, in Japan, care facilities that experimented with robots over the course of several weeks quickly discovered that practitioners suddenly had to tend to both robots and patients. As a result, human workload increased and within a few days, workers typically stopped using the robots.b

In fact, some robotics experts are skeptical LLMs will ever serve as effective robot “brains.” “LLMs are new and shiny and not deployed at scale anywhere,” said Rodney Brooks, Panasonic Professor of Robotics (emeritus) for the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab (MIT CSAIL), and CTO of robotics firm Robust AI. “Language is not related to the hard problems of robotics in any way.”

Flipping the Script

There’s a paradox associated with large language models serving as robot “brains.” While the goal is to simplify interactions between human and machine, an LLM introduces layers of complexity. These extend from the physical actions and operations of the robot—including how language maps to various sensors, actuators, and actions—to the broader external environment and the logic required to navigate a space and accomplish tasks.

“The LLM must be aware of the context of the robot embodiment. What can the robot do or not do? What are its mechanical constraints? What are practical safety constraints?” Levine explained. One way to build smarter robots and boost semantic reasoning, he said, is to fine-tune LLMs on robotic data. These Vision Language-Action Models (VLAs) fuse vision, language, and actions. Levine and a group of researchers have used this technique to develop RT-2,c an AI model capable of embodied chain of thought and sequential reasoning.

The RT-2 model is part of a broad Open X-Embodiment Collaboration framework that spans the likes of UC Berkeley, Carnegie Mellon University, Stanford University, ETH Zurich, the University of Tokyo, Max Planck Institute, Google DeepMind, and more than two dozen other institutions. Together, participants have conducted nearly a million robotic trials with 527 distinct skills across 22 types of robots.d Using LLMs in conjunction with VLAs, they have pushed up success rates by upwards of 50% and witnessed emergent capabilities in areas like spatial recognition and dexterity.

The hope is that sharing data will fuel more advanced robotics in homes, factories, and vehicles. “If we can train a single large neural network on data from many different robots, then we can drastically lower the barrier to entry for new robotics applications,” Levine said. Ultimately, a peer-to-peer approach like the Open X-Embodiment Collaboration framework could make it possible for robots to take a giant leap forward. “This would allow researchers to adapt models rapidly to new robots and robotic systems,” he added.

Using agentic design with AI workflows, researchers also are exploring ways to incorporate cues from the surrounding environment into a robotic framework. This approach would help augment their reasoning by introducing actual physical data and contextual information, including external code. Embedded sensors, Internet of Things (IoT) devices, and a combination of cloud and edge LLM data could feed robots critical information in real time, Majumdar explained. Already, this takes place with less-complex industrial robots in factories and other industrial settings. “Greater integration between the robot and the environment could increase the odds that the robot and the LLM operate correctly,” he said.

Minding the Robots

Despite groundbreaking advances in LLMs and a wave of commercial startups focused on the robotics space, building a general-purpose brain for robots remains a futuristic vision. Current generative AI models excel at producing text, code, and images, thanks to their training on vast amounts of Internet data. However, their AI capabilities don’t integrate naturally or seamlessly with the mechanical complexities of robots. Researchers must continue to search for ways to bridge the gaps.

Nevertheless, Levine believes researchers ultimately will solve the problem—and agentic systems that incorporate LLMs will play a key role. This could lead to better robots, autonomous vehicles, smart cities, and more. “Robots have the unique advantage that the more they attempt a task, the more experience they get, the better they can collectively perfect their skills through real-world use cases,” he said.

Further Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More