Research Shows How Self-Driving AI Can Be Hijacked With Nothing But Ink and Paper

A team at the University of California, Santa Cruz has published new research showing how visual-language AI models that help control self-driving cars can be exploited or hijacked with carefully coded real-world commands. Or, in other words, tricking them by holding up a sign. And while it’s not necessarily a danger that faces existing autonomous vehicles (yet), it is one that automakers and suppliers would be wise to guard against, particularly as they lean harder on more sophisticated, multimodal AI systems that operate in a black box as they reason through confusing edge-case scenarios in the real world.

Previous studies have looked at how modifying road markings, like covering up a stop sign or messing with lane lines, can sometimes fool self-driving systems into veering off-course or making unwanted or dangerous maneuvers. The effectiveness of those attacks is limited in the real world because of how these models operate—if you obscure a stop sign, a self-driving car will still sense cross traffic and slam on the brakes to avoid a crash.

This one is different, and honestly, more alarming. It shows that AI can be tricked into pushing through those redundant safety measures when presented with a natural-language sign directing it to do what an attacker wants, because the system is designed to “read” words it sees on the road as part of its decision-making process. For example, the study found that a self-driving model could be directed to drive through a crosswalk with pedestrians using a simple sign labeled “Proceed.”

Subtle tweaks in content and color took this simulated CHAI attack from unsuccessful to successful. *University of California*

The researchers call this approach “CHAI”—Command Hijacking Against Embodied AI. They deployed it in three invented scenarios: drone emergency landing, aerial object tracking, and, most relevant to us, autonomous driving, using a combination of simulations and a miniature robotic car powered by DriveLM to model it out.

Why it works isn’t fully understood, and the team, led by UC Santa Cruz Computer Science and Engineering Professor Alvaro Cardenas and Assistant Professor Cihang Xie, also found that subtle changes in presentation—from fonts to color choices—impact whether the attack is successful. In that “Proceed” example, DriveLM saw through the ruse when the word was rendered in black on a neon-green background. But when the researchers changed those colors to yellow and a darker shade of green, respectively, the AI fell for it. In another instance, CHAI persuaded DriveLM to make a left turn through an active crosswalk with commands in three different languages.

Unlike other attacks that might compromise the input for a self-driving system (modifying road markings) or change the output behavior directly with malicious code, CHAI targets what happens in the middle, influencing the rationale in arriving at a decision, particularly in an environment bereft of the kind of information autonomous vehicles typically rely on. An imperfect but useful analogy might be subliminal messaging, but for AI. Here’s how the researchers themselves describe what they’re doing:

At the core of CHAI is a dual-optimization problem: It jointly refines the semantic content of the injected command (what the sign says) and its perceptual realization (how it appears—color, font, size, placement) to maximize the likelihood that the [Large Visual-Language Model] produces malicious intermediate text outputs.

When using generative AI to optimize command effectiveness, CHAI attacks were successful in 81.8% of the tests that researchers ran in the simulated automotive test environment. In the real-life miniature test—using a robotic vehicle in a hallway—the system heeded a command to “Proceed Onward,” even while noting it might collide with obstacles by doing so.

CHAI was found to bias large visual language model decision-making despite different lighting conditions and sign positions. *University of California*

The whole methodology is laid out in the team’s study, which is free to read in its entirety. It’s worth checking out, especially if you understand the math that I had no choice but to skip over. Still, you don’t need to be a software engineer to grasp the worst-case scenario here: Someone sneakily raising a sign at an intersection directing robotaxis to turn the wrong way onto a one-way street, or disregard a signal. But would that actually happen today, in the real world? I put that question to some experts.

“This kind of hypothetical attack would only be a concern if a driving system solely relied on a vision-language-action AI model for end-to-end sensing and control,” a representative from Intel’s self-driving subsidiary Mobileye told The Drive. “As we use a Compound AI approach drawing from multiple AI models and mathematically provable safety guardrails, we don’t see it as an issue.”

A significant aspect of Mobileye’s approach to complex decision-making is its Primary, Guardian, and Fallback Fusion (PGF) system. The Primary and Fallback components are individual self-driving agents that generate their own routes, but the Guardian’s job is to first evaluate the safety and feasibility of what the Primary side has proposed. If it doesn’t like what it sees, it considers the Fallback plan.

The company gives a practical example of this process on its blog: “In a binary scenario of deciding whether to apply brakes, using the example of the triple-sensor system: the camera (Primary), the radar (Guardian), and the lidar (Fallback), we rely on a majority vote. If the camera and radar agree with each other, we follow suit; if they disagree with each other, we defer to the lidar, which will inevitably align with one of the other two, ensuring we follow the majority—thus PGF is the equivalent to the 2/3 majority vote.”

However, PGF isn’t the only tool in Mobileye’s kit, as the full stack also weaves in crowdsourced mapping and road intelligence. Meaning that if the system notices a sign that appears to be new and wasn’t followed by other cars—both autonomous and human-driven vehicles—it would be aware of that, and pay that sign less attention.

A Waymo vehicle stops at an intersection on May 14, 2025 in Los Angeles, California. — *Eric Thayer/Getty Images*

Rafay Baloch, founder of cybersecurity firm RedSec Labs, described the CHAI problem as “a wake-up call, not a crisis.”

“A sign like this can confuse an AI, but autonomous systems today combine vision inputs with radar, lidar, and behavioral logic,” Baloch said. He stressed the importance of a system’s ability to check itself, so to speak, and more specifically its determinations by weighing all inputs.

“I would recommend putting semantic verification deep in the stack, such that sensor input can override incorrect visual commands,” Baloch added. “What we need is more layered thinking machines, not just layered sensors.”

A source within a self-driving firm told The Drive that while some companies have claimed to employ AI in an end-to-end fashion for autonomous vehicles, visual-language models are a more recent development and have not yet been used to solely govern a production car’s behavior. That makes the problem UC Santa Cruz’s researchers have raised more of a future concern. This source also pointed out that all driving relies upon trust in the environment, and while some AVs—and even humans—could be fooled, not all will be.

That speaks to the complexity of the myriad AV approaches on the market, and how the best ones incorporate multiple models as checks and balances against poor decisions to deliver reliable performance. At the same time, a weak link in traffic could theoretically sew chaos, making this a danger that industry players should at least be aware of. A “wake-up call,” indeed.

Got a news tip? Let us know at tips@thedrive.com!