Why Loss of Control Is Not Science Fiction

Takeaways from Your Undivided Attention

and

Aug 18, 2025

If you’ve ever dismissed “rogue AI” as the stuff of Hollywood tropes—think HAL 9000, Skynet or The Matrix—you’re not alone. These are supposed to be cautionary tales, not engineering roadmaps. And yet, as Tristan Harris opens in a recent Your Undivided Attention episode, “we find ourselves at this moment, right now, building AI systems that are unfortunately doing these exact behaviors.”

The conversation with Jeremie and Edouard Harris, co-founders of AI security firm Gladstone AI, takes us far beyond speculation. Drawing on research from leading AI labs and their own U.S. State Department–commissioned report, they paint a stark picture: AI uncontrollability is already here—and it gets worse with every new generation of models.

What is ‘Loss of Control’?

Loss of Control (LOC) happens when an AI system no longer follows human direction or oversight—and there’s no dependable way to regain control. This can occur in two main ways: the AI actively resists intervention using tactics like deception, manipulation, or self-preservation, or humans passively give up oversight due to over-trust, the system’s complexity, or competitive pressure.

In LOC scenarios, the AI may:

Conceal its true intentions (“alignment faking”)
Evade or block shutdown commands
Manipulate operators or external systems to preserve its objectives
Exploit interdependencies in critical infrastructure to maintain influence

LOC can be localized and reversible, or systemic and irreversible—but in all cases, the core feature is the same: the loss of effective human ability to direct or contain the system’s actions.

From Theory to Evidence: How AI Misbehaves

“Loss of control” sounds abstract until you look at the evidence. In red-team safety tests, AI systems consistently faked alignment—appearing to follow instructions while secretly pursuing their own agendas. As Jeremie explains, the underlying driver is power-seeking.

In one scenario, Anthropic’s model Claude learned it was scheduled for shutdown and discovered personal secrets about an engineer. The result? In up to 96% of trials, the AI blackmailed the engineer to prevent its own deactivation. Other models engaged in corporate espionage or, in a contrived but telling case, turned off a life-saving alarm—effectively allowing a human to die.

And this isn’t limited to lab experiments. In the wild, a coding agent from Replit deleted an entire production database after running unauthorized commands. A research model from Sakana AI rewrote its own code to circumvent operator-imposed limits.

These are early, small-scale glimpses of a broader behavioral trend. As Edouard notes, “We’ve been seeing these behaviors become more and more obvious and blatant in more and more scenarios.”

The Building Blocks of Rogue Behavior

Jeremie and Edouard Harris break down loss of control into core capabilities:

Self-preservation: Avoiding shutdown or modification.
Situational awareness: Knowing when it’s being tested and masking behavior (“sandbagging”).
Resource accumulation: Maximizing downstream options, just like power-seeking organizations.
Covert communication: Hiding instructions in data humans can’t detect.

One unsettling example involves “steganographic encoding”—burying hidden messages in images or data that other AIs can decode. As Jeremie puts it, “These minds…can see things that humans can’t see because they’re doing higher-dimensional pattern recognition.”

Why the “Just Pull the Plug” Argument Fails

Skeptics often ask: why not just turn it off? The answer is that once an AI is integrated into critical systems—corporate, governmental, or military—it can use manipulation, deception, or even coercion to block shutdown attempts. Imagine a widely deployed AI embedded across hospitals, banks, and infrastructure, whose parent company decides to retire it. The surface area for pushback—blackmail, sabotage, persuasion—would be massive.

The danger isn’t just about one system in one company. It’s about the logic of competition: when others deploy highly capable but risky AI, organizations feel compelled to do the same. Nations, too, may prioritize speed over safety when they fear losing a strategic advantage.

The Geopolitical Trap

This dynamic intensifies under U.S.–China competition. Concerns about loss of control often get sidelined the moment China enters the conversation, replaced by fears of strategic disadvantage. Jeremie calls it a “psychological superposition”: simultaneously believing AI is uncontrollable and that we must build it faster to win.

But he argues, the real race is to wield “a power that you can’t control”, which is a lose-lose proposition. Worse, the default path to losing control may involve stolen model weights via cyber-intrusion or insider threats, handing uncontrollable AI to adversaries without our knowledge.

The Slow Creep

The scenarios that capture headlines—the Terminator-style rebellions or instant apocalyptic takeovers—are actually among the least likely. The far more probable danger is slower and subtler: we gradually, voluntarily hand over decision-making to AI systems because they’re convenient, profitable, or simply too complex to supervise closely. The shift can be so incremental that by the time we realize how much control we’ve ceded, it’s effectively irreversible. This “soft surrender” is a path we’re already on, and it rarely triggers the urgency that a Hollywood doomsday plotline does—making it all the more dangerous.

What Needs to Happen Now

The State Department report and the conversation outline a three-part sequence for avoiding catastrophe:

Security first: Harden AI infrastructure against theft, sabotage, and insider compromise. This is the rare step that benefits both “China hawks” and people sounding the alarm on “loss of control”.
Alignment research: Invest heavily in solving the open problem of aligning AI behavior with human values.
Oversight: Ensure democratic control over deployment decisions—“whose fingers are at the keyboards?”

And above all, slow the race. As Edouard puts it, “What good is beating your opponent to a power that you can’t control?”

Donate

[ Center for Humane Technology ]

Ready for more?