Zero-trust for AI agents: trust nothing, verify everything
Zero-trust was designed for human users and network perimeters. But the same principles apply — arguably more urgently — to AI agents operating inside your infrastructure.
The zero-trust security model can be summarized in one phrase: "never trust, always verify." Every request — regardless of where it comes from, what identity it carries, or whether it's inside your network — must be explicitly verified and authorized.
That model was developed in response to a specific failure: the assumption that anything inside the corporate perimeter was trustworthy. Breaches repeatedly proved otherwise. An attacker who compromised a single machine inside the network could move laterally without restriction because internal traffic was implicitly trusted.
With AI agents, we're making the same mistake — again. And the failure modes are worse.
Why AI agents break traditional security models
Traditional access control assumes a human is making decisions. A user logs in, gets a role, and that role determines what they can do. The human is the decision-maker; the access control is the enforcement mechanism.
AI agents invert this. The agent has credentials, but the decision-making is opaque and non-deterministic. The same agent, with the same role, can produce wildly different outputs depending on the prompt, the model version, the context window, or subtle changes in input. Access control was never designed to secure a non-deterministic decision-maker.
There's also the velocity problem. A human with production write access might make ten decisions per day. An AI agent with the same access can make ten thousand decisions per minute. Any misconfiguration or unexpected behavior is amplified at machine speed.
And there's prompt injection — a category of attack that simply doesn't exist for human operators. A malicious document, API response, or user input can redirect an agent's behavior in ways that bypass all intent-level controls. The agent is doing exactly what it was told; it just got told the wrong thing.
The five zero-trust principles, applied to agents
1. Verify explicitly — for every action, not just at login
The human logs in once; their session is trusted for its duration. Every action within that session inherits the session's authorization.
Authentication at session start is necessary but not sufficient. Each action the agent takes — especially irreversible ones — should be explicitly authorized at the moment of execution, not inherited from an earlier authentication event.
In practice: a human-in-the-loop approval checkpoint for actions that haven't been pre-approved. The agent authenticates when it connects; individual commands are authorized when they happen. This isn't just about security — it's about correctness. The agent authenticated hours ago; the current command may not match the intent behind that authentication.
2. Use least privilege access — and actually enforce it
Least privilege is probably the most cited and least followed security principle. The reason it gets ignored for AI agents is that "give it what it needs" is hard to determine in advance. So teams give agents broad access "just in case," intending to tighten it later. Later never comes.
The practical solution isn't to get permissions right up front — it's to start with nothing and expand based on actual observed behavior. Let the agent attempt things. When it hits a permissions boundary, evaluate whether that capability should be granted. Build the access model from evidence, not prediction.
This is the right way to build a whitelist too: start with manual approval for everything, then whitelist patterns you've actually reviewed and approved. The whitelist grows to reflect real observed behavior, not anticipated behavior.
3. Assume breach — design for containment
Zero-trust assumes that breaches happen and designs to minimize their impact. For agents, this means designing as if the agent will, at some point, behave unexpectedly — because it will. Models get updated. Prompts drift. Inputs get poisoned. The question isn't whether the agent will do something unexpected; it's how much damage it can do when it does.
Containment means:
- Scope limitation: the agent can only access what it needs for its specific task (see blast radius post)
- Approval gating: irreversible operations require human sign-off, so unexpected behavior is caught before it causes permanent damage
- Anomaly detection: unusual patterns (unexpected targets, unusual hours, command frequency spikes) trigger review even for nominally whitelisted operations
- Session isolation: agent sessions are isolated from each other; a compromised agent context doesn't affect others
4. Inspect and log everything
Zero-trust networks don't just block threats — they observe everything so that breaches can be detected and investigated. For agents, this means complete audit trails: every command attempted, every approval decision, every denial, every anomaly flag.
Log collection is the easy part. The harder part is making logs actionable. That requires structure: not just "this command ran," but "this command ran, at this time, in this session, by this agent, was approved by this reviewer, with this latency." That structure is what makes compliance reporting possible, and what makes incident investigation practical instead of theoretical.
There's also a more subtle benefit: logs tell you when your zero-trust controls are working. If an operation is being manually approved every time and never denied, it should be in the whitelist. If an operation is being denied regularly, the agent's scope is too broad. The approval rate is a signal about whether your controls are calibrated.
5. Authenticate every identity — agents included
In traditional zero-trust, every user and device must have a verified identity. For agents, this means each agent has a distinct identity — not a shared service account — with its own credentials, its own audit trail, and its own access scope.
Shared service accounts are a common shortcut that undermines the entire model. If three different agents share one set of credentials, you can't tell from the audit log which agent issued a specific command. Attribution disappears; investigation becomes impossible.
Per-agent credentials also make revocation practical. If one agent's credentials are compromised or the agent misbehaves, you revoke that agent's access without affecting others. With shared credentials, revocation requires rotating the shared secret and updating every agent simultaneously — so it rarely happens.
The approval layer as zero-trust enforcement
Human-in-the-loop approval is zero-trust enforcement at the action level. The agent presents an action to be performed; the authorization system evaluates it; a human or automated policy decides whether it proceeds.
The whitelist is the persistent state of that authorization: operations that have been reviewed, approved, and deemed safe to auto-approve in the future. New operations — ones the authorization system hasn't seen before — require explicit review. This maps directly to zero-trust's "verify explicitly" principle: the default is denial; explicit authorization is required.
Anomaly detection is zero-trust's "assume breach" principle in action. Even whitelisted operations get flagged when the context is unusual — because the assumption is that something may have gone wrong, and an unusual operation in an unusual context is evidence worth examining.
What this looks like operationally
A zero-trust agent deployment doesn't mean every command requires a human to click Approve. That's unsustainable at any meaningful scale. It means:
- New agents start with full manual review — every command goes to a reviewer until you've built enough history to know what's normal
- Known-safe operations are whitelisted — after review, routine low-risk commands are auto-approved. The reviewer sees less noise; genuine anomalies stand out
- Irreversible operations always require review — no matter how many times you've seen
kubectl delete, it goes to a reviewer. The cost of irreversibility justifies the friction - Anomalies surface for review — a whitelisted command that runs at 3am when it normally runs at 10am gets flagged. Zero-trust doesn't just check what; it checks when, where, and how often
Over time, the approval workflow teaches itself. The whitelist grows to reflect real operational patterns. Review burden decreases as safe operations are identified and approved. The human reviewer's attention is focused on exactly the operations that warrant it.
The alternative
The alternative to zero-trust for agents is implicit trust: give the agent broad credentials, assume it will behave as intended, review logs after the fact.
The problem is that "review logs after the fact" doesn't help when the damage is already done. A dropped database, a leaked credential, a mis-deployed configuration — none of these benefit from post-hoc log analysis. The event has already happened.
Post-hoc review is useful for understanding how something went wrong. It's not useful for preventing damage. Zero-trust for agents — explicit authorization at the moment of action — is the only model that provides prevention rather than just explanation.
The question isn't whether your AI agent will eventually do something unexpected. It will. The question is whether your infrastructure is designed to catch it before the damage is irreversible.
Getting started
If you're deploying AI agents and haven't thought about zero-trust principles yet, here's a minimal starting point:
- Give each agent a distinct identity. No shared service accounts. Per-agent credentials from day one.
- Start with full manual review. Understand what your agent actually does before whitelisting anything. Two weeks of manual review is worth months of cleanup after an incident.
- Log everything with structure. Command, agent ID, timestamp, outcome, reviewer, latency. You'll need this for the first investigation.
- Build the whitelist from observed behavior. Don't predict what the agent needs; observe what it does and approve it explicitly. The whitelist is your authorization record.
- Set anomaly thresholds early. Unusual hours, unusual targets, unusual frequency. These thresholds are easier to set when you have a baseline; establish them during the manual review phase.
Zero-trust wasn't designed for AI agents, but it fits them better than any alternative. The principles are the same; the stakes are higher; the implementation needs to account for non-deterministic behavior at machine speed.
Trust nothing. Verify at the point of action. Assume the unexpected will happen and design to contain it. That's zero-trust. That's the right model for agents.
Apply zero-trust principles to your AI agents
Expacti is purpose-built for zero-trust agent deployments: per-action authorization, whitelist-based approval memory, anomaly detection, and complete audit trails. Start reviewing every command from day one.
Get started free →